# Image generation — landscape and workflows State of the OSS image-gen stack as of 2026-05 on the Framework Desktop (Strix Halo gfx1151, 128 GB unified, ComfyUI via `kyuz0/amd-strix-halo-comfyui`). Practical operator guide, not a tour of the diffusion literature. ## What's deployed today - **ComfyUI** at `http://framework:8188` (compose stack: `comfyui`). Manual-start only; competes with kimi-linear for GPU. - **Flux.1-Dev** (`/srv/docker/comfyui/models/diffusion_models/flux1-dev.safetensors`) + CLIPs + VAE. Workhorse text-to-image model. - **t5xxl** in both fp8 and fp16 variants — templates may reference either. See [`pyinfra/framework/compose/comfyui/README.md`](pyinfra/framework/compose/comfyui/README.md) for bring-up and model dir layout. ## Hardware envelope Flux.1-Dev was trained on 1024×1024. Memory scales quadratically with pixel count. With 128 GB unified the **quality wall hits well before the memory wall**: | Resolution | Memory peak | Quality reality | |---|---|---| | 1024×1024 (native) | ~14 GB | Sharp, composition correct | | 1024×1536, 1536×1024 (native ratios) | ~18 GB | Native-class — trained on these | | 1536×1536 | ~30 GB | Soft drift in composition | | 2048×2048 | ~50-60 GB | Memory fits; duplicated subjects, anatomy errors | | 3072×3072 | ~120 GB | At the wall; quality degrades meaningfully | | 4096+ | OOM territory | Don't direct-render here | **Rule of thumb**: render at 1024-class, *upscale* to anything larger. Direct 2K+ rendering on Flux is technically possible on this box but produces visibly worse images than upscaling. For native >1K rendering, use a model trained for it (HiDream-O1 — see [Model landscape](#model-landscape)). ## Workflow patterns ### 1. Native render The baseline. Template: **Workflow → Browse Templates → Flux.1 Dev**. 1024×1024, fp8 t5xxl, ~20-30 sampling steps. First generation takes ~5 min (kernel JIT); subsequent ~30-60 sec. ### 2. Hires fix — same seed, higher final resolution The standard "I like the small render, give me the same thing bigger" pattern. Latent upscale + second sampler pass: ``` [KSampler 1024×1024 @ seed] ↓ [Upscale Latent By 2.0] ↓ [KSampler 2048×2048 @ same seed, denoise=0.4–0.6] ↓ [VAE Decode] ``` Denoise knob: - **0.3–0.4** — stays very close to the small render, adds detail - **0.5–0.6** — preserves composition, reinterprets details - **0.7+** — composition drifts; treat as "variation at higher res" Bundled ComfyUI templates include "Flux Hires Fix" — load that and edit the prompt/seed. ### 3. Image upscale + img2img refine Cheaper than hires fix, less faithful to the original prompt. Decode the small image, run an ESRGAN-class upscaler on it, then img2img with low denoise to sharpen: ``` [Render 1024×1024] ↓ [Upscale Image with Model: 4x-UltraSharp or 4x_NMKD-Siax] ↓ [VAE Encode] → [KSampler denoise=0.2–0.3] ↓ [VAE Decode] ``` Faster than hires fix because the upscaler is a small model; the diffusion pass only refines existing detail rather than re-rendering. Best when the original render is already correct and you just want sharper output. ### 4. Tiled upscale (>2× scaling) For 4K / 8K targets without OOM, process the upscaled image in overlapping tiles. Custom node: [`ComfyUI_UltimateSDUpscale`](https://github.com/ssitu/ComfyUI_UltimateSDUpscale). Each tile is its own sampler pass; tile overlap blends seams. Slower end-to-end but uses constant memory regardless of final resolution. ## Reference / style conditioning Three distinct tools, picked by what you want preserved from the reference: | Tool | Reference role | When to use | |---|---|---| | **Flux Redux** | "In the style of this" | Single inspiration image + text prompt — start here | | **IP-Adapter Flux** | "Blend these N references with weights" | Multiple inspirations, fine control over each | | **ControlNet (Flux)** | "Match this structure" — edges, depth, pose | Composition matters more than style | ### Flux Redux (recommended starting point) Black Forest Labs' official "remix this image" adapter. Pass an inspiration image + a text prompt describing what you want generated; Redux blends the visual cues into the text-conditioned output. ``` [Load Reference Image] → [Redux Style Model] ↓ [Text Prompt] ────────→ [Conditioning combiner] ↓ [KSampler] → output ``` Models needed: - `black-forest-labs/FLUX.1-Redux-dev` → `flux1-redux-dev.safetensors` (gated — accept license like Flux.1-Dev) - `google/siglip-so400m-patch14-384` → image encoder for Redux Drops in `/srv/docker/comfyui/models/style_models/` and `/srv/docker/comfyui/models/clip_vision/`. ### IP-Adapter Flux Community-built reference-conditioning adapter with per-image weight sliders. Better when you want to blend multiple inspirations or control intensity precisely. Custom node: [`ComfyUI_IPAdapter_plus`](https://github.com/cubiq/ComfyUI_IPAdapter_plus) (supports Flux as of 2025). Heavier setup than Redux. ### ControlNet Flux Geometric/structural conditioning — pass a canny edge map / depth map / pose skeleton extracted from a reference, and Flux follows that structure. Different question from "style of": this preserves *composition*, not look. [`XLabs-AI/flux-controlnet-collections`](https://huggingface.co/XLabs-AI/flux-controlnet-collections) ships Canny, Depth, HED variants. ## Model landscape | Model | Strength | When | |---|---|---| | **Flux.1-Dev** | Prompt adherence, current open SOTA on most tasks | Default; already deployed | | **Flux.1-Schnell** | Distilled 4-step Flux | When you want 4× speed and can tolerate quality drop | | **HiDream-O1-Image-Dev** | Pixel-native 2K, better text rendering | When native >1K matters (signs, posters, UI mockups) | | **Wan-Image-2** | Stylized / illustrative range | Specific aesthetics not Flux's strong suit | | **SDXL** + community LoRAs | Mature LoRA ecosystem | When a specific LoRA exists only for SDXL | | **Flux.2** | (Newer, presumably better) | ⚠ **Crashes on Strix Halo ROCm** as of 2026-05; skip until upstream patch | **HiDream-O1** is the recommended next pull when you want higher native resolution. Quote from the May 2026 multimodal research brief: "native 2K, much better text rendering than Flux, beats Flux.2 on benchmarks at 1/7 the params." Image gen at >1K native. ## Custom nodes worth installing Drop these in `/srv/docker/comfyui/custom_nodes/` (RW bind-mount — survives container recreate): | Repo | Why | |---|---| | [`ComfyUI_essentials`](https://github.com/cubiq/ComfyUI_essentials) | Already bundled in kyuz0 image. General-purpose utility nodes. | | [`ComfyUI-AMDGPUMonitor`](https://github.com/general/ComfyUI-AMDGPUMonitor) | Already bundled. VRAM monitoring in the UI. | | [`ComfyUI-GGUF`](https://github.com/city96/ComfyUI-GGUF) | Already bundled. Lets you use GGUF-quantized Flux variants. | | [`ComfyUI_UltimateSDUpscale`](https://github.com/ssitu/ComfyUI_UltimateSDUpscale) | Tiled upscale for 4K+ outputs. Manual install. | | [`ComfyUI_IPAdapter_plus`](https://github.com/cubiq/ComfyUI_IPAdapter_plus) | Multi-reference style conditioning. Manual install. | | [`ComfyUI-Manager`](https://github.com/ltdrdata/ComfyUI-Manager) | Install/update custom nodes from inside the UI. Quality-of-life. | Install pattern for a custom node: ```sh cd /srv/docker/comfyui/custom_nodes git clone https://github.com/ssitu/ComfyUI_UltimateSDUpscale # Restart ComfyUI to pick up the new node cd /srv/docker/comfyui && docker compose restart ``` ## Practical "starter kit" downloads If you want to unlock the four main workflow patterns at once: ```sh # Upscaler (for hires fix + img2img refine) hf download Kim2091/UltraSharp \ 4x-UltraSharp.pth \ --local-dir /srv/docker/comfyui/models/upscale_models # Flux Redux for "style of these inspirations" (gated — accept license) hf download black-forest-labs/FLUX.1-Redux-dev \ flux1-redux-dev.safetensors \ --local-dir /srv/docker/comfyui/models/style_models # Image encoder Redux needs hf download google/siglip-so400m-patch14-384 \ model.safetensors \ --local-dir /srv/docker/comfyui/models/clip_vision # Optional: HiDream for native 2K rendering # hf download HiDream-AI/HiDream-I1-Dev ... (verify path on HF) ``` After download, refresh the browser tab so ComfyUI rescans the dirs. ## Operational notes - **First generation is slow** (kernel JIT). Subsequent ones much faster. - **`--disable-mmap` is mandatory** for gfx1151 — already set in compose. Don't remove. - **VAE OOMs at high res** — `--bf16-vae` mitigates (set). For >2K outputs, decode in tiles. - **Concurrent generation** — ComfyUI queues requests, so a busy workflow can chain N images on one prompt. - **Output lands in `/srv/docker/comfyui/output/`** as well as in the UI. ## Pointers to other docs - Bring-up + model dir layout: [`pyinfra/framework/compose/comfyui/README.md`](pyinfra/framework/compose/comfyui/README.md) - Top-level coding-workflow status: [`README.md`](README.md) - Strategic landscape (Layer 0 image/video/audio): [`Roadmap.md`](Roadmap.md)