9.1 KiB
Image generation — landscape and workflows
State of the OSS image-gen stack as of 2026-05 on the Framework Desktop
(Strix Halo gfx1151, 128 GB unified, ComfyUI via
kyuz0/amd-strix-halo-comfyui). Practical operator guide, not a tour
of the diffusion literature.
What's deployed today
- ComfyUI at
http://framework:8188(compose stack:comfyui). Manual-start only; competes with kimi-linear for GPU. - Flux.1-Dev (
/srv/docker/comfyui/models/diffusion_models/flux1-dev.safetensors)- CLIPs + VAE. Workhorse text-to-image model.
- t5xxl in both fp8 and fp16 variants — templates may reference either.
See pyinfra/framework/compose/comfyui/README.md
for bring-up and model dir layout.
Hardware envelope
Flux.1-Dev was trained on 1024×1024. Memory scales quadratically with pixel count. With 128 GB unified the quality wall hits well before the memory wall:
| Resolution | Memory peak | Quality reality |
|---|---|---|
| 1024×1024 (native) | ~14 GB | Sharp, composition correct |
| 1024×1536, 1536×1024 (native ratios) | ~18 GB | Native-class — trained on these |
| 1536×1536 | ~30 GB | Soft drift in composition |
| 2048×2048 | ~50-60 GB | Memory fits; duplicated subjects, anatomy errors |
| 3072×3072 | ~120 GB | At the wall; quality degrades meaningfully |
| 4096+ | OOM territory | Don't direct-render here |
Rule of thumb: render at 1024-class, upscale to anything larger. Direct 2K+ rendering on Flux is technically possible on this box but produces visibly worse images than upscaling. For native >1K rendering, use a model trained for it (HiDream-O1 — see Model landscape).
Workflow patterns
1. Native render
The baseline. Template: Workflow → Browse Templates → Flux.1 Dev. 1024×1024, fp8 t5xxl, ~20-30 sampling steps. First generation takes ~5 min (kernel JIT); subsequent ~30-60 sec.
2. Hires fix — same seed, higher final resolution
The standard "I like the small render, give me the same thing bigger" pattern. Latent upscale + second sampler pass:
[KSampler 1024×1024 @ seed]
↓
[Upscale Latent By 2.0]
↓
[KSampler 2048×2048 @ same seed, denoise=0.4–0.6]
↓
[VAE Decode]
Denoise knob:
- 0.3–0.4 — stays very close to the small render, adds detail
- 0.5–0.6 — preserves composition, reinterprets details
- 0.7+ — composition drifts; treat as "variation at higher res"
Bundled ComfyUI templates include "Flux Hires Fix" — load that and edit the prompt/seed.
3. Image upscale + img2img refine
Cheaper than hires fix, less faithful to the original prompt. Decode the small image, run an ESRGAN-class upscaler on it, then img2img with low denoise to sharpen:
[Render 1024×1024]
↓
[Upscale Image with Model: 4x-UltraSharp or 4x_NMKD-Siax]
↓
[VAE Encode] → [KSampler denoise=0.2–0.3]
↓
[VAE Decode]
Faster than hires fix because the upscaler is a small model; the diffusion pass only refines existing detail rather than re-rendering. Best when the original render is already correct and you just want sharper output.
4. Tiled upscale (>2× scaling)
For 4K / 8K targets without OOM, process the upscaled image in
overlapping tiles. Custom node:
ComfyUI_UltimateSDUpscale.
Each tile is its own sampler pass; tile overlap blends seams. Slower
end-to-end but uses constant memory regardless of final resolution.
Reference / style conditioning
Three distinct tools, picked by what you want preserved from the reference:
| Tool | Reference role | When to use |
|---|---|---|
| Flux Redux | "In the style of this" | Single inspiration image + text prompt — start here |
| IP-Adapter Flux | "Blend these N references with weights" | Multiple inspirations, fine control over each |
| ControlNet (Flux) | "Match this structure" — edges, depth, pose | Composition matters more than style |
Flux Redux (recommended starting point)
Black Forest Labs' official "remix this image" adapter. Pass an inspiration image + a text prompt describing what you want generated; Redux blends the visual cues into the text-conditioned output.
[Load Reference Image] → [Redux Style Model]
↓
[Text Prompt] ────────→ [Conditioning combiner]
↓
[KSampler] → output
Models needed:
black-forest-labs/FLUX.1-Redux-dev→flux1-redux-dev.safetensors(gated — accept license like Flux.1-Dev)google/siglip-so400m-patch14-384→ image encoder for Redux
Drops in /srv/docker/comfyui/models/style_models/ and
/srv/docker/comfyui/models/clip_vision/.
IP-Adapter Flux
Community-built reference-conditioning adapter with per-image weight
sliders. Better when you want to blend multiple inspirations or
control intensity precisely. Custom node:
ComfyUI_IPAdapter_plus
(supports Flux as of 2025). Heavier setup than Redux.
ControlNet Flux
Geometric/structural conditioning — pass a canny edge map / depth map / pose skeleton extracted from a reference, and Flux follows that structure. Different question from "style of": this preserves composition, not look.
XLabs-AI/flux-controlnet-collections
ships Canny, Depth, HED variants.
Model landscape
| Model | Strength | When |
|---|---|---|
| Flux.1-Dev | Prompt adherence, current open SOTA on most tasks | Default; already deployed |
| Flux.1-Schnell | Distilled 4-step Flux | When you want 4× speed and can tolerate quality drop |
| HiDream-O1-Image-Dev | Pixel-native 2K, better text rendering | When native >1K matters (signs, posters, UI mockups) |
| Wan-Image-2 | Stylized / illustrative range | Specific aesthetics not Flux's strong suit |
| SDXL + community LoRAs | Mature LoRA ecosystem | When a specific LoRA exists only for SDXL |
| Flux.2 | (Newer, presumably better) | ⚠ Crashes on Strix Halo ROCm as of 2026-05; skip until upstream patch |
HiDream-O1 is the recommended next pull when you want higher native resolution. Quote from the May 2026 multimodal research brief: "native 2K, much better text rendering than Flux, beats Flux.2 on benchmarks at 1/7 the params." Image gen at >1K native.
Custom nodes worth installing
Drop these in /srv/docker/comfyui/custom_nodes/ (RW bind-mount —
survives container recreate):
| Repo | Why |
|---|---|
ComfyUI_essentials |
Already bundled in kyuz0 image. General-purpose utility nodes. |
ComfyUI-AMDGPUMonitor |
Already bundled. VRAM monitoring in the UI. |
ComfyUI-GGUF |
Already bundled. Lets you use GGUF-quantized Flux variants. |
ComfyUI_UltimateSDUpscale |
Tiled upscale for 4K+ outputs. Manual install. |
ComfyUI_IPAdapter_plus |
Multi-reference style conditioning. Manual install. |
ComfyUI-Manager |
Install/update custom nodes from inside the UI. Quality-of-life. |
Install pattern for a custom node:
cd /srv/docker/comfyui/custom_nodes
git clone https://github.com/ssitu/ComfyUI_UltimateSDUpscale
# Restart ComfyUI to pick up the new node
cd /srv/docker/comfyui && docker compose restart
Practical "starter kit" downloads
If you want to unlock the four main workflow patterns at once:
# Upscaler (for hires fix + img2img refine)
hf download Kim2091/UltraSharp \
4x-UltraSharp.pth \
--local-dir /srv/docker/comfyui/models/upscale_models
# Flux Redux for "style of these inspirations" (gated — accept license)
hf download black-forest-labs/FLUX.1-Redux-dev \
flux1-redux-dev.safetensors \
--local-dir /srv/docker/comfyui/models/style_models
# Image encoder Redux needs
hf download google/siglip-so400m-patch14-384 \
model.safetensors \
--local-dir /srv/docker/comfyui/models/clip_vision
# Optional: HiDream for native 2K rendering
# hf download HiDream-AI/HiDream-I1-Dev ... (verify path on HF)
After download, refresh the browser tab so ComfyUI rescans the dirs.
Operational notes
- First generation is slow (kernel JIT). Subsequent ones much faster.
--disable-mmapis mandatory for gfx1151 — already set in compose. Don't remove.- VAE OOMs at high res —
--bf16-vaemitigates (set). For >2K outputs, decode in tiles. - Concurrent generation — ComfyUI queues requests, so a busy workflow can chain N images on one prompt.
- Output lands in
/srv/docker/comfyui/output/as well as in the UI.
Pointers to other docs
- Bring-up + model dir layout:
pyinfra/framework/compose/comfyui/README.md - Top-level coding-workflow status:
README.md - Strategic landscape (Layer 0 image/video/audio):
Roadmap.md