Files

noisedestroyers de1635872f progress 235b

2026-06-08 15:31:50 +01:00

9.1 KiB

Raw Blame History

Image generation — landscape and workflows

State of the OSS image-gen stack as of 2026-05 on the Framework Desktop (Strix Halo gfx1151, 128 GB unified, ComfyUI via kyuz0/amd-strix-halo-comfyui). Practical operator guide, not a tour of the diffusion literature.

What's deployed today

ComfyUI at http://framework:8188 (compose stack: comfyui). Manual-start only; competes with kimi-linear for GPU.
Flux.1-Dev (/srv/docker/comfyui/models/diffusion_models/flux1-dev.safetensors)
- CLIPs + VAE. Workhorse text-to-image model.
t5xxl in both fp8 and fp16 variants — templates may reference either.

See pyinfra/framework/compose/comfyui/README.md for bring-up and model dir layout.

Hardware envelope

Flux.1-Dev was trained on 1024×1024. Memory scales quadratically with pixel count. With 128 GB unified the quality wall hits well before the memory wall:

Resolution	Memory peak	Quality reality
1024×1024 (native)	~14 GB	Sharp, composition correct
1024×1536, 1536×1024 (native ratios)	~18 GB	Native-class — trained on these
1536×1536	~30 GB	Soft drift in composition
2048×2048	~50-60 GB	Memory fits; duplicated subjects, anatomy errors
3072×3072	~120 GB	At the wall; quality degrades meaningfully
4096+	OOM territory	Don't direct-render here

Rule of thumb: render at 1024-class, upscale to anything larger. Direct 2K+ rendering on Flux is technically possible on this box but produces visibly worse images than upscaling. For native >1K rendering, use a model trained for it (HiDream-O1 — see Model landscape).

Workflow patterns

1. Native render

The baseline. Template: Workflow → Browse Templates → Flux.1 Dev. 1024×1024, fp8 t5xxl, ~20-30 sampling steps. First generation takes ~5 min (kernel JIT); subsequent ~30-60 sec.

2. Hires fix — same seed, higher final resolution

The standard "I like the small render, give me the same thing bigger" pattern. Latent upscale + second sampler pass:

[KSampler  1024×1024 @ seed]
        ↓
[Upscale Latent By 2.0]
        ↓
[KSampler 2048×2048 @ same seed, denoise=0.4–0.6]
        ↓
[VAE Decode]

Denoise knob:

0.3–0.4 — stays very close to the small render, adds detail
0.5–0.6 — preserves composition, reinterprets details
0.7+ — composition drifts; treat as "variation at higher res"

Bundled ComfyUI templates include "Flux Hires Fix" — load that and edit the prompt/seed.

3. Image upscale + img2img refine

Cheaper than hires fix, less faithful to the original prompt. Decode the small image, run an ESRGAN-class upscaler on it, then img2img with low denoise to sharpen:

[Render 1024×1024]
       ↓
[Upscale Image with Model: 4x-UltraSharp or 4x_NMKD-Siax]
       ↓
[VAE Encode] → [KSampler denoise=0.2–0.3]
       ↓
[VAE Decode]

Faster than hires fix because the upscaler is a small model; the diffusion pass only refines existing detail rather than re-rendering. Best when the original render is already correct and you just want sharper output.

4. Tiled upscale (>2× scaling)

For 4K / 8K targets without OOM, process the upscaled image in overlapping tiles. Custom node: ComfyUI_UltimateSDUpscale. Each tile is its own sampler pass; tile overlap blends seams. Slower end-to-end but uses constant memory regardless of final resolution.

Reference / style conditioning

Three distinct tools, picked by what you want preserved from the reference:

Tool	Reference role	When to use
Flux Redux	"In the style of this"	Single inspiration image + text prompt — start here
IP-Adapter Flux	"Blend these N references with weights"	Multiple inspirations, fine control over each
ControlNet (Flux)	"Match this structure" — edges, depth, pose	Composition matters more than style

Flux Redux (recommended starting point)

Black Forest Labs' official "remix this image" adapter. Pass an inspiration image + a text prompt describing what you want generated; Redux blends the visual cues into the text-conditioned output.

[Load Reference Image] → [Redux Style Model]
                                ↓
[Text Prompt] ────────→ [Conditioning combiner]
                                ↓
                          [KSampler] → output

Models needed:

black-forest-labs/FLUX.1-Redux-dev → flux1-redux-dev.safetensors (gated — accept license like Flux.1-Dev)
google/siglip-so400m-patch14-384 → image encoder for Redux

Drops in /srv/docker/comfyui/models/style_models/ and /srv/docker/comfyui/models/clip_vision/.

IP-Adapter Flux

Community-built reference-conditioning adapter with per-image weight sliders. Better when you want to blend multiple inspirations or control intensity precisely. Custom node: ComfyUI_IPAdapter_plus (supports Flux as of 2025). Heavier setup than Redux.

ControlNet Flux

Geometric/structural conditioning — pass a canny edge map / depth map / pose skeleton extracted from a reference, and Flux follows that structure. Different question from "style of": this preserves composition, not look.

XLabs-AI/flux-controlnet-collections ships Canny, Depth, HED variants.

Model landscape

Model	Strength	When
Flux.1-Dev	Prompt adherence, current open SOTA on most tasks	Default; already deployed
Flux.1-Schnell	Distilled 4-step Flux	When you want 4× speed and can tolerate quality drop
HiDream-O1-Image-Dev	Pixel-native 2K, better text rendering	When native >1K matters (signs, posters, UI mockups)
Wan-Image-2	Stylized / illustrative range	Specific aesthetics not Flux's strong suit
SDXL + community LoRAs	Mature LoRA ecosystem	When a specific LoRA exists only for SDXL
Flux.2	(Newer, presumably better)	⚠ Crashes on Strix Halo ROCm as of 2026-05; skip until upstream patch

HiDream-O1 is the recommended next pull when you want higher native resolution. Quote from the May 2026 multimodal research brief: "native 2K, much better text rendering than Flux, beats Flux.2 on benchmarks at 1/7 the params." Image gen at >1K native.

Custom nodes worth installing

Drop these in /srv/docker/comfyui/custom_nodes/ (RW bind-mount — survives container recreate):

Repo	Why
`ComfyUI_essentials`	Already bundled in kyuz0 image. General-purpose utility nodes.
`ComfyUI-AMDGPUMonitor`	Already bundled. VRAM monitoring in the UI.
`ComfyUI-GGUF`	Already bundled. Lets you use GGUF-quantized Flux variants.
`ComfyUI_UltimateSDUpscale`	Tiled upscale for 4K+ outputs. Manual install.
`ComfyUI_IPAdapter_plus`	Multi-reference style conditioning. Manual install.
`ComfyUI-Manager`	Install/update custom nodes from inside the UI. Quality-of-life.

Install pattern for a custom node:

cd /srv/docker/comfyui/custom_nodes
git clone https://github.com/ssitu/ComfyUI_UltimateSDUpscale
# Restart ComfyUI to pick up the new node
cd /srv/docker/comfyui && docker compose restart

Practical "starter kit" downloads

If you want to unlock the four main workflow patterns at once:

# Upscaler (for hires fix + img2img refine)
hf download Kim2091/UltraSharp \
    4x-UltraSharp.pth \
    --local-dir /srv/docker/comfyui/models/upscale_models

# Flux Redux for "style of these inspirations" (gated — accept license)
hf download black-forest-labs/FLUX.1-Redux-dev \
    flux1-redux-dev.safetensors \
    --local-dir /srv/docker/comfyui/models/style_models

# Image encoder Redux needs
hf download google/siglip-so400m-patch14-384 \
    model.safetensors \
    --local-dir /srv/docker/comfyui/models/clip_vision

# Optional: HiDream for native 2K rendering
# hf download HiDream-AI/HiDream-I1-Dev ... (verify path on HF)

After download, refresh the browser tab so ComfyUI rescans the dirs.

Operational notes

First generation is slow (kernel JIT). Subsequent ones much faster.
--disable-mmap is mandatory for gfx1151 — already set in compose. Don't remove.
VAE OOMs at high res — --bf16-vae mitigates (set). For >2K outputs, decode in tiles.
Concurrent generation — ComfyUI queues requests, so a busy workflow can chain N images on one prompt.
Output lands in /srv/docker/comfyui/output/ as well as in the UI.

Pointers to other docs

Bring-up + model dir layout: pyinfra/framework/compose/comfyui/README.md
Top-level coding-workflow status: README.md
Strategic landscape (Layer 0 image/video/audio): Roadmap.md

9.1 KiB Raw Blame History Unescape Escape