232 lines
9.1 KiB
Markdown
232 lines
9.1 KiB
Markdown
|
|
# Image generation — landscape and workflows
|
|||
|
|
|
|||
|
|
State of the OSS image-gen stack as of 2026-05 on the Framework Desktop
|
|||
|
|
(Strix Halo gfx1151, 128 GB unified, ComfyUI via
|
|||
|
|
`kyuz0/amd-strix-halo-comfyui`). Practical operator guide, not a tour
|
|||
|
|
of the diffusion literature.
|
|||
|
|
|
|||
|
|
## What's deployed today
|
|||
|
|
|
|||
|
|
- **ComfyUI** at `http://framework:8188` (compose stack: `comfyui`).
|
|||
|
|
Manual-start only; competes with kimi-linear for GPU.
|
|||
|
|
- **Flux.1-Dev** (`/srv/docker/comfyui/models/diffusion_models/flux1-dev.safetensors`)
|
|||
|
|
+ CLIPs + VAE. Workhorse text-to-image model.
|
|||
|
|
- **t5xxl** in both fp8 and fp16 variants — templates may reference either.
|
|||
|
|
|
|||
|
|
See [`pyinfra/framework/compose/comfyui/README.md`](pyinfra/framework/compose/comfyui/README.md)
|
|||
|
|
for bring-up and model dir layout.
|
|||
|
|
|
|||
|
|
## Hardware envelope
|
|||
|
|
|
|||
|
|
Flux.1-Dev was trained on 1024×1024. Memory scales quadratically with
|
|||
|
|
pixel count. With 128 GB unified the **quality wall hits well before
|
|||
|
|
the memory wall**:
|
|||
|
|
|
|||
|
|
| Resolution | Memory peak | Quality reality |
|
|||
|
|
|---|---|---|
|
|||
|
|
| 1024×1024 (native) | ~14 GB | Sharp, composition correct |
|
|||
|
|
| 1024×1536, 1536×1024 (native ratios) | ~18 GB | Native-class — trained on these |
|
|||
|
|
| 1536×1536 | ~30 GB | Soft drift in composition |
|
|||
|
|
| 2048×2048 | ~50-60 GB | Memory fits; duplicated subjects, anatomy errors |
|
|||
|
|
| 3072×3072 | ~120 GB | At the wall; quality degrades meaningfully |
|
|||
|
|
| 4096+ | OOM territory | Don't direct-render here |
|
|||
|
|
|
|||
|
|
**Rule of thumb**: render at 1024-class, *upscale* to anything larger.
|
|||
|
|
Direct 2K+ rendering on Flux is technically possible on this box but
|
|||
|
|
produces visibly worse images than upscaling. For native >1K rendering,
|
|||
|
|
use a model trained for it (HiDream-O1 — see [Model landscape](#model-landscape)).
|
|||
|
|
|
|||
|
|
## Workflow patterns
|
|||
|
|
|
|||
|
|
### 1. Native render
|
|||
|
|
|
|||
|
|
The baseline. Template: **Workflow → Browse Templates → Flux.1 Dev**.
|
|||
|
|
1024×1024, fp8 t5xxl, ~20-30 sampling steps. First generation takes
|
|||
|
|
~5 min (kernel JIT); subsequent ~30-60 sec.
|
|||
|
|
|
|||
|
|
### 2. Hires fix — same seed, higher final resolution
|
|||
|
|
|
|||
|
|
The standard "I like the small render, give me the same thing bigger"
|
|||
|
|
pattern. Latent upscale + second sampler pass:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[KSampler 1024×1024 @ seed]
|
|||
|
|
↓
|
|||
|
|
[Upscale Latent By 2.0]
|
|||
|
|
↓
|
|||
|
|
[KSampler 2048×2048 @ same seed, denoise=0.4–0.6]
|
|||
|
|
↓
|
|||
|
|
[VAE Decode]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Denoise knob:
|
|||
|
|
- **0.3–0.4** — stays very close to the small render, adds detail
|
|||
|
|
- **0.5–0.6** — preserves composition, reinterprets details
|
|||
|
|
- **0.7+** — composition drifts; treat as "variation at higher res"
|
|||
|
|
|
|||
|
|
Bundled ComfyUI templates include "Flux Hires Fix" — load that and edit
|
|||
|
|
the prompt/seed.
|
|||
|
|
|
|||
|
|
### 3. Image upscale + img2img refine
|
|||
|
|
|
|||
|
|
Cheaper than hires fix, less faithful to the original prompt. Decode
|
|||
|
|
the small image, run an ESRGAN-class upscaler on it, then img2img with
|
|||
|
|
low denoise to sharpen:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[Render 1024×1024]
|
|||
|
|
↓
|
|||
|
|
[Upscale Image with Model: 4x-UltraSharp or 4x_NMKD-Siax]
|
|||
|
|
↓
|
|||
|
|
[VAE Encode] → [KSampler denoise=0.2–0.3]
|
|||
|
|
↓
|
|||
|
|
[VAE Decode]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Faster than hires fix because the upscaler is a small model; the
|
|||
|
|
diffusion pass only refines existing detail rather than re-rendering.
|
|||
|
|
Best when the original render is already correct and you just want
|
|||
|
|
sharper output.
|
|||
|
|
|
|||
|
|
### 4. Tiled upscale (>2× scaling)
|
|||
|
|
|
|||
|
|
For 4K / 8K targets without OOM, process the upscaled image in
|
|||
|
|
overlapping tiles. Custom node:
|
|||
|
|
[`ComfyUI_UltimateSDUpscale`](https://github.com/ssitu/ComfyUI_UltimateSDUpscale).
|
|||
|
|
Each tile is its own sampler pass; tile overlap blends seams. Slower
|
|||
|
|
end-to-end but uses constant memory regardless of final resolution.
|
|||
|
|
|
|||
|
|
## Reference / style conditioning
|
|||
|
|
|
|||
|
|
Three distinct tools, picked by what you want preserved from the
|
|||
|
|
reference:
|
|||
|
|
|
|||
|
|
| Tool | Reference role | When to use |
|
|||
|
|
|---|---|---|
|
|||
|
|
| **Flux Redux** | "In the style of this" | Single inspiration image + text prompt — start here |
|
|||
|
|
| **IP-Adapter Flux** | "Blend these N references with weights" | Multiple inspirations, fine control over each |
|
|||
|
|
| **ControlNet (Flux)** | "Match this structure" — edges, depth, pose | Composition matters more than style |
|
|||
|
|
|
|||
|
|
### Flux Redux (recommended starting point)
|
|||
|
|
|
|||
|
|
Black Forest Labs' official "remix this image" adapter. Pass an
|
|||
|
|
inspiration image + a text prompt describing what you want generated;
|
|||
|
|
Redux blends the visual cues into the text-conditioned output.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[Load Reference Image] → [Redux Style Model]
|
|||
|
|
↓
|
|||
|
|
[Text Prompt] ────────→ [Conditioning combiner]
|
|||
|
|
↓
|
|||
|
|
[KSampler] → output
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Models needed:
|
|||
|
|
- `black-forest-labs/FLUX.1-Redux-dev` → `flux1-redux-dev.safetensors`
|
|||
|
|
(gated — accept license like Flux.1-Dev)
|
|||
|
|
- `google/siglip-so400m-patch14-384` → image encoder for Redux
|
|||
|
|
|
|||
|
|
Drops in `/srv/docker/comfyui/models/style_models/` and
|
|||
|
|
`/srv/docker/comfyui/models/clip_vision/`.
|
|||
|
|
|
|||
|
|
### IP-Adapter Flux
|
|||
|
|
|
|||
|
|
Community-built reference-conditioning adapter with per-image weight
|
|||
|
|
sliders. Better when you want to blend multiple inspirations or
|
|||
|
|
control intensity precisely. Custom node:
|
|||
|
|
[`ComfyUI_IPAdapter_plus`](https://github.com/cubiq/ComfyUI_IPAdapter_plus)
|
|||
|
|
(supports Flux as of 2025). Heavier setup than Redux.
|
|||
|
|
|
|||
|
|
### ControlNet Flux
|
|||
|
|
|
|||
|
|
Geometric/structural conditioning — pass a canny edge map / depth map
|
|||
|
|
/ pose skeleton extracted from a reference, and Flux follows that
|
|||
|
|
structure. Different question from "style of": this preserves
|
|||
|
|
*composition*, not look.
|
|||
|
|
|
|||
|
|
[`XLabs-AI/flux-controlnet-collections`](https://huggingface.co/XLabs-AI/flux-controlnet-collections)
|
|||
|
|
ships Canny, Depth, HED variants.
|
|||
|
|
|
|||
|
|
## Model landscape
|
|||
|
|
|
|||
|
|
| Model | Strength | When |
|
|||
|
|
|---|---|---|
|
|||
|
|
| **Flux.1-Dev** | Prompt adherence, current open SOTA on most tasks | Default; already deployed |
|
|||
|
|
| **Flux.1-Schnell** | Distilled 4-step Flux | When you want 4× speed and can tolerate quality drop |
|
|||
|
|
| **HiDream-O1-Image-Dev** | Pixel-native 2K, better text rendering | When native >1K matters (signs, posters, UI mockups) |
|
|||
|
|
| **Wan-Image-2** | Stylized / illustrative range | Specific aesthetics not Flux's strong suit |
|
|||
|
|
| **SDXL** + community LoRAs | Mature LoRA ecosystem | When a specific LoRA exists only for SDXL |
|
|||
|
|
| **Flux.2** | (Newer, presumably better) | ⚠ **Crashes on Strix Halo ROCm** as of 2026-05; skip until upstream patch |
|
|||
|
|
|
|||
|
|
**HiDream-O1** is the recommended next pull when you want higher
|
|||
|
|
native resolution. Quote from the May 2026 multimodal research brief:
|
|||
|
|
"native 2K, much better text rendering than Flux, beats Flux.2 on
|
|||
|
|
benchmarks at 1/7 the params." Image gen at >1K native.
|
|||
|
|
|
|||
|
|
## Custom nodes worth installing
|
|||
|
|
|
|||
|
|
Drop these in `/srv/docker/comfyui/custom_nodes/` (RW bind-mount —
|
|||
|
|
survives container recreate):
|
|||
|
|
|
|||
|
|
| Repo | Why |
|
|||
|
|
|---|---|
|
|||
|
|
| [`ComfyUI_essentials`](https://github.com/cubiq/ComfyUI_essentials) | Already bundled in kyuz0 image. General-purpose utility nodes. |
|
|||
|
|
| [`ComfyUI-AMDGPUMonitor`](https://github.com/general/ComfyUI-AMDGPUMonitor) | Already bundled. VRAM monitoring in the UI. |
|
|||
|
|
| [`ComfyUI-GGUF`](https://github.com/city96/ComfyUI-GGUF) | Already bundled. Lets you use GGUF-quantized Flux variants. |
|
|||
|
|
| [`ComfyUI_UltimateSDUpscale`](https://github.com/ssitu/ComfyUI_UltimateSDUpscale) | Tiled upscale for 4K+ outputs. Manual install. |
|
|||
|
|
| [`ComfyUI_IPAdapter_plus`](https://github.com/cubiq/ComfyUI_IPAdapter_plus) | Multi-reference style conditioning. Manual install. |
|
|||
|
|
| [`ComfyUI-Manager`](https://github.com/ltdrdata/ComfyUI-Manager) | Install/update custom nodes from inside the UI. Quality-of-life. |
|
|||
|
|
|
|||
|
|
Install pattern for a custom node:
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
cd /srv/docker/comfyui/custom_nodes
|
|||
|
|
git clone https://github.com/ssitu/ComfyUI_UltimateSDUpscale
|
|||
|
|
# Restart ComfyUI to pick up the new node
|
|||
|
|
cd /srv/docker/comfyui && docker compose restart
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Practical "starter kit" downloads
|
|||
|
|
|
|||
|
|
If you want to unlock the four main workflow patterns at once:
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
# Upscaler (for hires fix + img2img refine)
|
|||
|
|
hf download Kim2091/UltraSharp \
|
|||
|
|
4x-UltraSharp.pth \
|
|||
|
|
--local-dir /srv/docker/comfyui/models/upscale_models
|
|||
|
|
|
|||
|
|
# Flux Redux for "style of these inspirations" (gated — accept license)
|
|||
|
|
hf download black-forest-labs/FLUX.1-Redux-dev \
|
|||
|
|
flux1-redux-dev.safetensors \
|
|||
|
|
--local-dir /srv/docker/comfyui/models/style_models
|
|||
|
|
|
|||
|
|
# Image encoder Redux needs
|
|||
|
|
hf download google/siglip-so400m-patch14-384 \
|
|||
|
|
model.safetensors \
|
|||
|
|
--local-dir /srv/docker/comfyui/models/clip_vision
|
|||
|
|
|
|||
|
|
# Optional: HiDream for native 2K rendering
|
|||
|
|
# hf download HiDream-AI/HiDream-I1-Dev ... (verify path on HF)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
After download, refresh the browser tab so ComfyUI rescans the dirs.
|
|||
|
|
|
|||
|
|
## Operational notes
|
|||
|
|
|
|||
|
|
- **First generation is slow** (kernel JIT). Subsequent ones much
|
|||
|
|
faster.
|
|||
|
|
- **`--disable-mmap` is mandatory** for gfx1151 — already set in
|
|||
|
|
compose. Don't remove.
|
|||
|
|
- **VAE OOMs at high res** — `--bf16-vae` mitigates (set). For >2K
|
|||
|
|
outputs, decode in tiles.
|
|||
|
|
- **Concurrent generation** — ComfyUI queues requests, so a busy
|
|||
|
|
workflow can chain N images on one prompt.
|
|||
|
|
- **Output lands in `/srv/docker/comfyui/output/`** as well as in the UI.
|
|||
|
|
|
|||
|
|
## Pointers to other docs
|
|||
|
|
|
|||
|
|
- Bring-up + model dir layout: [`pyinfra/framework/compose/comfyui/README.md`](pyinfra/framework/compose/comfyui/README.md)
|
|||
|
|
- Top-level coding-workflow status: [`README.md`](README.md)
|
|||
|
|
- Strategic landscape (Layer 0 image/video/audio): [`Roadmap.md`](Roadmap.md)
|