Files
localgenai/ImageGen.md
2026-06-08 15:31:50 +01:00

232 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Image generation — landscape and workflows
State of the OSS image-gen stack as of 2026-05 on the Framework Desktop
(Strix Halo gfx1151, 128 GB unified, ComfyUI via
`kyuz0/amd-strix-halo-comfyui`). Practical operator guide, not a tour
of the diffusion literature.
## What's deployed today
- **ComfyUI** at `http://framework:8188` (compose stack: `comfyui`).
Manual-start only; competes with kimi-linear for GPU.
- **Flux.1-Dev** (`/srv/docker/comfyui/models/diffusion_models/flux1-dev.safetensors`)
+ CLIPs + VAE. Workhorse text-to-image model.
- **t5xxl** in both fp8 and fp16 variants — templates may reference either.
See [`pyinfra/framework/compose/comfyui/README.md`](pyinfra/framework/compose/comfyui/README.md)
for bring-up and model dir layout.
## Hardware envelope
Flux.1-Dev was trained on 1024×1024. Memory scales quadratically with
pixel count. With 128 GB unified the **quality wall hits well before
the memory wall**:
| Resolution | Memory peak | Quality reality |
|---|---|---|
| 1024×1024 (native) | ~14 GB | Sharp, composition correct |
| 1024×1536, 1536×1024 (native ratios) | ~18 GB | Native-class — trained on these |
| 1536×1536 | ~30 GB | Soft drift in composition |
| 2048×2048 | ~50-60 GB | Memory fits; duplicated subjects, anatomy errors |
| 3072×3072 | ~120 GB | At the wall; quality degrades meaningfully |
| 4096+ | OOM territory | Don't direct-render here |
**Rule of thumb**: render at 1024-class, *upscale* to anything larger.
Direct 2K+ rendering on Flux is technically possible on this box but
produces visibly worse images than upscaling. For native >1K rendering,
use a model trained for it (HiDream-O1 — see [Model landscape](#model-landscape)).
## Workflow patterns
### 1. Native render
The baseline. Template: **Workflow → Browse Templates → Flux.1 Dev**.
1024×1024, fp8 t5xxl, ~20-30 sampling steps. First generation takes
~5 min (kernel JIT); subsequent ~30-60 sec.
### 2. Hires fix — same seed, higher final resolution
The standard "I like the small render, give me the same thing bigger"
pattern. Latent upscale + second sampler pass:
```
[KSampler 1024×1024 @ seed]
[Upscale Latent By 2.0]
[KSampler 2048×2048 @ same seed, denoise=0.40.6]
[VAE Decode]
```
Denoise knob:
- **0.30.4** — stays very close to the small render, adds detail
- **0.50.6** — preserves composition, reinterprets details
- **0.7+** — composition drifts; treat as "variation at higher res"
Bundled ComfyUI templates include "Flux Hires Fix" — load that and edit
the prompt/seed.
### 3. Image upscale + img2img refine
Cheaper than hires fix, less faithful to the original prompt. Decode
the small image, run an ESRGAN-class upscaler on it, then img2img with
low denoise to sharpen:
```
[Render 1024×1024]
[Upscale Image with Model: 4x-UltraSharp or 4x_NMKD-Siax]
[VAE Encode] → [KSampler denoise=0.20.3]
[VAE Decode]
```
Faster than hires fix because the upscaler is a small model; the
diffusion pass only refines existing detail rather than re-rendering.
Best when the original render is already correct and you just want
sharper output.
### 4. Tiled upscale (>2× scaling)
For 4K / 8K targets without OOM, process the upscaled image in
overlapping tiles. Custom node:
[`ComfyUI_UltimateSDUpscale`](https://github.com/ssitu/ComfyUI_UltimateSDUpscale).
Each tile is its own sampler pass; tile overlap blends seams. Slower
end-to-end but uses constant memory regardless of final resolution.
## Reference / style conditioning
Three distinct tools, picked by what you want preserved from the
reference:
| Tool | Reference role | When to use |
|---|---|---|
| **Flux Redux** | "In the style of this" | Single inspiration image + text prompt — start here |
| **IP-Adapter Flux** | "Blend these N references with weights" | Multiple inspirations, fine control over each |
| **ControlNet (Flux)** | "Match this structure" — edges, depth, pose | Composition matters more than style |
### Flux Redux (recommended starting point)
Black Forest Labs' official "remix this image" adapter. Pass an
inspiration image + a text prompt describing what you want generated;
Redux blends the visual cues into the text-conditioned output.
```
[Load Reference Image] → [Redux Style Model]
[Text Prompt] ────────→ [Conditioning combiner]
[KSampler] → output
```
Models needed:
- `black-forest-labs/FLUX.1-Redux-dev``flux1-redux-dev.safetensors`
(gated — accept license like Flux.1-Dev)
- `google/siglip-so400m-patch14-384` → image encoder for Redux
Drops in `/srv/docker/comfyui/models/style_models/` and
`/srv/docker/comfyui/models/clip_vision/`.
### IP-Adapter Flux
Community-built reference-conditioning adapter with per-image weight
sliders. Better when you want to blend multiple inspirations or
control intensity precisely. Custom node:
[`ComfyUI_IPAdapter_plus`](https://github.com/cubiq/ComfyUI_IPAdapter_plus)
(supports Flux as of 2025). Heavier setup than Redux.
### ControlNet Flux
Geometric/structural conditioning — pass a canny edge map / depth map
/ pose skeleton extracted from a reference, and Flux follows that
structure. Different question from "style of": this preserves
*composition*, not look.
[`XLabs-AI/flux-controlnet-collections`](https://huggingface.co/XLabs-AI/flux-controlnet-collections)
ships Canny, Depth, HED variants.
## Model landscape
| Model | Strength | When |
|---|---|---|
| **Flux.1-Dev** | Prompt adherence, current open SOTA on most tasks | Default; already deployed |
| **Flux.1-Schnell** | Distilled 4-step Flux | When you want 4× speed and can tolerate quality drop |
| **HiDream-O1-Image-Dev** | Pixel-native 2K, better text rendering | When native >1K matters (signs, posters, UI mockups) |
| **Wan-Image-2** | Stylized / illustrative range | Specific aesthetics not Flux's strong suit |
| **SDXL** + community LoRAs | Mature LoRA ecosystem | When a specific LoRA exists only for SDXL |
| **Flux.2** | (Newer, presumably better) | ⚠ **Crashes on Strix Halo ROCm** as of 2026-05; skip until upstream patch |
**HiDream-O1** is the recommended next pull when you want higher
native resolution. Quote from the May 2026 multimodal research brief:
"native 2K, much better text rendering than Flux, beats Flux.2 on
benchmarks at 1/7 the params." Image gen at >1K native.
## Custom nodes worth installing
Drop these in `/srv/docker/comfyui/custom_nodes/` (RW bind-mount —
survives container recreate):
| Repo | Why |
|---|---|
| [`ComfyUI_essentials`](https://github.com/cubiq/ComfyUI_essentials) | Already bundled in kyuz0 image. General-purpose utility nodes. |
| [`ComfyUI-AMDGPUMonitor`](https://github.com/general/ComfyUI-AMDGPUMonitor) | Already bundled. VRAM monitoring in the UI. |
| [`ComfyUI-GGUF`](https://github.com/city96/ComfyUI-GGUF) | Already bundled. Lets you use GGUF-quantized Flux variants. |
| [`ComfyUI_UltimateSDUpscale`](https://github.com/ssitu/ComfyUI_UltimateSDUpscale) | Tiled upscale for 4K+ outputs. Manual install. |
| [`ComfyUI_IPAdapter_plus`](https://github.com/cubiq/ComfyUI_IPAdapter_plus) | Multi-reference style conditioning. Manual install. |
| [`ComfyUI-Manager`](https://github.com/ltdrdata/ComfyUI-Manager) | Install/update custom nodes from inside the UI. Quality-of-life. |
Install pattern for a custom node:
```sh
cd /srv/docker/comfyui/custom_nodes
git clone https://github.com/ssitu/ComfyUI_UltimateSDUpscale
# Restart ComfyUI to pick up the new node
cd /srv/docker/comfyui && docker compose restart
```
## Practical "starter kit" downloads
If you want to unlock the four main workflow patterns at once:
```sh
# Upscaler (for hires fix + img2img refine)
hf download Kim2091/UltraSharp \
4x-UltraSharp.pth \
--local-dir /srv/docker/comfyui/models/upscale_models
# Flux Redux for "style of these inspirations" (gated — accept license)
hf download black-forest-labs/FLUX.1-Redux-dev \
flux1-redux-dev.safetensors \
--local-dir /srv/docker/comfyui/models/style_models
# Image encoder Redux needs
hf download google/siglip-so400m-patch14-384 \
model.safetensors \
--local-dir /srv/docker/comfyui/models/clip_vision
# Optional: HiDream for native 2K rendering
# hf download HiDream-AI/HiDream-I1-Dev ... (verify path on HF)
```
After download, refresh the browser tab so ComfyUI rescans the dirs.
## Operational notes
- **First generation is slow** (kernel JIT). Subsequent ones much
faster.
- **`--disable-mmap` is mandatory** for gfx1151 — already set in
compose. Don't remove.
- **VAE OOMs at high res** — `--bf16-vae` mitigates (set). For >2K
outputs, decode in tiles.
- **Concurrent generation** — ComfyUI queues requests, so a busy
workflow can chain N images on one prompt.
- **Output lands in `/srv/docker/comfyui/output/`** as well as in the UI.
## Pointers to other docs
- Bring-up + model dir layout: [`pyinfra/framework/compose/comfyui/README.md`](pyinfra/framework/compose/comfyui/README.md)
- Top-level coding-workflow status: [`README.md`](README.md)
- Strategic landscape (Layer 0 image/video/audio): [`Roadmap.md`](Roadmap.md)