progress 235b
This commit is contained in:
121
pyinfra/framework/compose/comfyui/README.md
Normal file
121
pyinfra/framework/compose/comfyui/README.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# comfyui
|
||||
|
||||
ComfyUI for image generation on the Strix Halo box via
|
||||
`kyuz0/amd-strix-halo-comfyui` (battle-tested gfx1151 toolbox). Web UI
|
||||
at `http://framework:8188`. **Not** auto-started — bring up manually so
|
||||
it doesn't contend with kimi-linear for GPU.
|
||||
|
||||
## Coexistence notes (read first)
|
||||
|
||||
ComfyUI competes for GPU memory with the always-resident kimi-linear
|
||||
(vLLM) and on-demand ollama. The stack reflects this:
|
||||
|
||||
- `restart: "no"` — won't come back on box reboot. You start it.
|
||||
- Stop kimi-linear before heavy ComfyUI work, or accept slower swap.
|
||||
- Use case order: `docker compose up -d` here → use UI → `docker compose down`.
|
||||
|
||||
## Prereqs
|
||||
|
||||
- Pyinfra deploy has run (creates `/srv/docker/comfyui/{models,output,
|
||||
custom_nodes,workflows}` with the right perms).
|
||||
- BIOS UMA at 0.5 GB + ttm.pages_limit cmdline active (same recipe as
|
||||
kimi-linear). Verify with `cat /proc/cmdline | grep ttm.pages_limit`.
|
||||
|
||||
## Bring up
|
||||
|
||||
```sh
|
||||
cd /srv/docker/comfyui
|
||||
docker compose pull # ~8-12 GB image
|
||||
docker compose up -d
|
||||
docker compose logs -f # wait for "To see the GUI go to: http://0.0.0.0:8188"
|
||||
./smoke.sh
|
||||
```
|
||||
|
||||
Open `http://framework:8188` in a browser. The blank workflow loads.
|
||||
Empty without models — see next section.
|
||||
|
||||
## Adding Flux.1-Dev (CF-P1)
|
||||
|
||||
Flux.1-Dev is gated on HF — accept the license at
|
||||
<https://huggingface.co/black-forest-labs/FLUX.1-dev> first, set
|
||||
`HF_TOKEN` in your shell.
|
||||
|
||||
```sh
|
||||
export HF_TOKEN=...
|
||||
|
||||
# Diffusion model (UNet)
|
||||
hf download black-forest-labs/FLUX.1-dev \
|
||||
flux1-dev.safetensors \
|
||||
--local-dir /srv/docker/comfyui/models/diffusion_models
|
||||
|
||||
# VAE
|
||||
hf download black-forest-labs/FLUX.1-dev \
|
||||
ae.safetensors \
|
||||
--local-dir /srv/docker/comfyui/models/vae
|
||||
|
||||
# Text encoders (CLIP-L + T5XXL)
|
||||
hf download comfyanonymous/flux_text_encoders \
|
||||
clip_l.safetensors t5xxl_fp8_e4m3fn.safetensors \
|
||||
--local-dir /srv/docker/comfyui/models/text_encoders
|
||||
```
|
||||
|
||||
fp8 t5xxl (~5 GB) over fp16 (~9 GB) — generally indistinguishable
|
||||
quality, much faster. Refresh the UI; the canonical Flux txt2img
|
||||
workflow is at File → Load → flux-txt2img.
|
||||
|
||||
## Model dir layout
|
||||
|
||||
```
|
||||
/srv/docker/comfyui/models/
|
||||
├── checkpoints/ # full pipeline checkpoints (SDXL, SD1.5)
|
||||
├── diffusion_models/ # standalone UNet/transformer (Flux, HiDream, etc.)
|
||||
├── vae/ # VAEs
|
||||
├── text_encoders/ # CLIP-L, T5XXL, etc.
|
||||
├── loras/
|
||||
├── controlnet/
|
||||
├── clip_vision/
|
||||
└── upscale_models/
|
||||
```
|
||||
|
||||
The image's `extra_model_paths.yaml` maps these into ComfyUI's load
|
||||
paths automatically.
|
||||
|
||||
## Operations
|
||||
|
||||
```sh
|
||||
docker compose logs -f # tail
|
||||
docker compose restart comfyui # reload
|
||||
docker compose down # stop
|
||||
docker compose exec comfyui bash # shell in
|
||||
amdgpu_top # GPU view on host
|
||||
./smoke.sh # health check
|
||||
```
|
||||
|
||||
## Pin manifest
|
||||
|
||||
| Component | Pin |
|
||||
|---|---|
|
||||
| Image | `kyuz0/amd-strix-halo-comfyui:20260213-143435` (sha-7242b4d) |
|
||||
| Default port | 8188 (host) → 8188 (container) |
|
||||
| Flux text encoder quant | fp8_e4m3fn (override to fp16 if you observe quality loss) |
|
||||
|
||||
Bump the image pin deliberately after re-validating that Flux + any
|
||||
custom nodes still work.
|
||||
|
||||
## Known caveats
|
||||
|
||||
- **`--disable-mmap` is mandatory** (kyuz0 README: "mmap above 64 GB is
|
||||
currently very slow due to a ROCm issue"). Already set in compose.
|
||||
- **No `controlnet/` subdir created by `set_extra_paths.sh`** — we add
|
||||
it manually via pyinfra. If kyuz0 adds it to their script, our
|
||||
pyinfra dir creation is harmless idempotent overhead.
|
||||
- **Custom nodes can write to `/opt/ComfyUI/custom_nodes`** — that dir
|
||||
is bind-mounted RW so installs persist across container recreates.
|
||||
- **Flux.2 crashes on Strix Halo ROCm** as of 2026-05; stick with
|
||||
Flux.1-Dev or HiDream-O1 until upstream patch lands.
|
||||
|
||||
## Status
|
||||
|
||||
CF-P0 in progress (this commit). CF-P1 (Flux first image), CF-P2
|
||||
(productionize), CF-P3 (workflow library + OpenWebUI hook), CF-P4
|
||||
(HiDream, LTX-2, LoRAs) — see top-level task list.
|
||||
Reference in New Issue
Block a user