progress 235b

2026-06-08 15:31:50 +01:00
parent a29793032d
commit de1635872f
25 changed files with 1598 additions and 53 deletions
--- a/README.md
+++ b/README.md
@@ -31,6 +31,7 @@ Tailscale. Coding agents, monitoring, voice — all self-hosted.
 | [`TODO.md`](TODO.md) | Open questions and follow-ups. |
 | [`testing/qwen3-coder-30b/`](testing/qwen3-coder-30b/) | Small evaluation harness for Qwen3-Coder. |
 | [`VoiceModels.md`](VoiceModels.md) | STT / TTS landscape and upgrade paths from the Wyoming defaults. |
+| [`ImageGen.md`](ImageGen.md) | Image generation workflows on the ComfyUI stack — resolution envelope, hires fix, Redux/IP-Adapter for reference-driven generation. |

 ## Service ports (on `framework`)

@@ -42,11 +43,14 @@ Tailscale. Coding agents, monitoring, voice — all self-hosted.
 | `11434` | Ollama | ROCm with `HSA_OVERRIDE_GFX_VERSION=11.0.0` |
 | `3000` | OpenWebUI | ChatGPT-style UI in front of Ollama |
 | `3001` | OpenLIT | LLM fleet metrics dashboard |
+| `4000` | LiteLLM | OpenAI-compatible router in front of all local backends — `Bearer $LITELLM_MASTER_KEY` |
 | `3030` | OpenHands | Autonomous agent + sandbox runtime — Tailscale-only by design |
 | `4317` | Phoenix OTLP/gRPC | Trace ingestion |
 | `6006` | Phoenix UI / OTLP/HTTP | Per-trace agent waterfall (also `:6006/v1/traces`) |
 | `8001` | faster-whisper | STT (OpenAI API) — large-v3-turbo, for OpenWebUI/Conduit |
 | `8090` | Beszel | Host + container + AMD GPU dashboard |
+| `8081` | llama.cpp (Qwen3-235B) | Long-task model, ~5-10 tok/s — manual start only (can't coexist with 30B/Kimi/ComfyUI) |
+| `8188` | ComfyUI | Image generation web UI — manual start only (GPU contention) |
 | `8880` | Kokoro | TTS (OpenAI API) — Kokoro-82M, for OpenWebUI/Conduit |
 | `10200` | Piper | TTS (Wyoming protocol) — for Home Assistant Assist |
 | `10300` | Whisper | STT (Wyoming protocol) — for Home Assistant Assist |
@@ -80,10 +84,12 @@ Send a prompt; watch it land in Phoenix at <http://framework:6006>.
 | **In flight**: oc-tree TUI sidecar | M0 skeleton done, awaiting probe | Live tree view of opencode's SSE event stream; tmux-side companion to Phoenix's post-hoc traces. See [`oc-tree/NEXT_STEPS.md`](oc-tree/NEXT_STEPS.md). |
 | **In flight**: Kimi-Linear context ramp | P0 done at 32K, BIOS+GTT recipe applied | Next: ramp `--max-model-len` toward 1M and benchmark. See [`kimi-linear/NEXT_STEPS.md`](kimi-linear/NEXT_STEPS.md). |
 | **Configured but not deployed**: VS Code Continue | YAML ready | Drop [`vscode-continue-config.yml`](vscode-continue-config.yml) into `~/.continue/config.yaml`; `ollama pull nomic-embed-text` on the box for `@codebase` indexing. |
-| **Planned next**: ComfyUI for image generation | Phase plan written | Flux.1-Dev via kyuz0 toolbox, same pyinfra pattern as kimi-linear. See task list (CF-P0..P4). |
-| **Roadmap-only** (not started): Aider, Cline, OpenHands, LiteLLM, Multica | — | See `Roadmap.md`. |
+| **In flight**: ComfyUI for image generation | CF-P0 — artifacts written, awaiting box-side bring-up | Flux.1-Dev via `kyuz0/amd-strix-halo-comfyui`. Manual start (`restart: "no"`) so it doesn't contend with kimi-linear at boot. See [`pyinfra/framework/compose/comfyui/README.md`](pyinfra/framework/compose/comfyui/README.md). |
+| **In flight**: Qwen3-235B long-task model | M0 — artifacts written, awaiting weight pull + bring-up | Unsloth UD-Q2_K_XL (~88.8 GB) via `kyuz0:rocm-7.2.2`. The "overnight" model at ~5-10 tok/s — can't coexist with the 30B/Kimi/ComfyUI. See [`pyinfra/framework/compose/qwen3-235b/README.md`](pyinfra/framework/compose/qwen3-235b/README.md). Opencode wired via direct `framework-llama` provider. |
+| **In flight**: LiteLLM router | M1 — artifacts written, awaiting box-side bring-up | Single OpenAI-compatible endpoint (port 4000) in front of all 4 local backends; routing table in [`pyinfra/framework/compose/litellm/config.yaml`](pyinfra/framework/compose/litellm/config.yaml). Opencode stays direct; OpenHands + future orchestrator point here. |
+| **Roadmap-only** (not started): Aider, Cline, OpenHands integration, Multica | — | See `Roadmap.md`. |

-**Single-line summary**: opencode + Qwen3-Coder + the MCP toolbox is the working coding harness; Kimi-Linear is alongside for long-context chat; oc-tree and ComfyUI are the next two infra additions.
+**Single-line summary**: opencode + Qwen3-Coder + the MCP toolbox is the working coding harness; Kimi-Linear is alongside for long-context chat; Qwen3-235B is being added as the overnight long-task model, fronted by a LiteLLM router for the future orchestrator; oc-tree and ComfyUI are the next two infra additions.

 ## Why this stack