Document current coding-workflow stack state

Snapshot of where opencode + Qwen3-Coder + MCPs + Kimi-Linear + voice + Phoenix tracing land today, plus in-flight (oc-tree, kimi-linear context ramp) and next (ComfyUI) items with pointers to per-project NEXT_STEPS.md guides.
2026-05-10 21:14:43 -04:00
parent 228fe8d1ac
commit a29793032d
35 changed files with 2067 additions and 37 deletions
--- a/pyinfra/framework/compose/kimi-linear/README.md
+++ b/pyinfra/framework/compose/kimi-linear/README.md
@@ -0,0 +1,124 @@
+# kimi-linear
+
+Kimi-Linear-48B-A3B-Instruct on vLLM, ROCm/TheRock 7.x, gfx1151. Sits
+beside Ollama (port 11434, Qwen3-Coder) on port 8000. OpenAI-compatible.
+
+This is the **P0 verification stage** — no public Strix Halo numbers
+exist for this model as of 2026-05. Three things are unverified until a
+first generation succeeds: KDA Triton kernel on gfx1151,
+compressed-tensors loader on ROCm, and AITER + Kimi MoE topology.
+Smoke-test below confirms all three at once.
+
+## Prereqs
+
+- Pyinfra deploy has run (`./run.sh` from `pyinfra/framework/`) — gives
+  you `/srv/docker/kimi-linear/`, GPU group membership, `/models/`
+  layout, and `huggingface-cli` on the box.
+- Hugging Face CLI authenticated (`huggingface-cli login`) if the
+  weights repo gates downloads. cyankiwi's repo is currently public.
+
+## Step 1 — Download weights
+
+```sh
+huggingface-cli download \
+    cyankiwi/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit \
+    --local-dir /models/moonshotai/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit
+```
+
+~35 GB. The repo is named `AWQ-4bit` but the actual format is
+`compressed-tensors` int4 group-quantized — see `config.json`.
+
+## Step 2 — Try the upstream image first
+
+```sh
+cd /srv/docker/kimi-linear
+docker compose pull       # ~8.5 GB
+docker compose up -d
+docker compose logs -f
+```
+
+Watch for one of three things:
+
+- **Loads cleanly, model serves on :8000** → P0 passes. Run `./smoke.sh`.
+- **`MLAModules.__init__() missing 'indexer_rotary_emb'`** → upstream
+  image is on vLLM 0.12.x; need the v0.11.2 source build. Skip to
+  Step 3.
+- **KDA / Triton / fla-core compile error** → kernel doesn't work on
+  gfx1151 yet. Fall back path: llama.cpp ROCm + bartowski Q4_K_M GGUF
+  in `compose/llama.yml`. Document the error in
+  `localgenai/kimi-linear/NOTES.md` and stop.
+
+## Step 3 — Source build (if needed)
+
+```sh
+cd /srv/docker/kimi-linear
+tmux new -s kimi-build
+./build.sh        # multi-hour. Detach with C-b d; reattach with `tmux a -t kimi-build`
+```
+
+Builds `kimi-linear-local:v0.11.2` from kyuz0 SHA `e2288d6` with
+`VLLM_COMMIT=v0.11.2`. Then edit `docker-compose.yml`:
+
+```yaml
+    image: kimi-linear-local:v0.11.2
+```
+
+…and `docker compose up -d` again.
+
+## Step 4 — Smoke test
+
+```sh
+./smoke.sh
+```
+
+Expects: `/v1/models` returns `kimi-linear`; a four-token generation
+returns "ok". If both pass, **P0 is done**. Update task #6 and proceed
+to P1.
+
+## Operations
+
+```sh
+docker compose logs -f kimi-linear         # tail
+docker compose restart kimi-linear         # reload
+docker compose down                        # stop
+docker compose exec kimi-linear bash       # shell in
+amdgpu_top                                 # on host: GPU power, mem, util
+```
+
+## Pin manifest
+
+| Component                   | Pin                                |
+| --------------------------- | ---------------------------------- |
+| kyuz0 toolbox               | commit `e2288d6` (2026-04-22)      |
+| vLLM                        | tag `v0.11.2` (Moonshot recipe)    |
+| Image (default)             | `kyuz0/vllm-therock-gfx1151:stable`|
+| Image (pinned, if built)    | `kimi-linear-local:v0.11.2`        |
+| Weights                     | `cyankiwi/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit` (compressed-tensors int4) |
+| ROCm                        | TheRock nightlies via kyuz0 base   |
+| Python                      | 3.12 (hardcoded in kyuz0 Dockerfile) |
+
+Bump policy: don't move vLLM to 0.12.x; don't move kyuz0 commit without
+re-running smoke; bump weights only when an 8-bit A/B is in scope (P3).
+
+## Port collision warning
+
+`compose/vllm.yml` is a placeholder stub that also binds `:8000`. Only
+one of `kimi-linear` and `vllm` can run at a time. Don't `docker compose
+up` both. Long term either delete the stub or move it to a different
+port; not in scope here.
+
+## Known issues / mitigations
+
+- **HIP graph capture broken on gfx1151** (vllm-project/vllm#32180) —
+  `--enforce-eager` mitigates at a throughput cost. Re-test without it
+  once the upstream fix lands.
+- **vLLM 0.12.0 crash on Kimi-Linear** —
+  `MLAModules.__init__() missing 'indexer_rotary_emb'`. Hard pin to
+  0.11.2.
+- **No published gfx1151 numbers** — we are first. Findings stay
+  private (no upstream filings) per project policy.
+
+## Status
+
+P0 in progress. Update `oc-tree`-style `NEXT_STEPS.md` if you set this
+aside mid-verification.