Go to file

noisedestroyers a29793032d Document current coding-workflow stack state

Snapshot of where opencode + Qwen3-Coder + MCPs + Kimi-Linear + voice
  + Phoenix tracing land today, plus in-flight (oc-tree, kimi-linear
  context ramp) and next (ComfyUI) items with pointers to per-project
  NEXT_STEPS.md guides.

2026-05-10 21:14:43 -04:00

gitea-opencode-agent-project

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

kimi-linear

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

oc-tree

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

opencode

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

pyinfra

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

testing/qwen3-coder-30b

Initial commit: localgenai stack

2026-05-08 11:35:10 -04:00

.gitignore

Initial commit: localgenai stack

2026-05-08 11:35:10 -04:00

qwen-large-codebase-roadmap.md

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

README.md

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

Roadmap.md

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

StrixHaloMemory.md

Initial commit: localgenai stack

2026-05-08 11:35:10 -04:00

StrixHaloSetup.md

Initial commit: localgenai stack

2026-05-08 11:35:10 -04:00

TODO.md

Build btop 1.4 from source with AMD GPU support

2026-05-08 16:34:52 -04:00

VoiceModels.md

Add OpenAI-compatible voice servers (faster-whisper + Kokoro)

2026-05-08 14:42:45 -04:00

vscode-continue-config.yml

Document current coding-workflow stack state

2026-05-10 21:14:43 -04:00

README.md

localgenai

Local LLM stack on a Framework Desktop (AMD Ryzen AI Max+ 395 / "Strix Halo", Radeon 8060S, 128 GB unified LPDDR5x), driven from a Mac over Tailscale. Coding agents, monitoring, voice — all self-hosted.

Topology

┌─────────────────┐    Tailscale     ┌──────────────────────────────┐
│  Mac            │ ───────────────▶ │  Framework Desktop           │
│                 │                  │  (Ubuntu 26.04, gfx1151 iGPU)│
│  • OpenCode     │                  │  • Inference: Ollama,        │
│    + Phoenix    │                  │    llama.cpp, vLLM           │
│    bridge       │                  │  • Agent UIs: OpenWebUI,     │
│  • install.sh   │                  │    OpenHands                 │
│                 │                  │  • Monitoring: Beszel,       │
│                 │                  │    OpenLIT, Phoenix          │
└─────────────────┘                  └──────────────────────────────┘

Repo layout

Path	What's there
`pyinfra/`	pyinfra deploy targeting the Framework Desktop. Host setup (kernel params, Docker, AMD driver bits) plus all docker-compose files for the services below. See `pyinfra/framework/README.md`.
`opencode/`	OpenCode config + Phoenix bridge plugin. `install.sh` deploys it to `~/.config/opencode/` on a Mac. Wires up Playwright / SearXNG MCP and ships every prompt's spans to Phoenix.
`StrixHaloSetup.md`	Original phased bring-up plan for the box.
`StrixHaloMemory.md`	UMA / GTT / bandwidth notes — "what fits, what's slow, why."
`Roadmap.md`	What to build on top of the stack to narrow the gap with hosted Claude Code / Cowork / Skills / Design.
`TODO.md`	Open questions and follow-ups.
`testing/qwen3-coder-30b/`	Small evaluation harness for Qwen3-Coder.
`VoiceModels.md`	STT / TTS landscape and upgrade paths from the Wyoming defaults.

Service ports (on `framework`)

Port	Service	Notes
`7575`	Homepage	Front door — start here. Tile per service with live widgets.
`8080`	llama.cpp	Vulkan backend, `--metrics` for Prometheus
`8000`	vLLM	ROCm; gfx1151 support varies
`11434`	Ollama	ROCm with `HSA_OVERRIDE_GFX_VERSION=11.0.0`
`3000`	OpenWebUI	ChatGPT-style UI in front of Ollama
`3001`	OpenLIT	LLM fleet metrics dashboard
`3030`	OpenHands	Autonomous agent + sandbox runtime — Tailscale-only by design
`4317`	Phoenix OTLP/gRPC	Trace ingestion
`6006`	Phoenix UI / OTLP/HTTP	Per-trace agent waterfall (also `:6006/v1/traces`)
`8001`	faster-whisper	STT (OpenAI API) — large-v3-turbo, for OpenWebUI/Conduit
`8090`	Beszel	Host + container + AMD GPU dashboard
`8880`	Kokoro	TTS (OpenAI API) — Kokoro-82M, for OpenWebUI/Conduit
`10200`	Piper	TTS (Wyoming protocol) — for Home Assistant Assist
`10300`	Whisper	STT (Wyoming protocol) — for Home Assistant Assist

Quick start

On the box (after the pyinfra deploy):

cd /srv/docker/ollama && docker compose up -d
cd /srv/docker/phoenix && docker compose up -d
cd /srv/docker/beszel && docker compose up -d   # then add system per beszel.yml

On the Mac:

cd opencode && ./install.sh
opencode

Send a prompt; watch it land in Phoenix at http://framework:6006.

Coding workflows — state of the stack (2026-05-10)

Component	Status	Notes
Primary: opencode → `framework/qwen3-coder:30b` (Ollama)	Working daily-driver	Full MCP toolbox (playwright, searxng, serena, basic-memory, sequential-thinking, task-master). Phoenix bridge emits OTel traces for every call.
Secondary: opencode → `framework-vllm/kimi-linear` (vLLM)	Configured, experimental	`tool_call: false` — Kimi-Linear is a research model and isn't strongly tool-trained. Long-context chat only. Switch via `/model framework-vllm/kimi-linear`.
Chat front-end for Kimi-Linear: OpenWebUI	Working	vLLM endpoint wired via `OPENAI_API_BASE_URLS`. Pick `kimi-linear` from the model selector in OpenWebUI.
Voice loop: faster-whisper + Kokoro	Deployed	OpenAI-compatible endpoints; not yet wired into a hands-free chat loop.
Observability: Phoenix per-trace + OpenLIT fleet	Working	All opencode prompts traced via `.opencode/plugin/phoenix-bridge.js`.
In flight: oc-tree TUI sidecar	M0 skeleton done, awaiting probe	Live tree view of opencode's SSE event stream; tmux-side companion to Phoenix's post-hoc traces. See `oc-tree/NEXT_STEPS.md`.
In flight: Kimi-Linear context ramp	P0 done at 32K, BIOS+GTT recipe applied	Next: ramp `--max-model-len` toward 1M and benchmark. See `kimi-linear/NEXT_STEPS.md`.
Configured but not deployed: VS Code Continue	YAML ready	Drop `vscode-continue-config.yml` into `~/.continue/config.yaml`; `ollama pull nomic-embed-text` on the box for `@codebase` indexing.
Planned next: ComfyUI for image generation	Phase plan written	Flux.1-Dev via kyuz0 toolbox, same pyinfra pattern as kimi-linear. See task list (CF-P0..P4).
Roadmap-only (not started): Aider, Cline, OpenHands, LiteLLM, Multica	—	See `Roadmap.md`.

Single-line summary: opencode + Qwen3-Coder + the MCP toolbox is the working coding harness; Kimi-Linear is alongside for long-context chat; oc-tree and ComfyUI are the next two infra additions.

Why this stack

Bandwidth, not VRAM, is the ceiling on Strix Halo. 256 GB/s memory bandwidth means MoE models with small active-params (Qwen3-Coder-30B-A3B, Kimi-Linear-48B-A3B) dominate. See StrixHaloMemory.md.
AMD APU monitoring is broken upstream. amd-smi returns N/A on gfx1151, which kills the official Prometheus/Grafana exporters. Beszel reads sysfs directly and works. See the monitoring section of pyinfra/framework/README.md.
Two-layer observability. OpenLIT is fleet metrics across sessions; Phoenix is per-trace waterfall for "what did this one prompt do."
Reproducible. Every host-side config lives in pyinfra/; every Mac-side config in opencode/install.sh. Re-running either is safe.

README.md

localgenai

Topology

Repo layout

Service ports (on framework)

Quick start

Coding workflows — state of the stack (2026-05-10)

Why this stack

Service ports (on `framework`)