2026-05-08 11:41:25 -04:00
# localgenai
Local LLM stack on a Framework Desktop (AMD Ryzen AI Max+ 395 / "Strix
Halo", Radeon 8060S, 128 GB unified LPDDR5x), driven from a Mac over
Tailscale. Coding agents, monitoring, voice — all self-hosted.
## Topology
```
┌─────────────────┐ Tailscale ┌──────────────────────────────┐
│ Mac │ ───────────────▶ │ Framework Desktop │
│ │ │ (Ubuntu 26.04, gfx1151 iGPU)│
│ • OpenCode │ │ • Inference: Ollama, │
│ + Phoenix │ │ llama.cpp, vLLM │
│ bridge │ │ • Agent UIs: OpenWebUI, │
│ • install.sh │ │ OpenHands │
│ │ │ • Monitoring: Beszel, │
│ │ │ OpenLIT, Phoenix │
└─────────────────┘ └──────────────────────────────┘
```
## Repo layout
| Path | What's there |
|---|---|
| [`pyinfra/` ](pyinfra/ ) | pyinfra deploy targeting the Framework Desktop. Host setup (kernel params, Docker, AMD driver bits) plus all docker-compose files for the services below. See [`pyinfra/framework/README.md` ](pyinfra/framework/README.md ). |
| [`opencode/` ](opencode/ ) | OpenCode config + Phoenix bridge plugin. `install.sh` deploys it to `~/.config/opencode/` on a Mac. Wires up Playwright / SearXNG MCP and ships every prompt's spans to Phoenix. |
| [`StrixHaloSetup.md` ](StrixHaloSetup.md ) | Original phased bring-up plan for the box. |
| [`StrixHaloMemory.md` ](StrixHaloMemory.md ) | UMA / GTT / bandwidth notes — "what fits, what's slow, why." |
| [`Roadmap.md` ](Roadmap.md ) | What to build on top of the stack to narrow the gap with hosted Claude Code / Cowork / Skills / Design. |
| [`TODO.md` ](TODO.md ) | Open questions and follow-ups. |
| [`testing/qwen3-coder-30b/` ](testing/qwen3-coder-30b/ ) | Small evaluation harness for Qwen3-Coder. |
2026-05-08 13:33:17 -04:00
| [`VoiceModels.md` ](VoiceModels.md ) | STT / TTS landscape and upgrade paths from the Wyoming defaults. |
2026-05-08 11:41:25 -04:00
## Service ports (on `framework`)
| Port | Service | Notes |
|---|---|---|
2026-05-08 12:00:05 -04:00
| `7575` | **Homepage ** | **Front door — start here. ** Tile per service with live widgets. |
2026-05-08 11:41:25 -04:00
| `8080` | llama.cpp | Vulkan backend, `--metrics` for Prometheus |
| `8000` | vLLM | ROCm; gfx1151 support varies |
| `11434` | Ollama | ROCm with `HSA_OVERRIDE_GFX_VERSION=11.0.0` |
| `3000` | OpenWebUI | ChatGPT-style UI in front of Ollama |
| `3001` | OpenLIT | LLM fleet metrics dashboard |
2026-05-08 12:04:35 -04:00
| `3030` | OpenHands | Autonomous agent + sandbox runtime — Tailscale-only by design |
2026-05-08 11:41:25 -04:00
| `4317` | Phoenix OTLP/gRPC | Trace ingestion |
| `6006` | Phoenix UI / OTLP/HTTP | Per-trace agent waterfall (also `:6006/v1/traces` ) |
2026-05-08 14:42:45 -04:00
| `8001` | faster-whisper | STT (OpenAI API) — large-v3-turbo, for OpenWebUI/Conduit |
2026-05-08 11:41:25 -04:00
| `8090` | Beszel | Host + container + AMD GPU dashboard |
2026-05-08 14:42:45 -04:00
| `8880` | Kokoro | TTS (OpenAI API) — Kokoro-82M, for OpenWebUI/Conduit |
| `10200` | Piper | TTS (Wyoming protocol) — for Home Assistant Assist |
| `10300` | Whisper | STT (Wyoming protocol) — for Home Assistant Assist |
2026-05-08 11:41:25 -04:00
## Quick start
On the box (after the pyinfra deploy):
```sh
cd /srv/docker/ollama && docker compose up -d
cd /srv/docker/phoenix && docker compose up -d
cd /srv/docker/beszel && docker compose up -d # then add system per beszel.yml
```
On the Mac:
```sh
cd opencode && ./install.sh
opencode
```
Send a prompt; watch it land in Phoenix at <http://framework:6006>.
## Why this stack
- **Bandwidth, not VRAM, is the ceiling on Strix Halo.** 256 GB/s memory
bandwidth means MoE models with small active-params (Qwen3-Coder-30B-A3B,
Kimi-Linear-48B-A3B) dominate. See `StrixHaloMemory.md` .
- **AMD APU monitoring is broken upstream.** `amd-smi` returns N/A on
gfx1151, which kills the official Prometheus/Grafana exporters. Beszel
reads sysfs directly and works. See the monitoring section of
`pyinfra/framework/README.md` .
- **Two-layer observability.** OpenLIT is fleet metrics across sessions;
Phoenix is per-trace waterfall for "what did this one prompt do."
- **Reproducible.** Every host-side config lives in `pyinfra/` ; every
Mac-side config in `opencode/install.sh` . Re-running either is safe.