228fe8d1ac27ed7ad3853446d8bf7f694dcff953
apt's btop on 24.04 is 1.3.x, which has no AMD GPU monitoring. 1.4+ adds it but requires C++23, which gcc-13 (24.04 default) doesn't fully support. Plan: - Add ubuntu-toolchain-r/test PPA, install g++-14 (C++23-capable). - Add librocm-smi-dev to ROCm host diagnostics — btop dlopens librocm_smi64 at runtime; the headers are needed at compile time. - Drop btop from apt list, build from a pinned BTOP_VERSION tag with GPU_SUPPORT=true CXX=g++-14 -j; install to /usr/local/bin. - Idempotent — only rebuilds if installed version doesn't match. After deploy: btop → Esc → Options → "show_gpu_info" → On to enable the GPU panel. Also clean up TODO.md — the box is on 24.04 (noble), not 26.04. The libxml2 ABI mismatch / "ROCm gap" section was stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
localgenai
Local LLM stack on a Framework Desktop (AMD Ryzen AI Max+ 395 / "Strix Halo", Radeon 8060S, 128 GB unified LPDDR5x), driven from a Mac over Tailscale. Coding agents, monitoring, voice — all self-hosted.
Topology
┌─────────────────┐ Tailscale ┌──────────────────────────────┐
│ Mac │ ───────────────▶ │ Framework Desktop │
│ │ │ (Ubuntu 26.04, gfx1151 iGPU)│
│ • OpenCode │ │ • Inference: Ollama, │
│ + Phoenix │ │ llama.cpp, vLLM │
│ bridge │ │ • Agent UIs: OpenWebUI, │
│ • install.sh │ │ OpenHands │
│ │ │ • Monitoring: Beszel, │
│ │ │ OpenLIT, Phoenix │
└─────────────────┘ └──────────────────────────────┘
Repo layout
| Path | What's there |
|---|---|
pyinfra/ |
pyinfra deploy targeting the Framework Desktop. Host setup (kernel params, Docker, AMD driver bits) plus all docker-compose files for the services below. See pyinfra/framework/README.md. |
opencode/ |
OpenCode config + Phoenix bridge plugin. install.sh deploys it to ~/.config/opencode/ on a Mac. Wires up Playwright / SearXNG MCP and ships every prompt's spans to Phoenix. |
StrixHaloSetup.md |
Original phased bring-up plan for the box. |
StrixHaloMemory.md |
UMA / GTT / bandwidth notes — "what fits, what's slow, why." |
Roadmap.md |
What to build on top of the stack to narrow the gap with hosted Claude Code / Cowork / Skills / Design. |
TODO.md |
Open questions and follow-ups. |
testing/qwen3-coder-30b/ |
Small evaluation harness for Qwen3-Coder. |
VoiceModels.md |
STT / TTS landscape and upgrade paths from the Wyoming defaults. |
Service ports (on framework)
| Port | Service | Notes |
|---|---|---|
7575 |
Homepage | Front door — start here. Tile per service with live widgets. |
8080 |
llama.cpp | Vulkan backend, --metrics for Prometheus |
8000 |
vLLM | ROCm; gfx1151 support varies |
11434 |
Ollama | ROCm with HSA_OVERRIDE_GFX_VERSION=11.0.0 |
3000 |
OpenWebUI | ChatGPT-style UI in front of Ollama |
3001 |
OpenLIT | LLM fleet metrics dashboard |
3030 |
OpenHands | Autonomous agent + sandbox runtime — Tailscale-only by design |
4317 |
Phoenix OTLP/gRPC | Trace ingestion |
6006 |
Phoenix UI / OTLP/HTTP | Per-trace agent waterfall (also :6006/v1/traces) |
8001 |
faster-whisper | STT (OpenAI API) — large-v3-turbo, for OpenWebUI/Conduit |
8090 |
Beszel | Host + container + AMD GPU dashboard |
8880 |
Kokoro | TTS (OpenAI API) — Kokoro-82M, for OpenWebUI/Conduit |
10200 |
Piper | TTS (Wyoming protocol) — for Home Assistant Assist |
10300 |
Whisper | STT (Wyoming protocol) — for Home Assistant Assist |
Quick start
On the box (after the pyinfra deploy):
cd /srv/docker/ollama && docker compose up -d
cd /srv/docker/phoenix && docker compose up -d
cd /srv/docker/beszel && docker compose up -d # then add system per beszel.yml
On the Mac:
cd opencode && ./install.sh
opencode
Send a prompt; watch it land in Phoenix at http://framework:6006.
Why this stack
- Bandwidth, not VRAM, is the ceiling on Strix Halo. 256 GB/s memory
bandwidth means MoE models with small active-params (Qwen3-Coder-30B-A3B,
Kimi-Linear-48B-A3B) dominate. See
StrixHaloMemory.md. - AMD APU monitoring is broken upstream.
amd-smireturns N/A on gfx1151, which kills the official Prometheus/Grafana exporters. Beszel reads sysfs directly and works. See the monitoring section ofpyinfra/framework/README.md. - Two-layer observability. OpenLIT is fleet metrics across sessions; Phoenix is per-trace waterfall for "what did this one prompt do."
- Reproducible. Every host-side config lives in
pyinfra/; every Mac-side config inopencode/install.sh. Re-running either is safe.
Description
Languages
Python
64.7%
Shell
23.8%
JavaScript
9.6%
Dockerfile
1.9%