Add top-level README
One-page orientation: topology diagram, repo layout, service-port table, quick-start, and the rationale notes that get glossed over in the sub-READMEs (bandwidth ceiling, AMD APU monitoring gap, two-layer observability). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
78
README.md
Normal file
78
README.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# localgenai
|
||||
|
||||
Local LLM stack on a Framework Desktop (AMD Ryzen AI Max+ 395 / "Strix
|
||||
Halo", Radeon 8060S, 128 GB unified LPDDR5x), driven from a Mac over
|
||||
Tailscale. Coding agents, monitoring, voice — all self-hosted.
|
||||
|
||||
## Topology
|
||||
|
||||
```
|
||||
┌─────────────────┐ Tailscale ┌──────────────────────────────┐
|
||||
│ Mac │ ───────────────▶ │ Framework Desktop │
|
||||
│ │ │ (Ubuntu 26.04, gfx1151 iGPU)│
|
||||
│ • OpenCode │ │ • Inference: Ollama, │
|
||||
│ + Phoenix │ │ llama.cpp, vLLM │
|
||||
│ bridge │ │ • Agent UIs: OpenWebUI, │
|
||||
│ • install.sh │ │ OpenHands │
|
||||
│ │ │ • Monitoring: Beszel, │
|
||||
│ │ │ OpenLIT, Phoenix │
|
||||
└─────────────────┘ └──────────────────────────────┘
|
||||
```
|
||||
|
||||
## Repo layout
|
||||
|
||||
| Path | What's there |
|
||||
|---|---|
|
||||
| [`pyinfra/`](pyinfra/) | pyinfra deploy targeting the Framework Desktop. Host setup (kernel params, Docker, AMD driver bits) plus all docker-compose files for the services below. See [`pyinfra/framework/README.md`](pyinfra/framework/README.md). |
|
||||
| [`opencode/`](opencode/) | OpenCode config + Phoenix bridge plugin. `install.sh` deploys it to `~/.config/opencode/` on a Mac. Wires up Playwright / SearXNG MCP and ships every prompt's spans to Phoenix. |
|
||||
| [`StrixHaloSetup.md`](StrixHaloSetup.md) | Original phased bring-up plan for the box. |
|
||||
| [`StrixHaloMemory.md`](StrixHaloMemory.md) | UMA / GTT / bandwidth notes — "what fits, what's slow, why." |
|
||||
| [`Roadmap.md`](Roadmap.md) | What to build on top of the stack to narrow the gap with hosted Claude Code / Cowork / Skills / Design. |
|
||||
| [`TODO.md`](TODO.md) | Open questions and follow-ups. |
|
||||
| [`testing/qwen3-coder-30b/`](testing/qwen3-coder-30b/) | Small evaluation harness for Qwen3-Coder. |
|
||||
| `piper-compose.yaml`, `whisper-compose.yaml` | Voice (TTS / STT) compose files; not yet wired into the pyinfra deploy. |
|
||||
|
||||
## Service ports (on `framework`)
|
||||
|
||||
| Port | Service | Notes |
|
||||
|---|---|---|
|
||||
| `8080` | llama.cpp | Vulkan backend, `--metrics` for Prometheus |
|
||||
| `8000` | vLLM | ROCm; gfx1151 support varies |
|
||||
| `11434` | Ollama | ROCm with `HSA_OVERRIDE_GFX_VERSION=11.0.0` |
|
||||
| `3000` | OpenWebUI | ChatGPT-style UI in front of Ollama |
|
||||
| `3001` | OpenLIT | LLM fleet metrics dashboard |
|
||||
| `3030` | OpenHands | Loopback only — SSH-tunnel from your laptop |
|
||||
| `4317` | Phoenix OTLP/gRPC | Trace ingestion |
|
||||
| `6006` | Phoenix UI / OTLP/HTTP | Per-trace agent waterfall (also `:6006/v1/traces`) |
|
||||
| `8090` | Beszel | Host + container + AMD GPU dashboard |
|
||||
|
||||
## Quick start
|
||||
|
||||
On the box (after the pyinfra deploy):
|
||||
```sh
|
||||
cd /srv/docker/ollama && docker compose up -d
|
||||
cd /srv/docker/phoenix && docker compose up -d
|
||||
cd /srv/docker/beszel && docker compose up -d # then add system per beszel.yml
|
||||
```
|
||||
|
||||
On the Mac:
|
||||
```sh
|
||||
cd opencode && ./install.sh
|
||||
opencode
|
||||
```
|
||||
|
||||
Send a prompt; watch it land in Phoenix at <http://framework:6006>.
|
||||
|
||||
## Why this stack
|
||||
|
||||
- **Bandwidth, not VRAM, is the ceiling on Strix Halo.** 256 GB/s memory
|
||||
bandwidth means MoE models with small active-params (Qwen3-Coder-30B-A3B,
|
||||
Kimi-Linear-48B-A3B) dominate. See `StrixHaloMemory.md`.
|
||||
- **AMD APU monitoring is broken upstream.** `amd-smi` returns N/A on
|
||||
gfx1151, which kills the official Prometheus/Grafana exporters. Beszel
|
||||
reads sysfs directly and works. See the monitoring section of
|
||||
`pyinfra/framework/README.md`.
|
||||
- **Two-layer observability.** OpenLIT is fleet metrics across sessions;
|
||||
Phoenix is per-trace waterfall for "what did this one prompt do."
|
||||
- **Reproducible.** Every host-side config lives in `pyinfra/`; every
|
||||
Mac-side config in `opencode/install.sh`. Re-running either is safe.
|
||||
Reference in New Issue
Block a user