Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00
commit 2c4bfefa95
36 changed files with 5265 additions and 0 deletions
--- a/pyinfra/HowToScan.md
+++ b/pyinfra/HowToScan.md
@@ -0,0 +1,82 @@
+# How to onboard an existing host into pyinfra
+
+There's no auto-generator that reads a running system and emits a
+deploy.py — not in pyinfra, not in Ansible, not in any IaC tool.
+Reverse-engineering "what's intentional vs incidental" needs human
+judgment. Below is the workflow that actually works.
+
+## What pyinfra gives you for free
+
+The **facts** API queries live state without writing any deploy:
+
+```sh
+pyinfra @host fact deb.DebPackages           # installed packages
+pyinfra @host fact apt.AptSources            # extra apt repos
+pyinfra @host fact systemd.SystemdServices   # service state
+pyinfra @host fact server.Crontab            # cron
+pyinfra @host fact server.Users              # users
+```
+
+Output is JSON. There's no fact → op translator, but the data is easy
+to grep through and tells you what's actually on the box.
+
+## Practical workflow
+
+1. Run a scanner against the live host that dumps the *meaningful*
+   state — packages explicitly installed, enabled services, custom
+   configs, dotfiles, sudoers fragments, fstab non-defaults. Skip the
+   distro-default noise.
+2. Hand-pick the bits worth managing into a new `<station>/deploy.py`.
+3. Run with `--dry` against the live box; pyinfra's diff shows what's
+   still drifting.
+4. Iterate until `--dry` reports "no changes."
+
+## One-shot scanner
+
+```sh
+ssh you@host '
+echo "=== manually-installed packages ==="
+apt-mark showmanual
+echo "=== enabled services ==="
+systemctl list-unit-files --state=enabled --type=service --no-legend | awk "{print \$1}"
+echo "=== extra apt repos ==="
+ls /etc/apt/sources.list.d/
+echo "=== sudoers fragments ==="
+ls /etc/sudoers.d/
+echo "=== fstab non-default ==="
+grep -vE "^#|^$|/proc|/sys|/dev/pts" /etc/fstab
+echo "=== users with login shells ==="
+getent passwd | awk -F: "\$3>=1000 && \$3<60000 {print \$1}"
+echo "=== docker stacks ==="
+ls /srv/docker/ 2>/dev/null
+echo "=== home dotfiles ==="
+sudo find /home -maxdepth 3 -name ".*rc" -o -name ".*config" 2>/dev/null
+'
+```
+
+That gets you ~80% of what's worth managing in roughly 30 lines of
+pyinfra. The remaining 20% is case-by-case (kernel cmdline tweaks,
+hand-edited /etc files, dotfile contents) that no scanner fully
+captures — you write those as you re-discover the box's quirks.
+
+## What scanners can't catch
+
+- The *intent* behind a config (why this `sysctl` value, why this
+  cron entry).
+- Hand-edited fragments mixed into otherwise-default files (e.g. a
+  custom line in `/etc/ssh/sshd_config`).
+- Out-of-band state: data in databases, container volumes,
+  `/var/lib/<service>`.
+
+For these, treat pyinfra as managing the *bones* (packages, services,
+users, configs) and leave the data alone. Re-deploying onto fresh
+hardware should give you a working system; restoring data is a
+separate concern.
+
+## If "scan and reproduce" is a hard requirement
+
+The native answer is **NixOS**. The whole OS is declarative;
+`nixos-generate-config` reads your machine and writes a full config
+that recreates it. Different paradigm, big switch — but the right
+answer if reproducible-from-a-file is core to your workflow and you
+aren't already deep in Ubuntu-land.
--- a/pyinfra/README.md
+++ b/pyinfra/README.md
@@ -0,0 +1,12 @@
+# pyinfra
+
+One folder per station. Each subfolder is a self-contained pyinfra
+deploy: `inventory.py`, `deploy.py`, `run.sh`, plus any compose files
+or assets that ship to the host.
+
+| Station | Host | Notes |
+|---------|------|-------|
+| [`framework/`](framework/README.md) | `10.0.0.237` | Framework Desktop (Strix Halo, 128 GB) — local LLM box |
+
+To bring up a station, `cd` into its folder and run `./run.sh`. See the
+station's own README for prerequisites.
--- a/pyinfra/framework/README.md
+++ b/pyinfra/framework/README.md
@@ -0,0 +1,184 @@
+# pyinfra: Strix Halo bring-up
+
+Containerized setup for the Framework Desktop (Ryzen AI Max+ 395, Radeon
+8060S, 128 GB). The host stays minimal — kernel + driver + Docker +
+diagnostics. Inference engines (llama.cpp, vLLM, Ollama) run as docker
+compose services, each shipping its own ROCm/Vulkan stack.
+
+## Manual prerequisites
+
+1. **Phase 0** — update Framework BIOS, set GPU UMA carve-out (96 GB).
+2. **OS install** — Ubuntu Server 24.04 LTS. AMD ROCm only ships for
+   jammy/noble; later Ubuntus install but break the host-side toolchain
+   (libxml2 ABI). Enable SSH, import your laptop key, create user `noise`.
+   Recommended partitioning: ≥300 GB on `/`, big disk mounted at `/models`,
+   plain ext4 (skip LVM).
+3. The host must be reachable at `10.0.0.237` over SSH (edit `inventory.py`
+   if it moves).
+4. **NOPASSWD sudo for `noise`** — pyinfra's fact layer doesn't reliably
+   thread sudo passwords. One-time setup:
+   ```sh
+   ssh noise@10.0.0.237 'echo "noise ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/noise-nopasswd && sudo chmod 440 /etc/sudoers.d/noise-nopasswd'
+   ```
+
+## Run
+
+```sh
+uv tool install pyinfra
+./run.sh           # equivalent to: pyinfra inventory.py deploy.py
+./run.sh --dry     # any extra args are forwarded to pyinfra
+```
+
+Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.
+
+## What the deploy does
+
+- Base CLI: tmux, vim, htop, btop, nvtop, amdgpu_top, uv, lazydocker, huggingface-cli
+- Tailscale (run `sudo tailscale up` on the box once, interactively)
+- Docker engine + compose plugin, user added to `docker` group
+- ROCm host diagnostics only (`rocminfo`) — no full toolchain
+- GRUB kernel params for Strix Halo perf: `amd_iommu=off`,
+  `amdgpu.gttsize=117760` (per [Gygeek/Framework-strix-halo-llm-setup](https://github.com/Gygeek/Framework-strix-halo-llm-setup)).
+  Requires a reboot to activate — pyinfra rewrites `/etc/default/grub`
+  and runs `update-grub`, but won't reboot for you.
+- `/models/<vendor>/` layout
+- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands}/docker-compose.yml`
+  dropped in, not auto-started — you edit the model path then
+  `docker compose up -d`
+  (OpenWebUI needs no edits — it's pre-configured to find Ollama at
+  `host.docker.internal:11434` and uses `searxng.n0n.io` for web search)
+  (Beszel hub at :8090, OpenLIT UI at :3001, Phoenix UI at :6006,
+  OpenHands UI at :3030 — see "Monitoring stack" and "Agent harnesses"
+  below)
+
+If a previous run installed the native llama.cpp build / full ROCm /
+native Ollama, those are auto-cleaned the next time `./run.sh` runs.
+
+## After the deploy: starting an inference service
+
+```sh
+ssh noise@10.0.0.237
+sudo tailscale up                            # one-time, interactive
+
+# Drop a GGUF somewhere under /models, then:
+cd /srv/docker/llama
+vim docker-compose.yml                       # edit the --model path
+docker compose up -d
+curl localhost:8080/v1/models                # smoke test
+```
+
+Same shape for `vllm` (port 8000) and `ollama` (port 11434, no model edit
+needed — Ollama serves models on demand).
+
+## Tunables
+
+Top of `deploy.py`:
+- `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` — bump when AMD ships a newer
+  release. The .deb filename has a build suffix that doesn't derive from
+  the version; find it at https://repo.radeon.com/amdgpu-install/.
+- `AMDGPU_TOP_VERSION` — bump when a newer release lands at
+  https://github.com/Umio-Yasuno/amdgpu_top/releases.
+
+Compose images in
+`compose/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands}.yml`
+— pin tags here.
+
+## Monitoring stack
+
+Two compose stacks watch the inference services:
+
+- **Beszel** (`/srv/docker/beszel`, http://framework:8090) — host + Docker
+  container + AMD GPU dashboard. The agent's `amd_sysfs` collector reads
+  `/sys/class/drm/cardN/device/` directly; this is the only path that
+  reports real numbers on Strix Halo (gfx1151) — `amd-smi` returns N/A
+  for util/power/temp on this APU.
+
+  First-time bring-up:
+  ```sh
+  cd /srv/docker/beszel
+  docker compose up -d beszel                                 # 1. hub
+  # 2. open http://framework:8090 → create admin → "Add system"
+  # 3. copy the TOKEN and the SSH KEY from the dialog into .env:
+  sudo tee /srv/docker/beszel/.env >/dev/null <<EOF
+  BESZEL_TOKEN=<token-from-dialog>
+  BESZEL_KEY=ssh-ed25519 AAAA…
+  EOF
+  sudo chgrp docker /srv/docker/beszel/.env
+  sudo chmod 640 /srv/docker/beszel/.env
+  docker compose up -d --force-recreate beszel-agent          # 4. agent
+  ```
+
+- **OpenLIT** (`/srv/docker/openlit`, http://framework:3001) — fleet
+  metrics for the LLM stack (cost, tokens, latency aggregated across
+  sessions and models). Auto-instruments Ollama and vLLM via
+  OpenTelemetry without app-code changes; for llama.cpp, the compose
+  file already passes `--metrics` so OpenLIT can scrape `/metrics`.
+  ClickHouse-backed. OTLP receivers exposed on the host at **:4327
+  (gRPC) / :4328 (HTTP)** — Phoenix owns 4317/4318.
+
+  Bring-up: `cd /srv/docker/openlit && docker compose up -d`. UI prompts
+  to create an admin account on first load.
+
+- **Phoenix** (`/srv/docker/phoenix`, http://framework:6006) — per-trace
+  agent waterfall / flamegraph. Designed to answer "show me what one
+  OpenCode turn actually did" — full call tree, nested LLM/tool calls,
+  token counts inline. Single container, SQLite-backed. OTLP receivers
+  on the host at **:4317 (gRPC)** and **:6006/v1/traces (HTTP)** —
+  Phoenix 15.x serves HTTP OTLP on the UI port, not a separate 4318.
+
+  Bring-up: `cd /srv/docker/phoenix && docker compose up -d`. UI
+  immediately available; no auth in the self-hosted single-user path.
+
+  See [`localgenai/opencode/README.md`](../../opencode/README.md) for
+  how OpenCode is configured to ship traces here.
+
+The official AMD Prometheus exporters (`amd-smi-exporter`,
+`device-metrics-exporter`) are intentionally **not** deployed — they're
+broken on gfx1151 (ROCm#6035). If you ever want full Prometheus +
+Grafana, the migration path is to replace Beszel with `node_exporter` +
+a textfile collector reading `/sys/class/drm/cardN/device/` and
+`/sys/class/hwmon/`.
+
+## Agent harnesses
+
+Two agent UIs in front of the local model stack — pick one (or both)
+based on how you like to drive the agent:
+
+- **OpenWebUI** (`/srv/docker/openwebui`, http://framework:3000) —
+  ChatGPT-style web UI. Pre-wired to Ollama + SearXNG web search.
+  Best for casual chat and household-shared LLM access.
+
+  Bring-up: `cd /srv/docker/openwebui && docker compose up -d`.
+
+- **OpenHands** (`/srv/docker/openhands`, http://framework:3030,
+  loopback-only) — autonomous agent in a Docker sandbox. Spawns a
+  per-conversation `agent-server` container that can write code, run
+  tests, browse the web. Pre-configured for Ollama at
+  `openai/qwen3-coder:30b` over the OpenAI-compatible endpoint;
+  ships traces to Phoenix.
+
+  Bring-up:
+  ```sh
+  cd /srv/docker/openhands && docker compose up -d
+  # Tunnel the loopback-bound UI from your laptop:
+  ssh -L 3030:127.0.0.1:3030 noise@framework
+  open http://localhost:3030
+  ```
+
+  First run pulls the agent-server image (~2 GB) lazily on first
+  conversation, not at startup, so the orchestrator comes up fast but
+  your first message takes 30–60 s. Pre-0.44 state path was
+  `~/.openhands-state`; not relevant on a fresh install.
+
+  Tool-call quality with local models is much better when Ollama's
+  context is bumped — the compose at `/srv/docker/ollama` already sets
+  `OLLAMA_CONTEXT_LENGTH=65536`, which is plenty.
+
+OpenCode (the terminal driver, configured in
+[`localgenai/opencode/`](../../opencode/)) runs on the Mac and points
+at the same Ollama endpoint, so all three harnesses share the same
+model server.
+
+The llama.cpp image is `kyuz0/amd-strix-halo-toolboxes:vulkan-radv`,
+gfx1151-optimized with rocWMMA flash attention (the latter only kicks
+in on the ROCm tags). Auto-rebuilt against llama.cpp master.
--- a/pyinfra/framework/compose/beszel.yml
+++ b/pyinfra/framework/compose/beszel.yml
@@ -0,0 +1,66 @@
+# Beszel — host + container + GPU dashboard.
+# https://beszel.dev
+#
+# Picked over Prometheus+Grafana for this box because:
+# - The agent's `amd_sysfs` collector reads /sys/class/drm/card*/device/
+#   directly, which is the only reliable GPU metric source on Strix Halo
+#   (gfx1151). AMD's amd-smi / Device Metrics Exporter return N/A for
+#   util/power/temp on this APU (ROCm#6035), so the official Prometheus
+#   exporter path is dead.
+# - Two containers vs six.
+#
+# First-time setup (WebSocket connection model — current Beszel default):
+#   1. `docker compose up -d beszel`            (start the hub)
+#   2. Open http://framework:8090, create the admin account
+#   3. Click "Add system" — the dialog gives you a TOKEN and an SSH KEY.
+#   4. Edit /srv/docker/beszel/.env (created empty by pyinfra; pyinfra
+#      doesn't overwrite). Add:
+#        BESZEL_TOKEN=<token-from-dialog>
+#        BESZEL_KEY=ssh-ed25519 AAAA…
+#   5. `docker compose up -d --force-recreate beszel-agent`
+#
+# Docker Compose auto-reads the sibling .env file for ${VAR} interpolation
+# in the environment block below — so secrets stay out of the compose
+# file (which pyinfra overwrites) but the env-var names match exactly
+# what the agent expects.
+#
+# Why both TOKEN and KEY: TOKEN identifies which system this agent is,
+# KEY authenticates the agent (the SSH key is reused as the auth secret
+# in the WebSocket handshake). Rotate either by editing the .env and
+# `docker compose up -d --force-recreate`.
+services:
+  beszel:
+    image: henrygd/beszel:latest
+    container_name: beszel
+    restart: unless-stopped
+    ports:
+      - "8090:8090"
+    volumes:
+      - /srv/docker/beszel/data:/beszel_data
+
+  beszel-agent:
+    image: henrygd/beszel-agent:latest
+    container_name: beszel-agent
+    restart: unless-stopped
+    # Host networking so the agent sees real CPU/memory/network counters
+    # without bridge-NAT distortion.
+    network_mode: host
+    volumes:
+      # Read-only Docker socket for per-container CPU/mem/net.
+      - /var/run/docker.sock:/var/run/docker.sock:ro
+      # Sysfs paths the AMD GPU collector reads.
+      - /sys/class/drm:/sys/class/drm:ro
+      - /sys/class/hwmon:/sys/class/hwmon:ro
+    environment:
+      # Pulled from /srv/docker/beszel/.env at compose-parse time.
+      TOKEN: "${BESZEL_TOKEN:-}"
+      KEY: "${BESZEL_KEY:-}"
+      # WebSocket dial-out target — the hub on this same host. The agent
+      # is on host networking, so localhost is the host machine, where
+      # the hub container exposes port 8090.
+      HUB_URL: "http://localhost:8090"
+      # Optional fallback: legacy SSH listener for hub-initiated probing.
+      # Harmless to keep — hub only uses it if WebSocket is unreachable.
+      LISTEN: "45876"
+      # Enable the AMD sysfs GPU collector.
+      GPU: "true"
--- a/pyinfra/framework/compose/llama.yml
+++ b/pyinfra/framework/compose/llama.yml
@@ -0,0 +1,44 @@
+# llama.cpp server, gfx1151-optimized via kyuz0's Strix Halo toolboxes.
+# https://github.com/kyuz0/amd-strix-halo-toolboxes
+#
+# Tag options on docker.io/kyuz0/amd-strix-halo-toolboxes:
+#   vulkan-radv      — most stable, recommended default (this one)
+#   vulkan-amdvlk    — alternate Vulkan driver, sometimes faster
+#   rocm-7.2.2       — ROCm 7.x; needs /dev/kfd + render group_add (see vllm.yml pattern)
+#   rocm-6.4.4       — ROCm 6.x fallback
+#   rocm7-nightlies  — avoid: caps memory allocation to 64 GB (May 2026)
+#
+# Toolbox images use a shell entrypoint, so we override to launch
+# llama-server directly. Edit the --model path before `docker compose up -d`.
+services:
+  llama:
+    image: kyuz0/amd-strix-halo-toolboxes:vulkan-radv
+    container_name: llama
+    restart: unless-stopped
+    devices:
+      - /dev/dri:/dev/dri
+    volumes:
+      - /models:/models:ro
+    ports:
+      - "8080:8080"
+    entrypoint: ["llama-server"]
+    command:
+      - --model
+      - /models/REPLACE/ME/model.gguf
+      - --host
+      - 0.0.0.0
+      - --port
+      - "8080"
+      - --n-gpu-layers
+      - "999"
+      - --ctx-size
+      - "32768"
+      # Required for GPU backends on Strix Halo per Gygeek's setup
+      # guide. Forces full load into GPU memory rather than mmap.
+      - --no-mmap
+      # Flash attention — works on Vulkan too; the big win is on the
+      # ROCm tag where kyuz0's build has rocWMMA acceleration.
+      - --flash-attn
+      # Expose Prometheus metrics at /metrics — scraped by OpenLIT for
+      # tokens/sec, KV-cache use, queue depth, and request latency.
+      - --metrics
--- a/pyinfra/framework/compose/ollama.yml
+++ b/pyinfra/framework/compose/ollama.yml
@@ -0,0 +1,38 @@
+# Ollama, ROCm backend. Serves models on demand — safe to start before
+# you've put anything in /models.
+#
+# Storage: Ollama's content-addressed blob store is bind-mounted under
+# /models/ollama so all model data on the host lives under /models.
+# Note: Ollama's blobs are SHA256-named, not raw GGUFs — llama.cpp/vLLM
+# can't load them directly. Keep curated GGUFs at /models/<vendor>/...
+# for those engines.
+services:
+  ollama:
+    image: ollama/ollama:rocm
+    container_name: ollama
+    restart: unless-stopped
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri:/dev/dri
+    # Numeric GIDs of host's video (44) and render (991) groups — names
+    # don't exist inside the container, but the GIDs need to match the
+    # host so /dev/kfd + /dev/dri are accessible.
+    group_add:
+      - "44"
+      - "991"
+    environment:
+      # Strix Halo's iGPU is gfx1151 (RDNA 3.5), which Ollama's bundled
+      # ROCm runtime doesn't recognize — without this override it falls
+      # back to CPU silently. 11.0.0 = gfx1100 (Navi 31); the RDNA 3.x
+      # ISAs are close enough that gfx1100 kernels run on gfx1151.
+      - HSA_OVERRIDE_GFX_VERSION=11.0.0
+      # Default context. 256K (the upstream default for Qwen3-Coder)
+      # blows the KV cache up to ~25-30 GB and forces ollama to split
+      # layers between GPU and CPU. 64K keeps the model fully on GPU
+      # while still being plenty for coding contexts.
+      - OLLAMA_CONTEXT_LENGTH=65536
+    volumes:
+      - /models/ollama:/root/.ollama
+      - /models:/models:ro
+    ports:
+      - "11434:11434"
--- a/pyinfra/framework/compose/openhands.yml
+++ b/pyinfra/framework/compose/openhands.yml
@@ -0,0 +1,94 @@
+# OpenHands 1.7 (May 2026) — autonomous agent in a Docker sandbox.
+# https://docs.openhands.dev — repo: github.com/OpenHands/OpenHands
+#
+# Architecture: this container is a thin orchestrator. Per conversation
+# it spawns a separate `agent-server` container on the host Docker daemon
+# (that's what the docker.sock mount is for) and talks to it over REST.
+# AGENT_SERVER_IMAGE_TAG below pins the per-session sandbox image.
+#
+# Complements OpenCode: OpenCode is the interactive terminal driver,
+# OpenHands is for autonomous loops (write code, run tests, browse the
+# web in a sandbox, report back).
+services:
+  openhands:
+    # Org rebranded All-Hands-AI → OpenHands at v1.0 (Dec 2025); the old
+    # docker.all-hands.dev/all-hands-ai/openhands image is gone.
+    image: docker.openhands.dev/openhands/openhands:1.7
+    container_name: openhands
+    restart: unless-stopped
+
+    # 3030 host-side because :3000 is OpenWebUI and :3001 is OpenLIT.
+    # Loopback-only — reach via SSH tunnel or Tailscale, don't expose
+    # this directly.
+    ports:
+      - "127.0.0.1:3030:3000"
+
+    volumes:
+      # Required: orchestrator spawns sandbox containers via the host daemon.
+      - /var/run/docker.sock:/var/run/docker.sock
+      # State, settings, conversation history, MCP config, secrets.
+      # Pre-0.44 used ~/.openhands-state — N/A on a fresh install.
+      - /srv/docker/openhands/state:/.openhands
+      # Workspace the sandbox reads/writes. The host path on the LEFT must
+      # match SANDBOX_VOLUMES below — the sandbox container is spawned by
+      # the host daemon, so its bind mount is resolved on the host, not
+      # via this container's filesystem.
+      - /srv/docker/openhands/workspace:/srv/docker/openhands/workspace
+
+    # Linux Docker doesn't auto-provide host.docker.internal; this fixes it.
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+
+    environment:
+      # ---- Sandbox / agent-server image pin ----
+      # Replaces the V0.x SANDBOX_RUNTIME_CONTAINER_IMAGE. 1.19.1-python is
+      # the agent-server tag the 1.7 main image expects; bumping the main
+      # image will likely want a newer agent-server tag — check the
+      # upstream docker-compose.yml on each upgrade.
+      AGENT_SERVER_IMAGE_REPOSITORY: ghcr.io/openhands/agent-server
+      AGENT_SERVER_IMAGE_TAG: 1.19.1-python
+
+      # ---- Workspace mount into the per-session sandbox ----
+      # SANDBOX_VOLUMES is the V1 replacement for the deprecated
+      # WORKSPACE_BASE / WORKSPACE_MOUNT_PATH variables.
+      SANDBOX_VOLUMES: /srv/docker/openhands/workspace:/workspace:rw
+      # Match the host's `noise` UID so files the agent writes aren't
+      # owned by root.
+      SANDBOX_USER_ID: "1000"
+
+      # ---- LLM: host Ollama via OpenAI-compatible endpoint ----
+      # Per the official local-llms doc, the recommended path is the
+      # /v1 OpenAI-compatible endpoint with the `openai/` LiteLLM prefix
+      # — NOT `ollama/...`, which has worse tool-call behaviour.
+      LLM_MODEL: "openai/qwen3-coder:30b"
+      LLM_BASE_URL: "http://host.docker.internal:11434/v1"
+      LLM_API_KEY: "ollama"   # any non-empty string; Ollama doesn't auth.
+
+      # Default tool-calling renderer mismatches Qwen3-Coder's training
+      # format and produces malformed calls (issue #8140). Forcing false
+      # falls back to OpenHands' prompt-based protocol — costs some token
+      # efficiency, gains reliability with local models.
+      LLM_NATIVE_TOOL_CALLING: "false"
+
+      LOG_ALL_EVENTS: "true"
+
+      # ---- Optional: ship traces to Phoenix on :4318 ----
+      # OpenHands V1 uses LiteLLM + OpenTelemetry; standard OTLP env vars
+      # are honoured. Comment out to disable.
+      OTEL_EXPORTER_OTLP_ENDPOINT: "http://host.docker.internal:4318"
+      OTEL_EXPORTER_OTLP_PROTOCOL: "http/protobuf"
+      OTEL_SERVICE_NAME: "openhands"
+
+    # Per-session agent-server containers spawn headless chromium for
+    # browser tasks; default 64 MB shm causes silent crashes.
+    shm_size: "2gb"
+
+    # Playwright/chromium needs higher fd limits than Docker's default.
+    ulimits:
+      nofile:
+        soft: 65536
+        hard: 65536
+
+    # Bridge networking is correct here. Don't switch to network_mode: host
+    # — the spawned sandbox containers reach this orchestrator via Docker
+    # bridge DNS, which only works on a bridge network.
--- a/pyinfra/framework/compose/openlit.yml
+++ b/pyinfra/framework/compose/openlit.yml
@@ -0,0 +1,59 @@
+# OpenLIT — LLM observability (traces, costs, KV-cache, prompt/decode
+# latencies, tokens/sec). https://openlit.io
+#
+# Two services:
+#   - clickhouse  : columnar store for traces (internal only, no host port)
+#   - openlit     : Next.js UI on :3001 (3000 is OpenWebUI)
+#
+# Why OpenLIT vs Langfuse/Phoenix/Laminar: it's the only OSS dashboard
+# (May 2026) that auto-instruments Ollama AND vLLM via OpenTelemetry
+# without adding code to client apps. For llama.cpp, start the server
+# with --metrics (see ../llama/docker-compose.yml) and OpenLIT can scrape
+# /metrics.
+#
+# To send traces from a Python script calling Ollama/vLLM:
+#   pip install openlit
+#   python -c "import openlit; openlit.init(otlp_endpoint='http://framework:4318')"
+#
+# To wire OpenWebUI → OpenLIT, install OpenLIT's pipeline middleware
+# in OpenWebUI per https://openlit.io/blogs/openlit-openwebui.
+services:
+  clickhouse:
+    image: clickhouse/clickhouse-server:25.3-alpine
+    container_name: openlit-clickhouse
+    restart: unless-stopped
+    environment:
+      CLICKHOUSE_USER: default
+      CLICKHOUSE_PASSWORD: OPENLIT
+      CLICKHOUSE_DB: openlit
+    volumes:
+      - /srv/docker/openlit/clickhouse:/var/lib/clickhouse
+    ulimits:
+      nofile:
+        soft: 262144
+        hard: 262144
+
+  openlit:
+    image: ghcr.io/openlit/openlit:latest
+    container_name: openlit
+    restart: unless-stopped
+    depends_on:
+      - clickhouse
+    ports:
+      # Host:container — UI on 3001 (OpenWebUI owns 3000).
+      - "3001:3000"
+      # OTLP receivers exposed on the host so SDKs running off-box can
+      # ship traces here. gRPC + HTTP. Remapped (4327/4328 → 4317/4318)
+      # because Phoenix owns the canonical 4317/4318 ports for OpenCode
+      # traces — OpenLIT here is a secondary/fleet-metrics destination.
+      - "4327:4317"
+      - "4328:4318"
+    environment:
+      INIT_DB_HOST: clickhouse
+      INIT_DB_PORT: "8123"
+      INIT_DB_USERNAME: default
+      INIT_DB_PASSWORD: OPENLIT
+      INIT_DB_DATABASE: openlit
+      SQLITE_DATABASE_URL: file:/app/client/data/data.db
+    volumes:
+      - /srv/docker/openlit/data:/app/client/data
--- a/pyinfra/framework/compose/openwebui.yml
+++ b/pyinfra/framework/compose/openwebui.yml
@@ -0,0 +1,25 @@
+# OpenWebUI — ChatGPT-like web UI in front of Ollama. Pre-configured to
+# use the host's Ollama instance and the project's SearXNG for web
+# search. Default port 3000.
+#
+# Persistent state (users, conversations, uploaded docs, RAG vector
+# index) lives at /srv/docker/openwebui/data so backups touch one path.
+services:
+  openwebui:
+    image: ghcr.io/open-webui/open-webui:main
+    container_name: openwebui
+    restart: unless-stopped
+    ports:
+      - "3000:8080"
+    extra_hosts:
+      # Lets the container reach Ollama on the host's :11434 without
+      # needing to share Docker networks.
+      - "host.docker.internal:host-gateway"
+    environment:
+      - OLLAMA_BASE_URL=http://host.docker.internal:11434
+      # Built-in web search via the project's SearXNG instance.
+      - ENABLE_RAG_WEB_SEARCH=true
+      - RAG_WEB_SEARCH_ENGINE=searxng
+      - SEARXNG_QUERY_URL=https://searxng.n0n.io/search?q=<query>&format=json
+    volumes:
+      - /srv/docker/openwebui/data:/app/backend/data
--- a/pyinfra/framework/compose/phoenix.yml
+++ b/pyinfra/framework/compose/phoenix.yml
@@ -0,0 +1,35 @@
+# Arize Phoenix — per-trace agent waterfall / flamegraph viz.
+# https://github.com/Arize-ai/phoenix
+#
+# Picked over Langfuse for "show me one OpenCode turn as a tree":
+#   - Single container vs Langfuse's six (Postgres+ClickHouse+Redis+MinIO+web+worker).
+#   - First-class ingestion of Vercel AI SDK spans (which is what OpenCode
+#     emits under the hood when experimental.openTelemetry=true).
+#   - Best-in-class waterfall + agent-graph view for nested LLM/tool calls.
+#
+# Complements OpenLIT, doesn't replace it: OpenLIT is the fleet-metrics
+# layer (cost / tokens / latency aggregated across sessions). Phoenix is
+# the per-prompt debugger (see what one turn actually did).
+#
+# Bring-up: `docker compose up -d` — no first-run setup needed; UI prompts
+# for project name on first trace ingest. Storage is SQLite at /data.
+services:
+  phoenix:
+    image: arizephoenix/phoenix:latest
+    container_name: phoenix
+    restart: unless-stopped
+    ports:
+      # UI + OTLP/HTTP both ride on 6006 in Phoenix 15.x — HTTP traces go
+      # to http://framework:6006/v1/traces. (Pre-15 had a separate 4318;
+      # the consolidation happened in Phoenix v15.0.)
+      - "6006:6006"
+      # OTLP/gRPC stays separate.
+      - "4317:4317"
+    environment:
+      PHOENIX_WORKING_DIR: /data
+      # Phoenix listens on all interfaces by default; explicit for clarity.
+      PHOENIX_HOST: 0.0.0.0
+      PHOENIX_PORT: "6006"
+      PHOENIX_GRPC_PORT: "4317"
+    volumes:
+      - /srv/docker/phoenix/data:/data
--- a/pyinfra/framework/compose/vllm.yml
+++ b/pyinfra/framework/compose/vllm.yml
@@ -0,0 +1,36 @@
+# vLLM, ROCm backend.
+#
+# NOTE: vLLM's official ROCm support targets datacenter cards (MI300X /
+# gfx942). Strix Halo is gfx1151 — support varies by image tag and
+# release. If `rocm/vllm:latest` doesn't run on this iGPU, try
+# `rocm/vllm-dev:nightly` or build from source against ROCm 7.x.
+services:
+  vllm:
+    image: rocm/vllm:latest
+    container_name: vllm
+    restart: unless-stopped
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri:/dev/dri
+    cap_add:
+      - SYS_PTRACE
+    security_opt:
+      - seccomp=unconfined
+    # Numeric GIDs of host's video (44) and render (991) groups — names
+    # don't exist inside the container.
+    group_add:
+      - "44"
+      - "991"
+    shm_size: 16g
+    ipc: host
+    volumes:
+      - /models:/models:ro
+    ports:
+      - "8000:8000"
+    command:
+      - --model
+      - /models/REPLACE/ME
+      - --host
+      - 0.0.0.0
+      - --port
+      - "8000"
--- a/pyinfra/framework/deploy.py
+++ b/pyinfra/framework/deploy.py
@@ -0,0 +1,487 @@
+"""
+pyinfra deploy for Framework Desktop (Ryzen AI Max+ 395 / Strix Halo).
+
+Containerized layout: inference engines (llama.cpp, vLLM, Ollama) run as
+docker compose services, each shipping its own ROCm/Vulkan stack. The
+host stays minimal — kernel + driver + diagnostics + Docker.
+
+Run with:
+    ./run.sh                # equivalent to pyinfra inventory.py deploy.py
+
+Idempotent — re-running only does work that's actually needed. Includes
+one-shot cleanup steps for artifacts left over from the earlier native
+build (llama.cpp git checkout, symlinks, systemd unit, full ROCm).
+"""
+
+from io import StringIO
+
+from pyinfra.operations import apt, files, server, systemd
+from pyinfra import host
+
+# --- Tunables ----------------------------------------------------------------
+
+# Latest stable as of 2026-04-30. Verify at https://repo.radeon.com/amdgpu-install/.
+ROCM_VERSION = "7.2.3"
+# The .deb filename has an opaque build suffix — find the exact name at
+# the URL above and paste it here.
+AMDGPU_INSTALL_DEB = "amdgpu-install_7.2.3.70203-1_all.deb"
+
+# amdgpu_top — modern GPU monitor that handles new AMD cards / APUs
+# (Strix Halo / gfx1151). Replaces radeontop, which doesn't know about
+# RDNA 3.5. Verify at https://github.com/Umio-Yasuno/amdgpu_top/releases.
+AMDGPU_TOP_VERSION = "0.11.4-1"
+AMDGPU_TOP_DEB = f"amdgpu-top_without_gui_{AMDGPU_TOP_VERSION}_amd64.deb"
+
+SSH_USER = host.data.get("ssh_user", "noise")
+MODELS_DIR = "/models"
+# /srv is the FHS-blessed location for "data and configuration for
+# services this system provides." Owned root:docker with setgid +
+# group-write so any docker-group member can manage compose stacks.
+COMPOSE_DIR = "/srv/docker"
+
+# --- Phase 1 — Base OS basics ------------------------------------------------
+
+apt.update(name="apt update", _sudo=True)
+
+apt.packages(
+    name="HWE kernel (>=6.11 for ROCm 7)",
+    packages=["linux-generic-hwe-24.04"],
+    _sudo=True,
+)
+
+# User basics + monitoring tools.
+apt.packages(
+    name="Base CLI tools",
+    # radeontop intentionally omitted — it predates RDNA 3.5 / Strix Halo
+    # and just errors with "no VRAM support". amdgpu_top installed below.
+    packages=[
+        "tmux",
+        "vim",
+        "htop",
+        "btop",
+        "nvtop",
+        "git",
+        "curl",
+        "ca-certificates",
+        "unzip",
+    ],
+    _sudo=True,
+)
+
+# Vulkan diagnostics on the host (containers ship their own runtime).
+apt.packages(
+    name="Vulkan host diagnostics",
+    packages=["mesa-vulkan-drivers", "vulkan-tools"],
+    _sudo=True,
+)
+
+# uv (Python package/tool manager).
+server.shell(
+    name="Install / upgrade uv",
+    commands=[
+        "curl -LsSf https://astral.sh/uv/install.sh | "
+        "env UV_INSTALL_DIR=/usr/local/bin UV_UNMANAGED_INSTALL=1 sh",
+    ],
+    _sudo=True,
+)
+
+# huggingface_hub CLI for `huggingface-cli download <repo> --local-dir ...`
+# Lands at ~/.local/bin/huggingface-cli for the SSH user. Other users can
+# repeat the command themselves.
+server.shell(
+    name="Install / upgrade huggingface_hub CLI",
+    commands=["uv tool install --upgrade 'huggingface_hub[cli]'"],
+    _sudo=True,
+    _sudo_user=SSH_USER,
+)
+
+# gpakosz/.tmux config (Oh My Tmux!).
+TMUX_CONF_DIR = f"/home/{SSH_USER}/.tmux"
+server.shell(
+    name="Clone gpakosz/.tmux",
+    commands=[
+        f"test -d {TMUX_CONF_DIR}/.git || "
+        f"git clone --depth 1 https://github.com/gpakosz/.tmux.git {TMUX_CONF_DIR}",
+    ],
+    _sudo=True,
+    _sudo_user=SSH_USER,
+)
+files.link(
+    name="Symlink ~/.tmux.conf -> ~/.tmux/.tmux.conf",
+    path=f"/home/{SSH_USER}/.tmux.conf",
+    target=f"{TMUX_CONF_DIR}/.tmux.conf",
+    user=SSH_USER,
+    group=SSH_USER,
+    _sudo=True,
+)
+# Seed the user override file only if absent — preserves customizations.
+server.shell(
+    name="Seed ~/.tmux.conf.local (if missing)",
+    commands=[
+        f"test -f /home/{SSH_USER}/.tmux.conf.local || "
+        f"cp {TMUX_CONF_DIR}/.tmux.conf.local /home/{SSH_USER}/",
+    ],
+    _sudo=True,
+    _sudo_user=SSH_USER,
+)
+
+# --- Tailscale ---------------------------------------------------------------
+
+# Tailscale's noble repo works fine on later Ubuntus — the package is
+# self-contained Go binaries.
+files.download(
+    name="Fetch Tailscale apt key",
+    src="https://pkgs.tailscale.com/stable/ubuntu/noble.noarmor.gpg",
+    dest="/usr/share/keyrings/tailscale-archive-keyring.gpg",
+    mode="644",
+    _sudo=True,
+)
+files.download(
+    name="Fetch Tailscale apt list",
+    src="https://pkgs.tailscale.com/stable/ubuntu/noble.tailscale-keyring.list",
+    dest="/etc/apt/sources.list.d/tailscale.list",
+    mode="644",
+    _sudo=True,
+)
+apt.update(name="apt update (Tailscale)", _sudo=True)
+apt.packages(name="Install Tailscale", packages=["tailscale"], _sudo=True)
+systemd.service(
+    name="Enable tailscaled",
+    service="tailscaled",
+    running=True,
+    enabled=True,
+    _sudo=True,
+)
+# `sudo tailscale up` is interactive (browser auth) — run manually once.
+
+# --- Docker -----------------------------------------------------------------
+
+apt.packages(
+    name="Install Docker + compose plugin",
+    packages=["docker.io", "docker-compose-v2"],
+    _sudo=True,
+)
+systemd.service(
+    name="Enable docker daemon",
+    service="docker",
+    running=True,
+    enabled=True,
+    _sudo=True,
+)
+
+# lazydocker (TUI for docker). System-wide install via official script.
+server.shell(
+    name="Install / upgrade lazydocker",
+    commands=[
+        "curl -fsSL "
+        "https://raw.githubusercontent.com/jesseduffield/lazydocker/master/scripts/install_update_linux.sh "
+        "| DIR=/usr/local/bin bash",
+    ],
+    _sudo=True,
+)
+
+# --- GPU access (host kernel/driver bits only) ------------------------------
+
+# AMD's amdgpu-install package adds the ROCm apt repo; we use it just to
+# get rocminfo for host-side diagnostics. Containers ship
+# their own ROCm.
+files.directory(
+    name="apt keyring dir",
+    path="/etc/apt/keyrings",
+    mode="755",
+    _sudo=True,
+)
+
+amdgpu_deb = f"/tmp/{AMDGPU_INSTALL_DEB}"
+amdgpu_url = (
+    f"https://repo.radeon.com/amdgpu-install/{ROCM_VERSION}/ubuntu/noble/"
+    f"{AMDGPU_INSTALL_DEB}"
+)
+server.shell(
+    name="Fetch amdgpu-install .deb",
+    commands=[f"test -f {amdgpu_deb} || curl -fsSL {amdgpu_url} -o {amdgpu_deb}"],
+    _sudo=True,
+)
+server.shell(
+    name="Install amdgpu-install package",
+    commands=[f"apt install -y {amdgpu_deb}"],
+    _sudo=True,
+)
+
+# Idempotent cleanup: if a prior run installed the full ROCm userspace
+# (~25 GB), tear it down before installing the diagnostic-only subset.
+# On a fresh box this is a no-op.
+server.shell(
+    name="Remove full ROCm install if present",
+    commands=[
+        "if dpkg -l rocm-dev 2>/dev/null | grep -q '^ii'; then "
+        "  amdgpu-install -y --uninstall || true; "
+        "fi",
+    ],
+    _sudo=True,
+)
+apt.packages(
+    name="ROCm host diagnostics (rocminfo)",
+    # rocminfo is the stable diagnostic. The SMI tool's package name has
+    # churned across ROCm releases (rocm-smi-lib → amd-smi-lib in 7.x);
+    # install on demand if you need it.
+    packages=["rocminfo"],
+    _sudo=True,
+)
+
+# amdgpu_top — fetch + install the .deb if not already at the pinned
+# version. Replaces radeontop for newer AMD cards (Strix Halo / gfx1151).
+amdgpu_top_deb = f"/tmp/{AMDGPU_TOP_DEB}"
+amdgpu_top_url = (
+    f"https://github.com/Umio-Yasuno/amdgpu_top/releases/download/"
+    f"v{AMDGPU_TOP_VERSION.split('-')[0]}/{AMDGPU_TOP_DEB}"
+)
+server.shell(
+    name="Install / upgrade amdgpu_top",
+    commands=[
+        f"dpkg -s amdgpu-top 2>/dev/null | grep -q '^Version: {AMDGPU_TOP_VERSION}' || "
+        f"(test -f {amdgpu_top_deb} || curl -fsSL {amdgpu_top_url} -o {amdgpu_top_deb}; "
+        f"apt install -y {amdgpu_top_deb})",
+    ],
+    _sudo=True,
+)
+
+# Group membership for /dev/kfd + /dev/dri access (needed for GPU passthrough
+# into containers, and for unprivileged host-side rocminfo).
+server.group(name="ensure render group", group="render", _sudo=True)
+server.group(name="ensure video group", group="video", _sudo=True)
+server.user(
+    name="Add login user to render/video/docker",
+    user=SSH_USER,
+    groups=["render", "video", "docker"],
+    append=True,
+    _sudo=True,
+)
+
+# Kernel cmdline tuning per Gygeek/Framework-strix-halo-llm-setup:
+#   - amd_iommu=off       — ~6 % memory-read improvement on Strix Halo
+#   - amdgpu.gttsize=117760 — ~115 GB GTT ceiling so the GPU can borrow
+#                            most of system RAM dynamically. Acts as a
+#                            ceiling, not an allocation. See ../../StrixHaloMemory.md
+#                            for the UMA-vs-GTT trade-off discussion.
+# Requires a reboot to take effect; pyinfra leaves that to you.
+files.line(
+    name="GRUB cmdline (amd_iommu, gttsize)",
+    path="/etc/default/grub",
+    line=r"^GRUB_CMDLINE_LINUX_DEFAULT=.*",
+    replace='GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off amdgpu.gttsize=117760"',
+    _sudo=True,
+)
+server.shell(
+    name="Regenerate GRUB config",
+    commands=["update-grub"],
+    _sudo=True,
+)
+
+# --- Storage layout ---------------------------------------------------------
+
+files.directory(
+    name="/models root",
+    path=MODELS_DIR,
+    user=SSH_USER,
+    group=SSH_USER,
+    mode="755",
+    _sudo=True,
+)
+for sub in ("moonshotai", "qwen", "deepseek", "zai", "mistralai"):
+    files.directory(
+        name=f"{MODELS_DIR}/{sub}",
+        path=f"{MODELS_DIR}/{sub}",
+        user=SSH_USER,
+        group=SSH_USER,
+        mode="755",
+        _sudo=True,
+    )
+# Ollama bind-mounts its content-addressed store here.
+files.directory(
+    name=f"{MODELS_DIR}/ollama",
+    path=f"{MODELS_DIR}/ollama",
+    user=SSH_USER,
+    group=SSH_USER,
+    mode="755",
+    _sudo=True,
+)
+
+# --- Compose files for inference services ----------------------------------
+
+files.directory(
+    name="Compose root",
+    path=COMPOSE_DIR,
+    group="docker",
+    mode="2775",          # setgid: new files inherit docker group
+    _sudo=True,
+)
+# Idempotent cleanup of earlier locations.
+files.directory(
+    name="Remove old /srv/compose",
+    path="/srv/compose",
+    present=False,
+    _sudo=True,
+)
+files.directory(
+    name="Remove old ~/docker compose dir",
+    path=f"/home/{SSH_USER}/docker",
+    present=False,
+    _sudo=True,
+)
+for svc in (
+    "llama",
+    "vllm",
+    "ollama",
+    "openwebui",
+    "beszel",
+    "openlit",
+    "phoenix",
+    "openhands",
+):
+    files.directory(
+        name=f"compose/{svc} dir",
+        path=f"{COMPOSE_DIR}/{svc}",
+        group="docker",
+        mode="2775",
+        _sudo=True,
+    )
+    files.put(
+        name=f"compose/{svc}/docker-compose.yml",
+        src=f"compose/{svc}.yml",
+        dest=f"{COMPOSE_DIR}/{svc}/docker-compose.yml",
+        group="docker",
+        mode="664",
+        _sudo=True,
+    )
+
+# OpenWebUI persistent state (users, conversations, uploaded docs,
+# RAG vector index) — bind-mounted into the container.
+files.directory(
+    name="OpenWebUI data dir",
+    path=f"{COMPOSE_DIR}/openwebui/data",
+    group="docker",
+    mode="2775",
+    _sudo=True,
+)
+
+# Beszel persistent state (admin account, system list, metric history).
+files.directory(
+    name="Beszel data dir",
+    path=f"{COMPOSE_DIR}/beszel/data",
+    group="docker",
+    mode="2775",
+    _sudo=True,
+)
+# Sibling .env file Docker Compose auto-reads for variable interpolation.
+# Pyinfra creates it empty if missing and never overwrites — the user
+# fills in BESZEL_TOKEN= and BESZEL_KEY= from the hub UI on first setup.
+# Mode 640 root:docker so docker group members can read it at compose
+# parse time but it isn't world-readable.
+files.file(
+    name="Beszel .env (placeholder)",
+    path=f"{COMPOSE_DIR}/beszel/.env",
+    present=True,
+    mode="640",
+    user="root",
+    group="docker",
+    _sudo=True,
+)
+# Tear down the previous /etc/default/beszel-agent location if a prior
+# deploy created it (idempotent — no-op on a fresh box).
+files.file(
+    name="Remove legacy /etc/default/beszel-agent",
+    path="/etc/default/beszel-agent",
+    present=False,
+    _sudo=True,
+)
+
+# OpenLIT persistent state — Next.js app config + ClickHouse trace store.
+files.directory(
+    name="OpenLIT data dir",
+    path=f"{COMPOSE_DIR}/openlit/data",
+    group="docker",
+    mode="2775",
+    _sudo=True,
+)
+files.directory(
+    name="OpenLIT ClickHouse data dir",
+    path=f"{COMPOSE_DIR}/openlit/clickhouse",
+    group="docker",
+    mode="2775",
+    _sudo=True,
+)
+
+# Phoenix persistent state (SQLite-backed trace store).
+files.directory(
+    name="Phoenix data dir",
+    path=f"{COMPOSE_DIR}/phoenix/data",
+    group="docker",
+    mode="2775",
+    _sudo=True,
+)
+
+# OpenHands state + workspace. Owned by the SSH user (UID 1000) so the
+# sandbox containers — which run as SANDBOX_USER_ID=1000 — can write to
+# the shared workspace without root-owned files leaking out.
+files.directory(
+    name="OpenHands state dir",
+    path=f"{COMPOSE_DIR}/openhands/state",
+    user=SSH_USER,
+    group=SSH_USER,
+    mode="2775",
+    _sudo=True,
+)
+files.directory(
+    name="OpenHands workspace dir",
+    path=f"{COMPOSE_DIR}/openhands/workspace",
+    user=SSH_USER,
+    group=SSH_USER,
+    mode="2775",
+    _sudo=True,
+)
+
+# --- Cleanup of artifacts from the prior native-build deploy ----------------
+# All idempotent — `present=False` is a no-op when the target is absent.
+
+server.shell(
+    name="Stop & disable old native llama-server.service",
+    commands=[
+        "systemctl disable --now llama-server.service 2>/dev/null || true",
+    ],
+    _sudo=True,
+)
+files.file(
+    name="Remove old llama-server.service",
+    path="/etc/systemd/system/llama-server.service",
+    present=False,
+    _sudo=True,
+)
+files.link(
+    name="Remove old llama-server-vulkan symlink",
+    path="/usr/local/bin/llama-server-vulkan",
+    present=False,
+    _sudo=True,
+)
+files.link(
+    name="Remove old llama-server-rocm symlink",
+    path="/usr/local/bin/llama-server-rocm",
+    present=False,
+    _sudo=True,
+)
+files.directory(
+    name="Remove old llama.cpp checkout",
+    path="/opt/llama.cpp",
+    present=False,
+    _sudo=True,
+)
+server.shell(
+    name="Stop & remove native Ollama install",
+    commands=[
+        "systemctl disable --now ollama.service 2>/dev/null || true",
+        "rm -f /etc/systemd/system/ollama.service /usr/local/bin/ollama",
+        "userdel ollama 2>/dev/null || true",
+    ],
+    _sudo=True,
+)
+systemd.daemon_reload(name="systemctl daemon-reload", _sudo=True)
--- a/pyinfra/framework/inventory.py
+++ b/pyinfra/framework/inventory.py
@@ -0,0 +1,5 @@
+# pyinfra inventory for the Framework Desktop / Strix Halo box.
+
+framework_desktop = [
+    ("framework", {"ssh_hostname": "10.0.0.70", "ssh_user": "noise"}),
+]
--- a/pyinfra/framework/run.sh
+++ b/pyinfra/framework/run.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+# Wrapper: invokes pyinfra against the Framework Desktop, forwarding any
+# extra args (e.g. --dry, -v) to pyinfra. Assumes NOPASSWD sudo on the box.
+
+set -euo pipefail
+cd "$(dirname "$0")"
+exec pyinfra -v --ssh-password-prompt inventory.py deploy.py "$@"