Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00
commit 2c4bfefa95
36 changed files with 5265 additions and 0 deletions
--- a/pyinfra/framework/compose/beszel.yml
+++ b/pyinfra/framework/compose/beszel.yml
@@ -0,0 +1,66 @@
+# Beszel — host + container + GPU dashboard.
+# https://beszel.dev
+#
+# Picked over Prometheus+Grafana for this box because:
+# - The agent's `amd_sysfs` collector reads /sys/class/drm/card*/device/
+#   directly, which is the only reliable GPU metric source on Strix Halo
+#   (gfx1151). AMD's amd-smi / Device Metrics Exporter return N/A for
+#   util/power/temp on this APU (ROCm#6035), so the official Prometheus
+#   exporter path is dead.
+# - Two containers vs six.
+#
+# First-time setup (WebSocket connection model — current Beszel default):
+#   1. `docker compose up -d beszel`            (start the hub)
+#   2. Open http://framework:8090, create the admin account
+#   3. Click "Add system" — the dialog gives you a TOKEN and an SSH KEY.
+#   4. Edit /srv/docker/beszel/.env (created empty by pyinfra; pyinfra
+#      doesn't overwrite). Add:
+#        BESZEL_TOKEN=<token-from-dialog>
+#        BESZEL_KEY=ssh-ed25519 AAAA…
+#   5. `docker compose up -d --force-recreate beszel-agent`
+#
+# Docker Compose auto-reads the sibling .env file for ${VAR} interpolation
+# in the environment block below — so secrets stay out of the compose
+# file (which pyinfra overwrites) but the env-var names match exactly
+# what the agent expects.
+#
+# Why both TOKEN and KEY: TOKEN identifies which system this agent is,
+# KEY authenticates the agent (the SSH key is reused as the auth secret
+# in the WebSocket handshake). Rotate either by editing the .env and
+# `docker compose up -d --force-recreate`.
+services:
+  beszel:
+    image: henrygd/beszel:latest
+    container_name: beszel
+    restart: unless-stopped
+    ports:
+      - "8090:8090"
+    volumes:
+      - /srv/docker/beszel/data:/beszel_data
+
+  beszel-agent:
+    image: henrygd/beszel-agent:latest
+    container_name: beszel-agent
+    restart: unless-stopped
+    # Host networking so the agent sees real CPU/memory/network counters
+    # without bridge-NAT distortion.
+    network_mode: host
+    volumes:
+      # Read-only Docker socket for per-container CPU/mem/net.
+      - /var/run/docker.sock:/var/run/docker.sock:ro
+      # Sysfs paths the AMD GPU collector reads.
+      - /sys/class/drm:/sys/class/drm:ro
+      - /sys/class/hwmon:/sys/class/hwmon:ro
+    environment:
+      # Pulled from /srv/docker/beszel/.env at compose-parse time.
+      TOKEN: "${BESZEL_TOKEN:-}"
+      KEY: "${BESZEL_KEY:-}"
+      # WebSocket dial-out target — the hub on this same host. The agent
+      # is on host networking, so localhost is the host machine, where
+      # the hub container exposes port 8090.
+      HUB_URL: "http://localhost:8090"
+      # Optional fallback: legacy SSH listener for hub-initiated probing.
+      # Harmless to keep — hub only uses it if WebSocket is unreachable.
+      LISTEN: "45876"
+      # Enable the AMD sysfs GPU collector.
+      GPU: "true"