Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00
commit 2c4bfefa95
36 changed files with 5265 additions and 0 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -0,0 +1,44 @@
+# TODO
+
+## ROCm / vLLM on Strix Halo (gfx1151)
+
+The Framework Desktop runs **Ubuntu 26.04 LTS**; AMD only ships ROCm
+7.2.3 packages for jammy (22.04) and noble (24.04). We installed the
+noble repo but pulled only `rocminfo` + `rocm-smi-lib` for host-side
+diagnostics — all heavy ROCm work runs in containers, which ship their
+own ROCm stack. This sidesteps the host-side libxml2 ABI mismatch (noble
+ships `libxml2.so.2`, 26.04 ships `libxml2.so.16`) that broke the native
+HIP toolchain.
+
+### Open questions
+
+- **Does `rocm/vllm:latest` actually run on Strix Halo's iGPU?** vLLM's
+  AMD support officially targets datacenter cards (MI300X / gfx942).
+  gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image
+  doesn't initialize the device, try `rocm/vllm-dev:nightly` or build
+  from source against ROCm 7.x with `-DAMDGPU_TARGETS=gfx1151`.
+- **AMD support for 26.04** — watch https://repo.radeon.com/amdgpu-install/<latest>/ubuntu/
+  for a directory matching the box's codename. AMD historically lags
+  Ubuntu LTS by 6–12 months for ROCm packaging.
+
+### When 26.04 ROCm packages land
+
+If you ever want to do native ROCm work on the host (rather than via
+containers):
+1. Bump `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` in `pyinfra/deploy.py`
+   to the new release.
+2. Update the apt source URL path in `deploy.py` if AMD adds a new
+   release codename (currently hardcoded to `noble`).
+3. Add a step that runs `amdgpu-install -y --usecase=rocm --no-dkms`
+   (the current deploy explicitly avoids this to stay slim).
+4. `./run.sh`.
+
+For container-only workflows (current default), no action is needed —
+container images update independently of the host.
+
+## Pick a coding model (StrixHaloSetup Phase 6)
+
+Open question — research current Strix Halo benchmarks before
+committing. Candidates: Qwen3-Coder, DeepSeek-Coder-V3.x, GLM-4.6,
+Devstral, Kimi-K2. Track Kimi Linear separately via the weekly routine
+referenced in `StrixHaloSetup.md`.