Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.
- pyinfra/framework/: pyinfra deploy targeting the box
- llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
for gfx1151), OpenWebUI
- Beszel (host + container + AMD GPU dashboard via sysfs)
- OpenLIT (LLM fleet metrics)
- Phoenix (per-trace agent waterfall)
- OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
- install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1.9 KiB
TODO
ROCm / vLLM on Strix Halo (gfx1151)
The Framework Desktop runs Ubuntu 26.04 LTS; AMD only ships ROCm
7.2.3 packages for jammy (22.04) and noble (24.04). We installed the
noble repo but pulled only rocminfo + rocm-smi-lib for host-side
diagnostics — all heavy ROCm work runs in containers, which ship their
own ROCm stack. This sidesteps the host-side libxml2 ABI mismatch (noble
ships libxml2.so.2, 26.04 ships libxml2.so.16) that broke the native
HIP toolchain.
Open questions
- Does
rocm/vllm:latestactually run on Strix Halo's iGPU? vLLM's AMD support officially targets datacenter cards (MI300X / gfx942). gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image doesn't initialize the device, tryrocm/vllm-dev:nightlyor build from source against ROCm 7.x with-DAMDGPU_TARGETS=gfx1151. - AMD support for 26.04 — watch https://repo.radeon.com/amdgpu-install//ubuntu/ for a directory matching the box's codename. AMD historically lags Ubuntu LTS by 6–12 months for ROCm packaging.
When 26.04 ROCm packages land
If you ever want to do native ROCm work on the host (rather than via containers):
- Bump
ROCM_VERSIONandAMDGPU_INSTALL_DEBinpyinfra/deploy.pyto the new release. - Update the apt source URL path in
deploy.pyif AMD adds a new release codename (currently hardcoded tonoble). - Add a step that runs
amdgpu-install -y --usecase=rocm --no-dkms(the current deploy explicitly avoids this to stay slim). ./run.sh.
For container-only workflows (current default), no action is needed — container images update independently of the host.
Pick a coding model (StrixHaloSetup Phase 6)
Open question — research current Strix Halo benchmarks before
committing. Candidates: Qwen3-Coder, DeepSeek-Coder-V3.x, GLM-4.6,
Devstral, Kimi-K2. Track Kimi Linear separately via the weekly routine
referenced in StrixHaloSetup.md.