Files
localgenai/TODO.md
noisedestroyers 2c4bfefa95 Initial commit: localgenai stack
Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.

- pyinfra/framework/: pyinfra deploy targeting the box
  - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
    for gfx1151), OpenWebUI
  - Beszel (host + container + AMD GPU dashboard via sysfs)
  - OpenLIT (LLM fleet metrics)
  - Phoenix (per-trace agent waterfall)
  - OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
  - install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
  documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00

1.9 KiB
Raw Blame History

TODO

ROCm / vLLM on Strix Halo (gfx1151)

The Framework Desktop runs Ubuntu 26.04 LTS; AMD only ships ROCm 7.2.3 packages for jammy (22.04) and noble (24.04). We installed the noble repo but pulled only rocminfo + rocm-smi-lib for host-side diagnostics — all heavy ROCm work runs in containers, which ship their own ROCm stack. This sidesteps the host-side libxml2 ABI mismatch (noble ships libxml2.so.2, 26.04 ships libxml2.so.16) that broke the native HIP toolchain.

Open questions

  • Does rocm/vllm:latest actually run on Strix Halo's iGPU? vLLM's AMD support officially targets datacenter cards (MI300X / gfx942). gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image doesn't initialize the device, try rocm/vllm-dev:nightly or build from source against ROCm 7.x with -DAMDGPU_TARGETS=gfx1151.
  • AMD support for 26.04 — watch https://repo.radeon.com/amdgpu-install//ubuntu/ for a directory matching the box's codename. AMD historically lags Ubuntu LTS by 612 months for ROCm packaging.

When 26.04 ROCm packages land

If you ever want to do native ROCm work on the host (rather than via containers):

  1. Bump ROCM_VERSION and AMDGPU_INSTALL_DEB in pyinfra/deploy.py to the new release.
  2. Update the apt source URL path in deploy.py if AMD adds a new release codename (currently hardcoded to noble).
  3. Add a step that runs amdgpu-install -y --usecase=rocm --no-dkms (the current deploy explicitly avoids this to stay slim).
  4. ./run.sh.

For container-only workflows (current default), no action is needed — container images update independently of the host.

Pick a coding model (StrixHaloSetup Phase 6)

Open question — research current Strix Halo benchmarks before committing. Candidates: Qwen3-Coder, DeepSeek-Coder-V3.x, GLM-4.6, Devstral, Kimi-K2. Track Kimi Linear separately via the weekly routine referenced in StrixHaloSetup.md.