TODO.md

# TODO

## ROCm / vLLM on Strix Halo (gfx1151)

The Framework Desktop runs **Ubuntu 24.04 LTS (noble)**, which aligns
with AMD's ROCm 7.x packaging. The deploy installs `rocminfo` and
`librocm-smi-dev` host-side; heavier ROCm bits (full HIP toolchain,
device-mapped libraries) still run inside containers that ship their
own ROCm stack. The host stays slim by design.

### Open questions

- **Does `rocm/vllm:latest` actually run on Strix Halo's iGPU?** vLLM's
  AMD support officially targets datacenter cards (MI300X / gfx942).
  gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image
  doesn't initialize the device, try `rocm/vllm-dev:nightly` or build
  from source against ROCm 7.x with `-DAMDGPU_TARGETS=gfx1151`.

### If you ever want full host-side ROCm

For native ROCm work on the host (compiling HIP kernels, full toolchain):
1. Bump `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` in
   `pyinfra/framework/deploy.py` to the latest release.
2. Add a step that runs `amdgpu-install -y --usecase=rocm --no-dkms`
   (currently avoided to stay slim — ~25 GB toolchain).
3. `./run.sh`.

For container-only workflows (current default), no action is needed —
container images update independently of the host.

## Pick a coding model (StrixHaloSetup Phase 6)

Open question — research current Strix Halo benchmarks before
committing. Candidates: Qwen3-Coder, DeepSeek-Coder-V3.x, GLM-4.6,
Devstral, Kimi-K2. Track Kimi Linear separately via the weekly routine
referenced in `StrixHaloSetup.md`.
Initial commit: localgenai stack Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-08 11:35:10 -04:00			`# TODO`

			`## ROCm / vLLM on Strix Halo (gfx1151)`

Build btop 1.4 from source with AMD GPU support apt's btop on 24.04 is 1.3.x, which has no AMD GPU monitoring. 1.4+ adds it but requires C++23, which gcc-13 (24.04 default) doesn't fully support. Plan: - Add ubuntu-toolchain-r/test PPA, install g++-14 (C++23-capable). - Add librocm-smi-dev to ROCm host diagnostics — btop dlopens librocm_smi64 at runtime; the headers are needed at compile time. - Drop btop from apt list, build from a pinned BTOP_VERSION tag with GPU_SUPPORT=true CXX=g++-14 -j; install to /usr/local/bin. - Idempotent — only rebuilds if installed version doesn't match. After deploy: btop → Esc → Options → "show_gpu_info" → On to enable the GPU panel. Also clean up TODO.md — the box is on 24.04 (noble), not 26.04. The libxml2 ABI mismatch / "ROCm gap" section was stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-08 16:34:52 -04:00			`The Framework Desktop runs Ubuntu 24.04 LTS (noble), which aligns`
			with AMD's ROCm 7.x packaging. The deploy installs `rocminfo` and
			`librocm-smi-dev` host-side; heavier ROCm bits (full HIP toolchain,
			`device-mapped libraries) still run inside containers that ship their`
			`own ROCm stack. The host stays slim by design.`
Initial commit: localgenai stack Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-08 11:35:10 -04:00
			`### Open questions`

			- Does `rocm/vllm:latest` actually run on Strix Halo's iGPU? vLLM's
			`AMD support officially targets datacenter cards (MI300X / gfx942).`
			`gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image`
			doesn't initialize the device, try `rocm/vllm-dev:nightly` or build
			from source against ROCm 7.x with `-DAMDGPU_TARGETS=gfx1151`.
Build btop 1.4 from source with AMD GPU support apt's btop on 24.04 is 1.3.x, which has no AMD GPU monitoring. 1.4+ adds it but requires C++23, which gcc-13 (24.04 default) doesn't fully support. Plan: - Add ubuntu-toolchain-r/test PPA, install g++-14 (C++23-capable). - Add librocm-smi-dev to ROCm host diagnostics — btop dlopens librocm_smi64 at runtime; the headers are needed at compile time. - Drop btop from apt list, build from a pinned BTOP_VERSION tag with GPU_SUPPORT=true CXX=g++-14 -j; install to /usr/local/bin. - Idempotent — only rebuilds if installed version doesn't match. After deploy: btop → Esc → Options → "show_gpu_info" → On to enable the GPU panel. Also clean up TODO.md — the box is on 24.04 (noble), not 26.04. The libxml2 ABI mismatch / "ROCm gap" section was stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-08 16:34:52 -04:00
			`### If you ever want full host-side ROCm`

			`For native ROCm work on the host (compiling HIP kernels, full toolchain):`
			1. Bump `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` in
			`pyinfra/framework/deploy.py` to the latest release.
			2. Add a step that runs `amdgpu-install -y --usecase=rocm --no-dkms`
			`(currently avoided to stay slim — ~25 GB toolchain).`
			3. `./run.sh`.
Initial commit: localgenai stack Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-08 11:35:10 -04:00
			`For container-only workflows (current default), no action is needed —`
			`container images update independently of the host.`

			`## Pick a coding model (StrixHaloSetup Phase 6)`

			`Open question — research current Strix Halo benchmarks before`
			`committing. Candidates: Qwen3-Coder, DeepSeek-Coder-V3.x, GLM-4.6,`
			`Devstral, Kimi-K2. Track Kimi Linear separately via the weekly routine`
			referenced in `StrixHaloSetup.md`.