pyinfra: Strix Halo bring-up

Containerized setup for the Framework Desktop (Ryzen AI Max+ 395, Radeon 8060S, 128 GB). The host stays minimal — kernel + driver + Docker + diagnostics. Inference engines (llama.cpp, vLLM, Ollama) run as docker compose services, each shipping its own ROCm/Vulkan stack.

Manual prerequisites

Phase 0 — update Framework BIOS, set GPU UMA carve-out (96 GB).
OS install — Ubuntu Server 24.04 LTS. AMD ROCm only ships for jammy/noble; later Ubuntus install but break the host-side toolchain (libxml2 ABI). Enable SSH, import your laptop key, create user noise. Recommended partitioning: ≥300 GB on /, big disk mounted at /models, plain ext4 (skip LVM).
The host must be reachable at 10.0.0.237 over SSH (edit inventory.py if it moves).

NOPASSWD sudo for noise — pyinfra's fact layer doesn't reliably thread sudo passwords. One-time setup:

ssh noise@10.0.0.237 'echo "noise ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/noise-nopasswd && sudo chmod 440 /etc/sudoers.d/noise-nopasswd'

Run

uv tool install pyinfra
./run.sh           # equivalent to: pyinfra inventory.py deploy.py
./run.sh --dry     # any extra args are forwarded to pyinfra

Or run it ephemerally without installing: uvx pyinfra inventory.py deploy.py.

What the deploy does

Base CLI: tmux, vim, htop, btop, nvtop, radeontop, uv
Tailscale (run sudo tailscale up on the box once, interactively)
Docker engine + compose plugin, user added to docker group
ROCm host diagnostics only (rocminfo, rocm-smi) — no full toolchain
/models/<vendor>/ layout
~/docker/{llama,vllm,ollama}/docker-compose.yml dropped in, not auto-started — you edit the model path then docker compose up -d

If a previous run installed the native llama.cpp build / full ROCm / native Ollama, those are auto-cleaned the next time ./run.sh runs.

After the deploy: starting an inference service

ssh noise@10.0.0.237
sudo tailscale up                            # one-time, interactive

# Drop a GGUF somewhere under /models, then:
cd ~/docker/llama
vim docker-compose.yml                       # edit the --model path
docker compose up -d
curl localhost:8080/v1/models                # smoke test

Same shape for vllm (port 8000) and ollama (port 11434, no model edit needed — Ollama serves models on demand).

Tunables

Top of deploy.py:

ROCM_VERSION and AMDGPU_INSTALL_DEB — bump when AMD ships a newer release. The .deb filename has a build suffix that doesn't derive from the version; find it at https://repo.radeon.com/amdgpu-install/.

Compose images in compose/{llama,vllm,ollama}.yml — pin tags here.

2.7 KiB Raw Blame History

pyinfra: Strix Halo bring-up

Manual prerequisites

Run

What the deploy does

After the deploy: starting an inference service

Tunables

2.7 KiB

Raw Blame History