# pyinfra: Strix Halo bring-up Containerized setup for the Framework Desktop (Ryzen AI Max+ 395, Radeon 8060S, 128 GB). The host stays minimal — kernel + driver + Docker + diagnostics. Inference engines (llama.cpp, vLLM, Ollama) run as docker compose services, each shipping its own ROCm/Vulkan stack. ## Manual prerequisites 1. **Phase 0** — update Framework BIOS, set GPU UMA carve-out (96 GB). 2. **OS install** — Ubuntu Server (24.04 LTS recommended; 26.04 also works but ROCm host-side support is patchy — see `../TODO.md`). Enable SSH, import your laptop key, create user `noise`. 3. The host must be reachable at `10.0.0.237` over SSH (edit `inventory.py` if it moves). 4. **NOPASSWD sudo for `noise`** — pyinfra's fact layer doesn't reliably thread sudo passwords. One-time setup: ```sh ssh noise@10.0.0.237 'echo "noise ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/noise-nopasswd && sudo chmod 440 /etc/sudoers.d/noise-nopasswd' ``` ## Run ```sh uv tool install pyinfra ./run.sh # equivalent to: pyinfra inventory.py deploy.py ./run.sh --dry # any extra args are forwarded to pyinfra ``` Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`. ## What the deploy does - Base CLI: tmux, vim, htop, btop, nvtop, radeontop, uv - Tailscale (run `sudo tailscale up` on the box once, interactively) - Docker engine + compose plugin, user added to `docker` group - ROCm host diagnostics only (`rocminfo`, `rocm-smi`) — no full toolchain - `/models//` layout - `~/docker/{llama,vllm,ollama}/docker-compose.yml` dropped in, not auto-started — you edit the model path then `docker compose up -d` If a previous run installed the native llama.cpp build / full ROCm / native Ollama, those are auto-cleaned the next time `./run.sh` runs. ## After the deploy: starting an inference service ```sh ssh noise@10.0.0.237 sudo tailscale up # one-time, interactive # Drop a GGUF somewhere under /models, then: cd ~/docker/llama vim docker-compose.yml # edit the --model path docker compose up -d curl localhost:8080/v1/models # smoke test ``` Same shape for `vllm` (port 8000) and `ollama` (port 11434, no model edit needed — Ollama serves models on demand). ## Tunables Top of `deploy.py`: - `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` — bump when AMD ships a newer release. The .deb filename has a build suffix that doesn't derive from the version; find it at https://repo.radeon.com/amdgpu-install/. Compose images in `compose/{llama,vllm,ollama}.yml` — pin tags here.