71 lines
2.7 KiB
Markdown
71 lines
2.7 KiB
Markdown
|
|
# pyinfra: Strix Halo bring-up
|
||
|
|
|
||
|
|
Containerized setup for the Framework Desktop (Ryzen AI Max+ 395, Radeon
|
||
|
|
8060S, 128 GB). The host stays minimal — kernel + driver + Docker +
|
||
|
|
diagnostics. Inference engines (llama.cpp, vLLM, Ollama) run as docker
|
||
|
|
compose services, each shipping its own ROCm/Vulkan stack.
|
||
|
|
|
||
|
|
## Manual prerequisites
|
||
|
|
|
||
|
|
1. **Phase 0** — update Framework BIOS, set GPU UMA carve-out (96 GB).
|
||
|
|
2. **OS install** — Ubuntu Server 24.04 LTS. AMD ROCm only ships for
|
||
|
|
jammy/noble; later Ubuntus install but break the host-side toolchain
|
||
|
|
(libxml2 ABI). Enable SSH, import your laptop key, create user `noise`.
|
||
|
|
Recommended partitioning: ≥300 GB on `/`, big disk mounted at `/models`,
|
||
|
|
plain ext4 (skip LVM).
|
||
|
|
3. The host must be reachable at `10.0.0.237` over SSH (edit `inventory.py`
|
||
|
|
if it moves).
|
||
|
|
4. **NOPASSWD sudo for `noise`** — pyinfra's fact layer doesn't reliably
|
||
|
|
thread sudo passwords. One-time setup:
|
||
|
|
```sh
|
||
|
|
ssh noise@10.0.0.237 'echo "noise ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/noise-nopasswd && sudo chmod 440 /etc/sudoers.d/noise-nopasswd'
|
||
|
|
```
|
||
|
|
|
||
|
|
## Run
|
||
|
|
|
||
|
|
```sh
|
||
|
|
uv tool install pyinfra
|
||
|
|
./run.sh # equivalent to: pyinfra inventory.py deploy.py
|
||
|
|
./run.sh --dry # any extra args are forwarded to pyinfra
|
||
|
|
```
|
||
|
|
|
||
|
|
Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.
|
||
|
|
|
||
|
|
## What the deploy does
|
||
|
|
|
||
|
|
- Base CLI: tmux, vim, htop, btop, nvtop, radeontop, uv
|
||
|
|
- Tailscale (run `sudo tailscale up` on the box once, interactively)
|
||
|
|
- Docker engine + compose plugin, user added to `docker` group
|
||
|
|
- ROCm host diagnostics only (`rocminfo`, `rocm-smi`) — no full toolchain
|
||
|
|
- `/models/<vendor>/` layout
|
||
|
|
- `~/docker/{llama,vllm,ollama}/docker-compose.yml` dropped in,
|
||
|
|
not auto-started — you edit the model path then `docker compose up -d`
|
||
|
|
|
||
|
|
If a previous run installed the native llama.cpp build / full ROCm /
|
||
|
|
native Ollama, those are auto-cleaned the next time `./run.sh` runs.
|
||
|
|
|
||
|
|
## After the deploy: starting an inference service
|
||
|
|
|
||
|
|
```sh
|
||
|
|
ssh noise@10.0.0.237
|
||
|
|
sudo tailscale up # one-time, interactive
|
||
|
|
|
||
|
|
# Drop a GGUF somewhere under /models, then:
|
||
|
|
cd ~/docker/llama
|
||
|
|
vim docker-compose.yml # edit the --model path
|
||
|
|
docker compose up -d
|
||
|
|
curl localhost:8080/v1/models # smoke test
|
||
|
|
```
|
||
|
|
|
||
|
|
Same shape for `vllm` (port 8000) and `ollama` (port 11434, no model edit
|
||
|
|
needed — Ollama serves models on demand).
|
||
|
|
|
||
|
|
## Tunables
|
||
|
|
|
||
|
|
Top of `deploy.py`:
|
||
|
|
- `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` — bump when AMD ships a newer
|
||
|
|
release. The .deb filename has a build suffix that doesn't derive from
|
||
|
|
the version; find it at https://repo.radeon.com/amdgpu-install/.
|
||
|
|
|
||
|
|
Compose images in `compose/{llama,vllm,ollama}.yml` — pin tags here.
|