framework/README.md

# pyinfra: Strix Halo bring-up

Containerized setup for the Framework Desktop (Ryzen AI Max+ 395, Radeon
8060S, 128 GB). The host stays minimal — kernel + driver + Docker +
diagnostics. Inference engines (llama.cpp, vLLM, Ollama) run as docker
compose services, each shipping its own ROCm/Vulkan stack.

## Manual prerequisites

1. **Phase 0** — update Framework BIOS, set GPU UMA carve-out (96 GB).
2. **OS install** — Ubuntu Server 24.04 LTS. AMD ROCm only ships for
   jammy/noble; later Ubuntus install but break the host-side toolchain
   (libxml2 ABI). Enable SSH, import your laptop key, create user `noise`.
   Recommended partitioning: ≥300 GB on `/`, big disk mounted at `/models`,
   plain ext4 (skip LVM).
3. The host must be reachable at `10.0.0.237` over SSH (edit `inventory.py`
   if it moves).
4. **NOPASSWD sudo for `noise`** — pyinfra's fact layer doesn't reliably
   thread sudo passwords. One-time setup:
   ```sh
   ssh noise@10.0.0.237 'echo "noise ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/noise-nopasswd && sudo chmod 440 /etc/sudoers.d/noise-nopasswd'
   ```

## Run

```sh
uv tool install pyinfra
./run.sh           # equivalent to: pyinfra inventory.py deploy.py
./run.sh --dry     # any extra args are forwarded to pyinfra
```

Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.

## What the deploy does

- Base CLI: tmux, vim, htop, btop, nvtop, radeontop, uv
- Tailscale (run `sudo tailscale up` on the box once, interactively)
- Docker engine + compose plugin, user added to `docker` group
- ROCm host diagnostics only (`rocminfo`, `rocm-smi`) — no full toolchain
- `/models/<vendor>/` layout
- `~/docker/{llama,vllm,ollama}/docker-compose.yml` dropped in,
  not auto-started — you edit the model path then `docker compose up -d`

If a previous run installed the native llama.cpp build / full ROCm /
native Ollama, those are auto-cleaned the next time `./run.sh` runs.

## After the deploy: starting an inference service

```sh
ssh noise@10.0.0.237
sudo tailscale up                            # one-time, interactive

# Drop a GGUF somewhere under /models, then:
cd ~/docker/llama
vim docker-compose.yml                       # edit the --model path
docker compose up -d
curl localhost:8080/v1/models                # smoke test
```

Same shape for `vllm` (port 8000) and `ollama` (port 11434, no model edit
needed — Ollama serves models on demand).

## Tunables

Top of `deploy.py`:
- `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` — bump when AMD ships a newer
  release. The .deb filename has a build suffix that doesn't derive from
  the version; find it at https://repo.radeon.com/amdgpu-install/.

Compose images in `compose/{llama,vllm,ollama}.yml` — pin tags here.
folder-per-station 2026-05-07 07:37:40 -04:00			`# pyinfra: Strix Halo bring-up`

			`Containerized setup for the Framework Desktop (Ryzen AI Max+ 395, Radeon`
			`8060S, 128 GB). The host stays minimal — kernel + driver + Docker +`
			`diagnostics. Inference engines (llama.cpp, vLLM, Ollama) run as docker`
			`compose services, each shipping its own ROCm/Vulkan stack.`

			`## Manual prerequisites`

			`1. Phase 0 — update Framework BIOS, set GPU UMA carve-out (96 GB).`
			`2. OS install — Ubuntu Server 24.04 LTS. AMD ROCm only ships for`
			`jammy/noble; later Ubuntus install but break the host-side toolchain`
			(libxml2 ABI). Enable SSH, import your laptop key, create user `noise`.
			Recommended partitioning: ≥300 GB on `/`, big disk mounted at `/models`,
			`plain ext4 (skip LVM).`
			3. The host must be reachable at `10.0.0.237` over SSH (edit `inventory.py`
			`if it moves).`
			4. NOPASSWD sudo for `noise` — pyinfra's fact layer doesn't reliably
			`thread sudo passwords. One-time setup:`
			```sh
			`ssh noise@10.0.0.237 'echo "noise ALL=(ALL) NOPASSWD: ALL" \| sudo tee /etc/sudoers.d/noise-nopasswd && sudo chmod 440 /etc/sudoers.d/noise-nopasswd'`
			```

			`## Run`

			```sh
			`uv tool install pyinfra`
			`./run.sh # equivalent to: pyinfra inventory.py deploy.py`
			`./run.sh --dry # any extra args are forwarded to pyinfra`
			```

			Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.

			`## What the deploy does`

			`- Base CLI: tmux, vim, htop, btop, nvtop, radeontop, uv`
			- Tailscale (run `sudo tailscale up` on the box once, interactively)
			- Docker engine + compose plugin, user added to `docker` group
			- ROCm host diagnostics only (`rocminfo`, `rocm-smi`) — no full toolchain
			- `/models/<vendor>/` layout
			- `~/docker/{llama,vllm,ollama}/docker-compose.yml` dropped in,
			not auto-started — you edit the model path then `docker compose up -d`

			`If a previous run installed the native llama.cpp build / full ROCm /`
			native Ollama, those are auto-cleaned the next time `./run.sh` runs.

			`## After the deploy: starting an inference service`

			```sh
			`ssh noise@10.0.0.237`
			`sudo tailscale up # one-time, interactive`

			`# Drop a GGUF somewhere under /models, then:`
			`cd ~/docker/llama`
			`vim docker-compose.yml # edit the --model path`
			`docker compose up -d`
			`curl localhost:8080/v1/models # smoke test`
			```

			Same shape for `vllm` (port 8000) and `ollama` (port 11434, no model edit
			`needed — Ollama serves models on demand).`

			`## Tunables`

			Top of `deploy.py`:
			- `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` — bump when AMD ships a newer
			`release. The .deb filename has a build suffix that doesn't derive from`
			`the version; find it at https://repo.radeon.com/amdgpu-install/.`

			Compose images in `compose/{llama,vllm,ollama}.yml` — pin tags here.