Initial commit: localgenai stack
Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.
- pyinfra/framework/: pyinfra deploy targeting the box
- llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
for gfx1151), OpenWebUI
- Beszel (host + container + AMD GPU dashboard via sysfs)
- OpenLIT (LLM fleet metrics)
- Phoenix (per-trace agent waterfall)
- OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
- install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
82
pyinfra/HowToScan.md
Normal file
82
pyinfra/HowToScan.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# How to onboard an existing host into pyinfra
|
||||
|
||||
There's no auto-generator that reads a running system and emits a
|
||||
deploy.py — not in pyinfra, not in Ansible, not in any IaC tool.
|
||||
Reverse-engineering "what's intentional vs incidental" needs human
|
||||
judgment. Below is the workflow that actually works.
|
||||
|
||||
## What pyinfra gives you for free
|
||||
|
||||
The **facts** API queries live state without writing any deploy:
|
||||
|
||||
```sh
|
||||
pyinfra @host fact deb.DebPackages # installed packages
|
||||
pyinfra @host fact apt.AptSources # extra apt repos
|
||||
pyinfra @host fact systemd.SystemdServices # service state
|
||||
pyinfra @host fact server.Crontab # cron
|
||||
pyinfra @host fact server.Users # users
|
||||
```
|
||||
|
||||
Output is JSON. There's no fact → op translator, but the data is easy
|
||||
to grep through and tells you what's actually on the box.
|
||||
|
||||
## Practical workflow
|
||||
|
||||
1. Run a scanner against the live host that dumps the *meaningful*
|
||||
state — packages explicitly installed, enabled services, custom
|
||||
configs, dotfiles, sudoers fragments, fstab non-defaults. Skip the
|
||||
distro-default noise.
|
||||
2. Hand-pick the bits worth managing into a new `<station>/deploy.py`.
|
||||
3. Run with `--dry` against the live box; pyinfra's diff shows what's
|
||||
still drifting.
|
||||
4. Iterate until `--dry` reports "no changes."
|
||||
|
||||
## One-shot scanner
|
||||
|
||||
```sh
|
||||
ssh you@host '
|
||||
echo "=== manually-installed packages ==="
|
||||
apt-mark showmanual
|
||||
echo "=== enabled services ==="
|
||||
systemctl list-unit-files --state=enabled --type=service --no-legend | awk "{print \$1}"
|
||||
echo "=== extra apt repos ==="
|
||||
ls /etc/apt/sources.list.d/
|
||||
echo "=== sudoers fragments ==="
|
||||
ls /etc/sudoers.d/
|
||||
echo "=== fstab non-default ==="
|
||||
grep -vE "^#|^$|/proc|/sys|/dev/pts" /etc/fstab
|
||||
echo "=== users with login shells ==="
|
||||
getent passwd | awk -F: "\$3>=1000 && \$3<60000 {print \$1}"
|
||||
echo "=== docker stacks ==="
|
||||
ls /srv/docker/ 2>/dev/null
|
||||
echo "=== home dotfiles ==="
|
||||
sudo find /home -maxdepth 3 -name ".*rc" -o -name ".*config" 2>/dev/null
|
||||
'
|
||||
```
|
||||
|
||||
That gets you ~80% of what's worth managing in roughly 30 lines of
|
||||
pyinfra. The remaining 20% is case-by-case (kernel cmdline tweaks,
|
||||
hand-edited /etc files, dotfile contents) that no scanner fully
|
||||
captures — you write those as you re-discover the box's quirks.
|
||||
|
||||
## What scanners can't catch
|
||||
|
||||
- The *intent* behind a config (why this `sysctl` value, why this
|
||||
cron entry).
|
||||
- Hand-edited fragments mixed into otherwise-default files (e.g. a
|
||||
custom line in `/etc/ssh/sshd_config`).
|
||||
- Out-of-band state: data in databases, container volumes,
|
||||
`/var/lib/<service>`.
|
||||
|
||||
For these, treat pyinfra as managing the *bones* (packages, services,
|
||||
users, configs) and leave the data alone. Re-deploying onto fresh
|
||||
hardware should give you a working system; restoring data is a
|
||||
separate concern.
|
||||
|
||||
## If "scan and reproduce" is a hard requirement
|
||||
|
||||
The native answer is **NixOS**. The whole OS is declarative;
|
||||
`nixos-generate-config` reads your machine and writes a full config
|
||||
that recreates it. Different paradigm, big switch — but the right
|
||||
answer if reproducible-from-a-file is core to your workflow and you
|
||||
aren't already deep in Ubuntu-land.
|
||||
12
pyinfra/README.md
Normal file
12
pyinfra/README.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# pyinfra
|
||||
|
||||
One folder per station. Each subfolder is a self-contained pyinfra
|
||||
deploy: `inventory.py`, `deploy.py`, `run.sh`, plus any compose files
|
||||
or assets that ship to the host.
|
||||
|
||||
| Station | Host | Notes |
|
||||
|---------|------|-------|
|
||||
| [`framework/`](framework/README.md) | `10.0.0.237` | Framework Desktop (Strix Halo, 128 GB) — local LLM box |
|
||||
|
||||
To bring up a station, `cd` into its folder and run `./run.sh`. See the
|
||||
station's own README for prerequisites.
|
||||
184
pyinfra/framework/README.md
Normal file
184
pyinfra/framework/README.md
Normal file
@@ -0,0 +1,184 @@
|
||||
# pyinfra: Strix Halo bring-up
|
||||
|
||||
Containerized setup for the Framework Desktop (Ryzen AI Max+ 395, Radeon
|
||||
8060S, 128 GB). The host stays minimal — kernel + driver + Docker +
|
||||
diagnostics. Inference engines (llama.cpp, vLLM, Ollama) run as docker
|
||||
compose services, each shipping its own ROCm/Vulkan stack.
|
||||
|
||||
## Manual prerequisites
|
||||
|
||||
1. **Phase 0** — update Framework BIOS, set GPU UMA carve-out (96 GB).
|
||||
2. **OS install** — Ubuntu Server 24.04 LTS. AMD ROCm only ships for
|
||||
jammy/noble; later Ubuntus install but break the host-side toolchain
|
||||
(libxml2 ABI). Enable SSH, import your laptop key, create user `noise`.
|
||||
Recommended partitioning: ≥300 GB on `/`, big disk mounted at `/models`,
|
||||
plain ext4 (skip LVM).
|
||||
3. The host must be reachable at `10.0.0.237` over SSH (edit `inventory.py`
|
||||
if it moves).
|
||||
4. **NOPASSWD sudo for `noise`** — pyinfra's fact layer doesn't reliably
|
||||
thread sudo passwords. One-time setup:
|
||||
```sh
|
||||
ssh noise@10.0.0.237 'echo "noise ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/noise-nopasswd && sudo chmod 440 /etc/sudoers.d/noise-nopasswd'
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
```sh
|
||||
uv tool install pyinfra
|
||||
./run.sh # equivalent to: pyinfra inventory.py deploy.py
|
||||
./run.sh --dry # any extra args are forwarded to pyinfra
|
||||
```
|
||||
|
||||
Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.
|
||||
|
||||
## What the deploy does
|
||||
|
||||
- Base CLI: tmux, vim, htop, btop, nvtop, amdgpu_top, uv, lazydocker, huggingface-cli
|
||||
- Tailscale (run `sudo tailscale up` on the box once, interactively)
|
||||
- Docker engine + compose plugin, user added to `docker` group
|
||||
- ROCm host diagnostics only (`rocminfo`) — no full toolchain
|
||||
- GRUB kernel params for Strix Halo perf: `amd_iommu=off`,
|
||||
`amdgpu.gttsize=117760` (per [Gygeek/Framework-strix-halo-llm-setup](https://github.com/Gygeek/Framework-strix-halo-llm-setup)).
|
||||
Requires a reboot to activate — pyinfra rewrites `/etc/default/grub`
|
||||
and runs `update-grub`, but won't reboot for you.
|
||||
- `/models/<vendor>/` layout
|
||||
- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands}/docker-compose.yml`
|
||||
dropped in, not auto-started — you edit the model path then
|
||||
`docker compose up -d`
|
||||
(OpenWebUI needs no edits — it's pre-configured to find Ollama at
|
||||
`host.docker.internal:11434` and uses `searxng.n0n.io` for web search)
|
||||
(Beszel hub at :8090, OpenLIT UI at :3001, Phoenix UI at :6006,
|
||||
OpenHands UI at :3030 — see "Monitoring stack" and "Agent harnesses"
|
||||
below)
|
||||
|
||||
If a previous run installed the native llama.cpp build / full ROCm /
|
||||
native Ollama, those are auto-cleaned the next time `./run.sh` runs.
|
||||
|
||||
## After the deploy: starting an inference service
|
||||
|
||||
```sh
|
||||
ssh noise@10.0.0.237
|
||||
sudo tailscale up # one-time, interactive
|
||||
|
||||
# Drop a GGUF somewhere under /models, then:
|
||||
cd /srv/docker/llama
|
||||
vim docker-compose.yml # edit the --model path
|
||||
docker compose up -d
|
||||
curl localhost:8080/v1/models # smoke test
|
||||
```
|
||||
|
||||
Same shape for `vllm` (port 8000) and `ollama` (port 11434, no model edit
|
||||
needed — Ollama serves models on demand).
|
||||
|
||||
## Tunables
|
||||
|
||||
Top of `deploy.py`:
|
||||
- `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` — bump when AMD ships a newer
|
||||
release. The .deb filename has a build suffix that doesn't derive from
|
||||
the version; find it at https://repo.radeon.com/amdgpu-install/.
|
||||
- `AMDGPU_TOP_VERSION` — bump when a newer release lands at
|
||||
https://github.com/Umio-Yasuno/amdgpu_top/releases.
|
||||
|
||||
Compose images in
|
||||
`compose/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands}.yml`
|
||||
— pin tags here.
|
||||
|
||||
## Monitoring stack
|
||||
|
||||
Two compose stacks watch the inference services:
|
||||
|
||||
- **Beszel** (`/srv/docker/beszel`, http://framework:8090) — host + Docker
|
||||
container + AMD GPU dashboard. The agent's `amd_sysfs` collector reads
|
||||
`/sys/class/drm/cardN/device/` directly; this is the only path that
|
||||
reports real numbers on Strix Halo (gfx1151) — `amd-smi` returns N/A
|
||||
for util/power/temp on this APU.
|
||||
|
||||
First-time bring-up:
|
||||
```sh
|
||||
cd /srv/docker/beszel
|
||||
docker compose up -d beszel # 1. hub
|
||||
# 2. open http://framework:8090 → create admin → "Add system"
|
||||
# 3. copy the TOKEN and the SSH KEY from the dialog into .env:
|
||||
sudo tee /srv/docker/beszel/.env >/dev/null <<EOF
|
||||
BESZEL_TOKEN=<token-from-dialog>
|
||||
BESZEL_KEY=ssh-ed25519 AAAA…
|
||||
EOF
|
||||
sudo chgrp docker /srv/docker/beszel/.env
|
||||
sudo chmod 640 /srv/docker/beszel/.env
|
||||
docker compose up -d --force-recreate beszel-agent # 4. agent
|
||||
```
|
||||
|
||||
- **OpenLIT** (`/srv/docker/openlit`, http://framework:3001) — fleet
|
||||
metrics for the LLM stack (cost, tokens, latency aggregated across
|
||||
sessions and models). Auto-instruments Ollama and vLLM via
|
||||
OpenTelemetry without app-code changes; for llama.cpp, the compose
|
||||
file already passes `--metrics` so OpenLIT can scrape `/metrics`.
|
||||
ClickHouse-backed. OTLP receivers exposed on the host at **:4327
|
||||
(gRPC) / :4328 (HTTP)** — Phoenix owns 4317/4318.
|
||||
|
||||
Bring-up: `cd /srv/docker/openlit && docker compose up -d`. UI prompts
|
||||
to create an admin account on first load.
|
||||
|
||||
- **Phoenix** (`/srv/docker/phoenix`, http://framework:6006) — per-trace
|
||||
agent waterfall / flamegraph. Designed to answer "show me what one
|
||||
OpenCode turn actually did" — full call tree, nested LLM/tool calls,
|
||||
token counts inline. Single container, SQLite-backed. OTLP receivers
|
||||
on the host at **:4317 (gRPC)** and **:6006/v1/traces (HTTP)** —
|
||||
Phoenix 15.x serves HTTP OTLP on the UI port, not a separate 4318.
|
||||
|
||||
Bring-up: `cd /srv/docker/phoenix && docker compose up -d`. UI
|
||||
immediately available; no auth in the self-hosted single-user path.
|
||||
|
||||
See [`localgenai/opencode/README.md`](../../opencode/README.md) for
|
||||
how OpenCode is configured to ship traces here.
|
||||
|
||||
The official AMD Prometheus exporters (`amd-smi-exporter`,
|
||||
`device-metrics-exporter`) are intentionally **not** deployed — they're
|
||||
broken on gfx1151 (ROCm#6035). If you ever want full Prometheus +
|
||||
Grafana, the migration path is to replace Beszel with `node_exporter` +
|
||||
a textfile collector reading `/sys/class/drm/cardN/device/` and
|
||||
`/sys/class/hwmon/`.
|
||||
|
||||
## Agent harnesses
|
||||
|
||||
Two agent UIs in front of the local model stack — pick one (or both)
|
||||
based on how you like to drive the agent:
|
||||
|
||||
- **OpenWebUI** (`/srv/docker/openwebui`, http://framework:3000) —
|
||||
ChatGPT-style web UI. Pre-wired to Ollama + SearXNG web search.
|
||||
Best for casual chat and household-shared LLM access.
|
||||
|
||||
Bring-up: `cd /srv/docker/openwebui && docker compose up -d`.
|
||||
|
||||
- **OpenHands** (`/srv/docker/openhands`, http://framework:3030,
|
||||
loopback-only) — autonomous agent in a Docker sandbox. Spawns a
|
||||
per-conversation `agent-server` container that can write code, run
|
||||
tests, browse the web. Pre-configured for Ollama at
|
||||
`openai/qwen3-coder:30b` over the OpenAI-compatible endpoint;
|
||||
ships traces to Phoenix.
|
||||
|
||||
Bring-up:
|
||||
```sh
|
||||
cd /srv/docker/openhands && docker compose up -d
|
||||
# Tunnel the loopback-bound UI from your laptop:
|
||||
ssh -L 3030:127.0.0.1:3030 noise@framework
|
||||
open http://localhost:3030
|
||||
```
|
||||
|
||||
First run pulls the agent-server image (~2 GB) lazily on first
|
||||
conversation, not at startup, so the orchestrator comes up fast but
|
||||
your first message takes 30–60 s. Pre-0.44 state path was
|
||||
`~/.openhands-state`; not relevant on a fresh install.
|
||||
|
||||
Tool-call quality with local models is much better when Ollama's
|
||||
context is bumped — the compose at `/srv/docker/ollama` already sets
|
||||
`OLLAMA_CONTEXT_LENGTH=65536`, which is plenty.
|
||||
|
||||
OpenCode (the terminal driver, configured in
|
||||
[`localgenai/opencode/`](../../opencode/)) runs on the Mac and points
|
||||
at the same Ollama endpoint, so all three harnesses share the same
|
||||
model server.
|
||||
|
||||
The llama.cpp image is `kyuz0/amd-strix-halo-toolboxes:vulkan-radv`,
|
||||
gfx1151-optimized with rocWMMA flash attention (the latter only kicks
|
||||
in on the ROCm tags). Auto-rebuilt against llama.cpp master.
|
||||
66
pyinfra/framework/compose/beszel.yml
Normal file
66
pyinfra/framework/compose/beszel.yml
Normal file
@@ -0,0 +1,66 @@
|
||||
# Beszel — host + container + GPU dashboard.
|
||||
# https://beszel.dev
|
||||
#
|
||||
# Picked over Prometheus+Grafana for this box because:
|
||||
# - The agent's `amd_sysfs` collector reads /sys/class/drm/card*/device/
|
||||
# directly, which is the only reliable GPU metric source on Strix Halo
|
||||
# (gfx1151). AMD's amd-smi / Device Metrics Exporter return N/A for
|
||||
# util/power/temp on this APU (ROCm#6035), so the official Prometheus
|
||||
# exporter path is dead.
|
||||
# - Two containers vs six.
|
||||
#
|
||||
# First-time setup (WebSocket connection model — current Beszel default):
|
||||
# 1. `docker compose up -d beszel` (start the hub)
|
||||
# 2. Open http://framework:8090, create the admin account
|
||||
# 3. Click "Add system" — the dialog gives you a TOKEN and an SSH KEY.
|
||||
# 4. Edit /srv/docker/beszel/.env (created empty by pyinfra; pyinfra
|
||||
# doesn't overwrite). Add:
|
||||
# BESZEL_TOKEN=<token-from-dialog>
|
||||
# BESZEL_KEY=ssh-ed25519 AAAA…
|
||||
# 5. `docker compose up -d --force-recreate beszel-agent`
|
||||
#
|
||||
# Docker Compose auto-reads the sibling .env file for ${VAR} interpolation
|
||||
# in the environment block below — so secrets stay out of the compose
|
||||
# file (which pyinfra overwrites) but the env-var names match exactly
|
||||
# what the agent expects.
|
||||
#
|
||||
# Why both TOKEN and KEY: TOKEN identifies which system this agent is,
|
||||
# KEY authenticates the agent (the SSH key is reused as the auth secret
|
||||
# in the WebSocket handshake). Rotate either by editing the .env and
|
||||
# `docker compose up -d --force-recreate`.
|
||||
services:
|
||||
beszel:
|
||||
image: henrygd/beszel:latest
|
||||
container_name: beszel
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "8090:8090"
|
||||
volumes:
|
||||
- /srv/docker/beszel/data:/beszel_data
|
||||
|
||||
beszel-agent:
|
||||
image: henrygd/beszel-agent:latest
|
||||
container_name: beszel-agent
|
||||
restart: unless-stopped
|
||||
# Host networking so the agent sees real CPU/memory/network counters
|
||||
# without bridge-NAT distortion.
|
||||
network_mode: host
|
||||
volumes:
|
||||
# Read-only Docker socket for per-container CPU/mem/net.
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||
# Sysfs paths the AMD GPU collector reads.
|
||||
- /sys/class/drm:/sys/class/drm:ro
|
||||
- /sys/class/hwmon:/sys/class/hwmon:ro
|
||||
environment:
|
||||
# Pulled from /srv/docker/beszel/.env at compose-parse time.
|
||||
TOKEN: "${BESZEL_TOKEN:-}"
|
||||
KEY: "${BESZEL_KEY:-}"
|
||||
# WebSocket dial-out target — the hub on this same host. The agent
|
||||
# is on host networking, so localhost is the host machine, where
|
||||
# the hub container exposes port 8090.
|
||||
HUB_URL: "http://localhost:8090"
|
||||
# Optional fallback: legacy SSH listener for hub-initiated probing.
|
||||
# Harmless to keep — hub only uses it if WebSocket is unreachable.
|
||||
LISTEN: "45876"
|
||||
# Enable the AMD sysfs GPU collector.
|
||||
GPU: "true"
|
||||
44
pyinfra/framework/compose/llama.yml
Normal file
44
pyinfra/framework/compose/llama.yml
Normal file
@@ -0,0 +1,44 @@
|
||||
# llama.cpp server, gfx1151-optimized via kyuz0's Strix Halo toolboxes.
|
||||
# https://github.com/kyuz0/amd-strix-halo-toolboxes
|
||||
#
|
||||
# Tag options on docker.io/kyuz0/amd-strix-halo-toolboxes:
|
||||
# vulkan-radv — most stable, recommended default (this one)
|
||||
# vulkan-amdvlk — alternate Vulkan driver, sometimes faster
|
||||
# rocm-7.2.2 — ROCm 7.x; needs /dev/kfd + render group_add (see vllm.yml pattern)
|
||||
# rocm-6.4.4 — ROCm 6.x fallback
|
||||
# rocm7-nightlies — avoid: caps memory allocation to 64 GB (May 2026)
|
||||
#
|
||||
# Toolbox images use a shell entrypoint, so we override to launch
|
||||
# llama-server directly. Edit the --model path before `docker compose up -d`.
|
||||
services:
|
||||
llama:
|
||||
image: kyuz0/amd-strix-halo-toolboxes:vulkan-radv
|
||||
container_name: llama
|
||||
restart: unless-stopped
|
||||
devices:
|
||||
- /dev/dri:/dev/dri
|
||||
volumes:
|
||||
- /models:/models:ro
|
||||
ports:
|
||||
- "8080:8080"
|
||||
entrypoint: ["llama-server"]
|
||||
command:
|
||||
- --model
|
||||
- /models/REPLACE/ME/model.gguf
|
||||
- --host
|
||||
- 0.0.0.0
|
||||
- --port
|
||||
- "8080"
|
||||
- --n-gpu-layers
|
||||
- "999"
|
||||
- --ctx-size
|
||||
- "32768"
|
||||
# Required for GPU backends on Strix Halo per Gygeek's setup
|
||||
# guide. Forces full load into GPU memory rather than mmap.
|
||||
- --no-mmap
|
||||
# Flash attention — works on Vulkan too; the big win is on the
|
||||
# ROCm tag where kyuz0's build has rocWMMA acceleration.
|
||||
- --flash-attn
|
||||
# Expose Prometheus metrics at /metrics — scraped by OpenLIT for
|
||||
# tokens/sec, KV-cache use, queue depth, and request latency.
|
||||
- --metrics
|
||||
38
pyinfra/framework/compose/ollama.yml
Normal file
38
pyinfra/framework/compose/ollama.yml
Normal file
@@ -0,0 +1,38 @@
|
||||
# Ollama, ROCm backend. Serves models on demand — safe to start before
|
||||
# you've put anything in /models.
|
||||
#
|
||||
# Storage: Ollama's content-addressed blob store is bind-mounted under
|
||||
# /models/ollama so all model data on the host lives under /models.
|
||||
# Note: Ollama's blobs are SHA256-named, not raw GGUFs — llama.cpp/vLLM
|
||||
# can't load them directly. Keep curated GGUFs at /models/<vendor>/...
|
||||
# for those engines.
|
||||
services:
|
||||
ollama:
|
||||
image: ollama/ollama:rocm
|
||||
container_name: ollama
|
||||
restart: unless-stopped
|
||||
devices:
|
||||
- /dev/kfd:/dev/kfd
|
||||
- /dev/dri:/dev/dri
|
||||
# Numeric GIDs of host's video (44) and render (991) groups — names
|
||||
# don't exist inside the container, but the GIDs need to match the
|
||||
# host so /dev/kfd + /dev/dri are accessible.
|
||||
group_add:
|
||||
- "44"
|
||||
- "991"
|
||||
environment:
|
||||
# Strix Halo's iGPU is gfx1151 (RDNA 3.5), which Ollama's bundled
|
||||
# ROCm runtime doesn't recognize — without this override it falls
|
||||
# back to CPU silently. 11.0.0 = gfx1100 (Navi 31); the RDNA 3.x
|
||||
# ISAs are close enough that gfx1100 kernels run on gfx1151.
|
||||
- HSA_OVERRIDE_GFX_VERSION=11.0.0
|
||||
# Default context. 256K (the upstream default for Qwen3-Coder)
|
||||
# blows the KV cache up to ~25-30 GB and forces ollama to split
|
||||
# layers between GPU and CPU. 64K keeps the model fully on GPU
|
||||
# while still being plenty for coding contexts.
|
||||
- OLLAMA_CONTEXT_LENGTH=65536
|
||||
volumes:
|
||||
- /models/ollama:/root/.ollama
|
||||
- /models:/models:ro
|
||||
ports:
|
||||
- "11434:11434"
|
||||
94
pyinfra/framework/compose/openhands.yml
Normal file
94
pyinfra/framework/compose/openhands.yml
Normal file
@@ -0,0 +1,94 @@
|
||||
# OpenHands 1.7 (May 2026) — autonomous agent in a Docker sandbox.
|
||||
# https://docs.openhands.dev — repo: github.com/OpenHands/OpenHands
|
||||
#
|
||||
# Architecture: this container is a thin orchestrator. Per conversation
|
||||
# it spawns a separate `agent-server` container on the host Docker daemon
|
||||
# (that's what the docker.sock mount is for) and talks to it over REST.
|
||||
# AGENT_SERVER_IMAGE_TAG below pins the per-session sandbox image.
|
||||
#
|
||||
# Complements OpenCode: OpenCode is the interactive terminal driver,
|
||||
# OpenHands is for autonomous loops (write code, run tests, browse the
|
||||
# web in a sandbox, report back).
|
||||
services:
|
||||
openhands:
|
||||
# Org rebranded All-Hands-AI → OpenHands at v1.0 (Dec 2025); the old
|
||||
# docker.all-hands.dev/all-hands-ai/openhands image is gone.
|
||||
image: docker.openhands.dev/openhands/openhands:1.7
|
||||
container_name: openhands
|
||||
restart: unless-stopped
|
||||
|
||||
# 3030 host-side because :3000 is OpenWebUI and :3001 is OpenLIT.
|
||||
# Loopback-only — reach via SSH tunnel or Tailscale, don't expose
|
||||
# this directly.
|
||||
ports:
|
||||
- "127.0.0.1:3030:3000"
|
||||
|
||||
volumes:
|
||||
# Required: orchestrator spawns sandbox containers via the host daemon.
|
||||
- /var/run/docker.sock:/var/run/docker.sock
|
||||
# State, settings, conversation history, MCP config, secrets.
|
||||
# Pre-0.44 used ~/.openhands-state — N/A on a fresh install.
|
||||
- /srv/docker/openhands/state:/.openhands
|
||||
# Workspace the sandbox reads/writes. The host path on the LEFT must
|
||||
# match SANDBOX_VOLUMES below — the sandbox container is spawned by
|
||||
# the host daemon, so its bind mount is resolved on the host, not
|
||||
# via this container's filesystem.
|
||||
- /srv/docker/openhands/workspace:/srv/docker/openhands/workspace
|
||||
|
||||
# Linux Docker doesn't auto-provide host.docker.internal; this fixes it.
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
|
||||
environment:
|
||||
# ---- Sandbox / agent-server image pin ----
|
||||
# Replaces the V0.x SANDBOX_RUNTIME_CONTAINER_IMAGE. 1.19.1-python is
|
||||
# the agent-server tag the 1.7 main image expects; bumping the main
|
||||
# image will likely want a newer agent-server tag — check the
|
||||
# upstream docker-compose.yml on each upgrade.
|
||||
AGENT_SERVER_IMAGE_REPOSITORY: ghcr.io/openhands/agent-server
|
||||
AGENT_SERVER_IMAGE_TAG: 1.19.1-python
|
||||
|
||||
# ---- Workspace mount into the per-session sandbox ----
|
||||
# SANDBOX_VOLUMES is the V1 replacement for the deprecated
|
||||
# WORKSPACE_BASE / WORKSPACE_MOUNT_PATH variables.
|
||||
SANDBOX_VOLUMES: /srv/docker/openhands/workspace:/workspace:rw
|
||||
# Match the host's `noise` UID so files the agent writes aren't
|
||||
# owned by root.
|
||||
SANDBOX_USER_ID: "1000"
|
||||
|
||||
# ---- LLM: host Ollama via OpenAI-compatible endpoint ----
|
||||
# Per the official local-llms doc, the recommended path is the
|
||||
# /v1 OpenAI-compatible endpoint with the `openai/` LiteLLM prefix
|
||||
# — NOT `ollama/...`, which has worse tool-call behaviour.
|
||||
LLM_MODEL: "openai/qwen3-coder:30b"
|
||||
LLM_BASE_URL: "http://host.docker.internal:11434/v1"
|
||||
LLM_API_KEY: "ollama" # any non-empty string; Ollama doesn't auth.
|
||||
|
||||
# Default tool-calling renderer mismatches Qwen3-Coder's training
|
||||
# format and produces malformed calls (issue #8140). Forcing false
|
||||
# falls back to OpenHands' prompt-based protocol — costs some token
|
||||
# efficiency, gains reliability with local models.
|
||||
LLM_NATIVE_TOOL_CALLING: "false"
|
||||
|
||||
LOG_ALL_EVENTS: "true"
|
||||
|
||||
# ---- Optional: ship traces to Phoenix on :4318 ----
|
||||
# OpenHands V1 uses LiteLLM + OpenTelemetry; standard OTLP env vars
|
||||
# are honoured. Comment out to disable.
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT: "http://host.docker.internal:4318"
|
||||
OTEL_EXPORTER_OTLP_PROTOCOL: "http/protobuf"
|
||||
OTEL_SERVICE_NAME: "openhands"
|
||||
|
||||
# Per-session agent-server containers spawn headless chromium for
|
||||
# browser tasks; default 64 MB shm causes silent crashes.
|
||||
shm_size: "2gb"
|
||||
|
||||
# Playwright/chromium needs higher fd limits than Docker's default.
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
|
||||
# Bridge networking is correct here. Don't switch to network_mode: host
|
||||
# — the spawned sandbox containers reach this orchestrator via Docker
|
||||
# bridge DNS, which only works on a bridge network.
|
||||
59
pyinfra/framework/compose/openlit.yml
Normal file
59
pyinfra/framework/compose/openlit.yml
Normal file
@@ -0,0 +1,59 @@
|
||||
# OpenLIT — LLM observability (traces, costs, KV-cache, prompt/decode
|
||||
# latencies, tokens/sec). https://openlit.io
|
||||
#
|
||||
# Two services:
|
||||
# - clickhouse : columnar store for traces (internal only, no host port)
|
||||
# - openlit : Next.js UI on :3001 (3000 is OpenWebUI)
|
||||
#
|
||||
# Why OpenLIT vs Langfuse/Phoenix/Laminar: it's the only OSS dashboard
|
||||
# (May 2026) that auto-instruments Ollama AND vLLM via OpenTelemetry
|
||||
# without adding code to client apps. For llama.cpp, start the server
|
||||
# with --metrics (see ../llama/docker-compose.yml) and OpenLIT can scrape
|
||||
# /metrics.
|
||||
#
|
||||
# To send traces from a Python script calling Ollama/vLLM:
|
||||
# pip install openlit
|
||||
# python -c "import openlit; openlit.init(otlp_endpoint='http://framework:4318')"
|
||||
#
|
||||
# To wire OpenWebUI → OpenLIT, install OpenLIT's pipeline middleware
|
||||
# in OpenWebUI per https://openlit.io/blogs/openlit-openwebui.
|
||||
services:
|
||||
clickhouse:
|
||||
image: clickhouse/clickhouse-server:25.3-alpine
|
||||
container_name: openlit-clickhouse
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
CLICKHOUSE_USER: default
|
||||
CLICKHOUSE_PASSWORD: OPENLIT
|
||||
CLICKHOUSE_DB: openlit
|
||||
volumes:
|
||||
- /srv/docker/openlit/clickhouse:/var/lib/clickhouse
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 262144
|
||||
hard: 262144
|
||||
|
||||
openlit:
|
||||
image: ghcr.io/openlit/openlit:latest
|
||||
container_name: openlit
|
||||
restart: unless-stopped
|
||||
depends_on:
|
||||
- clickhouse
|
||||
ports:
|
||||
# Host:container — UI on 3001 (OpenWebUI owns 3000).
|
||||
- "3001:3000"
|
||||
# OTLP receivers exposed on the host so SDKs running off-box can
|
||||
# ship traces here. gRPC + HTTP. Remapped (4327/4328 → 4317/4318)
|
||||
# because Phoenix owns the canonical 4317/4318 ports for OpenCode
|
||||
# traces — OpenLIT here is a secondary/fleet-metrics destination.
|
||||
- "4327:4317"
|
||||
- "4328:4318"
|
||||
environment:
|
||||
INIT_DB_HOST: clickhouse
|
||||
INIT_DB_PORT: "8123"
|
||||
INIT_DB_USERNAME: default
|
||||
INIT_DB_PASSWORD: OPENLIT
|
||||
INIT_DB_DATABASE: openlit
|
||||
SQLITE_DATABASE_URL: file:/app/client/data/data.db
|
||||
volumes:
|
||||
- /srv/docker/openlit/data:/app/client/data
|
||||
25
pyinfra/framework/compose/openwebui.yml
Normal file
25
pyinfra/framework/compose/openwebui.yml
Normal file
@@ -0,0 +1,25 @@
|
||||
# OpenWebUI — ChatGPT-like web UI in front of Ollama. Pre-configured to
|
||||
# use the host's Ollama instance and the project's SearXNG for web
|
||||
# search. Default port 3000.
|
||||
#
|
||||
# Persistent state (users, conversations, uploaded docs, RAG vector
|
||||
# index) lives at /srv/docker/openwebui/data so backups touch one path.
|
||||
services:
|
||||
openwebui:
|
||||
image: ghcr.io/open-webui/open-webui:main
|
||||
container_name: openwebui
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "3000:8080"
|
||||
extra_hosts:
|
||||
# Lets the container reach Ollama on the host's :11434 without
|
||||
# needing to share Docker networks.
|
||||
- "host.docker.internal:host-gateway"
|
||||
environment:
|
||||
- OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||
# Built-in web search via the project's SearXNG instance.
|
||||
- ENABLE_RAG_WEB_SEARCH=true
|
||||
- RAG_WEB_SEARCH_ENGINE=searxng
|
||||
- SEARXNG_QUERY_URL=https://searxng.n0n.io/search?q=<query>&format=json
|
||||
volumes:
|
||||
- /srv/docker/openwebui/data:/app/backend/data
|
||||
35
pyinfra/framework/compose/phoenix.yml
Normal file
35
pyinfra/framework/compose/phoenix.yml
Normal file
@@ -0,0 +1,35 @@
|
||||
# Arize Phoenix — per-trace agent waterfall / flamegraph viz.
|
||||
# https://github.com/Arize-ai/phoenix
|
||||
#
|
||||
# Picked over Langfuse for "show me one OpenCode turn as a tree":
|
||||
# - Single container vs Langfuse's six (Postgres+ClickHouse+Redis+MinIO+web+worker).
|
||||
# - First-class ingestion of Vercel AI SDK spans (which is what OpenCode
|
||||
# emits under the hood when experimental.openTelemetry=true).
|
||||
# - Best-in-class waterfall + agent-graph view for nested LLM/tool calls.
|
||||
#
|
||||
# Complements OpenLIT, doesn't replace it: OpenLIT is the fleet-metrics
|
||||
# layer (cost / tokens / latency aggregated across sessions). Phoenix is
|
||||
# the per-prompt debugger (see what one turn actually did).
|
||||
#
|
||||
# Bring-up: `docker compose up -d` — no first-run setup needed; UI prompts
|
||||
# for project name on first trace ingest. Storage is SQLite at /data.
|
||||
services:
|
||||
phoenix:
|
||||
image: arizephoenix/phoenix:latest
|
||||
container_name: phoenix
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
# UI + OTLP/HTTP both ride on 6006 in Phoenix 15.x — HTTP traces go
|
||||
# to http://framework:6006/v1/traces. (Pre-15 had a separate 4318;
|
||||
# the consolidation happened in Phoenix v15.0.)
|
||||
- "6006:6006"
|
||||
# OTLP/gRPC stays separate.
|
||||
- "4317:4317"
|
||||
environment:
|
||||
PHOENIX_WORKING_DIR: /data
|
||||
# Phoenix listens on all interfaces by default; explicit for clarity.
|
||||
PHOENIX_HOST: 0.0.0.0
|
||||
PHOENIX_PORT: "6006"
|
||||
PHOENIX_GRPC_PORT: "4317"
|
||||
volumes:
|
||||
- /srv/docker/phoenix/data:/data
|
||||
36
pyinfra/framework/compose/vllm.yml
Normal file
36
pyinfra/framework/compose/vllm.yml
Normal file
@@ -0,0 +1,36 @@
|
||||
# vLLM, ROCm backend.
|
||||
#
|
||||
# NOTE: vLLM's official ROCm support targets datacenter cards (MI300X /
|
||||
# gfx942). Strix Halo is gfx1151 — support varies by image tag and
|
||||
# release. If `rocm/vllm:latest` doesn't run on this iGPU, try
|
||||
# `rocm/vllm-dev:nightly` or build from source against ROCm 7.x.
|
||||
services:
|
||||
vllm:
|
||||
image: rocm/vllm:latest
|
||||
container_name: vllm
|
||||
restart: unless-stopped
|
||||
devices:
|
||||
- /dev/kfd:/dev/kfd
|
||||
- /dev/dri:/dev/dri
|
||||
cap_add:
|
||||
- SYS_PTRACE
|
||||
security_opt:
|
||||
- seccomp=unconfined
|
||||
# Numeric GIDs of host's video (44) and render (991) groups — names
|
||||
# don't exist inside the container.
|
||||
group_add:
|
||||
- "44"
|
||||
- "991"
|
||||
shm_size: 16g
|
||||
ipc: host
|
||||
volumes:
|
||||
- /models:/models:ro
|
||||
ports:
|
||||
- "8000:8000"
|
||||
command:
|
||||
- --model
|
||||
- /models/REPLACE/ME
|
||||
- --host
|
||||
- 0.0.0.0
|
||||
- --port
|
||||
- "8000"
|
||||
487
pyinfra/framework/deploy.py
Normal file
487
pyinfra/framework/deploy.py
Normal file
@@ -0,0 +1,487 @@
|
||||
"""
|
||||
pyinfra deploy for Framework Desktop (Ryzen AI Max+ 395 / Strix Halo).
|
||||
|
||||
Containerized layout: inference engines (llama.cpp, vLLM, Ollama) run as
|
||||
docker compose services, each shipping its own ROCm/Vulkan stack. The
|
||||
host stays minimal — kernel + driver + diagnostics + Docker.
|
||||
|
||||
Run with:
|
||||
./run.sh # equivalent to pyinfra inventory.py deploy.py
|
||||
|
||||
Idempotent — re-running only does work that's actually needed. Includes
|
||||
one-shot cleanup steps for artifacts left over from the earlier native
|
||||
build (llama.cpp git checkout, symlinks, systemd unit, full ROCm).
|
||||
"""
|
||||
|
||||
from io import StringIO
|
||||
|
||||
from pyinfra.operations import apt, files, server, systemd
|
||||
from pyinfra import host
|
||||
|
||||
# --- Tunables ----------------------------------------------------------------
|
||||
|
||||
# Latest stable as of 2026-04-30. Verify at https://repo.radeon.com/amdgpu-install/.
|
||||
ROCM_VERSION = "7.2.3"
|
||||
# The .deb filename has an opaque build suffix — find the exact name at
|
||||
# the URL above and paste it here.
|
||||
AMDGPU_INSTALL_DEB = "amdgpu-install_7.2.3.70203-1_all.deb"
|
||||
|
||||
# amdgpu_top — modern GPU monitor that handles new AMD cards / APUs
|
||||
# (Strix Halo / gfx1151). Replaces radeontop, which doesn't know about
|
||||
# RDNA 3.5. Verify at https://github.com/Umio-Yasuno/amdgpu_top/releases.
|
||||
AMDGPU_TOP_VERSION = "0.11.4-1"
|
||||
AMDGPU_TOP_DEB = f"amdgpu-top_without_gui_{AMDGPU_TOP_VERSION}_amd64.deb"
|
||||
|
||||
SSH_USER = host.data.get("ssh_user", "noise")
|
||||
MODELS_DIR = "/models"
|
||||
# /srv is the FHS-blessed location for "data and configuration for
|
||||
# services this system provides." Owned root:docker with setgid +
|
||||
# group-write so any docker-group member can manage compose stacks.
|
||||
COMPOSE_DIR = "/srv/docker"
|
||||
|
||||
# --- Phase 1 — Base OS basics ------------------------------------------------
|
||||
|
||||
apt.update(name="apt update", _sudo=True)
|
||||
|
||||
apt.packages(
|
||||
name="HWE kernel (>=6.11 for ROCm 7)",
|
||||
packages=["linux-generic-hwe-24.04"],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# User basics + monitoring tools.
|
||||
apt.packages(
|
||||
name="Base CLI tools",
|
||||
# radeontop intentionally omitted — it predates RDNA 3.5 / Strix Halo
|
||||
# and just errors with "no VRAM support". amdgpu_top installed below.
|
||||
packages=[
|
||||
"tmux",
|
||||
"vim",
|
||||
"htop",
|
||||
"btop",
|
||||
"nvtop",
|
||||
"git",
|
||||
"curl",
|
||||
"ca-certificates",
|
||||
"unzip",
|
||||
],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# Vulkan diagnostics on the host (containers ship their own runtime).
|
||||
apt.packages(
|
||||
name="Vulkan host diagnostics",
|
||||
packages=["mesa-vulkan-drivers", "vulkan-tools"],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# uv (Python package/tool manager).
|
||||
server.shell(
|
||||
name="Install / upgrade uv",
|
||||
commands=[
|
||||
"curl -LsSf https://astral.sh/uv/install.sh | "
|
||||
"env UV_INSTALL_DIR=/usr/local/bin UV_UNMANAGED_INSTALL=1 sh",
|
||||
],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# huggingface_hub CLI for `huggingface-cli download <repo> --local-dir ...`
|
||||
# Lands at ~/.local/bin/huggingface-cli for the SSH user. Other users can
|
||||
# repeat the command themselves.
|
||||
server.shell(
|
||||
name="Install / upgrade huggingface_hub CLI",
|
||||
commands=["uv tool install --upgrade 'huggingface_hub[cli]'"],
|
||||
_sudo=True,
|
||||
_sudo_user=SSH_USER,
|
||||
)
|
||||
|
||||
# gpakosz/.tmux config (Oh My Tmux!).
|
||||
TMUX_CONF_DIR = f"/home/{SSH_USER}/.tmux"
|
||||
server.shell(
|
||||
name="Clone gpakosz/.tmux",
|
||||
commands=[
|
||||
f"test -d {TMUX_CONF_DIR}/.git || "
|
||||
f"git clone --depth 1 https://github.com/gpakosz/.tmux.git {TMUX_CONF_DIR}",
|
||||
],
|
||||
_sudo=True,
|
||||
_sudo_user=SSH_USER,
|
||||
)
|
||||
files.link(
|
||||
name="Symlink ~/.tmux.conf -> ~/.tmux/.tmux.conf",
|
||||
path=f"/home/{SSH_USER}/.tmux.conf",
|
||||
target=f"{TMUX_CONF_DIR}/.tmux.conf",
|
||||
user=SSH_USER,
|
||||
group=SSH_USER,
|
||||
_sudo=True,
|
||||
)
|
||||
# Seed the user override file only if absent — preserves customizations.
|
||||
server.shell(
|
||||
name="Seed ~/.tmux.conf.local (if missing)",
|
||||
commands=[
|
||||
f"test -f /home/{SSH_USER}/.tmux.conf.local || "
|
||||
f"cp {TMUX_CONF_DIR}/.tmux.conf.local /home/{SSH_USER}/",
|
||||
],
|
||||
_sudo=True,
|
||||
_sudo_user=SSH_USER,
|
||||
)
|
||||
|
||||
# --- Tailscale ---------------------------------------------------------------
|
||||
|
||||
# Tailscale's noble repo works fine on later Ubuntus — the package is
|
||||
# self-contained Go binaries.
|
||||
files.download(
|
||||
name="Fetch Tailscale apt key",
|
||||
src="https://pkgs.tailscale.com/stable/ubuntu/noble.noarmor.gpg",
|
||||
dest="/usr/share/keyrings/tailscale-archive-keyring.gpg",
|
||||
mode="644",
|
||||
_sudo=True,
|
||||
)
|
||||
files.download(
|
||||
name="Fetch Tailscale apt list",
|
||||
src="https://pkgs.tailscale.com/stable/ubuntu/noble.tailscale-keyring.list",
|
||||
dest="/etc/apt/sources.list.d/tailscale.list",
|
||||
mode="644",
|
||||
_sudo=True,
|
||||
)
|
||||
apt.update(name="apt update (Tailscale)", _sudo=True)
|
||||
apt.packages(name="Install Tailscale", packages=["tailscale"], _sudo=True)
|
||||
systemd.service(
|
||||
name="Enable tailscaled",
|
||||
service="tailscaled",
|
||||
running=True,
|
||||
enabled=True,
|
||||
_sudo=True,
|
||||
)
|
||||
# `sudo tailscale up` is interactive (browser auth) — run manually once.
|
||||
|
||||
# --- Docker -----------------------------------------------------------------
|
||||
|
||||
apt.packages(
|
||||
name="Install Docker + compose plugin",
|
||||
packages=["docker.io", "docker-compose-v2"],
|
||||
_sudo=True,
|
||||
)
|
||||
systemd.service(
|
||||
name="Enable docker daemon",
|
||||
service="docker",
|
||||
running=True,
|
||||
enabled=True,
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# lazydocker (TUI for docker). System-wide install via official script.
|
||||
server.shell(
|
||||
name="Install / upgrade lazydocker",
|
||||
commands=[
|
||||
"curl -fsSL "
|
||||
"https://raw.githubusercontent.com/jesseduffield/lazydocker/master/scripts/install_update_linux.sh "
|
||||
"| DIR=/usr/local/bin bash",
|
||||
],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# --- GPU access (host kernel/driver bits only) ------------------------------
|
||||
|
||||
# AMD's amdgpu-install package adds the ROCm apt repo; we use it just to
|
||||
# get rocminfo for host-side diagnostics. Containers ship
|
||||
# their own ROCm.
|
||||
files.directory(
|
||||
name="apt keyring dir",
|
||||
path="/etc/apt/keyrings",
|
||||
mode="755",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
amdgpu_deb = f"/tmp/{AMDGPU_INSTALL_DEB}"
|
||||
amdgpu_url = (
|
||||
f"https://repo.radeon.com/amdgpu-install/{ROCM_VERSION}/ubuntu/noble/"
|
||||
f"{AMDGPU_INSTALL_DEB}"
|
||||
)
|
||||
server.shell(
|
||||
name="Fetch amdgpu-install .deb",
|
||||
commands=[f"test -f {amdgpu_deb} || curl -fsSL {amdgpu_url} -o {amdgpu_deb}"],
|
||||
_sudo=True,
|
||||
)
|
||||
server.shell(
|
||||
name="Install amdgpu-install package",
|
||||
commands=[f"apt install -y {amdgpu_deb}"],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# Idempotent cleanup: if a prior run installed the full ROCm userspace
|
||||
# (~25 GB), tear it down before installing the diagnostic-only subset.
|
||||
# On a fresh box this is a no-op.
|
||||
server.shell(
|
||||
name="Remove full ROCm install if present",
|
||||
commands=[
|
||||
"if dpkg -l rocm-dev 2>/dev/null | grep -q '^ii'; then "
|
||||
" amdgpu-install -y --uninstall || true; "
|
||||
"fi",
|
||||
],
|
||||
_sudo=True,
|
||||
)
|
||||
apt.packages(
|
||||
name="ROCm host diagnostics (rocminfo)",
|
||||
# rocminfo is the stable diagnostic. The SMI tool's package name has
|
||||
# churned across ROCm releases (rocm-smi-lib → amd-smi-lib in 7.x);
|
||||
# install on demand if you need it.
|
||||
packages=["rocminfo"],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# amdgpu_top — fetch + install the .deb if not already at the pinned
|
||||
# version. Replaces radeontop for newer AMD cards (Strix Halo / gfx1151).
|
||||
amdgpu_top_deb = f"/tmp/{AMDGPU_TOP_DEB}"
|
||||
amdgpu_top_url = (
|
||||
f"https://github.com/Umio-Yasuno/amdgpu_top/releases/download/"
|
||||
f"v{AMDGPU_TOP_VERSION.split('-')[0]}/{AMDGPU_TOP_DEB}"
|
||||
)
|
||||
server.shell(
|
||||
name="Install / upgrade amdgpu_top",
|
||||
commands=[
|
||||
f"dpkg -s amdgpu-top 2>/dev/null | grep -q '^Version: {AMDGPU_TOP_VERSION}' || "
|
||||
f"(test -f {amdgpu_top_deb} || curl -fsSL {amdgpu_top_url} -o {amdgpu_top_deb}; "
|
||||
f"apt install -y {amdgpu_top_deb})",
|
||||
],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# Group membership for /dev/kfd + /dev/dri access (needed for GPU passthrough
|
||||
# into containers, and for unprivileged host-side rocminfo).
|
||||
server.group(name="ensure render group", group="render", _sudo=True)
|
||||
server.group(name="ensure video group", group="video", _sudo=True)
|
||||
server.user(
|
||||
name="Add login user to render/video/docker",
|
||||
user=SSH_USER,
|
||||
groups=["render", "video", "docker"],
|
||||
append=True,
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# Kernel cmdline tuning per Gygeek/Framework-strix-halo-llm-setup:
|
||||
# - amd_iommu=off — ~6 % memory-read improvement on Strix Halo
|
||||
# - amdgpu.gttsize=117760 — ~115 GB GTT ceiling so the GPU can borrow
|
||||
# most of system RAM dynamically. Acts as a
|
||||
# ceiling, not an allocation. See ../../StrixHaloMemory.md
|
||||
# for the UMA-vs-GTT trade-off discussion.
|
||||
# Requires a reboot to take effect; pyinfra leaves that to you.
|
||||
files.line(
|
||||
name="GRUB cmdline (amd_iommu, gttsize)",
|
||||
path="/etc/default/grub",
|
||||
line=r"^GRUB_CMDLINE_LINUX_DEFAULT=.*",
|
||||
replace='GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off amdgpu.gttsize=117760"',
|
||||
_sudo=True,
|
||||
)
|
||||
server.shell(
|
||||
name="Regenerate GRUB config",
|
||||
commands=["update-grub"],
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# --- Storage layout ---------------------------------------------------------
|
||||
|
||||
files.directory(
|
||||
name="/models root",
|
||||
path=MODELS_DIR,
|
||||
user=SSH_USER,
|
||||
group=SSH_USER,
|
||||
mode="755",
|
||||
_sudo=True,
|
||||
)
|
||||
for sub in ("moonshotai", "qwen", "deepseek", "zai", "mistralai"):
|
||||
files.directory(
|
||||
name=f"{MODELS_DIR}/{sub}",
|
||||
path=f"{MODELS_DIR}/{sub}",
|
||||
user=SSH_USER,
|
||||
group=SSH_USER,
|
||||
mode="755",
|
||||
_sudo=True,
|
||||
)
|
||||
# Ollama bind-mounts its content-addressed store here.
|
||||
files.directory(
|
||||
name=f"{MODELS_DIR}/ollama",
|
||||
path=f"{MODELS_DIR}/ollama",
|
||||
user=SSH_USER,
|
||||
group=SSH_USER,
|
||||
mode="755",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# --- Compose files for inference services ----------------------------------
|
||||
|
||||
files.directory(
|
||||
name="Compose root",
|
||||
path=COMPOSE_DIR,
|
||||
group="docker",
|
||||
mode="2775", # setgid: new files inherit docker group
|
||||
_sudo=True,
|
||||
)
|
||||
# Idempotent cleanup of earlier locations.
|
||||
files.directory(
|
||||
name="Remove old /srv/compose",
|
||||
path="/srv/compose",
|
||||
present=False,
|
||||
_sudo=True,
|
||||
)
|
||||
files.directory(
|
||||
name="Remove old ~/docker compose dir",
|
||||
path=f"/home/{SSH_USER}/docker",
|
||||
present=False,
|
||||
_sudo=True,
|
||||
)
|
||||
for svc in (
|
||||
"llama",
|
||||
"vllm",
|
||||
"ollama",
|
||||
"openwebui",
|
||||
"beszel",
|
||||
"openlit",
|
||||
"phoenix",
|
||||
"openhands",
|
||||
):
|
||||
files.directory(
|
||||
name=f"compose/{svc} dir",
|
||||
path=f"{COMPOSE_DIR}/{svc}",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
files.put(
|
||||
name=f"compose/{svc}/docker-compose.yml",
|
||||
src=f"compose/{svc}.yml",
|
||||
dest=f"{COMPOSE_DIR}/{svc}/docker-compose.yml",
|
||||
group="docker",
|
||||
mode="664",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# OpenWebUI persistent state (users, conversations, uploaded docs,
|
||||
# RAG vector index) — bind-mounted into the container.
|
||||
files.directory(
|
||||
name="OpenWebUI data dir",
|
||||
path=f"{COMPOSE_DIR}/openwebui/data",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# Beszel persistent state (admin account, system list, metric history).
|
||||
files.directory(
|
||||
name="Beszel data dir",
|
||||
path=f"{COMPOSE_DIR}/beszel/data",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
# Sibling .env file Docker Compose auto-reads for variable interpolation.
|
||||
# Pyinfra creates it empty if missing and never overwrites — the user
|
||||
# fills in BESZEL_TOKEN= and BESZEL_KEY= from the hub UI on first setup.
|
||||
# Mode 640 root:docker so docker group members can read it at compose
|
||||
# parse time but it isn't world-readable.
|
||||
files.file(
|
||||
name="Beszel .env (placeholder)",
|
||||
path=f"{COMPOSE_DIR}/beszel/.env",
|
||||
present=True,
|
||||
mode="640",
|
||||
user="root",
|
||||
group="docker",
|
||||
_sudo=True,
|
||||
)
|
||||
# Tear down the previous /etc/default/beszel-agent location if a prior
|
||||
# deploy created it (idempotent — no-op on a fresh box).
|
||||
files.file(
|
||||
name="Remove legacy /etc/default/beszel-agent",
|
||||
path="/etc/default/beszel-agent",
|
||||
present=False,
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# OpenLIT persistent state — Next.js app config + ClickHouse trace store.
|
||||
files.directory(
|
||||
name="OpenLIT data dir",
|
||||
path=f"{COMPOSE_DIR}/openlit/data",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
files.directory(
|
||||
name="OpenLIT ClickHouse data dir",
|
||||
path=f"{COMPOSE_DIR}/openlit/clickhouse",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# Phoenix persistent state (SQLite-backed trace store).
|
||||
files.directory(
|
||||
name="Phoenix data dir",
|
||||
path=f"{COMPOSE_DIR}/phoenix/data",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# OpenHands state + workspace. Owned by the SSH user (UID 1000) so the
|
||||
# sandbox containers — which run as SANDBOX_USER_ID=1000 — can write to
|
||||
# the shared workspace without root-owned files leaking out.
|
||||
files.directory(
|
||||
name="OpenHands state dir",
|
||||
path=f"{COMPOSE_DIR}/openhands/state",
|
||||
user=SSH_USER,
|
||||
group=SSH_USER,
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
files.directory(
|
||||
name="OpenHands workspace dir",
|
||||
path=f"{COMPOSE_DIR}/openhands/workspace",
|
||||
user=SSH_USER,
|
||||
group=SSH_USER,
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# --- Cleanup of artifacts from the prior native-build deploy ----------------
|
||||
# All idempotent — `present=False` is a no-op when the target is absent.
|
||||
|
||||
server.shell(
|
||||
name="Stop & disable old native llama-server.service",
|
||||
commands=[
|
||||
"systemctl disable --now llama-server.service 2>/dev/null || true",
|
||||
],
|
||||
_sudo=True,
|
||||
)
|
||||
files.file(
|
||||
name="Remove old llama-server.service",
|
||||
path="/etc/systemd/system/llama-server.service",
|
||||
present=False,
|
||||
_sudo=True,
|
||||
)
|
||||
files.link(
|
||||
name="Remove old llama-server-vulkan symlink",
|
||||
path="/usr/local/bin/llama-server-vulkan",
|
||||
present=False,
|
||||
_sudo=True,
|
||||
)
|
||||
files.link(
|
||||
name="Remove old llama-server-rocm symlink",
|
||||
path="/usr/local/bin/llama-server-rocm",
|
||||
present=False,
|
||||
_sudo=True,
|
||||
)
|
||||
files.directory(
|
||||
name="Remove old llama.cpp checkout",
|
||||
path="/opt/llama.cpp",
|
||||
present=False,
|
||||
_sudo=True,
|
||||
)
|
||||
server.shell(
|
||||
name="Stop & remove native Ollama install",
|
||||
commands=[
|
||||
"systemctl disable --now ollama.service 2>/dev/null || true",
|
||||
"rm -f /etc/systemd/system/ollama.service /usr/local/bin/ollama",
|
||||
"userdel ollama 2>/dev/null || true",
|
||||
],
|
||||
_sudo=True,
|
||||
)
|
||||
systemd.daemon_reload(name="systemctl daemon-reload", _sudo=True)
|
||||
5
pyinfra/framework/inventory.py
Normal file
5
pyinfra/framework/inventory.py
Normal file
@@ -0,0 +1,5 @@
|
||||
# pyinfra inventory for the Framework Desktop / Strix Halo box.
|
||||
|
||||
framework_desktop = [
|
||||
("framework", {"ssh_hostname": "10.0.0.70", "ssh_user": "noise"}),
|
||||
]
|
||||
7
pyinfra/framework/run.sh
Executable file
7
pyinfra/framework/run.sh
Executable file
@@ -0,0 +1,7 @@
|
||||
#!/usr/bin/env bash
|
||||
# Wrapper: invokes pyinfra against the Framework Desktop, forwarding any
|
||||
# extra args (e.g. --dry, -v) to pyinfra. Assumes NOPASSWD sudo on the box.
|
||||
|
||||
set -euo pipefail
|
||||
cd "$(dirname "$0")"
|
||||
exec pyinfra -v --ssh-password-prompt inventory.py deploy.py "$@"
|
||||
Reference in New Issue
Block a user