Build btop 1.4 from source with AMD GPU support

apt's btop on 24.04 is 1.3.x, which has no AMD GPU monitoring. 1.4+
adds it but requires C++23, which gcc-13 (24.04 default) doesn't fully
support. Plan:

- Add ubuntu-toolchain-r/test PPA, install g++-14 (C++23-capable).
- Add librocm-smi-dev to ROCm host diagnostics — btop dlopens
  librocm_smi64 at runtime; the headers are needed at compile time.
- Drop btop from apt list, build from a pinned BTOP_VERSION tag with
  GPU_SUPPORT=true CXX=g++-14 -j; install to /usr/local/bin.
- Idempotent — only rebuilds if installed version doesn't match.

After deploy: btop → Esc → Options → "show_gpu_info" → On to enable
the GPU panel.

Also clean up TODO.md — the box is on 24.04 (noble), not 26.04. The
libxml2 ABI mismatch / "ROCm gap" section was stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 16:34:52 -04:00
parent 37b0cd9a58
commit 228fe8d1ac
3 changed files with 74 additions and 32 deletions

32
TODO.md
View File

@@ -2,13 +2,11 @@
## ROCm / vLLM on Strix Halo (gfx1151) ## ROCm / vLLM on Strix Halo (gfx1151)
The Framework Desktop runs **Ubuntu 26.04 LTS**; AMD only ships ROCm The Framework Desktop runs **Ubuntu 24.04 LTS (noble)**, which aligns
7.2.3 packages for jammy (22.04) and noble (24.04). We installed the with AMD's ROCm 7.x packaging. The deploy installs `rocminfo` and
noble repo but pulled only `rocminfo` + `rocm-smi-lib` for host-side `librocm-smi-dev` host-side; heavier ROCm bits (full HIP toolchain,
diagnostics — all heavy ROCm work runs in containers, which ship their device-mapped libraries) still run inside containers that ship their
own ROCm stack. This sidesteps the host-side libxml2 ABI mismatch (noble own ROCm stack. The host stays slim by design.
ships `libxml2.so.2`, 26.04 ships `libxml2.so.16`) that broke the native
HIP toolchain.
### Open questions ### Open questions
@@ -17,21 +15,15 @@ HIP toolchain.
gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image
doesn't initialize the device, try `rocm/vllm-dev:nightly` or build doesn't initialize the device, try `rocm/vllm-dev:nightly` or build
from source against ROCm 7.x with `-DAMDGPU_TARGETS=gfx1151`. from source against ROCm 7.x with `-DAMDGPU_TARGETS=gfx1151`.
- **AMD support for 26.04** — watch https://repo.radeon.com/amdgpu-install/<latest>/ubuntu/
for a directory matching the box's codename. AMD historically lags
Ubuntu LTS by 612 months for ROCm packaging.
### When 26.04 ROCm packages land ### If you ever want full host-side ROCm
If you ever want to do native ROCm work on the host (rather than via For native ROCm work on the host (compiling HIP kernels, full toolchain):
containers): 1. Bump `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` in
1. Bump `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` in `pyinfra/deploy.py` `pyinfra/framework/deploy.py` to the latest release.
to the new release. 2. Add a step that runs `amdgpu-install -y --usecase=rocm --no-dkms`
2. Update the apt source URL path in `deploy.py` if AMD adds a new (currently avoided to stay slim — ~25 GB toolchain).
release codename (currently hardcoded to `noble`). 3. `./run.sh`.
3. Add a step that runs `amdgpu-install -y --usecase=rocm --no-dkms`
(the current deploy explicitly avoids this to stay slim).
4. `./run.sh`.
For container-only workflows (current default), no action is needed — For container-only workflows (current default), no action is needed —
container images update independently of the host. container images update independently of the host.

View File

@@ -80,10 +80,15 @@ Top of `deploy.py`:
the version; find it at https://repo.radeon.com/amdgpu-install/. the version; find it at https://repo.radeon.com/amdgpu-install/.
- `AMDGPU_TOP_VERSION` — bump when a newer release lands at - `AMDGPU_TOP_VERSION` — bump when a newer release lands at
https://github.com/Umio-Yasuno/amdgpu_top/releases. https://github.com/Umio-Yasuno/amdgpu_top/releases.
- `NVTOP_VERSION` — built from source because Ubuntu 26.04's apt - `NVTOP_VERSION` — built from source because apt's nvtop predates
package (3.0.2) predates gfx1151 detection. Bump when a newer release gfx1151 detection. Bump when a newer release lands at
lands at https://github.com/Syllo/nvtop/releases. Run `sudo nvtop` to https://github.com/Syllo/nvtop/releases. Run `sudo nvtop` to see all
see all GPU processes (non-root only sees the calling user's own). GPU processes (non-root only sees the calling user's own).
- `BTOP_VERSION` — built from source because apt's btop has no AMD GPU
support. 1.4+ requires C++23, hence the ubuntu-toolchain-r/test PPA
for g++-14. The build links `librocm-smi-dev` for AMD GPU monitoring.
Bump at https://github.com/aristocratos/btop/releases. In btop, Esc
→ Options → "show_gpu_info" → On to enable the GPU panel.
Compose images in Compose images in
`compose/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage}.yml` `compose/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage}.yml`

View File

@@ -32,12 +32,19 @@ AMDGPU_INSTALL_DEB = "amdgpu-install_7.2.3.70203-1_all.deb"
AMDGPU_TOP_VERSION = "0.11.4-1" AMDGPU_TOP_VERSION = "0.11.4-1"
AMDGPU_TOP_DEB = f"amdgpu-top_without_gui_{AMDGPU_TOP_VERSION}_amd64.deb" AMDGPU_TOP_DEB = f"amdgpu-top_without_gui_{AMDGPU_TOP_VERSION}_amd64.deb"
# nvtop — htop-like GPU monitor with multi-vendor support. Ubuntu 26.04 # nvtop — htop-like GPU monitor with multi-vendor support. Ubuntu's apt
# ships 3.0.2 in apt, which predates the gfx1151 sysfs detection # package predates the gfx1151 sysfs detection improvements; we build
# improvements; we build 3.2.x from source instead. Verify at # 3.2.x from source instead. Verify at
# https://github.com/Syllo/nvtop/releases. # https://github.com/Syllo/nvtop/releases.
NVTOP_VERSION = "3.2.0" NVTOP_VERSION = "3.2.0"
# btop — system monitor with optional GPU panel. AMD GPU support
# requires building with GPU_SUPPORT=true against librocm-smi-dev.
# Ubuntu 24.04's apt package is 1.3.x (no GPU support). 1.4+ requires
# C++23, hence g++-14 from the ubuntu-toolchain-r/test PPA. Verify at
# https://github.com/aristocratos/btop/releases.
BTOP_VERSION = "1.4.7"
SSH_USER = host.data.get("ssh_user", "noise") SSH_USER = host.data.get("ssh_user", "noise")
MODELS_DIR = "/models" MODELS_DIR = "/models"
# /srv is the FHS-blessed location for "data and configuration for # /srv is the FHS-blessed location for "data and configuration for
@@ -60,17 +67,19 @@ apt.packages(
name="Base CLI tools", name="Base CLI tools",
# radeontop intentionally omitted — predates RDNA 3.5 / Strix Halo, # radeontop intentionally omitted — predates RDNA 3.5 / Strix Halo,
# errors with "no VRAM support". amdgpu_top installed below. # errors with "no VRAM support". amdgpu_top installed below.
# nvtop from apt intentionally omitted — Ubuntu 26.04 ships 3.0.2, # nvtop from apt intentionally omitted — apt's 3.0.2 doesn't pick up
# which doesn't pick up gfx1151. Built from source below instead. # gfx1151. Built from source below instead.
# btop from apt intentionally omitted — apt's 1.3.x has no AMD GPU
# support. Built from source below.
packages=[ packages=[
"tmux", "tmux",
"vim", "vim",
"htop", "htop",
"btop",
"git", "git",
"curl", "curl",
"ca-certificates", "ca-certificates",
"unzip", "unzip",
"software-properties-common", # for add-apt-repository (g++-14 PPA)
], ],
_sudo=True, _sudo=True,
) )
@@ -228,11 +237,14 @@ server.shell(
_sudo=True, _sudo=True,
) )
apt.packages( apt.packages(
name="ROCm host diagnostics (rocminfo)", name="ROCm host diagnostics (rocminfo, librocm-smi-dev)",
# rocminfo is the stable diagnostic. The SMI tool's package name has # rocminfo is the stable diagnostic. The SMI tool's package name has
# churned across ROCm releases (rocm-smi-lib → amd-smi-lib in 7.x); # churned across ROCm releases (rocm-smi-lib → amd-smi-lib in 7.x);
# install on demand if you need it. # install on demand if you need it.
packages=["rocminfo"], # librocm-smi-dev provides librocm_smi64.so + headers; btop dlopens
# it at runtime for AMD GPU monitoring (compiled against headers,
# loaded dynamically). Cheap to install (~50 MB), no full ROCm tail.
packages=["rocminfo", "librocm-smi-dev"],
_sudo=True, _sudo=True,
) )
@@ -286,6 +298,39 @@ server.shell(
_sudo=True, _sudo=True,
) )
# btop from source. apt's 1.3.x has no AMD GPU support; 1.4+ requires
# C++23 (g++-14), which 24.04 doesn't ship by default — add the
# ubuntu-toolchain-r/test PPA. Build with GPU_SUPPORT=true so btop
# dlopens librocm_smi64 (provided by librocm-smi-dev installed above).
# Idempotent — only rebuilds if installed version doesn't match.
server.shell(
name="Add ubuntu-toolchain-r/test PPA (g++-14 on 24.04)",
commands=[
"grep -rq ubuntu-toolchain-r /etc/apt/sources.list.d/ 2>/dev/null || "
"add-apt-repository -y ppa:ubuntu-toolchain-r/test",
],
_sudo=True,
)
apt.update(name="apt update (post-toolchain PPA)", _sudo=True)
apt.packages(
name="btop build deps (g++-14 + ncurses)",
packages=["g++-14", "libncurses-dev"],
_sudo=True,
)
server.shell(
name=f"Build & install btop {BTOP_VERSION} from source",
commands=[
f"/usr/local/bin/btop --version 2>/dev/null | grep -q '{BTOP_VERSION}' && exit 0; "
f"rm -rf /tmp/btop-build && "
f"git clone --depth 1 --branch v{BTOP_VERSION} "
f"https://github.com/aristocratos/btop.git /tmp/btop-build && "
f"make -C /tmp/btop-build GPU_SUPPORT=true CXX=g++-14 -j && "
f"make -C /tmp/btop-build install PREFIX=/usr/local && "
f"rm -rf /tmp/btop-build",
],
_sudo=True,
)
# Group membership for /dev/kfd + /dev/dri access (needed for GPU passthrough # Group membership for /dev/kfd + /dev/dri access (needed for GPU passthrough
# into containers, and for unprivileged host-side rocminfo). # into containers, and for unprivileged host-side rocminfo).
server.group(name="ensure render group", group="render", _sudo=True) server.group(name="ensure render group", group="render", _sudo=True)