Build btop 1.4 from source with AMD GPU support
apt's btop on 24.04 is 1.3.x, which has no AMD GPU monitoring. 1.4+ adds it but requires C++23, which gcc-13 (24.04 default) doesn't fully support. Plan: - Add ubuntu-toolchain-r/test PPA, install g++-14 (C++23-capable). - Add librocm-smi-dev to ROCm host diagnostics — btop dlopens librocm_smi64 at runtime; the headers are needed at compile time. - Drop btop from apt list, build from a pinned BTOP_VERSION tag with GPU_SUPPORT=true CXX=g++-14 -j; install to /usr/local/bin. - Idempotent — only rebuilds if installed version doesn't match. After deploy: btop → Esc → Options → "show_gpu_info" → On to enable the GPU panel. Also clean up TODO.md — the box is on 24.04 (noble), not 26.04. The libxml2 ABI mismatch / "ROCm gap" section was stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
32
TODO.md
32
TODO.md
@@ -2,13 +2,11 @@
|
||||
|
||||
## ROCm / vLLM on Strix Halo (gfx1151)
|
||||
|
||||
The Framework Desktop runs **Ubuntu 26.04 LTS**; AMD only ships ROCm
|
||||
7.2.3 packages for jammy (22.04) and noble (24.04). We installed the
|
||||
noble repo but pulled only `rocminfo` + `rocm-smi-lib` for host-side
|
||||
diagnostics — all heavy ROCm work runs in containers, which ship their
|
||||
own ROCm stack. This sidesteps the host-side libxml2 ABI mismatch (noble
|
||||
ships `libxml2.so.2`, 26.04 ships `libxml2.so.16`) that broke the native
|
||||
HIP toolchain.
|
||||
The Framework Desktop runs **Ubuntu 24.04 LTS (noble)**, which aligns
|
||||
with AMD's ROCm 7.x packaging. The deploy installs `rocminfo` and
|
||||
`librocm-smi-dev` host-side; heavier ROCm bits (full HIP toolchain,
|
||||
device-mapped libraries) still run inside containers that ship their
|
||||
own ROCm stack. The host stays slim by design.
|
||||
|
||||
### Open questions
|
||||
|
||||
@@ -17,21 +15,15 @@ HIP toolchain.
|
||||
gfx1151 (RDNA 3.5 consumer) is a different ISA. If the stock image
|
||||
doesn't initialize the device, try `rocm/vllm-dev:nightly` or build
|
||||
from source against ROCm 7.x with `-DAMDGPU_TARGETS=gfx1151`.
|
||||
- **AMD support for 26.04** — watch https://repo.radeon.com/amdgpu-install/<latest>/ubuntu/
|
||||
for a directory matching the box's codename. AMD historically lags
|
||||
Ubuntu LTS by 6–12 months for ROCm packaging.
|
||||
|
||||
### When 26.04 ROCm packages land
|
||||
### If you ever want full host-side ROCm
|
||||
|
||||
If you ever want to do native ROCm work on the host (rather than via
|
||||
containers):
|
||||
1. Bump `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` in `pyinfra/deploy.py`
|
||||
to the new release.
|
||||
2. Update the apt source URL path in `deploy.py` if AMD adds a new
|
||||
release codename (currently hardcoded to `noble`).
|
||||
3. Add a step that runs `amdgpu-install -y --usecase=rocm --no-dkms`
|
||||
(the current deploy explicitly avoids this to stay slim).
|
||||
4. `./run.sh`.
|
||||
For native ROCm work on the host (compiling HIP kernels, full toolchain):
|
||||
1. Bump `ROCM_VERSION` and `AMDGPU_INSTALL_DEB` in
|
||||
`pyinfra/framework/deploy.py` to the latest release.
|
||||
2. Add a step that runs `amdgpu-install -y --usecase=rocm --no-dkms`
|
||||
(currently avoided to stay slim — ~25 GB toolchain).
|
||||
3. `./run.sh`.
|
||||
|
||||
For container-only workflows (current default), no action is needed —
|
||||
container images update independently of the host.
|
||||
|
||||
Reference in New Issue
Block a user