Files
localgenai/pyinfra/HowToScan.md
noisedestroyers 2c4bfefa95 Initial commit: localgenai stack
Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.

- pyinfra/framework/: pyinfra deploy targeting the box
  - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
    for gfx1151), OpenWebUI
  - Beszel (host + container + AMD GPU dashboard via sysfs)
  - OpenLIT (LLM fleet metrics)
  - Phoenix (per-trace agent waterfall)
  - OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
  - install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
  documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00

83 lines
3.1 KiB
Markdown

# How to onboard an existing host into pyinfra
There's no auto-generator that reads a running system and emits a
deploy.py — not in pyinfra, not in Ansible, not in any IaC tool.
Reverse-engineering "what's intentional vs incidental" needs human
judgment. Below is the workflow that actually works.
## What pyinfra gives you for free
The **facts** API queries live state without writing any deploy:
```sh
pyinfra @host fact deb.DebPackages # installed packages
pyinfra @host fact apt.AptSources # extra apt repos
pyinfra @host fact systemd.SystemdServices # service state
pyinfra @host fact server.Crontab # cron
pyinfra @host fact server.Users # users
```
Output is JSON. There's no fact → op translator, but the data is easy
to grep through and tells you what's actually on the box.
## Practical workflow
1. Run a scanner against the live host that dumps the *meaningful*
state — packages explicitly installed, enabled services, custom
configs, dotfiles, sudoers fragments, fstab non-defaults. Skip the
distro-default noise.
2. Hand-pick the bits worth managing into a new `<station>/deploy.py`.
3. Run with `--dry` against the live box; pyinfra's diff shows what's
still drifting.
4. Iterate until `--dry` reports "no changes."
## One-shot scanner
```sh
ssh you@host '
echo "=== manually-installed packages ==="
apt-mark showmanual
echo "=== enabled services ==="
systemctl list-unit-files --state=enabled --type=service --no-legend | awk "{print \$1}"
echo "=== extra apt repos ==="
ls /etc/apt/sources.list.d/
echo "=== sudoers fragments ==="
ls /etc/sudoers.d/
echo "=== fstab non-default ==="
grep -vE "^#|^$|/proc|/sys|/dev/pts" /etc/fstab
echo "=== users with login shells ==="
getent passwd | awk -F: "\$3>=1000 && \$3<60000 {print \$1}"
echo "=== docker stacks ==="
ls /srv/docker/ 2>/dev/null
echo "=== home dotfiles ==="
sudo find /home -maxdepth 3 -name ".*rc" -o -name ".*config" 2>/dev/null
'
```
That gets you ~80% of what's worth managing in roughly 30 lines of
pyinfra. The remaining 20% is case-by-case (kernel cmdline tweaks,
hand-edited /etc files, dotfile contents) that no scanner fully
captures — you write those as you re-discover the box's quirks.
## What scanners can't catch
- The *intent* behind a config (why this `sysctl` value, why this
cron entry).
- Hand-edited fragments mixed into otherwise-default files (e.g. a
custom line in `/etc/ssh/sshd_config`).
- Out-of-band state: data in databases, container volumes,
`/var/lib/<service>`.
For these, treat pyinfra as managing the *bones* (packages, services,
users, configs) and leave the data alone. Re-deploying onto fresh
hardware should give you a working system; restoring data is a
separate concern.
## If "scan and reproduce" is a hard requirement
The native answer is **NixOS**. The whole OS is declarative;
`nixos-generate-config` reads your machine and writes a full config
that recreates it. Different paradigm, big switch — but the right
answer if reproducible-from-a-file is core to your workflow and you
aren't already deep in Ubuntu-land.