# How to onboard an existing host into pyinfra There's no auto-generator that reads a running system and emits a deploy.py — not in pyinfra, not in Ansible, not in any IaC tool. Reverse-engineering "what's intentional vs incidental" needs human judgment. Below is the workflow that actually works. ## What pyinfra gives you for free The **facts** API queries live state without writing any deploy: ```sh pyinfra @host fact deb.DebPackages # installed packages pyinfra @host fact apt.AptSources # extra apt repos pyinfra @host fact systemd.SystemdServices # service state pyinfra @host fact server.Crontab # cron pyinfra @host fact server.Users # users ``` Output is JSON. There's no fact → op translator, but the data is easy to grep through and tells you what's actually on the box. ## Practical workflow 1. Run a scanner against the live host that dumps the *meaningful* state — packages explicitly installed, enabled services, custom configs, dotfiles, sudoers fragments, fstab non-defaults. Skip the distro-default noise. 2. Hand-pick the bits worth managing into a new `/deploy.py`. 3. Run with `--dry` against the live box; pyinfra's diff shows what's still drifting. 4. Iterate until `--dry` reports "no changes." ## One-shot scanner ```sh ssh you@host ' echo "=== manually-installed packages ===" apt-mark showmanual echo "=== enabled services ===" systemctl list-unit-files --state=enabled --type=service --no-legend | awk "{print \$1}" echo "=== extra apt repos ===" ls /etc/apt/sources.list.d/ echo "=== sudoers fragments ===" ls /etc/sudoers.d/ echo "=== fstab non-default ===" grep -vE "^#|^$|/proc|/sys|/dev/pts" /etc/fstab echo "=== users with login shells ===" getent passwd | awk -F: "\$3>=1000 && \$3<60000 {print \$1}" echo "=== docker stacks ===" ls /srv/docker/ 2>/dev/null echo "=== home dotfiles ===" sudo find /home -maxdepth 3 -name ".*rc" -o -name ".*config" 2>/dev/null ' ``` That gets you ~80% of what's worth managing in roughly 30 lines of pyinfra. The remaining 20% is case-by-case (kernel cmdline tweaks, hand-edited /etc files, dotfile contents) that no scanner fully captures — you write those as you re-discover the box's quirks. ## What scanners can't catch - The *intent* behind a config (why this `sysctl` value, why this cron entry). - Hand-edited fragments mixed into otherwise-default files (e.g. a custom line in `/etc/ssh/sshd_config`). - Out-of-band state: data in databases, container volumes, `/var/lib/`. For these, treat pyinfra as managing the *bones* (packages, services, users, configs) and leave the data alone. Re-deploying onto fresh hardware should give you a working system; restoring data is a separate concern. ## If "scan and reproduce" is a hard requirement The native answer is **NixOS**. The whole OS is declarative; `nixos-generate-config` reads your machine and writes a full config that recreates it. Different paradigm, big switch — but the right answer if reproducible-from-a-file is core to your workflow and you aren't already deep in Ubuntu-land.