Add Wyoming voice stack to pyinfra + landscape doc

- Move piper-compose.yaml / whisper-compose.yaml from repo root into
  pyinfra/framework/compose/{piper,whisper}.yml; bind paths shifted to
  /srv/docker/{piper,whisper}/data on the box.
- deploy.py registers both stacks and provisions the data dirs.
- Homepage gets a "Voice" group with informational tiles (Wyoming has
  no web UI, so tiles show container status without click-through).
- New VoiceModels.md captures the May 2026 STT/TTS landscape, why the
  current Wyoming defaults aren't SOTA, and concrete upgrade paths
  (whisper-large-v3-turbo + faster-whisper-server, Kokoro, Sesame CSM,
  F5-TTS for cloning).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 13:33:17 -04:00
parent 1816ae2458
commit 36b8cfe835
10 changed files with 292 additions and 32 deletions

View File

@@ -42,14 +42,15 @@ Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.
Requires a reboot to activate — pyinfra rewrites `/etc/default/grub`
and runs `update-grub`, but won't reboot for you.
- `/models/<vendor>/` layout
- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage}/docker-compose.yml`
- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage,whisper,piper}/docker-compose.yml`
dropped in, not auto-started — you edit the model path then
`docker compose up -d`
(OpenWebUI needs no edits — it's pre-configured to find Ollama at
`host.docker.internal:11434` and uses `searxng.n0n.io` for web search)
(Beszel hub at :8090, OpenLIT UI at :3001, Phoenix UI at :6006,
OpenHands UI at :3030, Homepage at :7575 — see "Monitoring stack",
"Agent harnesses", and "Front door" below)
OpenHands UI at :3030, Homepage at :7575, Whisper STT at :10300,
Piper TTS at :10200 — see "Monitoring stack", "Agent harnesses",
"Front door", and "Voice" below)
If a previous run installed the native llama.cpp build / full ROCm /
native Ollama, those are auto-cleaned the next time `./run.sh` runs.
@@ -85,6 +86,31 @@ Compose images in
(`services.yaml`, `settings.yaml`, etc.); edit there, `./run.sh`, restart
the homepage container.
## Voice
Wyoming-protocol speech servers (Home Assistant's voice stack). No web
UI — these are TCP protocol servers consumable by HA Assist, Wyoming
satellites, or any Wyoming client.
- **Whisper** (`/srv/docker/whisper`, `tcp://framework:10300`) —
speech-to-text. Default model is `tiny-int8` (~75 MB). Models
download into `/srv/docker/whisper/data/` on first run.
- **Piper** (`/srv/docker/piper`, `tcp://framework:10200`) —
text-to-speech. Default voice is `en_US-lessac-medium` (~63 MB).
Browse alternatives at <https://github.com/rhasspy/piper/blob/master/VOICES.md>.
Bring-up:
```sh
cd /srv/docker/whisper && docker compose up -d
cd /srv/docker/piper && docker compose up -d
```
Both are conservative defaults sized for "small command-style usage";
on Strix Halo's hardware budget you can run substantially better
models. See [`localgenai/VoiceModels.md`](../../VoiceModels.md) for the
landscape and upgrade options.
## Front door
- **Homepage** (`/srv/docker/homepage`, http://framework:7575) — single