Add Wyoming voice stack to pyinfra + landscape doc

- Move piper-compose.yaml / whisper-compose.yaml from repo root into pyinfra/framework/compose/{piper,whisper}.yml; bind paths shifted to /srv/docker/{piper,whisper}/data on the box. - deploy.py registers both stacks and provisions the data dirs. - Homepage gets a "Voice" group with informational tiles (Wyoming has no web UI, so tiles show container status without click-through). - New VoiceModels.md captures the May 2026 STT/TTS landscape, why the current Wyoming defaults aren't SOTA, and concrete upgrade paths (whisper-large-v3-turbo + faster-whisper-server, Kokoro, Sesame CSM, F5-TTS for cloning). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:33:17 -04:00
parent 1816ae2458
commit 36b8cfe835
10 changed files with 292 additions and 32 deletions
--- a/pyinfra/framework/README.md
+++ b/pyinfra/framework/README.md
@@ -42,14 +42,15 @@ Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.
  Requires a reboot to activate — pyinfra rewrites `/etc/default/grub`
  and runs `update-grub`, but won't reboot for you.
 - `/models/<vendor>/` layout
- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage}/docker-compose.yml`
+- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage,whisper,piper}/docker-compose.yml`
  dropped in, not auto-started — you edit the model path then
  `docker compose up -d`
  (OpenWebUI needs no edits — it's pre-configured to find Ollama at
  `host.docker.internal:11434` and uses `searxng.n0n.io` for web search)
  (Beszel hub at :8090, OpenLIT UI at :3001, Phoenix UI at :6006,
-  OpenHands UI at :3030, Homepage at :7575 — see "Monitoring stack",
-  "Agent harnesses", and "Front door" below)
+  OpenHands UI at :3030, Homepage at :7575, Whisper STT at :10300,
+  Piper TTS at :10200 — see "Monitoring stack", "Agent harnesses",
+  "Front door", and "Voice" below)

 If a previous run installed the native llama.cpp build / full ROCm /
 native Ollama, those are auto-cleaned the next time `./run.sh` runs.
@@ -85,6 +86,31 @@ Compose images in
 (`services.yaml`, `settings.yaml`, etc.); edit there, `./run.sh`, restart
 the homepage container.

+## Voice
+
+Wyoming-protocol speech servers (Home Assistant's voice stack). No web
+UI — these are TCP protocol servers consumable by HA Assist, Wyoming
+satellites, or any Wyoming client.
+
+- **Whisper** (`/srv/docker/whisper`, `tcp://framework:10300`) —
+  speech-to-text. Default model is `tiny-int8` (~75 MB). Models
+  download into `/srv/docker/whisper/data/` on first run.
+
+- **Piper** (`/srv/docker/piper`, `tcp://framework:10200`) —
+  text-to-speech. Default voice is `en_US-lessac-medium` (~63 MB).
+  Browse alternatives at <https://github.com/rhasspy/piper/blob/master/VOICES.md>.
+
+Bring-up:
+```sh
+cd /srv/docker/whisper && docker compose up -d
+cd /srv/docker/piper && docker compose up -d
+```
+
+Both are conservative defaults sized for "small command-style usage";
+on Strix Halo's hardware budget you can run substantially better
+models. See [`localgenai/VoiceModels.md`](../../VoiceModels.md) for the
+landscape and upgrade options.
+
 ## Front door

 - **Homepage** (`/srv/docker/homepage`, http://framework:7575) — single
--- a/pyinfra/framework/compose/homepage/services.yaml
+++ b/pyinfra/framework/compose/homepage/services.yaml
@@ -73,6 +73,21 @@
        server: localhost-docker
        container: phoenix

+- Voice:
+    # Wyoming-protocol services have no web UI; tiles are informational
+    # (container status + port). Click-through goes nowhere meaningful.
+    - Whisper:
+        icon: mdi-microphone-message
+        description: Speech-to-text (Wyoming :10300)
+        server: localhost-docker
+        container: wyoming-whisper
+
+    - Piper:
+        icon: mdi-account-voice
+        description: Text-to-speech (Wyoming :10200)
+        server: localhost-docker
+        container: wyoming-piper
+
 - External:
    - SearXNG:
        icon: searxng.svg
--- a/pyinfra/framework/compose/homepage/settings.yaml
+++ b/pyinfra/framework/compose/homepage/settings.yaml
@@ -21,6 +21,9 @@ layout:
  Observability:
    style: row
    columns: 3
+  Voice:
+    style: row
+    columns: 3
  External:
    style: row
    columns: 3
--- a/pyinfra/framework/compose/piper.yml
+++ b/pyinfra/framework/compose/piper.yml
@@ -0,0 +1,22 @@
+# Wyoming Piper — text-to-speech over the Wyoming protocol.
+# https://github.com/rhasspy/wyoming-piper
+#
+# Wyoming is Home Assistant's voice protocol; it's also consumable by any
+# Wyoming client. No web UI — this is a protocol server on TCP :10200.
+#
+# Voice selection: en_US-lessac-medium is the most balanced English voice
+# (~63 MB, natural prosody). Browse alternatives at
+# https://github.com/rhasspy/piper/blob/master/VOICES.md — pulled into
+# /srv/docker/piper/data on first start.
+services:
+  piper:
+    image: rhasspy/wyoming-piper:latest
+    container_name: wyoming-piper
+    restart: unless-stopped
+    ports:
+      - "10200:10200"
+    volumes:
+      - /srv/docker/piper/data:/data
+    command:
+      - --voice
+      - en_US-lessac-medium
--- a/pyinfra/framework/compose/whisper.yml
+++ b/pyinfra/framework/compose/whisper.yml
@@ -0,0 +1,26 @@
+# Wyoming Whisper — speech-to-text over the Wyoming protocol.
+# https://github.com/rhasspy/wyoming-whisper
+#
+# Wyoming is Home Assistant's voice protocol; it's also consumable by any
+# Wyoming client. No web UI — this is a protocol server on TCP :10300.
+#
+# Model selection: `tiny-int8` is the smallest viable model (~75 MB),
+# fast and good enough for command-style transcription. Bump to
+# `base-int8` (140 MB) or `small-int8` (480 MB) for general dictation.
+# Models are downloaded into /srv/docker/whisper/data on first start.
+services:
+  whisper:
+    image: rhasspy/wyoming-whisper:latest
+    container_name: wyoming-whisper
+    restart: unless-stopped
+    ports:
+      - "10300:10300"
+    volumes:
+      - /srv/docker/whisper/data:/data
+    command:
+      - --model
+      - tiny-int8
+      - --language
+      - en
+      - --beam-size
+      - "1"
--- a/pyinfra/framework/deploy.py
+++ b/pyinfra/framework/deploy.py
@@ -339,6 +339,8 @@ for svc in (
    "phoenix",
    "openhands",
    "homepage",
+    "whisper",
+    "piper",
 ):
    files.directory(
        name=f"compose/{svc} dir",
@@ -470,6 +472,24 @@ for cfg in (
        _sudo=True,
    )

+# Voice stack — Wyoming-protocol Whisper (STT) and Piper (TTS). Models
+# are downloaded on first start; bind-mounting these dirs survives
+# container recreation.
+files.directory(
+    name="Whisper data dir",
+    path=f"{COMPOSE_DIR}/whisper/data",
+    group="docker",
+    mode="2775",
+    _sudo=True,
+)
+files.directory(
+    name="Piper data dir",
+    path=f"{COMPOSE_DIR}/piper/data",
+    group="docker",
+    mode="2775",
+    _sudo=True,
+)
+
 # --- Cleanup of artifacts from the prior native-build deploy ----------------
 # All idempotent — `present=False` is a no-op when the target is absent.