Add Wyoming voice stack to pyinfra + landscape doc
- Move piper-compose.yaml / whisper-compose.yaml from repo root into
pyinfra/framework/compose/{piper,whisper}.yml; bind paths shifted to
/srv/docker/{piper,whisper}/data on the box.
- deploy.py registers both stacks and provisions the data dirs.
- Homepage gets a "Voice" group with informational tiles (Wyoming has
no web UI, so tiles show container status without click-through).
- New VoiceModels.md captures the May 2026 STT/TTS landscape, why the
current Wyoming defaults aren't SOTA, and concrete upgrade paths
(whisper-large-v3-turbo + faster-whisper-server, Kokoro, Sesame CSM,
F5-TTS for cloning).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -42,14 +42,15 @@ Or run it ephemerally without installing: `uvx pyinfra inventory.py deploy.py`.
|
||||
Requires a reboot to activate — pyinfra rewrites `/etc/default/grub`
|
||||
and runs `update-grub`, but won't reboot for you.
|
||||
- `/models/<vendor>/` layout
|
||||
- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage}/docker-compose.yml`
|
||||
- `/srv/docker/{llama,vllm,ollama,openwebui,beszel,openlit,phoenix,openhands,homepage,whisper,piper}/docker-compose.yml`
|
||||
dropped in, not auto-started — you edit the model path then
|
||||
`docker compose up -d`
|
||||
(OpenWebUI needs no edits — it's pre-configured to find Ollama at
|
||||
`host.docker.internal:11434` and uses `searxng.n0n.io` for web search)
|
||||
(Beszel hub at :8090, OpenLIT UI at :3001, Phoenix UI at :6006,
|
||||
OpenHands UI at :3030, Homepage at :7575 — see "Monitoring stack",
|
||||
"Agent harnesses", and "Front door" below)
|
||||
OpenHands UI at :3030, Homepage at :7575, Whisper STT at :10300,
|
||||
Piper TTS at :10200 — see "Monitoring stack", "Agent harnesses",
|
||||
"Front door", and "Voice" below)
|
||||
|
||||
If a previous run installed the native llama.cpp build / full ROCm /
|
||||
native Ollama, those are auto-cleaned the next time `./run.sh` runs.
|
||||
@@ -85,6 +86,31 @@ Compose images in
|
||||
(`services.yaml`, `settings.yaml`, etc.); edit there, `./run.sh`, restart
|
||||
the homepage container.
|
||||
|
||||
## Voice
|
||||
|
||||
Wyoming-protocol speech servers (Home Assistant's voice stack). No web
|
||||
UI — these are TCP protocol servers consumable by HA Assist, Wyoming
|
||||
satellites, or any Wyoming client.
|
||||
|
||||
- **Whisper** (`/srv/docker/whisper`, `tcp://framework:10300`) —
|
||||
speech-to-text. Default model is `tiny-int8` (~75 MB). Models
|
||||
download into `/srv/docker/whisper/data/` on first run.
|
||||
|
||||
- **Piper** (`/srv/docker/piper`, `tcp://framework:10200`) —
|
||||
text-to-speech. Default voice is `en_US-lessac-medium` (~63 MB).
|
||||
Browse alternatives at <https://github.com/rhasspy/piper/blob/master/VOICES.md>.
|
||||
|
||||
Bring-up:
|
||||
```sh
|
||||
cd /srv/docker/whisper && docker compose up -d
|
||||
cd /srv/docker/piper && docker compose up -d
|
||||
```
|
||||
|
||||
Both are conservative defaults sized for "small command-style usage";
|
||||
on Strix Halo's hardware budget you can run substantially better
|
||||
models. See [`localgenai/VoiceModels.md`](../../VoiceModels.md) for the
|
||||
landscape and upgrade options.
|
||||
|
||||
## Front door
|
||||
|
||||
- **Homepage** (`/srv/docker/homepage`, http://framework:7575) — single
|
||||
|
||||
@@ -73,6 +73,21 @@
|
||||
server: localhost-docker
|
||||
container: phoenix
|
||||
|
||||
- Voice:
|
||||
# Wyoming-protocol services have no web UI; tiles are informational
|
||||
# (container status + port). Click-through goes nowhere meaningful.
|
||||
- Whisper:
|
||||
icon: mdi-microphone-message
|
||||
description: Speech-to-text (Wyoming :10300)
|
||||
server: localhost-docker
|
||||
container: wyoming-whisper
|
||||
|
||||
- Piper:
|
||||
icon: mdi-account-voice
|
||||
description: Text-to-speech (Wyoming :10200)
|
||||
server: localhost-docker
|
||||
container: wyoming-piper
|
||||
|
||||
- External:
|
||||
- SearXNG:
|
||||
icon: searxng.svg
|
||||
|
||||
@@ -21,6 +21,9 @@ layout:
|
||||
Observability:
|
||||
style: row
|
||||
columns: 3
|
||||
Voice:
|
||||
style: row
|
||||
columns: 3
|
||||
External:
|
||||
style: row
|
||||
columns: 3
|
||||
|
||||
22
pyinfra/framework/compose/piper.yml
Normal file
22
pyinfra/framework/compose/piper.yml
Normal file
@@ -0,0 +1,22 @@
|
||||
# Wyoming Piper — text-to-speech over the Wyoming protocol.
|
||||
# https://github.com/rhasspy/wyoming-piper
|
||||
#
|
||||
# Wyoming is Home Assistant's voice protocol; it's also consumable by any
|
||||
# Wyoming client. No web UI — this is a protocol server on TCP :10200.
|
||||
#
|
||||
# Voice selection: en_US-lessac-medium is the most balanced English voice
|
||||
# (~63 MB, natural prosody). Browse alternatives at
|
||||
# https://github.com/rhasspy/piper/blob/master/VOICES.md — pulled into
|
||||
# /srv/docker/piper/data on first start.
|
||||
services:
|
||||
piper:
|
||||
image: rhasspy/wyoming-piper:latest
|
||||
container_name: wyoming-piper
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "10200:10200"
|
||||
volumes:
|
||||
- /srv/docker/piper/data:/data
|
||||
command:
|
||||
- --voice
|
||||
- en_US-lessac-medium
|
||||
26
pyinfra/framework/compose/whisper.yml
Normal file
26
pyinfra/framework/compose/whisper.yml
Normal file
@@ -0,0 +1,26 @@
|
||||
# Wyoming Whisper — speech-to-text over the Wyoming protocol.
|
||||
# https://github.com/rhasspy/wyoming-whisper
|
||||
#
|
||||
# Wyoming is Home Assistant's voice protocol; it's also consumable by any
|
||||
# Wyoming client. No web UI — this is a protocol server on TCP :10300.
|
||||
#
|
||||
# Model selection: `tiny-int8` is the smallest viable model (~75 MB),
|
||||
# fast and good enough for command-style transcription. Bump to
|
||||
# `base-int8` (140 MB) or `small-int8` (480 MB) for general dictation.
|
||||
# Models are downloaded into /srv/docker/whisper/data on first start.
|
||||
services:
|
||||
whisper:
|
||||
image: rhasspy/wyoming-whisper:latest
|
||||
container_name: wyoming-whisper
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "10300:10300"
|
||||
volumes:
|
||||
- /srv/docker/whisper/data:/data
|
||||
command:
|
||||
- --model
|
||||
- tiny-int8
|
||||
- --language
|
||||
- en
|
||||
- --beam-size
|
||||
- "1"
|
||||
@@ -339,6 +339,8 @@ for svc in (
|
||||
"phoenix",
|
||||
"openhands",
|
||||
"homepage",
|
||||
"whisper",
|
||||
"piper",
|
||||
):
|
||||
files.directory(
|
||||
name=f"compose/{svc} dir",
|
||||
@@ -470,6 +472,24 @@ for cfg in (
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# Voice stack — Wyoming-protocol Whisper (STT) and Piper (TTS). Models
|
||||
# are downloaded on first start; bind-mounting these dirs survives
|
||||
# container recreation.
|
||||
files.directory(
|
||||
name="Whisper data dir",
|
||||
path=f"{COMPOSE_DIR}/whisper/data",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
files.directory(
|
||||
name="Piper data dir",
|
||||
path=f"{COMPOSE_DIR}/piper/data",
|
||||
group="docker",
|
||||
mode="2775",
|
||||
_sudo=True,
|
||||
)
|
||||
|
||||
# --- Cleanup of artifacts from the prior native-build deploy ----------------
|
||||
# All idempotent — `present=False` is a no-op when the target is absent.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user