Add OpenAI-compatible voice servers (faster-whisper + Kokoro)

Path B from VoiceModels.md — adds two new compose stacks alongside the Wyoming pair so OpenWebUI/Conduit get voice without a Wyoming-shim: - compose/faster-whisper.yml — fedirz/faster-whisper-server CPU image, large-v3-turbo by default, OpenAI /v1/audio/transcriptions on :8001. Built-in web UI for ad-hoc transcription. - compose/kokoro.yml — ghcr.io/remsky/kokoro-fastapi-cpu, Kokoro-82M, OpenAI /v1/audio/speech on :8880. Both run alongside (not instead of) Wyoming Whisper + Piper — Wyoming keeps serving HA Assist, OpenAI-API serves OpenWebUI / Conduit. Memory budget on Strix Halo accommodates everything plus Qwen3-Coder loaded concurrently with plenty of headroom. Homepage gets dedicated tiles for both. README documents the OpenWebUI Audio configuration that wires the new endpoints. Conduit inherits voice via OpenWebUI without app-side setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 14:42:45 -04:00
parent 36b8cfe835
commit 6db46d8f6a
7 changed files with 157 additions and 27 deletions
--- a/VoiceModels.md
+++ b/VoiceModels.md
@@ -163,6 +163,12 @@ weekend project (gives you voice in the existing browser UIs without
 forcing the Wyoming dependency on every consumer). Skip C/D until you
 have a concrete use case that demands them.

+**Status:** Path B is now live in pyinfra — `compose/faster-whisper.yml`
+and `compose/kokoro.yml` deploy alongside Wyoming. Wire OpenWebUI's
+Audio settings at them per the "Wiring OpenWebUI for voice" section in
+[`pyinfra/framework/README.md`](pyinfra/framework/README.md). The
+Conduit Android app inherits voice transitively through OpenWebUI.
+
 ## Reference catalogs

 - [Hugging Face TTS model leaderboard](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) — blind preference scores across SOTA TTS