Path B from VoiceModels.md — adds two new compose stacks alongside the Wyoming pair so OpenWebUI/Conduit get voice without a Wyoming-shim: - compose/faster-whisper.yml — fedirz/faster-whisper-server CPU image, large-v3-turbo by default, OpenAI /v1/audio/transcriptions on :8001. Built-in web UI for ad-hoc transcription. - compose/kokoro.yml — ghcr.io/remsky/kokoro-fastapi-cpu, Kokoro-82M, OpenAI /v1/audio/speech on :8880. Both run alongside (not instead of) Wyoming Whisper + Piper — Wyoming keeps serving HA Assist, OpenAI-API serves OpenWebUI / Conduit. Memory budget on Strix Halo accommodates everything plus Qwen3-Coder loaded concurrently with plenty of headroom. Homepage gets dedicated tiles for both. README documents the OpenWebUI Audio configuration that wires the new endpoints. Conduit inherits voice via OpenWebUI without app-side setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 lines
782 B
YAML
21 lines
782 B
YAML
# Kokoro-FastAPI — OpenAI-compatible TTS in front of Kokoro-82M.
|
|
# https://github.com/remsky/Kokoro-FastAPI
|
|
#
|
|
# Speaks `/v1/audio/speech`. Pair with faster-whisper-server for a full
|
|
# OpenAI-compatible voice loop driving OpenWebUI / Conduit.
|
|
#
|
|
# Kokoro-82M (hexgrad, Jan 2025) is small (~340 MB) but produces
|
|
# noticeably more natural prosody than Piper. Apache 2.0 licence.
|
|
# CPU image is plenty fast for this model size on Strix Halo.
|
|
services:
|
|
kokoro:
|
|
image: ghcr.io/remsky/kokoro-fastapi-cpu:latest
|
|
container_name: kokoro
|
|
restart: unless-stopped
|
|
ports:
|
|
# 8880 is Kokoro-FastAPI's own default — keeping the same on the
|
|
# host side so docs/tutorials line up.
|
|
- "8880:8880"
|
|
volumes:
|
|
- /srv/docker/kokoro/models:/app/api/src/models
|