pyinfra/framework/compose/faster-whisper.yml

# faster-whisper-server — OpenAI-compatible STT.
# https://github.com/fedirz/faster-whisper-server
#
# Speaks `/v1/audio/transcriptions` (and `/v1/audio/translations`) so any
# client that talks to OpenAI's audio API works without changes —
# OpenWebUI, Conduit (via OpenWebUI), arbitrary scripts.
#
# Runs alongside (not instead of) Wyoming Whisper. Wyoming stays for
# Home Assistant Assist; this server is for OpenAI-API consumers.
#
# CPU mode: Strix Halo's 16 Zen 5 cores comfortably real-time even on
# large-v3-turbo. CTranslate2's ROCm support for gfx1151 is unreliable;
# CPU sidesteps that.
services:
  faster-whisper:
    image: fedirz/faster-whisper-server:latest-cpu
    container_name: faster-whisper
    restart: unless-stopped
    ports:
      - "8001:8000"
    environment:
      # Default model loaded on first request. Auto-downloads on use.
      WHISPER__MODEL: Systran/faster-whisper-large-v3-turbo
      WHISPER__INFERENCE_DEVICE: cpu
      WHISPER__COMPUTE_TYPE: int8
      # Built-in web UI at /
      ENABLE_UI: "true"
    volumes:
      # Persist model downloads across container recreates.
      - /srv/docker/faster-whisper/cache:/root/.cache/huggingface
Add OpenAI-compatible voice servers (faster-whisper + Kokoro) Path B from VoiceModels.md — adds two new compose stacks alongside the Wyoming pair so OpenWebUI/Conduit get voice without a Wyoming-shim: - compose/faster-whisper.yml — fedirz/faster-whisper-server CPU image, large-v3-turbo by default, OpenAI /v1/audio/transcriptions on :8001. Built-in web UI for ad-hoc transcription. - compose/kokoro.yml — ghcr.io/remsky/kokoro-fastapi-cpu, Kokoro-82M, OpenAI /v1/audio/speech on :8880. Both run alongside (not instead of) Wyoming Whisper + Piper — Wyoming keeps serving HA Assist, OpenAI-API serves OpenWebUI / Conduit. Memory budget on Strix Halo accommodates everything plus Qwen3-Coder loaded concurrently with plenty of headroom. Homepage gets dedicated tiles for both. README documents the OpenWebUI Audio configuration that wires the new endpoints. Conduit inherits voice via OpenWebUI without app-side setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-08 14:42:45 -04:00			`# faster-whisper-server — OpenAI-compatible STT.`
			`# https://github.com/fedirz/faster-whisper-server`
			`#`
			# Speaks `/v1/audio/transcriptions` (and `/v1/audio/translations`) so any
			`# client that talks to OpenAI's audio API works without changes —`
			`# OpenWebUI, Conduit (via OpenWebUI), arbitrary scripts.`
			`#`
			`# Runs alongside (not instead of) Wyoming Whisper. Wyoming stays for`
			`# Home Assistant Assist; this server is for OpenAI-API consumers.`
			`#`
			`# CPU mode: Strix Halo's 16 Zen 5 cores comfortably real-time even on`
			`# large-v3-turbo. CTranslate2's ROCm support for gfx1151 is unreliable;`
			`# CPU sidesteps that.`
			`services:`
			`faster-whisper:`
			`image: fedirz/faster-whisper-server:latest-cpu`
			`container_name: faster-whisper`
			`restart: unless-stopped`
			`ports:`
			`- "8001:8000"`
			`environment:`
			`# Default model loaded on first request. Auto-downloads on use.`
			`WHISPER__MODEL: Systran/faster-whisper-large-v3-turbo`
			`WHISPER__INFERENCE_DEVICE: cpu`
			`WHISPER__COMPUTE_TYPE: int8`
			`# Built-in web UI at /`
			`ENABLE_UI: "true"`
			`volumes:`
			`# Persist model downloads across container recreates.`
			`- /srv/docker/faster-whisper/cache:/root/.cache/huggingface`