Run Without the Fish Sidecar

For single-GPU hosts, or when you don't need voice cloning, skip the Fish Audio container entirely.

One-service start

bash

TTS_BACKEND=kokoro docker compose up -d protovoice

protovoice alone. No fish-speech.
Kokoro runs in-process inside the protovoice container (~2 GB VRAM, no GPU sharing penalty).
FISH_URL is ignored.

Fits on a 24 GB+ card (RTX 3090, 4090, PRO 6000, A100, H100, etc.).

bash

TTS_BACKEND=kokoro KOKORO_VOICE=am_michael docker compose up -d protovoice

Common voices:

Full list on the Kokoro HF card.

If you care about turn latency and don't need cloning, Kokoro is often the better choice.