How-To Guides
Task-oriented. Each page assumes you already know what you want to do — if you're looking for orientation, start with Tutorials.
TTS
- Switch TTS Backend — Fish ↔ Kokoro
- Clone a Voice — upload a reference clip and speak in that voice
- Run Without the Fish Sidecar — Kokoro-only, single container
LLM
- Use an External LLM — point at a gateway, a remote vLLM, or OpenAI
All-API setup
- Use LocalAI (all-API) — swap STT + TTS + LLM to OpenAI-compat endpoints; run protoVoice on a CPU box
Voice agent behaviour
- Configure Verbosity — tune filler chattiness (silent / brief / narrated / chatty)
- Backchannels — listener-acks ("mm-hmm") during long user turns
- Delivery Policies —
now/next_silence/when_askedfor async tool results - Personas & Skills — swap voice + system prompt per skill YAML
- Users & API Keys — configure the auth roster (Infisical or
config/users.yaml) - Audio Handling — echo guard, half-duplex, noise filter, smart-turn
Extending the agent
- Build a Tool — sync vs async patterns, latency tiers, the result_callback gotcha
Fleet integration
- A2A Integration — inbound JSON-RPC + callback webhook + outbound dispatch
Ops
- Benchmarking — measure LLM / TTS / STT / A2A latency with
scripts/bench.py