HTTP API

Served by the protovoice FastAPI app on PORT (default 7866).

Authentication

Every /api/* route requires an API key identifying the caller. Send one of:

http

X-API-Key: <key>
Authorization: Bearer <key>

Keys are resolved to users via the roster loaded from Infisical (when INFISICAL_CLIENT_ID + INFISICAL_CLIENT_SECRET + INFISICAL_PROJECT_ID are set) or config/users.yaml as a fallback. See Personas & Skills → Users.

When no user source is configured (empty registry), protoVoice operates in single-user fallback mode — all requests resolve to a synthetic default user regardless of credentials. Keeps local dev and existing tailnet-only deployments working unchanged. The moment the registry has ≥1 user, auth enforcement kicks in.

GET /healthz, GET /, and the PWA static assets stay public (no auth). GET /.well-known/agent-card.json + POST /a2a use their own A2A_AUTH_TOKEN shared-secret scheme — see A2A Integration.

`GET /api/whoami`

Returns the caller's resolved identity + role + access-list + any admin-set viz override. Clients use this to confirm their API key is valid, display the user's name, and branch UI for admin-only controls.

json

{
  "id": "alice",
  "display_name": "Alice",
  "role": "user",
  "allowed_skills": ["josh", "chef"],
  "pinned_viz": { "palette": "Noir" },
  "auth_source": "infisical"
}

role — "user" (default) or "admin". Admins see the full drawer (Voice + Orb tabs, admin CRUD when shipped); users see Voice only.
allowed_skills — list of skill slugs the user can activate, or null for unconstrained. A one-element list causes the client to render a read-only chip in place of the dropdown.
pinned_viz — optional {variant?, palette?, params?} block that overrides the active skill's viz. null when unset.
auth_source — "infisical", "file", or "empty" (single-user fallback; the DEFAULT user resolves as admin).

`POST /api/users/reload`

Re-fetches the user roster from the active source (Infisical or config/users.yaml). Active clients keep their authenticated state until they reconnect; new connections use the refreshed registry. Returns {"ok": true, "users": [...], "source": "..."}.

`GET /`

Returns the browser client HTML. Single-page; click Start to connect.

`GET /healthz`

Public — no auth. Reports process-wide shape, not per-user state.

json

{
  "status": "ok",
  "stt_backend": "local",
  "tts_backend": "fish",
  "auth_source": "infisical",
  "user_count": 3,
  "active_sessions": 1,
  "delegates": [{"name": "ava", "type": "openai"}],
  "skills": ["default", "chef", "voice-01"],
  "audio": {
    "half_duplex": false,
    "echo_guard_ms": 300,
    "noise_filter": "off",
    "smart_turn": "off"
  }
}

auth_source — "infisical", "file", or "empty" (single-user fallback)
user_count — size of the roster (0 in single-user mode)
active_sessions — users with a live voice connection right now

Used by Docker HEALTHCHECK and external monitoring.

`GET /api/verbosity`

json

{"verbosity":"brief"}

Scoped to the calling user. Each user has their own verbosity setting.

`POST /api/verbosity`

http

POST /api/verbosity
Content-Type: application/json

{"level":"narrated"}

Accepts silent / brief / narrated / chatty. Returns the new value or {"error":"..."} on invalid input.

Session-level (module singleton for now); shared across all connected clients. Per-session keying is planned for multi-tenant.

`GET /api/skills`

json

{
  "active": "chef",
  "locked": false,
  "skills": [
    {"slug": "default", "name": "Default", "description": "", "viz": {}},
    {
      "slug": "chef",
      "name": "Chef Bruno",
      "description": "An Italian-American chef...",
      "viz": {
        "variant": "fractal",
        "palette": "Ember",
        "params": {"density": 2.4, "speed": 0.45}
      }
    }
  ]
}

active — the resolved skill for the calling user. For users with allowed_skills set, always falls inside that list (snaps to the first entry when mutable state drifts outside).
locked — true when the caller's allowed_skills has exactly one entry. Clients use this to render a read-only chip in place of the skill dropdown.
skills[] — filtered to the caller's allowed_skills for non-admins. Admins (and unconstrained users) see the full catalog.
skills[].viz — optional orb viz block. The client applies it on skill switch (priority: whoami.pinned_viz → skill.viz → nothing). See Orb visualizer.

`POST /api/skills`

http

POST /api/skills
Content-Type: application/json

{"slug": "chef"}

Applies on the next Start click — skill is snapshotted at connect time. Returns {"error":"..."} if the slug is unknown.

Returns HTTP 403 with detail "skill '<slug>' not in allowed_skills for user '<id>'" when a non-admin caller posts a slug outside their allowed_skills. Admins can change any user's skill via POST /api/admin/skills below.

`POST /api/admin/skills`

Admin-only. Sets any user's mutable active skill, bypassing the skill selector UI.

http

POST /api/admin/skills
Content-Type: application/json

{"user_id": "alice", "slug": "chef"}

json

{"ok": true, "user_id": "alice", "active": "chef"}

Requires the caller's role to be admin (returns 403 otherwise).
Updates the target's in-memory UserState.skill_slug. Applies on their next connect; active sessions keep their snapshotted skill.
Does not modify allowed_skills — for persistent access-list changes, edit config/users.yaml (or the USERS_YAML Infisical secret) and POST /api/users/reload.
Returns {"error": "..."} for unknown user_id or unknown slug.

`POST /api/skills/reload`

http

POST /api/skills/reload

Re-reads every config/skills/*.yaml + SOUL.md from disk. Active sessions keep their captured skill snapshot until they reconnect. Returns {"ok": true, "skills": [...], "active": "<slug>"}.

`POST /api/delegates/reload`

http

POST /api/delegates/reload

Re-reads config/delegates.yaml from disk. Safe mid-session — delegate lookup happens per delegate_to() call, so in-flight sessions see the new registry on their next dispatch. Returns {"ok": true, "delegates": [...]}.

`GET /api/voice/references`

json

{"backend": "fish", "references": ["josh_sample_1", "voice_01", "voice_02"]}

Returns the Fish server's saved reference IDs. Empty list if TTS_BACKEND isn't fish.

`POST /api/voice/clone`

Multipart upload — clones a new voice AND creates a skill pointing at it.

http

POST /api/voice/clone
Content-Type: multipart/form-data

audio:       <file>       required — 10-30 s WAV/MP3/FLAC/OGG
slug:        <string>     required — lowercase [a-z0-9\-_] 2-64 chars
name:        <string>     optional — defaults to title-cased slug
transcript:  <string>     optional — omit to auto-transcribe via Whisper
description: <string>     optional

Response:

json

{
  "ok": true,
  "slug": "alex",
  "name": "Alex",
  "transcript": "...",
  "auto_transcribed": true
}

Side effects:

Saved reference on the Fish server (POST /v1/references/add)
New YAML at config/skills/<slug>.yaml pointing at the reference, reusing SOUL.md for the system prompt
In-memory skill dict reloaded — the dropdown picks up the new skill immediately

Returns {"error": "..."} if: the slug collides with an existing skill, Whisper produces empty text, Fish rejects the reference, etc.

`POST /api/offer`

Pipecat SmallWebRTCTransport signalling. Accepts the SDP offer, instantiates a SmallWebRTCConnection, returns the SDP answer + pc_id.

http

POST /api/offer
Content-Type: application/json

{
  "sdp": "v=0\r\no=...",
  "type": "offer"
}

Response:

json

{
  "sdp": "v=0\r\no=...",
  "type": "answer",
  "pc_id": "SmallWebRTCConnection#0-<uuid>"
}

See WebRTC Protocol for the full handshake.

`PATCH /api/offer`

Trickle ICE candidates. Required after the POST answer returns with a pc_id.

http

PATCH /api/offer
Content-Type: application/json

{
  "pc_id": "SmallWebRTCConnection#0-<uuid>",
  "candidates": [
    {
      "candidate": "candidate:1 1 udp 2113667326 192.168.1.10 58387 typ host",
      "sdp_mid": "0",
      "sdp_mline_index": 0
    }
  ]
}

Returns {"status":"success"} on accepted, 404 if pc_id is unknown (e.g. connection expired).

`GET /.well-known/agent-card.json`

A2A agent card. Aliased as /.well-known/agent.json for legacy callers. Cached one minute (Cache-Control: public, max-age=60).

`POST /a2a`

JSON-RPC 2.0 inbound. Supports method: "message/send". Requires X-API-Key or Authorization: Bearer <A2A_AUTH_TOKEN> when A2A_AUTH_TOKEN is set.

Returns an A2A task result with artifacts[0].parts[0] containing the assistant text. See A2A Integration for the full flow.

`POST /a2a/callback`

Receives push-notification results from agents we dispatched to. Accepts flexible shapes; looks for text at body.text, body.result.artifacts[].parts[].text, or body.message.parts[].text.

When an active voice session exists, speaks the result via the DeliveryController with next_silence policy. Otherwise returns {"ok": true, "delivered": false, "reason": "no active session"}.

`GET /static/*`

Static assets from ./static/. Currently used only for the browser client.

HTTP API ​

Authentication ​

GET /api/whoami ​

POST /api/users/reload ​

GET / ​

GET /healthz ​

GET /api/verbosity ​

POST /api/verbosity ​

GET /api/skills ​

POST /api/skills ​

POST /api/admin/skills ​

POST /api/skills/reload ​

POST /api/delegates/reload ​

GET /api/voice/references ​

POST /api/voice/clone ​

POST /api/offer ​

PATCH /api/offer ​

GET /.well-known/agent-card.json ​

POST /a2a ​

POST /a2a/callback ​

GET /static/* ​

HTTP API

Authentication

`GET /api/whoami`

`POST /api/users/reload`

`GET /`

`GET /healthz`

`GET /api/verbosity`

`POST /api/verbosity`

`GET /api/skills`

`POST /api/skills`

`POST /api/admin/skills`

`POST /api/skills/reload`

`POST /api/delegates/reload`

`GET /api/voice/references`

`POST /api/voice/clone`

`POST /api/offer`

`PATCH /api/offer`

`GET /.well-known/agent-card.json`

`POST /a2a`

`POST /a2a/callback`

`GET /static/*`