Build a Tool
How to add a new tool the voice agent can call. Two patterns: sync (block until done, agent speaks the result) and async (return immediately, deliver later — the user can keep chatting).
Sync tools — quick lookups
Use when: the tool returns in ❤️ seconds and the user wants the answer right away.
Steps
- Pick a name. Add a
FunctionSchematoagent/tools.py. - Write the handler.
- Register it.
- Add a latency hint.
Example: a sync weather tool
# agent/tools.py
WEATHER_SCHEMA = FunctionSchema(
name="weather",
description="Get the current weather for a city. Returns a one-sentence summary.",
properties={
"city": {"type": "string", "description": "City name, e.g. 'Tokyo'"},
},
required=["city"],
)
async def weather_handler(params: FunctionCallParams) -> None:
city = params.arguments.get("city", "").strip()
# ... do the work, e.g. call a weather API ...
result = f"It's 72 and partly cloudy in {city}."
await params.result_callback(result) # ← THE result. LLM speaks this.Add to the registry inside register_tools():
llm.register_function(
"weather",
_wrap_sync(weather_handler),
cancel_on_interruption=True, # interrupt cancels the call
)
standard.append(WEATHER_SCHEMA)And register a latency tier in TOOL_LATENCY near the top of agent/tools.py:
TOOL_LATENCY: dict[str, Latency] = {
...
"weather": Latency.MEDIUM, # 0.5-3s expected
}That's it. The LLM will:
- Emit a brief preamble ("hmm, let me check") via the TOOL USE prompt block
- Call
weather(city="Tokyo") - Get the result back
- Speak the answer
Async tools — long-running investigations
Use when: the tool takes many seconds (or minutes) and the user shouldn't have to wait silently.
The critical pattern
Never call result_callback in the foreground for async tools.
Pipecat treats whatever you pass to result_callback() as the finished result. If you call it in the foreground with a placeholder ("I'll get back to you"), the LLM thinks the tool is done with that as the answer — and will fabricate follow-up data about the topic.
The correct shape: do nothing in the foreground, spawn a background task, call result_callback with the real result when the work completes.
Example: an async deep_dive tool
# agent/tools.py
DEEP_DIVE_SCHEMA = FunctionSchema(
name="deep_dive",
description=(
"Kick off a long-running investigation (30s+). Use when the user "
"doesn't need an immediate answer — they can keep chatting and "
"the agent will share the result when it's ready."
),
properties={
"topic": {"type": "string", "description": "What to investigate"},
},
required=["topic"],
)
async def deep_dive_handler(params: FunctionCallParams) -> None:
topic = params.arguments.get("topic", "")
logger.info(f"[deep_dive] starting: {topic!r}")
# NOTHING in the foreground. Pipecat auto-injects a "tool is running"
# placeholder for the LLM context. The LLM's preamble is what the
# user heard immediately ("alright, give me a few minutes").
async def _background() -> None:
result = await actually_do_the_research(topic) # 30+ seconds
# NOW call result_callback. Pipecat injects this as a developer
# message and the LLM speaks the actual answer at the next
# natural opportunity (typically the next user-silence).
await params.result_callback(result)
logger.info(f"[deep_dive] result_callback fired ({len(result)} chars)")
asyncio.create_task(_background())Register it with cancel_on_interruption=False (this is what makes it async):
llm.register_function(
"deep_dive",
deep_dive_handler, # NOTE: not wrapped in _wrap_sync
cancel_on_interruption=False, # ← async path; pipecat treats as deferred
)
standard.append(DEEP_DIVE_SCHEMA)Mark it as async + slow in the metadata:
ASYNC_TOOL_NAMES: frozenset[str] = frozenset({"slow_research", "deep_dive"})
TOOL_LATENCY: dict[str, Latency] = {
...
"deep_dive": Latency.SLOW, # 3s+ expected, generated progress fires
}What happens at runtime
- User asks "can you do a deep dive on quantum computing when you have a moment?"
- LLM emits inline preamble ("alright, give me a few minutes") — spoken immediately via the TOOL USE prompt block
- LLM calls
deep_dive(topic="quantum computing") - Handler returns immediately; pipecat marks the call
in_progress - User keeps chatting — agent answers normally
- 30 seconds later: background task finishes, calls
result_callback(real_result) - Pipecat injects a developer message with the result
- LLM generates a response speaking the real answer; pipeline streams it through TTS at the next natural pause
Latency tiers
Each tool registers a tier that gates filler behavior:
| Tier | When | Behaviour |
|---|---|---|
FAST | <500ms | No preamble, no progress. The answer arrives faster than any filler could. (calculator, get_datetime) |
MEDIUM | 0.5-3s | Inline preamble + answer. (web_search, deep_research, a2a_dispatch, weather) |
SLOW | 3s+ | Inline preamble + periodic generated progress lines until the result lands. (slow_research, deep_dive) |
See agent/tools.TOOL_LATENCY for the registration table.
Hand-off as a delegate (not a tool)
If your "tool" is actually delegating a question to another agent OR a heavier model, don't write a new tool — add it to config/delegates.yaml. The agent already has delegate_to(target, query) which handles both A2A agents and OpenAI-compat endpoints. You get the right behaviour for free, and the LLM picks based on the delegate's description (no prompt engineering required).
delegates:
- name: my_specialist
description: "What this delegate is good at."
type: a2a # or openai
url: ...See Delegates for the full schema. Add a new specialist agent or a new heavier model — same tool, same code path.
Conventions
- Handler is
async def— even sync tools (the LLM call is async-only). - Handler signature is
(params: FunctionCallParams) -> None— return value ignored; speak viaresult_callback. - Sync handlers MUST be wrapped with
_wrap_sync()inregister_tools()— that wrapper handles theon_finishhook (cancels the SLOW progress loop). Async handlers do NOT use the wrapper because their work continues after the function returns. - Catch your own connection errors — see how
delegate_tocatcheshttpx.ConnectErrorto surface a friendly spoken message rather than crash the turn (seeagent/tools.py). - Tool args feed the inline preamble. The TOOL USE prompt instructs the LLM to ground its preamble in the topic, so naming args clearly (
query,city,topic) helps natural acknowledgement. - Don't mention tool names in persona prompts. The LLM picks tools from the schemas, not from prompt re-statements. A persona that says "use
web_searchfor X" goes stale the moment you rename or remove that tool. Write personas that describe behavior — the schemas tell the LLM what's available.
Common pitfalls
- Calling
result_callbackearly in async tools. See the danger callout above. The LLM will hallucinate follow-ups. - Forgetting
ASYNC_TOOL_NAMES. If you registercancel_on_interruption=Falsebut don't add the name toASYNC_TOOL_NAMES, the progress loop keeps firing forever (theon_finishcancel hook never fires for async tools). - Forgetting
TOOL_LATENCY. Defaults to MEDIUM, which is usually fine — but FAST tools without an explicit hint will get a useless preamble for15 * 1.2 + 3. - Not catching backend errors. Tools that throw raw exceptions break the LLM turn (pipecat surfaces an ErrorFrame). Catch and convert to a spoken error string via
result_callback. - Verbose results. The LLM speaks the raw
result_callbackpayload. Keep it short (1-3 sentences) or instruct the LLM to summarize via the persona prompt.