0024 — Spawn CLI coding agents over ACP (code_with)
Status: Accepted (PR1 — thin vertical)
Context
protoAgent's lead agent already delegates in two ways: to in-process LLM subagents via task() (DeerFlow pattern, graph/subagents/) and to remote A2A peers via peer_consult (tools/peer_tools.py). Both are talk to another model delegations — neither can pick up a repo, edit files, run the test suite, and hand back a diff.
There is a whole class of work that wants exactly that: "add a /healthz route and run the tests", "fix the failing import in server/chat.py". A purpose-built CLI coding agent — protoCLI (proto), Claude Code, Codex, Gemini CLI — does this far better than a generic tool loop, because it carries its own file access, shell, repo-map, edit/verify harness, and approval UX.
The protoLabs companion stack already solved this. ORBIS (the Python voice companion) is an ACP client: it launches a coding agent as a subprocess and drives one session over the Agent Client Protocol — JSON-RPC 2.0, newline-delimited, on the child's stdin/stdout. The same delegate_to registry routes a2a / openai / acp delegate types. protoCLI, on the other side, speaks the matching server role (proto --acp).
This ADR brings the ACP-client leg into protoAgent so our agents can spawn CLI coding agents. We do not port ORBIS's whole delegate_to registry — protoAgent already covers the a2a and openai legs differently (peers + subagents + the LiteLLM gateway). The new capability is just the ACP one.
Decision
Ship ACP-client support as a first-party, opt-in plugin (plugins/coding_agent), not a core tool. It contributes one tool — code_with(agent, task) — backed by a small ACP client (a port of ORBIS's acp/client.py).
Why a plugin, not core
- The plugin seam already gives config + secrets + Settings + enable/disable for free, with zero core edits — matching the operator-fork contract (ADR 0019) and the Discord/Google precedent (ADR 0018).
- Spawning a coding agent with file + shell access in a workdir is a real authority delegation. It should be off by default and explicitly opted into, exactly like the shipped
helloexample plugin (enabled: false). - Not every fork wants this. A plugin keeps the default tool surface lean (ADR 0005, tool pollution).
Shape
plugins/coding_agent/
protoagent.plugin.yaml # config_section: coding_agent; agents: []; enabled: false
acp_client.py # AcpClient — JSON-RPC 2.0 over the child's stdio
__init__.py # register(): builds code_with from configured agentsConfig (a top-level coding_agent section, ADR 0019):
coding_agent:
default_timeout_s: 600 # coding is slow; per-agent override available
agents:
- name: proto # the name the LLM passes to code_with(agent=…)
command: proto # binary on PATH
args: ["--acp"] # ACP server mode
workdir: ~/dev/my-repo # session cwd — the confinement boundary
# env: { FOO: bar } # optional extra env (merged over the process env)
# timeout_s: 900 # optional per-agent overrideThe tool the lead agent sees:
code_with(agent="proto", task="add a /healthz route and run the tests")
→ the agent's final message text (the work happens in its own session)Confinement & permission posture
- Workdir is config-pinned.
code_withtakes onlyagent+task— never a caller-chosen path. The cwd comes from the matched config entry, so the LLM cannot point a coding agent at an arbitrary directory. Workdirs must be listed in config; an unknownagentreturns an error listing the configured ones. - By-kind permission policy (PR3). The client advertises no client-served
fs/terminalcapability, so the coding agent uses its own file/shell access, scoped to the session cwd. Inboundsession/request_permissionis answered by the agent'spermissionspolicy, keyed on the request'stoolCall.kind:auto(allow all — the PR1 default, mirroring ORBIS),allowlist(denyexecute/delete, allow the rest), orreadonly(allow only read-like kinds). Overridable per agent withallow_kinds/deny_kinds. - Per-call consent gate (PR3).
confirm: truemakescode_withask the operator (viaask_human→input-required) to approve before each call to that agent. The gate runs before any side effect, so the LangGraph resume re-execution is idempotent. - Per-action live HITL is deferred — approving each individual edit/shell command mid-turn would require pausing a blocking subprocess session while asking the operator. LangGraph's
interrupt()checkpoints and re-runs the node on resume, which can't resume a half-finished ACP session awaiting a specific permission response.readonly/allowlistgive deterministic per-action control;confirmgives a per-call human gate. (Same coupling blocks live narration — see Scope.) - The subprocess inherits the server's env (plus any per-agent
env). Run protoAgent under an account whose ambient credentials you're willing to lend the coding agent — or scope itsworkdirto a throwaway checkout.
Wire protocol (ACP, client side)
→ initialize {protocolVersion: 1, clientCapabilities: {fs:{…false}, terminal:false}}
→ session/new {cwd, mcpServers: []} ← {sessionId}
→ session/prompt {sessionId, prompt: [{type:text, text}]} ← {stopReason}
← session/update {update:{sessionUpdate:"agent_message_chunk", content:{text}}} (accumulated → answer)
← session/update {update:{sessionUpdate:"tool_call", title}} (narration → logged)
← session/request_permission {options:[…]} → auto-allowOne AcpClient owns one subprocess + one session, cached per agent so follow-up code_with calls continue the same thread (mirrors the A2A peer's sticky contextId). A per-agent lock serializes turns (a session is a single conversation; task_batch must not interleave two prompts on one session).
Scope
PR1 (#596): ACP client + code_with + config + auto-allow + tests + docs. Synchronous — the final answer is returned; tool_call titles are logged.
PR3 (this update): by-kind permission policy (auto/allowlist/readonly + allow_kinds/deny_kinds); per-call consent gate (confirm); shipped agent recipes (claude-code-acp, codex, gemini). Per-action live HITL is documented as deferred (the blocking-session constraint above).
PR4: a gated eval case (code_with_delegation) verifying end-to-end delegation over a live A2A turn, plus a requires_env skip mechanism in the eval runner so the case (and any future case needing an optional integration) is skipped — not failed — when its prerequisite env var is unset.
Later PRs: live narration of tool_call titles onto A2A working-status frames (so an operator watching a turn sees "Editing app.py") — blocked on the same mid-session channel as per-action HITL, so it wants a general progress mechanism (event-bus ride-along or a stream-factory progress event), not a one-off.
Consequences
- protoAgent gains a third delegation altitude: hand a real coding job to a purpose-built CLI agent and get the result back — without forking.
- The security surface is explicit and opt-in: disabled by default, empty agent list by default, workdir-confined, documented.
- We take a dependency on the target agent being installed and on PATH; a missing binary returns a clear error string, not a crash.
Alternatives considered
- Core tool module (
tools/acp_tools.py) — always present likepeer_tools. Rejected: edits core for a capability most forks won't use, and loses the free config/Settings/enable-disable plumbing. - Full
delegate_toregistry port (a2a + openai + acp) — most faithful to ORBIS. Rejected for now: largest blast radius, and it overlaps protoAgent's existing peer + subagent + gateway seams. The ACP leg is the only genuinely missing one.