DECISIONS β architectural decision records β
Lightweight ADRs. Each entry: context (what we knew at the time), decision (what we chose), consequences (what we accepted, what's now closed off). Sorted newest-first.
0015 β Pin [tracing] extra to langfuse v2 (LiteLLM compatibility) β
2026-05-03 Β· Status: accepted
Context. First production deploy of the chat agent (#13) crashed LiteLLM's own Langfuse callback at boot:
Langfuse.__init__() got an unexpected keyword argument 'sdk_integration'LiteLLM's proxy-runtime extra hard-pins langfuse==2.59.7 and constructs Langfuse(sdk_integration=...). v3 dropped that kwarg without a deprecation. Our [tracing] extra requested langfuse>=3.0,<4, pip resolved to 3.14.6, and LiteLLM's callback registration broke. The error is non-blocking (the request still flows) but it killed Langfuse tracing on the gateway entirely β both LiteLLM's coarse traces AND our fine-grained ones.
Two fixes were on the table:
- Pin our extra to v2: LiteLLM's callback works again. Our v3-only
_tracing.py(usesfrom langfuse import get_client+start_as_current_span) gracefully fails the import on v2 and falls back to no-op spans. Trade: no fine-grained sub-spans from us; only LiteLLM's per-request traces emit. - Make
_tracing.pysupport both v2 and v3 APIs: full observability under either pin. Cost: write a v2 adapter (Langfuse() + trace().span()shape, very different from v3's OTel- flavouredstart_as_current_spancontext manager).
Decision. Pin to v2 immediately; defer the v2 adapter.
The fast path was the right call because (a) LiteLLM's callback was the production-blocking failure, (b) LiteLLM's per-request traces already give the most-asked-for observability ("which model, what input, what output, latency"), (c) the v2 adapter is real engineering work that shouldn't gate a hotfix.
Consequences.
- β LiteLLM Langfuse callback resumes immediately on next deploy
- β
Our
_tracing.pyno-ops cleanly on v2 (verified locally) β no crashes, no silent corruption - β Our fine-grained sub-spans (workflow_stem, prompt_id, comfyui.wait_for_completion latency, agent iteration tree) are invisible until a v2 adapter lands. The agent's "which tool got called" stays observable via LiteLLM's request/response capture.
- β When LiteLLM eventually upgrades to langfuse v3, this pin becomes a problem in reverse. Treat it as a known migration debt (track in CHANGELOG).
0014 β Agent must use AsyncOpenAI inside an async LiteLLM proxy β
2026-05-03 Β· Status: accepted
Context. First deploy of #13 hung for 2+ minutes per chat turn, no error, no progress. Diagnosis took longer than the fix because the proxy logs showed nothing useful β the request just sat there.
Root cause: the agent's LLM call used the synchronous OpenAI(...). chat.completions.create(...), which inside an async coroutine blocks the entire event loop. The agent's PROTOBANANA_AGENT_BASE points at the same gateway (http://localhost:4000/v1), so the agent calls back through LiteLLM. But the sync call held the event loop while waiting on a response that needed the same loop to be served. Instant deadlock. The block was invisible: no exception, no "request in flight" log line, just an infinitely waiting socket.
Decision. Hard rule: any LLM client invoked inside protoBanana's async path uses AsyncOpenAI and is awaited. The agent module already had async def run; the OpenAI client was the one synchronous call. Fix was a 2-line change.
Linting/contract: protobanana.agent._build_lm_client returns AsyncOpenAI, and tests use AsyncMock for the chat.completions. create mock so an accidental switch back to sync would fail the suite.
Consequences.
- β Chat turn that previously hung forever now completes in seconds
- β Tests catch a regression to sync (AsyncMock vs MagicMock β sync call on AsyncMock returns a coroutine that's never awaited, fail pattern is loud)
- β Future contributors who reflexively reach for
OpenAI()will hit the same trap. Documented inagent.py's_build_lm_client()docstring + this ADR. - This trap generalises: any sync HTTP/LLM call inside the proxy's async path will deadlock the same way if it loops back. Worth remembering when writing future tools that call out.
0013 β Tool-use chat agent on /v1/chat/completions β
2026-05-03 Β· Status: accepted
Context. The Phase 1 chat path was a deterministic keywordβworkflow dispatcher: walk message history, extract latest user prompt, classify the operation by keyword match, run one image op, embed the result as a markdown data URL. That worked for simple imperative prompts (draw a cat, make her shirt blue) but couldn't handle the things users actually do in conversation:
- Conversational replies β
thanks!got an image of "thanks!" generated, because the keyword router can't not run an op - Clarifying questions β
make it redwas ambiguous (the what?) but had no way to ask - Chained operations β
remove the bg, then add a sunsetneeded two ops in one reply; the keyword router picked one - Natural feedback β
actually I prefer the previous onehad nowhere to land; the router had no concept of "no op needed"
These weren't fixable with smarter keywords. The chat path needed an LLM that owned the entire conversation surface; image ops became tools it could choose to call.
Decision. Replace the keyword classifier with a tool-use agent loop on the chat path:
- Build OpenAI-shape tool definitions for all 6 image ops (
generate_image,edit_image,region_edit,remove_background,multi_ref_compose,outpaint). - On each chat completion, call the configured LM (default
protolabs/fast= Qwen3.6-35B-A3B-FP8) with the chat history + tool list. - If the LM returns text only β return it (conversational reply). If the LM returns tool calls β execute each against ComfyUI β feed
{success, image_size_bytes}back to the LM β loop. Cap iterations at 3. - The LM never sees image bytes. Server-side state holds the most recent image + the full image list for
multi_ref_compose. The final response embeds the last produced image as a markdown data URL (same shape as before, so existing clients keep rendering). - Keep the keyword classifier as a fallback. When the agent is disabled (no
PROTOBANANA_AGENT_BASE), the openai client isn't importable, or the first LM call fails, the provider falls through to the existing keyword dispatch β degraded UX (image-only output) but the system stays up.
Consequences.
- β Conversational replies, clarifying questions, multi-step chains all become possible
- β The natural way to add Phase 8+ ops is "add a new tool definition + executor" β no new keyword arms, no new dispatch branches
- β
Smaller routing models (Qwen-fast) work but need anchored prompts; larger models (Claude, GPT-4) would need less hand-holding.
PROTOBANANA_AGENT_MODEL=protolabs/smartswitches to a thinking model when routing quality matters more than latency. - β +500-800 ms latency per chat turn (the LM call before any image work starts)
- β Failure surface expanded: prompt drift, hallucinated tool args, infinite tool loops. Mitigated by
temperature=0,max_iterations=3, and the keyword fallback. - β Tool descriptions become a contract the LLM reads; bad copy β bad routing. Locked the description style in
protobanana/tools.py. - See ADR 0014 (AsyncOpenAI) and 0015 (langfuse v2 pin) for the production-deploy bugs this surfaced.
0012 β Static workflow validator + e2e smoke as the pre-merge gate β
2026-05-03 Β· Status: accepted
Context. Three predecessor incidents took the gateway alias offline:
_metatop-level key crashed ComfyUI as an orphan node (PR #50)ImageScaleToTotalPixelssilently addedresolution_stepsas requirededit_qwen_image_2511.jsonshipped withCLIPTextEncodeinstead ofTextEncodeQwenImageEditPlusβ the workflow was syntactically valid, submitted cleanly, returned an image; the image just had nothing to do with the input (the conditioning bug, see ADR 0011)
The first two are pure schema mismatches. The third is a semantic bug β the workflow is structurally fine, but the meaning is wrong: text-only conditioning routed to an instruction-edit model. Static validation can never catch the third.
Decision. Two-tier pre-merge gate:
Static validator (
scripts/validate_workflows.py) hits ComfyUI's/object_infoand asserts: every node has a realclass_type, every required input is present, every COMBO value is in the allowed list. Runtime-substituted fields (e.g.LoadImage.imageplaceholders) are whitelisted via_RUNTIME_SUBSTITUTED. Catches schema drift.E2E smoke (one-off submission with a known-recognizable input, visual or numeric assertion on the output). For edit-shaped workflows: red square + white circle + prompt that should preserve the background. Output's average RGB confirms input was respected. Catches semantic bugs.
Static is fast (<1s) and runs in pytest with COMFYUI_BASE_URL. E2E is slower (~20s per submission) and lives as a documented script + checklist item in DECISIONS / validating-workflows.md, not blocking CI yet.
Consequences.
- β
Schema-class bugs blocked at PR-time (caught the BiRefNet
class_typebug on first run) - β Semantic bugs caught before user reports (would have caught the conditioning bug if the e2e step had existed)
- β Documents the gap explicitly so we don't pretend the validator is enough
- β E2E smoke isn't automated yet β relies on the dev to run it. Worth promoting to CI once we have a managed ComfyUI test endpoint.
- β Validator depends on a live ComfyUI; can't run in pure-unit CI. Mitigated by graceful skip when
COMFYUI_BASE_URLis unset.
0011 β TextEncodeQwenImageEditPlus is the only correct conditioning path for edit/multiref β
2026-05-03 Β· Status: accepted
Context. Initial edit_qwen_image_2511.json and multiref_qwen_image_2511.json used the generic SDXL-era pattern:
LoadImage β ImageScale β VAEEncode β KSampler(latent_image)
CLIPTextEncode("the prompt") β KSampler(positive)
CLIPTextEncode("negative") β KSampler(negative)This is correct for SDXL/SD3-style img2img where the input image acts as a noisy starting point and denoise < 1.0 preserves some of it. It is wrong for Qwen-Image-Edit, where:
- The model expects the image as visual conditioning (encoded via the Qwen2.5-VL vision tower, attended to during denoising), not as a noisy latent.
- The published example workflows always run
denoise=1.0β thelatent_imageprovides spatial dimensions, but the actual edit signal comes from the text-encoder branch with the image attached.
With our broken pattern: at denoise=1.0 the latent_image is fully re-noised, and CLIPTextEncode only sees text. Net result: zero image conditioning β fresh unrelated output β user-visible "edit produced a new image."
Decision. Both edit and multiref use TextEncodeQwenImageEditPlus on positive AND negative encoders, with the scaled input image piped into image1 (and image2/image3 for multiref). The negative encoder receives the same image so the model has consistent visual context across both conditioning streams.
Consequences.
- β Edit actually edits β input image is respected, output is a modified version
- β Same encoder shape works for 1-3 images via image1/2/3 β keeps edit and multiref symmetric
- β Documents that "looks like img2img" is a misleading shape for instruction-edit models β important for any future edit-class model we adopt
- β
_set_prompt()helper introduces a small branch (write topromptfor Qwen edit encoders,textfor legacy CLIPTextEncode). Acceptable: workflows that haven't migrated keep working. - β The bug shipped to production for ~2 days. Caught by user testing in the Gradio app, not by tests. Drove ADR 0012.
0010 β Standalone repo extraction (protoBanana) β
2026-05-03 Β· Status: accepted
Context. Provider lived inline in homelab-iac/stacks/ai/config/litellm/ providers/comfyui_image.py after PRs #52-#53. As Phases 2-7 came into scope, the inline approach was hard to test, hard to share, hard to publish, and hard to track in commits.
Decision. Extract to protoLabsAI/protoBanana. Apache-2.0. pip-installable. Workflows ship with the package. Tests are first-class. Documentation is the publishable artifact.
Consequences.
- β Test suite is now isolated; can run in CI on every PR
- β External users can install without joining the protoLabs repo
- β Phases 4-7 land as PRs against this repo, not infra commits
- β One more repo to maintain and version
- β homelab-iac must now
pip install protobananafrom a tag, not read from a mounted directory β adds release discipline
0009 β Operation enum, single classifier function β
2026-05-03 Β· Status: accepted
Context. Need to choose between (a) one big classifier with a switch statement, (b) a chain-of-classifiers (each operation has its own bool predicate), (c) a state machine.
Decision. Single classify_operation() returning an Operation enum. Inside it, a fixed priority order: explicit-mask > bgremove > outpaint > inpaint > region_edit > multiref > edit > gen.
Consequences.
- β Adding a new operation = add to enum + add a check + add tests. ~30 LOC.
- β Priority is auditable β read top-to-bottom in the function
- β Tightly coupled β every new op requires editing the central function
- β Order matters; reordering an arm can break previously-passing tests
0008 β Markdown-embedded data URL for chat output β
2026-05-03 Β· Status: accepted
Context. /v1/chat/completions with image output has two protocols:
- Multimodal
contentlist withimage_urlparts (OpenAI's vision protocol; well-defined but client-side rendering is patchy) - String
contentwith markdown image embed(renders in any markdown-capable UI; not OpenAI-canonical)
Decision. Use option 2 (markdown embed) by default. Phase 7 may add a config flag to switch per model_list entry.
Consequences.
- β Works in Open WebUI, Slack rich-text bridges, Discord, GitHub comments, etc. β any markdown renderer
- β Client SDKs that just print the response.content show a working data URL that browsers can open
- β Programmatic detection harder β clients must regex-extract
- β Token-count accounting weird β base64 is huge in token terms but zero useful signal
- β Diverges from OpenAI's preferred multimodal output protocol
0007 β Cap reference images at 3 β
2026-05-03 Β· Status: accepted (Qwen ceiling)
Context. Qwen-Image-Edit-2511 specs cleanly handle up to 3 reference images via parallel encode + conditioning stack. Nano-Banana 2 supports 14. Some clients might pass more.
Decision. Hard cap at 3 in the provider. Take first-3-in-document-order when more arrive. Document the limit in HOWTO + PHASES.
Consequences.
- β Predictable behavior; no silent degradation past N=3
- β Matches model spec β pushing past 3 hurts quality even if the workflow tolerates it
- β Can't compete with nano-banana on multi-ref-heavy tasks until Qwen ships a higher-ref variant
- β Open question: should we eventually pre-mux 4+ images into 3 via semantic selection? Defer until real demand.
0006 β Server-side workflow substitution (not client) β
2026-05-03 Β· Status: accepted (PR #52 outcome)
Context. Open WebUI's IMAGE_GENERATION_ENGINE=comfyui mode requires the client to know ComfyUI's prompt schema and node-IDs. PR #49-#51 showed the contract is brittle: each Open WebUI version shifts the env schema (COMFYUI_WORKFLOW_NODES went from CSV string β JSON list of strings β list of arrays).
Decision. Provider mutates node IDs server-side in Python. Clients just send standard OpenAI requests; they need to know nothing about ComfyUI.
Consequences.
- β Open WebUI version drift doesn't affect us
- β Adding a client (protoCLI, raw curl, etc.) is zero work
- β Workflow JSONs can be debugged standalone in ComfyUI's UI
- β Provider must know each workflow's node-ID conventions (mitigated by per-route
substitute()functions that own the convention) - β If we want to support arbitrary user-supplied workflows, we'd need a node-mapping config β defer until that's a real ask
0005 β BiRefNet default, RMBG-2.0 opt-in β
2026-05-03 Β· Status: accepted
Context. RMBG-2.0 outperforms BiRefNet (90% vs 85% on bg-removal benchmark) but is CC BY-NC 4.0 β non-commercial only. BiRefNet is open-license, commercial-safe.
Decision. Default workflow bgremove_birefnet.json ships with BiRefNet. bgremove_rmbg2.json is the opt-in alternative for personal or non-commercial users. Both workflows live in workflows/; the user picks via model_name (protolabs/qwen-image-bgremove or protolabs/qwen-image-bgremove-rmbg).
Consequences.
- β Out-of-box installs are commercial-safe
- β Users who don't care about license get the higher-quality option by deliberate choice
- β Documentation can be honest about the trade
- β Two workflows to maintain (mitigated: bg-remove workflows are trivial β 3 nodes each)
- β Slightly more complex model_list entry naming
0004 β huggingface-cli download cache-only by default β
2026-05-03 Β· Status: accepted (after the deletion incident)
Context. Used --local-dir for a "redundant" download, then rm -rf'd the duplicate to free 29 GB. Six ComfyUI symlinks broke because --local-dir shares blob inodes with the cache via hardlinks; deleting the local-dir cascaded into the cache.
Decision. Default to cache-only downloads (HF_HOME=/mnt/models/ huggingface huggingface-cli download <repo> <files>, no --local-dir). Document in lab CLAUDE.md storage rules. If --local-dir is required, also pass --local-dir-use-symlinks False so files are full copies.
Consequences.
- β No more cascading deletes
- β Slightly less convenient for "I want files in a specific location" use cases β must accept the cache layout
0003 β Strip metadata keys (_meta, _doc) from workflow JSON β
2026-05-03 Β· Status: accepted (after PR #50)
Context. Wanted documentation embedded in workflow JSONs (model filenames, expected paths, license notes). Initially used a top-level _meta key.
ComfyUI iterates all top-level keys as node IDs and validates each has class_type. _meta raised missing_node_type β HTTP 500 β Open WebUI failed parsing the response with Expecting value: line 1 column 1 (char 0).
Decision. Move documentation to workflows/README.md and per-workflow _doc fields that the loader strips before submission. The loader filters top-level keys, keeping only those with class_type set.
Consequences.
- β Workflows are valid ComfyUI input
- β
Inline
_docfield still works for IDE tooltips and side-by-side reading; just doesn't get submitted - β One more thing to remember when authoring workflows; tested in
tests/test_workflow_loader.py
0002 β Gateway-routed image gen, not Open WebUI direct β
2026-05-03 Β· Status: accepted (PR #52)
Context. Open WebUI's IMAGE_GENERATION_ENGINE=comfyui was failing in three ways: brittle workflow contract, no intent inference (required explicit UI gesture), single-client coupled (couldn't reuse for protoCLI).
Decision. Build a LiteLLM CustomLLM provider that exposes ComfyUI as an OpenAI-compatible image endpoint. Open WebUI swaps to IMAGE_GENERATION_ENGINE=openai pointing at the gateway.
Consequences.
- β
Same call shape as
protolabs/nano-banana-2. Architectural symmetry. - β Every client gets image gen for free.
- β Langfuse traces, Prometheus metrics, retries β gateway features apply automatically.
- β One more service in the chain (gateway β provider β ComfyUI vs Open WebUI β ComfyUI directly). Adds ~20-50ms.
0001 β Qwen-Image-2512 + Qwen-Image-Edit-2511 over alternatives β
2026-05-02 Β· Status: accepted
Context. OSS image gen + edit landscape (May 2026):
- HunyuanImage 3.0 Instruct: 80B MoE, top Elo, but too big for our 96 GB
- Qwen-Image-2512: 20B MMDiT, #1 AA Arena both gen+edit, native 2K
- BAGEL-7B-MoT: 7B active, has world-modeling, smaller
- FLUX.1 Kontext: 7s/edit, weak text rendering
Decision. Qwen-Image-2512 (gen) + Qwen-Image-Edit-2511 (edit). Already on disk via lab experiments/image-gen-eval. Best fit for our GPU. Strongest text rendering of the OSS options.
Consequences.
- β Best OSS quality at our scale
- β Native multi-ref (3-image cap) baked into Edit-2511
- β 3-ref ceiling vs nano-banana's 14
- β Two UNets (gen + edit) need swap-in/swap-out
- β Quality 6-12 months behind frontier closed models on hardest cases