Skip to content

Changelog ​

All notable changes to protoBanana. Format: Keep a Changelog, SemVer.

[0.1.0a4] β€” 2026-05-03 β€” chat agent (LLM as router + brain), region edit, outpaint ​

Added β€” agent + the rest of the ChatGPT-image-2 op set ​

  • Tool-use chat agent (default for /v1/chat/completions). The LLM (protolabs/fast by default) decides whether to respond conversationally, call an image tool, or chain multiple tools. Replaces the deterministic keyword classifier on the chat path; the keyword path remains as a hard fallback when the agent is disabled or unreachable. New modules: protobanana/agent.py, protobanana/tools.py. Configured via PROTOBANANA_AGENT_BASE / _KEY / _MODEL / _MAX_ITERS. New [agent] extra. Full docs: docs/agent.md.
  • Phase 4 β€” agent-driven region edit via SAM 3 + Qwen-Image-Edit-2511 + ImageCompositeMasked. The agent names a region (e.g. "the man's tie"), SAM 3 produces a mask from text (no GroundingDINO/Florence-2 dependency β€” those are broken on current ComfyUI's transformers), Qwen inpaints inside, the composite step preserves outside-mask pixels exactly. New workflow region_edit_sam3_qwen_image_2511.json, new route routes/region_edit.py, splitter extract_region_edit_parts().
  • Phase 5 β€” inpaint route + workflow for /v1/images/edits with a mask multipart. The agent doesn't drive this directly; routes exist for clients that want to send their own mask.
  • Phase 6 β€” outpaint via ImagePadForOutpaint + InpaintModelConditioning. New workflow outpaint_qwen_image_2511.json, splitter extract_outpaint_directions() (parses "extend left", "make this wider", "show more sky", "uncrop" into per-side pad amounts; clamped to [64, 1024]).
  • Phase 7 β€” optional LM intent classifier as a second-pass refiner for ambiguous EDIT/GEN cases. Mostly superseded by the agent itself; ships as a diagnostic for the keyword fallback path.
  • Langfuse tracing of provider entry points + ComfyUI HTTP sub-spans + agent iterations + tool calls. New [tracing] extra pinned to langfuse>=2.59,<3 (LiteLLM compatibility β€” see Fixed). Docs: docs/observability.md.

Fixed ​

  • workflow_stem extraction silently fell back to a hardcoded default for the bare-name case. LiteLLM strips the provider prefix on /v1/images/{generations,edits} but keeps it on /v1/chat/completions. The if "/" in model else DEFAULT guard routed every bare-name request to the wrong workflow.
  • Multi-ref with ❀️ reference images failed with Invalid image file: ref3.png β€” multiref.substitute() only populated the slots it had filenames for, leaving the others with placeholder defaults. Now also prunes the unused LoadImage + ImageScale pairs and drops the corresponding image_N input from both encoder nodes.
  • Chat path tried to load gen_qwen_image_2512.json against gateways named after upstream models. Renamed gen.DEFAULT_STEM and edit.DEFAULT_STEM to upstream Qwen names.
  • Edit + multi-ref workflows ignored the input image. Switched CLIPTextEncode (text-only conditioning) β†’ TextEncodeQwenImage EditPlus so the image flows into Qwen2.5-VL's vision tower.
  • Sticker tab returned a "blue cat". Inline gateway provider rewrote workflow_stem to the edit workflow whenever the stem name didn't contain "edit". Migrated the gateway to install protoBanana as a package; dispatch is now stem-prefix-based.
  • Agent deadlocked on first deploy (OpenAI sync client inside the async LiteLLM proxy, calling back through the same gateway β†’ blocked event loop). Switched to AsyncOpenAI.
  • LiteLLM Langfuse callback failed at boot (Langfuse.__init__() got an unexpected keyword argument 'sdk_integration') once the [tracing] extra forced langfuse v3. LiteLLM hard-pins v2; v3 removed the kwarg. Pinned [tracing] to langfuse>=2.59,<3. Trade: until a v2 adapter ships, our fine-grained sub-spans no-op cleanly while LiteLLM's per-request traces emit again.
  • Agent misrouted "make it a bowling cap" to generate_image. System prompt described tool-choice rules but framed the image-in-conversation context as informational ("the recent assistant image is available for edit_image..."). Rewrote as a directive contract + few-shot examples. Live verified against vLLM local-fast: now picks region_edit(region="the hat", edit_prompt="a bowling cap").
  • Static workflow validator now skips LoadImageMask.image as a runtime-substituted COMBO field, mirroring the LoadImage.image skip from earlier.

Changed ​

  • Default chat path is now the agent, not the keyword classifier. Set PROTOBANANA_AGENT_BASE to enable; if unset, the provider falls back to keyword dispatch (no behavioral regression for existing clients without an LM endpoint).
  • System prompt in agent.py rewritten as a directive contract with few-shot examples β€” the conversation-has-an-image case now reads as a constraint, not a fact.

Discovered via ​

  • homelab-iac#56 β€” gateway migration to the protoBanana package surfaced everything in this release as the live stack started running real requests against each component in turn.

[0.1.0a3] β€” 2026-05-03 β€” stem alignment + multiref prune + workflow_stem extraction ​

Fixed ​

  • Chat path tried to load gen_qwen_image_2512.json against gateways named after upstream models. gen.DEFAULT_STEM and edit.DEFAULT_STEM were prefixed with the operation name (gen_* / edit_*) which forced gateway maintainers to keep the same naming. Renamed both to match the upstream Qwen model names (qwen_image_2512, qwen_image_edit_2511) so a chat request through any gateway using the standard model names just works without per-deployment config.
  • Multi-ref with ❀️ reference images failed with Invalid image file: ref3.png. multiref.substitute() only populated the slots it had filenames for, leaving the others with placeholder defaults. Now also prunes the unused LoadImage + ImageScale pairs and drops the corresponding image_N input from both encoder nodes (the encoder inputs are optional per /object_info). 5 new unit tests + e2e verified.
  • workflow_stem extraction silently fell back to a hardcoded default for the bare-name case. LiteLLM strips the provider prefix on /v1/images/{generations,edits} but keeps it on /v1/chat/completions. The if "/" in model else DEFAULT guard routed every bare-name request to the wrong workflow. Now uses model.split("/", 1)[-1] or DEFAULT β€” handles both shapes. 3 new regression tests.

Changed ​

  • workflows/gen_qwen_image_2512.json β†’ workflows/qwen_image_2512.json
  • workflows/edit_qwen_image_2511.json β†’ workflows/qwen_image_edit_2511.json
  • gen.DEFAULT_STEM constant + docstrings updated to match
  • edit.DEFAULT_STEM constant + docstrings updated to match
  • docs/workflows-cookbook.md β€” naming convention now distinguishes upstream-model-direct (gen/edit) vs. operation-prefix (multiref/bgremove). Stem MUST match JSON filename.

Discovered via ​

  • homelab-iac#56 β€” gateway migration to protoBanana package surfaced all three issues in sequence as each piece of the live stack started running real requests.

[0.1.0a2] β€” 2026-05-03 β€” workflow validator + edit conditioning fix ​

Fixed ​

  • Edit + multi-ref workflows ignored the input image. Both edit_qwen_image_2511.json and multiref_qwen_image_2511.json used CLIPTextEncode (text-only conditioning) and routed the input only through VAEEncode β†’ latent_image for KSampler. With denoise=1.0 that latent gets fully overwritten with random noise, so the model saw zero visual context. Switched both to TextEncodeQwenImageEditPlus on positive AND negative β€” the image now flows into Qwen2.5-VL's vision tower as proper conditioning. Verified end-to-end: red+circle input + "change the white circle to a yellow star, keep the red background" β†’ red+star output (avg RGB 225,49,29).
  • bgremove_birefnet.json used wrong class_type. Was RMBG (which only accepts RMBG-2.0/INSPYRENET/BEN/BEN2); BiRefNet needs the separate BiRefNetRMBG node from ComfyUI-RMBG. Caught by the new static validator on its first run.
  • ImageScaleToTotalPixels now requires resolution_steps. Patched all 5 instances across edit + multiref workflows.

Added ​

  • scripts/validate_workflows.py β€” static validator hits ComfyUI's /object_info, checks every workflow JSON: class_type exists, required inputs present, COMBO values valid. Skips runtime-substituted fields (LoadImage.image). Exit code = number of failed workflows.
  • tests/test_workflows_static.py β€” pytest gate over every workflow JSON. Skipped when COMFYUI_BASE_URL unset / ComfyUI unreachable so unit-test CI without a ComfyUI dep still runs clean.
  • docs/validating-workflows.md β€” when to run, what it catches, the schema-vs-semantic gap with an e2e smoke pattern.

Changed ​

  • protobanana.routes.edit.substitute() / protobanana.routes.multiref.substitute() now write the prompt to prompt (Qwen edit encoder) or text (legacy CLIPTextEncode) based on the node's class_type, via a _set_prompt() helper. This lets bgremove / gen keep using CLIPTextEncode while edit-shaped workflows use the Plus encoder.

Lesson ​

Static schema validation can't catch "this workflow is the wrong shape for the model loaded at node 37." The conditioning bug passed the new validator (CLIPTextEncode is a real node, all required fields were set). Schema validation answers "will ComfyUI accept this graph"; an end-to-end smoke (real input β†’ check output is related to input) is what answers "will the model actually do the work." Both are now documented as the standing pre-merge gate.


[0.1.0a1] β€” 2026-05-03 β€” Gradio test/eval UI + HF Space scaffold ​

Added ​

  • app/gradio_app.py β€” Gradio 5.x UI with 5 tabs (Generate, Edit, Multi-ref, Sticker/BG remove, Chat). Settings accordion for gateway URL + API key + model alias overrides. Defaults pull from env.
  • app/__main__.py β€” python -m app entry point with --share, --port, --auth flags
  • app/README.md β€” Gradio app docs (configuration, troubleshooting)
  • app/spaces/app.py β€” HuggingFace Spaces entry point that re-exports the canonical build_app()
  • app/spaces/requirements.txt β€” minimal Space deps (gradio, openai, pillow)
  • app/spaces/README.md β€” Space frontmatter + deploy walk-through
  • docs/GRADIO-APP.md β€” UI architecture + Space deploy strategy
  • gradio optional extra in pyproject (pip install -e ".[gradio]")

Architecture note ​

The Gradio app is a thin OpenAI client (~600 LOC). All model logic stays server-side in the gateway + provider; the Space deploy is CPU-only because nothing on the UI side touches model weights. Users bring their own gateway URL + API key (or the Space owner sets them as Space secrets).


[0.1.0a0] β€” 2026-05-03 β€” initial extraction ​

Standalone repo carved out of protoLabsAI/homelab-iac PRs #52, #53. Phase 1-3 implemented; Phases 4-7 specced.

Added ​

Phase 1 β€” Foundation

  • protobanana.provider.ProtoBananaProvider (LiteLLM CustomLLM) with three entry points: aimage_generation, aimage_edit, acompletion
  • protobanana.client.ComfyUIClient β€” async HTTP transport (upload, submit, poll, fetch, view)
  • protobanana.workflows.WorkflowLoader β€” caches templates, returns deep copies, strips metadata keys
  • protobanana.intents.keywords β€” operation classifier + aspect-ratio inference from prompt text
  • protobanana.routes.gen / edit / multiref / bgremove modules
  • Workflow JSONs:
    • workflows/gen_qwen_image_2512.json
    • workflows/edit_qwen_image_2511.json
    • workflows/multiref_qwen_image_2511.json
    • workflows/bgremove_birefnet.json (default, commercial-safe)
    • workflows/bgremove_rmbg2.json (opt-in, CC BY-NC 4.0)
  • 46 unit tests covering: intent classification (all 7 ops + aspect inference), workflow loader (cache + deep-copy + metadata strip), chat-message extraction (multimodal + markdown data URLs + 3-image cap)

Phase 2 β€” Background removal

  • Operation.BGREMOVE + keyword triggers ("sticker", "transparent background", "remove background", "alpha background", etc.)
  • routes/bgremove.py with BiRefNet (default) and RMBG-2.0 (opt-in) workflow stems

Phase 3 β€” Multi-reference compose

  • Operation.MULTIREF (auto-routes when β‰₯2 images present in chat)
  • routes/multiref.py β€” uploads up to 3 refs to ComfyUI, substitutes filenames into parallel LoadImage nodes (IDs 100/101/102)
  • provider._extract_chat_request collects ALL images from history, capped at 3 (Qwen-Image-Edit-2511 ceiling)

Changed ​

  • N/A (initial release)

Documentation ​

  • README.md β€” quickstart, headline, prior-art accounting
  • PROPOSAL.md β€” strategic system design, antagonistic review, architecture
  • PHASES.md β€” 7-phase roadmap with status, models, acceptance criteria
  • JOURNEY.md β€” full backfill from research β†’ broken integrations β†’ repo
  • HOWTO.md β€” user-facing recipes (gen, edit, multi-ref, sticker, queued Phases 4-6)
  • DECISIONS.md β€” architectural decision records
  • docs/ β€” INSTALLATION, OPERATING, ARCHITECTURE, WORKFLOWS-COOKBOOK, INTENT-ROUTER, API, BENCHMARKS

Known limitations ​

  • 3-reference cap β€” Qwen-Image-Edit-2511 ceiling; Nano-Banana 2 supports 14. Cloud-fallback recommended for β‰₯4 refs.
  • No streaming β€” /v1/chat/completions is buffered until image is ready. (Streaming the markdown image chunk-by-chunk doesn't add value for indivisible base64 blobs.)
  • Phase 4-6 ops fall back to single EDIT β€” provider logs a warning and routes through edit.run(). No-op until those phases ship.
  • Usage is zero β€” ComfyUI doesn't report token-equivalent usage; we report zeros to keep response shape valid.
  • aimage_edit may not route from /v1/images/edits depending on LiteLLM version. Chat-completions path covers edit comprehensively.

Lineage ​

  • Extracted from protoLabsAI/homelab-iac PRs:
    • #49 (initial Open WebUI ↔ ComfyUI integration; brittle)
    • #50 (workflow JSON _meta strip + node-mapping format fix)
    • #52 (LiteLLM CustomLLM aimage_generation)
    • #53 (aimage_edit + acompletion + size inference)
  • Companion follow-up PR feat/protobanana-package swaps inline provider for pip install protobanana + drop the providers/comfyui_image.py inline file.

Apache-2.0 licensed. Docs follow the DiΓ‘taxis framework.