Changelog β
All notable changes to protoBanana. Format: Keep a Changelog, SemVer.
[0.1.0a4] β 2026-05-03 β chat agent (LLM as router + brain), region edit, outpaint β
Added β agent + the rest of the ChatGPT-image-2 op set β
- Tool-use chat agent (default for
/v1/chat/completions). The LLM (protolabs/fastby default) decides whether to respond conversationally, call an image tool, or chain multiple tools. Replaces the deterministic keyword classifier on the chat path; the keyword path remains as a hard fallback when the agent is disabled or unreachable. New modules:protobanana/agent.py,protobanana/tools.py. Configured viaPROTOBANANA_AGENT_BASE/_KEY/_MODEL/_MAX_ITERS. New[agent]extra. Full docs: docs/agent.md. - Phase 4 β agent-driven region edit via SAM 3 + Qwen-Image-Edit-2511 + ImageCompositeMasked. The agent names a region (e.g. "the man's tie"), SAM 3 produces a mask from text (no GroundingDINO/Florence-2 dependency β those are broken on current ComfyUI's transformers), Qwen inpaints inside, the composite step preserves outside-mask pixels exactly. New workflow
region_edit_sam3_qwen_image_2511.json, new routeroutes/region_edit.py, splitterextract_region_edit_parts(). - Phase 5 β inpaint route + workflow for
/v1/images/editswith a mask multipart. The agent doesn't drive this directly; routes exist for clients that want to send their own mask. - Phase 6 β outpaint via
ImagePadForOutpaint+InpaintModelConditioning. New workflowoutpaint_qwen_image_2511.json, splitterextract_outpaint_directions()(parses "extend left", "make this wider", "show more sky", "uncrop" into per-side pad amounts; clamped to [64, 1024]). - Phase 7 β optional LM intent classifier as a second-pass refiner for ambiguous EDIT/GEN cases. Mostly superseded by the agent itself; ships as a diagnostic for the keyword fallback path.
- Langfuse tracing of provider entry points + ComfyUI HTTP sub-spans + agent iterations + tool calls. New
[tracing]extra pinned tolangfuse>=2.59,<3(LiteLLM compatibility β see Fixed). Docs: docs/observability.md.
Fixed β
workflow_stemextraction silently fell back to a hardcoded default for the bare-name case. LiteLLM strips the provider prefix on/v1/images/{generations,edits}but keeps it on/v1/chat/completions. Theif "/" in model else DEFAULTguard routed every bare-name request to the wrong workflow.- Multi-ref with β€οΈ reference images failed with
Invalid image file: ref3.pngβmultiref.substitute()only populated the slots it had filenames for, leaving the others with placeholder defaults. Now also prunes the unusedLoadImage+ImageScalepairs and drops the correspondingimage_Ninput from both encoder nodes. - Chat path tried to load
gen_qwen_image_2512.jsonagainst gateways named after upstream models. Renamedgen.DEFAULT_STEMandedit.DEFAULT_STEMto upstream Qwen names. - Edit + multi-ref workflows ignored the input image. Switched
CLIPTextEncode(text-only conditioning) βTextEncodeQwenImage EditPlusso the image flows into Qwen2.5-VL's vision tower. - Sticker tab returned a "blue cat". Inline gateway provider rewrote
workflow_stemto the edit workflow whenever the stem name didn't contain "edit". Migrated the gateway to install protoBanana as a package; dispatch is now stem-prefix-based. - Agent deadlocked on first deploy (
OpenAIsync client inside the async LiteLLM proxy, calling back through the same gateway β blocked event loop). Switched toAsyncOpenAI. - LiteLLM Langfuse callback failed at boot (
Langfuse.__init__() got an unexpected keyword argument 'sdk_integration') once the[tracing]extra forced langfuse v3. LiteLLM hard-pins v2; v3 removed the kwarg. Pinned[tracing]tolangfuse>=2.59,<3. Trade: until a v2 adapter ships, our fine-grained sub-spans no-op cleanly while LiteLLM's per-request traces emit again. - Agent misrouted "make it a bowling cap" to
generate_image. System prompt described tool-choice rules but framed the image-in-conversation context as informational ("the recent assistant image is available for edit_image..."). Rewrote as a directive contract + few-shot examples. Live verified against vLLMlocal-fast: now picksregion_edit(region="the hat", edit_prompt="a bowling cap"). - Static workflow validator now skips
LoadImageMask.imageas a runtime-substituted COMBO field, mirroring theLoadImage.imageskip from earlier.
Changed β
- Default chat path is now the agent, not the keyword classifier. Set
PROTOBANANA_AGENT_BASEto enable; if unset, the provider falls back to keyword dispatch (no behavioral regression for existing clients without an LM endpoint). - System prompt in
agent.pyrewritten as a directive contract with few-shot examples β the conversation-has-an-image case now reads as a constraint, not a fact.
Discovered via β
- homelab-iac#56 β gateway migration to the protoBanana package surfaced everything in this release as the live stack started running real requests against each component in turn.
[0.1.0a3] β 2026-05-03 β stem alignment + multiref prune + workflow_stem extraction β
Fixed β
- Chat path tried to load
gen_qwen_image_2512.jsonagainst gateways named after upstream models.gen.DEFAULT_STEMandedit.DEFAULT_STEMwere prefixed with the operation name (gen_*/edit_*) which forced gateway maintainers to keep the same naming. Renamed both to match the upstream Qwen model names (qwen_image_2512,qwen_image_edit_2511) so a chat request through any gateway using the standard model names just works without per-deployment config. - Multi-ref with β€οΈ reference images failed with
Invalid image file: ref3.png.multiref.substitute()only populated the slots it had filenames for, leaving the others with placeholder defaults. Now also prunes the unusedLoadImage+ImageScalepairs and drops the correspondingimage_Ninput from both encoder nodes (the encoder inputs are optional per/object_info). 5 new unit tests + e2e verified. workflow_stemextraction silently fell back to a hardcoded default for the bare-name case. LiteLLM strips the provider prefix on/v1/images/{generations,edits}but keeps it on/v1/chat/completions. Theif "/" in model else DEFAULTguard routed every bare-name request to the wrong workflow. Now usesmodel.split("/", 1)[-1] or DEFAULTβ handles both shapes. 3 new regression tests.
Changed β
workflows/gen_qwen_image_2512.jsonβworkflows/qwen_image_2512.jsonworkflows/edit_qwen_image_2511.jsonβworkflows/qwen_image_edit_2511.jsongen.DEFAULT_STEMconstant + docstrings updated to matchedit.DEFAULT_STEMconstant + docstrings updated to matchdocs/workflows-cookbook.mdβ naming convention now distinguishes upstream-model-direct (gen/edit) vs. operation-prefix (multiref/bgremove). Stem MUST match JSON filename.
Discovered via β
- homelab-iac#56 β gateway migration to protoBanana package surfaced all three issues in sequence as each piece of the live stack started running real requests.
[0.1.0a2] β 2026-05-03 β workflow validator + edit conditioning fix β
Fixed β
- Edit + multi-ref workflows ignored the input image. Both
edit_qwen_image_2511.jsonandmultiref_qwen_image_2511.jsonusedCLIPTextEncode(text-only conditioning) and routed the input only throughVAEEncode β latent_imagefor KSampler. Withdenoise=1.0that latent gets fully overwritten with random noise, so the model saw zero visual context. Switched both toTextEncodeQwenImageEditPluson positive AND negative β the image now flows into Qwen2.5-VL's vision tower as proper conditioning. Verified end-to-end: red+circle input + "change the white circle to a yellow star, keep the red background" β red+star output (avg RGB 225,49,29). bgremove_birefnet.jsonused wrongclass_type. WasRMBG(which only acceptsRMBG-2.0/INSPYRENET/BEN/BEN2); BiRefNet needs the separateBiRefNetRMBGnode from ComfyUI-RMBG. Caught by the new static validator on its first run.ImageScaleToTotalPixelsnow requiresresolution_steps. Patched all 5 instances acrossedit+multirefworkflows.
Added β
scripts/validate_workflows.pyβ static validator hits ComfyUI's/object_info, checks every workflow JSON: class_type exists, required inputs present, COMBO values valid. Skips runtime-substituted fields (LoadImage.image). Exit code = number of failed workflows.tests/test_workflows_static.pyβ pytest gate over every workflow JSON. Skipped whenCOMFYUI_BASE_URLunset / ComfyUI unreachable so unit-test CI without a ComfyUI dep still runs clean.docs/validating-workflows.mdβ when to run, what it catches, the schema-vs-semantic gap with an e2e smoke pattern.
Changed β
protobanana.routes.edit.substitute()/protobanana.routes.multiref.substitute()now write the prompt toprompt(Qwen edit encoder) ortext(legacy CLIPTextEncode) based on the node'sclass_type, via a_set_prompt()helper. This letsbgremove/genkeep usingCLIPTextEncodewhile edit-shaped workflows use the Plus encoder.
Lesson β
Static schema validation can't catch "this workflow is the wrong shape for the model loaded at node 37." The conditioning bug passed the new validator (CLIPTextEncode is a real node, all required fields were set). Schema validation answers "will ComfyUI accept this graph"; an end-to-end smoke (real input β check output is related to input) is what answers "will the model actually do the work." Both are now documented as the standing pre-merge gate.
[0.1.0a1] β 2026-05-03 β Gradio test/eval UI + HF Space scaffold β
Added β
app/gradio_app.pyβ Gradio 5.x UI with 5 tabs (Generate, Edit, Multi-ref, Sticker/BG remove, Chat). Settings accordion for gateway URL + API key + model alias overrides. Defaults pull from env.app/__main__.pyβpython -m appentry point with--share,--port,--authflagsapp/README.mdβ Gradio app docs (configuration, troubleshooting)app/spaces/app.pyβ HuggingFace Spaces entry point that re-exports the canonicalbuild_app()app/spaces/requirements.txtβ minimal Space deps (gradio, openai, pillow)app/spaces/README.mdβ Space frontmatter + deploy walk-throughdocs/GRADIO-APP.mdβ UI architecture + Space deploy strategygradiooptional extra in pyproject (pip install -e ".[gradio]")
Architecture note β
The Gradio app is a thin OpenAI client (~600 LOC). All model logic stays server-side in the gateway + provider; the Space deploy is CPU-only because nothing on the UI side touches model weights. Users bring their own gateway URL + API key (or the Space owner sets them as Space secrets).
[0.1.0a0] β 2026-05-03 β initial extraction β
Standalone repo carved out of protoLabsAI/homelab-iac PRs #52, #53. Phase 1-3 implemented; Phases 4-7 specced.
Added β
Phase 1 β Foundation
protobanana.provider.ProtoBananaProvider(LiteLLMCustomLLM) with three entry points:aimage_generation,aimage_edit,acompletionprotobanana.client.ComfyUIClientβ async HTTP transport (upload, submit, poll, fetch, view)protobanana.workflows.WorkflowLoaderβ caches templates, returns deep copies, strips metadata keysprotobanana.intents.keywordsβ operation classifier + aspect-ratio inference from prompt textprotobanana.routes.gen/edit/multiref/bgremovemodules- Workflow JSONs:
workflows/gen_qwen_image_2512.jsonworkflows/edit_qwen_image_2511.jsonworkflows/multiref_qwen_image_2511.jsonworkflows/bgremove_birefnet.json(default, commercial-safe)workflows/bgremove_rmbg2.json(opt-in, CC BY-NC 4.0)
- 46 unit tests covering: intent classification (all 7 ops + aspect inference), workflow loader (cache + deep-copy + metadata strip), chat-message extraction (multimodal + markdown data URLs + 3-image cap)
Phase 2 β Background removal
Operation.BGREMOVE+ keyword triggers ("sticker","transparent background","remove background","alpha background", etc.)routes/bgremove.pywith BiRefNet (default) and RMBG-2.0 (opt-in) workflow stems
Phase 3 β Multi-reference compose
Operation.MULTIREF(auto-routes when β₯2 images present in chat)routes/multiref.pyβ uploads up to 3 refs to ComfyUI, substitutes filenames into parallelLoadImagenodes (IDs 100/101/102)provider._extract_chat_requestcollects ALL images from history, capped at 3 (Qwen-Image-Edit-2511 ceiling)
Changed β
- N/A (initial release)
Documentation β
- README.md β quickstart, headline, prior-art accounting
- PROPOSAL.md β strategic system design, antagonistic review, architecture
- PHASES.md β 7-phase roadmap with status, models, acceptance criteria
- JOURNEY.md β full backfill from research β broken integrations β repo
- HOWTO.md β user-facing recipes (gen, edit, multi-ref, sticker, queued Phases 4-6)
- DECISIONS.md β architectural decision records
- docs/ β INSTALLATION, OPERATING, ARCHITECTURE, WORKFLOWS-COOKBOOK, INTENT-ROUTER, API, BENCHMARKS
Known limitations β
- 3-reference cap β Qwen-Image-Edit-2511 ceiling; Nano-Banana 2 supports 14. Cloud-fallback recommended for β₯4 refs.
- No streaming β
/v1/chat/completionsis buffered until image is ready. (Streaming the markdown image chunk-by-chunk doesn't add value for indivisible base64 blobs.) - Phase 4-6 ops fall back to single EDIT β provider logs a warning and routes through
edit.run(). No-op until those phases ship. Usageis zero β ComfyUI doesn't report token-equivalent usage; we report zeros to keep response shape valid.aimage_editmay not route from/v1/images/editsdepending on LiteLLM version. Chat-completions path covers edit comprehensively.
Lineage β
- Extracted from
protoLabsAI/homelab-iacPRs:- #49 (initial Open WebUI β ComfyUI integration; brittle)
- #50 (workflow JSON
_metastrip + node-mapping format fix) - #52 (LiteLLM CustomLLM
aimage_generation) - #53 (
aimage_edit+acompletion+ size inference)
- Companion follow-up PR
feat/protobanana-packageswaps inline provider forpip install protobanana+ drop theproviders/comfyui_image.pyinline file.