Quickstart
5 minutes to your first generated image through a protoBanana-backed gateway.
Prerequisites
You need:
- A running LiteLLM gateway (or the willingness to spin one up)
- ComfyUI running on a machine with an NVIDIA GPU (~24 GB VRAM recommended for Qwen-Image FP8)
- The Qwen-Image-2512 model files in ComfyUI's standard directories
If you don't have these yet, see the full Installation guide.
1. Install the package
In your LiteLLM gateway environment:
pip install git+https://github.com/protoLabsAI/protoBanana.gitThe package ships a CustomLLM handler at protobanana.handler plus default workflows under protobanana/workflows/.
2. Wire LiteLLM
Add to your config.yaml:
model_list:
- model_name: protolabs/qwen-image
litellm_params:
model: protobanana/qwen_image_2512
api_base: http://your-comfyui-host:8188
model_info: { mode: image_generation }
- model_name: protolabs/qwen-image-chat
litellm_params:
model: protobanana/chat
api_base: http://your-comfyui-host:8188
model_info: { mode: chat, supports_vision: true }
litellm_settings:
custom_provider_map:
- { provider: "protobanana", custom_handler: "protobanana.handler" }Restart the gateway. If you're running it in Docker:
docker compose restart gateway3. Smoke-test
# Generation endpoint
curl -X POST http://your-gateway:4000/v1/images/generations \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-d '{"model":"protolabs/qwen-image","prompt":"a watercolor of a cat in a hat"}' \
| jq '.data[0].b64_json | length'
# expect ~2_000_000 (1024×1024 PNG, base64 encoded)
# Chat-completions endpoint (the conversational UX)
curl -X POST http://your-gateway:4000/v1/chat/completions \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"protolabs/qwen-image-chat","messages":[
{"role":"user","content":"a cat in a hat, watercolor, portrait"}
]}' \
| jq '.choices[0].message.content[:200]'The chat call returns an assistant message with a markdown-embedded data:image/png;base64,... URL. Any markdown-rendering chat client (Open WebUI, Slack rich-text bridge, Discord) displays it inline like a regular image.
4. Try the UX from a real chat client
In Open WebUI: select protolabs/qwen-image-chat, type "draw a cat in a hat" → image appears inline. Then "now make it blue" → edited image. Then "remove the background" → transparent PNG. The provider auto-routes per turn.
5. Spin up the test/eval Gradio app (optional)
pip install -e ".[gradio]"
GATEWAY_URL=http://your-gateway:4000/v1 GATEWAY_API_KEY=sk-... python -m appOpens at http://localhost:7860 — five tabs covering Generate, Edit, Multi-ref, Sticker, and Chat. See the Gradio app docs.
What just happened
Your chat client → LiteLLM gateway → the protoBanana provider → ComfyUI → image. The provider:
- Parsed your OpenAI request
- Walked the messages history to find prompt + any input images
- Classified the operation (gen, edit, multi-ref, sticker, etc.) per intent router rules
- Loaded the matching workflow JSON, substituted prompt/seed/size into the right node IDs
- Submitted to ComfyUI, polled for completion, fetched the result image
- Returned an OpenAI-shaped response
Total provider code: ~600 LOC. Most of the value is in the orchestration, not the model layer.
Next steps
- Operating guide — GPU planning, model swap behavior, troubleshooting
- Architecture — component breakdown + extension model
- How-to recipes — prompting tricks, multi-ref tips, intent keywords
- Phases roadmap — what's queued (region edit, inpaint, outpaint, LM intent classifier)