The proxy
What `{provider}-proxy.vendo.run` does to your request, what you can observe, and what to expect when it 4xxs.
When your tool calls an LLM, TTS, or transcription provider through Vendo, the request doesn't go to api.openai.com. It goes to openai-proxy.vendo.run — a Cloudflare Worker that authenticates, meters, and forwards on your behalf. Your tool never sees the real provider API key.
The proxy accepts both Authorization: Bearer vendo_sk_*** (the OpenAI convention) and x-api-key: vendo_sk_*** (Anthropic SDK compatibility). SDKs you already use work without modification once you swap the base URL.
The path your request takes
Your tool
│ POST https://openai-proxy.vendo.run/v1/chat/completions
│ Authorization: Bearer vendo_sk_***
│
▼
Proxy Worker
│ 1. Resolve adapter from subdomain (openai-proxy → openai adapter)
│ 2. SHA-256 the bearer, look up key:{hash} in KV
│ 3. Check balance:{tenantId} in KV
│ 4. Resolve binding: app_id + provider → which connection's credentials
│
▼
Upstream provider (api.openai.com)
│ Authorization: Bearer sk-*** (real key, injected by adapter)
│
▼
Response back to your tool
│
└─► Worker meters tokens, debits credits (waitUntil; non-blocking)The whole thing typically adds 10–40ms of overhead per request. Streaming responses pipe straight through.
The providers
Each provider you call gets its own subdomain:
| Provider | Hostname | Notes |
|---|---|---|
| OpenAI | openai-proxy.vendo.run | Token-metered. |
| Anthropic | anthropic-proxy.vendo.run | Token-metered. Accepts x-api-key and Authorization. |
| OpenRouter | openrouter-proxy.vendo.run | Token-metered. Adapter prepends /api to paths that don't start with /api/, so /v1/chat/completions is silently rewritten to /api/v1/chat/completions. |
| Gemini | gemini-proxy.vendo.run | Token-metered. Streaming uses :streamGenerateContent URL paths. |
| ElevenLabs | elevenlabs-proxy.vendo.run | TTS per-character; STT per audio-hour. |
| AssemblyAI | assemblyai-proxy.vendo.run | Per audio-hour; deduped by transcript id. |
| MuAPI | muapi-proxy.vendo.run | Per-request flat cost per model. |
| Telegram | telegram-proxy.vendo.run | Outbound bot-API forwarder. Not metered — Vendo doesn't bill Telegram traffic. |
| Composio | composio-proxy.vendo.run | Meta-integration brokering ~1,000 toolkits. Per-call cost; uses Composio-managed OAuth tokens under the hood. |
The path you send (/v1/chat/completions, /v1/messages, etc.) matches the upstream provider's path — the SDKs you already use work without modification once you swap the base URL.
Status codes you'll see
The proxy returns provider responses passthrough on success. It also returns its own status codes when it rejects the request. Every Vendo-originated error response carries a Vendo-Error-Code header with a stable machine-readable code — match on the header rather than the prose.
| Status | Vendo-Error-Code | Meaning |
|---|---|---|
400 | (none) | Path-traversal guard rejected the URL (e.g. /foo/../healthz). |
401 | app_unknown | Missing or unrecognized bearer. |
401 | app_revoked | Bearer was issued but later revoked. |
401 | connection_needs_reauth | The bound connection is in needs_reauth state. User must re-authorize. |
402 | (none) | Insufficient credits. Tenant balance is at or below zero. Top up. |
403 | binding_missing | No active connection bound for this provider on your app, or tenant/provider mismatch on a connection-scoped key. |
403 | connection_revoked | The bound connection itself is revoked. |
404 | (none) | Unknown adapter — you hit a subdomain Vendo doesn't proxy. |
429 | spend_cap_daily | App key hit its daily spend cap. Retry-After header gives seconds until UTC midnight. |
429 | spend_cap_monthly | App key hit its monthly spend cap. Retry-After gives seconds until UTC month-end. |
503 | (none) | Billing service unavailable. Either Supabase is unreachable or the per-tenant Wallet Durable Object is degraded. Transient — retry. |
Any 4xx or 5xx from the upstream is passed back to you unchanged, with the same status code and body the provider would have returned (without a Vendo-Error-Code header, so you can distinguish provider failures from proxy rejections).
The proxy also responds to CORS preflights unconditionally with Access-Control-Allow-Origin: *, so the providers can be called from a browser.
Free vs metered routes
Not every call costs credits. Each provider's adapter classifies routes:
- Metered — completions, embeddings, audio synthesis, transcription. Reserves a hold against your balance, settles after the response.
- Free — metadata endpoints (e.g.
GET /v1/models, OpenRouter's catalog). Zero credits, no balance check. A tenant with$0balance can still list models. - Blocked — a small allowlist; provider routes Vendo explicitly refuses to forward.
You don't have to track which is which. The proxy decides at dispatch time.
What gets metered, and how
For text generation: usage is extracted from the response (tokens in/out) and priced from Vendo's rates table, which mirrors provider pricing plus a per-meter margin. For TTS: per-character. For transcription: per audio-hour.
Billing runs after the response is sent (Cloudflare's waitUntil). If billing fails, your call still succeeded. The catch: balance updates can lag by a few seconds — don't poll balance immediately after a call expecting an instant decrement.
For streaming responses (stream: true), the proxy reads usage from the final SSE chunk. OpenRouter's proxy auto-injects stream_options: { include_usage: true } so the chunk is present.
What you can observe
- Your tool's logs — the request, status code, latency. Same as calling the provider directly.
GET /api/billing/balance— current credit balance (the endpoint the SDK uses; returns micros)./api/v1/balanceis the legacy USD-shaped endpoint kept for the CLI.GET /api/billing/usage— recent usage rolled up per provider.- Dashboard → Billing — historical usage by provider.
You cannot see the real provider API key, the upstream rate limit headers (proxy doesn't relay all of them), or another tenant's calls.
Gotchas
- Don't hard-code base URLs. Read the URL from the env var Vendo injects (e.g.
OPENROUTER_BASE_URL,OPENAI_BASE_URL). If you bake it in, you can't run the same code locally against the upstream provider. X-Vendo-Connectionfor disambiguation. If a tenant has two connections to the same provider (rare, but possible), passX-Vendo-Connection: <connection_id>to pick one. Otherwise the proxy picks the oldest binding.- Holds can leave a 60s shadow. If your tool retries fast on a transient error before the hold settles, you may briefly see a smaller available balance than expected.
- Telegram is forwarded, not metered. Telegram bot calls go through
telegram-proxy.vendo.runbut Vendo doesn't bill them.
Related
- Credentials worker — where OAuth tokens come from.
- Secrets and env vars — env vars your tool reads.