VendoVendo Docs
Infrastructure

The proxy

What `{provider}-proxy.vendo.run` does to your request, what you can observe, and what to expect when it 4xxs.

When your tool calls an LLM, TTS, or transcription provider through Vendo, the request doesn't go to api.openai.com. It goes to openai-proxy.vendo.run — a Cloudflare Worker that authenticates, meters, and forwards on your behalf. Your tool never sees the real provider API key.

The proxy accepts both Authorization: Bearer vendo_sk_*** (the OpenAI convention) and x-api-key: vendo_sk_*** (Anthropic SDK compatibility). SDKs you already use work without modification once you swap the base URL.

The path your request takes

Your tool
  │  POST https://openai-proxy.vendo.run/v1/chat/completions
  │  Authorization: Bearer vendo_sk_***


Proxy Worker
  │ 1. Resolve adapter from subdomain  (openai-proxy → openai adapter)
  │ 2. SHA-256 the bearer, look up key:{hash} in KV
  │ 3. Check balance:{tenantId} in KV
  │ 4. Resolve binding: app_id + provider → which connection's credentials


Upstream provider  (api.openai.com)
  │  Authorization: Bearer sk-***  (real key, injected by adapter)


Response back to your tool

  └─► Worker meters tokens, debits credits  (waitUntil; non-blocking)

The whole thing typically adds 10–40ms of overhead per request. Streaming responses pipe straight through.

The providers

Each provider you call gets its own subdomain:

ProviderHostnameNotes
OpenAIopenai-proxy.vendo.runToken-metered.
Anthropicanthropic-proxy.vendo.runToken-metered. Accepts x-api-key and Authorization.
OpenRouteropenrouter-proxy.vendo.runToken-metered. Adapter prepends /api to paths that don't start with /api/, so /v1/chat/completions is silently rewritten to /api/v1/chat/completions.
Geminigemini-proxy.vendo.runToken-metered. Streaming uses :streamGenerateContent URL paths.
ElevenLabselevenlabs-proxy.vendo.runTTS per-character; STT per audio-hour.
AssemblyAIassemblyai-proxy.vendo.runPer audio-hour; deduped by transcript id.
MuAPImuapi-proxy.vendo.runPer-request flat cost per model.
Telegramtelegram-proxy.vendo.runOutbound bot-API forwarder. Not metered — Vendo doesn't bill Telegram traffic.
Composiocomposio-proxy.vendo.runMeta-integration brokering ~1,000 toolkits. Per-call cost; uses Composio-managed OAuth tokens under the hood.

The path you send (/v1/chat/completions, /v1/messages, etc.) matches the upstream provider's path — the SDKs you already use work without modification once you swap the base URL.

Status codes you'll see

The proxy returns provider responses passthrough on success. It also returns its own status codes when it rejects the request. Every Vendo-originated error response carries a Vendo-Error-Code header with a stable machine-readable code — match on the header rather than the prose.

StatusVendo-Error-CodeMeaning
400(none)Path-traversal guard rejected the URL (e.g. /foo/../healthz).
401app_unknownMissing or unrecognized bearer.
401app_revokedBearer was issued but later revoked.
401connection_needs_reauthThe bound connection is in needs_reauth state. User must re-authorize.
402(none)Insufficient credits. Tenant balance is at or below zero. Top up.
403binding_missingNo active connection bound for this provider on your app, or tenant/provider mismatch on a connection-scoped key.
403connection_revokedThe bound connection itself is revoked.
404(none)Unknown adapter — you hit a subdomain Vendo doesn't proxy.
429spend_cap_dailyApp key hit its daily spend cap. Retry-After header gives seconds until UTC midnight.
429spend_cap_monthlyApp key hit its monthly spend cap. Retry-After gives seconds until UTC month-end.
503(none)Billing service unavailable. Either Supabase is unreachable or the per-tenant Wallet Durable Object is degraded. Transient — retry.

Any 4xx or 5xx from the upstream is passed back to you unchanged, with the same status code and body the provider would have returned (without a Vendo-Error-Code header, so you can distinguish provider failures from proxy rejections).

The proxy also responds to CORS preflights unconditionally with Access-Control-Allow-Origin: *, so the providers can be called from a browser.

Free vs metered routes

Not every call costs credits. Each provider's adapter classifies routes:

  • Metered — completions, embeddings, audio synthesis, transcription. Reserves a hold against your balance, settles after the response.
  • Free — metadata endpoints (e.g. GET /v1/models, OpenRouter's catalog). Zero credits, no balance check. A tenant with $0 balance can still list models.
  • Blocked — a small allowlist; provider routes Vendo explicitly refuses to forward.

You don't have to track which is which. The proxy decides at dispatch time.

What gets metered, and how

For text generation: usage is extracted from the response (tokens in/out) and priced from Vendo's rates table, which mirrors provider pricing plus a per-meter margin. For TTS: per-character. For transcription: per audio-hour.

Billing runs after the response is sent (Cloudflare's waitUntil). If billing fails, your call still succeeded. The catch: balance updates can lag by a few seconds — don't poll balance immediately after a call expecting an instant decrement.

For streaming responses (stream: true), the proxy reads usage from the final SSE chunk. OpenRouter's proxy auto-injects stream_options: { include_usage: true } so the chunk is present.

What you can observe

  • Your tool's logs — the request, status code, latency. Same as calling the provider directly.
  • GET /api/billing/balance — current credit balance (the endpoint the SDK uses; returns micros). /api/v1/balance is the legacy USD-shaped endpoint kept for the CLI.
  • GET /api/billing/usage — recent usage rolled up per provider.
  • Dashboard → Billing — historical usage by provider.

You cannot see the real provider API key, the upstream rate limit headers (proxy doesn't relay all of them), or another tenant's calls.

Gotchas

  • Don't hard-code base URLs. Read the URL from the env var Vendo injects (e.g. OPENROUTER_BASE_URL, OPENAI_BASE_URL). If you bake it in, you can't run the same code locally against the upstream provider.
  • X-Vendo-Connection for disambiguation. If a tenant has two connections to the same provider (rare, but possible), pass X-Vendo-Connection: <connection_id> to pick one. Otherwise the proxy picks the oldest binding.
  • Holds can leave a 60s shadow. If your tool retries fast on a transient error before the hold settles, you may briefly see a smaller available balance than expected.
  • Telegram is forwarded, not metered. Telegram bot calls go through telegram-proxy.vendo.run but Vendo doesn't bill them.

On this page