The proxy and credentials

How your tool calls an upstream provider through Vendo — the metering proxy, the credentials worker, and why both exist.

Your tool never talks to OpenAI directly. It talks to openai-proxy.vendo.run with a vendo_sk_* bearer, and Vendo forwards the call to the real upstream with the real credential. The same applies to every brokered integration — there's a {provider}-proxy.vendo.run for each.

This page covers why that indirection exists and what it does on every request.

The request flow

When your tool calls openai-proxy.vendo.run/v1/chat/completions:

The proxy hashes the bearer (vendo_sk_...) and looks it up in a fast KV cache. The lookup yields the bearer's scope (app or connection), tenant id, and either the app id (app-scoped keys) or the connection id directly (connection-scoped keys), plus spend caps and revocation state.
For app-scoped keys, the proxy resolves the app's binding for openai. If a tenant has multiple connections for the same provider bound to the same app, the proxy picks the oldest by default — pass X-Vendo-Connection: {connection_id} to disambiguate. If no binding exists, the request gets a 403 (binding_missing) and never reaches OpenAI. Connection-scoped keys skip this step entirely.
The proxy reserves a pre-authorization hold up to the adapter's holdUsd ceiling and rejects with 402 if the tenant's balance + caps don't cover it. Free routes (see below) skip the hold.
It rewrites the request: strips the Vendo bearer, injects the real provider key (or an OAuth-refreshed token via the credentials worker), forwards to the upstream.
Once the response is on its way back to your tool, the proxy extracts usage from the response body (tokens for LLM calls, characters for TTS, seconds for transcription), computes the cost, settles (hold − actual) as a single net ledger entry, and writes a usage_events row.

Steps 4 and 5 stream — for SSE responses the proxy pipes chunks through to your tool while it watches for the final usage block. There's no buffering penalty.

The two-step (hold → settle) semantics matter when debugging edge cases: a tenant with a tiny positive balance may still get 402 on a small call if the hold ceiling exceeds the balance. The hold also covers refunds when an upstream stream dies mid-flight.

Why this indirection exists

Three reasons, in order of weight:

Credentials never leak. Tenants don't trust your code with their OpenAI key, and you shouldn't want to hold it. The provider key only ever exists in Vendo's Worker memory for the duration of one forwarded request. Your tool sees a vendo_sk_* bearer that is useless outside the proxy and revocable in one click.

Billing happens by construction. Because every call goes through the proxy, the meter is the request log. There's no per-tool accounting code to write, no Stripe webhook to wire up. Your tool reads vendo.billing.balance() if it wants to gate a feature on remaining credits, but you never compute a charge.

Centralised rate, retry, and failure semantics. Each provider adapter knows how to normalise model ids, how to read usage out of the response shape, and what counts as a billable error vs a free retry. Adapters get updated centrally when providers change their APIs; your tool stays unchanged.

OAuth-style credentials: the credentials worker

For OAuth integrations (Google Drive, Notion, Composio-managed providers), the credential isn't a static key — it's a short-lived token that has to be refreshed. The proxy can't bake that into adapter code for every provider, so it's split out.

credentials.vendo.run is a small Worker that, given a vendo_sk_* bearer and a provider slug, returns a fresh access token for the bound connection. It handles refresh transparently: it returns a cached token if one is still valid, otherwise it refreshes against the upstream and caches the result. Your tool gets { access_token, expires_at } and uses the token to call the provider directly (for SDKs that don't ship a configurable base URL) or hands it back to the SDK.

In practice the SDK hides this: vendo.token("notion") returns a usable token whether the underlying connection is byok_static (proxy injects on every call) or oauth_app_install (credentials worker vends a refreshed one). You don't pick a code path; the SDK does.

What your tool actually configures

For most brokered providers, your tool runs an unmodified provider SDK with the base URL pointed at Vendo's proxy and the API key set to the deployment's app key. The exact env vars each integration injects are declared per-adapter in packages/integrations/<slug>/integration.ts. For OpenAI that's OPENAI_BASE_URL and OPENAI_API_KEY; for Telegram it's TELEGRAM_BOT_TOKEN (no base URL because the Telegram client library is unconfigurable). Either way the provider SDK reads its native env vars with no code change.

from openai import OpenAI
client = OpenAI()  # OPENAI_BASE_URL and OPENAI_API_KEY come from the deployment env

That OPENAI_API_KEY is a vendo_sk_*, not a real OpenAI key. The proxy swaps it for the real one on the way out.

Free vs metered routes

Not every route is metered. GET /v1/models (the catalog of available models) is free — it doesn't consume tokens, so the proxy skips the balance check and the usage write. Each adapter declares which routes are free. Tools using a metered route against a tenant at zero balance get a 402 before the upstream call, so a credit-exhausted tenant never incurs a charge.

The full proxy and credentials reference — exact headers, error codes, key scopes — lives under Reference. This page is the mental model; that one is the contract.