VendoVendo Docs
Concepts

Credits and billing

How tenants pay for using your tool — the credits model from a tool author's view.

Vendo's payment surface is a prepaid credit balance, per tenant. Tenants buy credits in USD; every proxied API call drains a small amount; when the balance gets low, the tenant tops up. There are no subscriptions, no invoices, no usage caps you set. As a tool author this means most of the billing work has already been done — you mostly need to know what defaults you're getting and where the levers are.

The model in one paragraph

A tenant clicks "Add credits", pays via Stripe, and gets a positive entry in their credit ledger. Each call your tool makes through {provider}-proxy.vendo.run writes a negative entry: the upstream cost (e.g. OpenAI's price for those tokens) plus a margin Vendo configures per integration. The balance is the running sum. When it hits zero, the tenant's deployments are suspended — paused, not destroyed — and a top-up resumes them. Data is preserved for 90 days; only after that does the deployment get torn down.

The full ledger and suspension flow lives under Pricing & revenue. This page is the tool-author-facing summary.

What you declare: marketing.pricing

Your vendo.yaml has one field that toggles the wizard's payment step:

marketing.pricingWhat the wizard doesWhen to pick it
credits (default)Asks the tenant to fund their wallet before launchingAny tool that calls the proxy
freeSkips the payment stepTools that never call the proxy (pure utility, no LLM/TTS/etc)

If your tool calls the proxy and you set pricing: "free", the wizard skips payment and the tenant will hit a 402 on the first proxy call. The flag does not exempt anyone from billing — it only controls the wizard.

pricing: "free" does not make your tool's calls free. It only suppresses the wizard's payment step. Tools that proxy any metered call must use pricing: "credits".

What costs what

Cost is per call, computed by the proxy. The formula:

cost = upstream_unit_cost × quantity × (1 + margin_pct / 100)
  • upstream_unit_cost comes from a rates table Vendo keeps in sync with each provider. For LLMs it's per million tokens; for TTS it's per character; for transcription it's per second.
  • quantity is what the upstream returned in the response (input + output tokens, characters, seconds).
  • margin_pct is Vendo's markup. The cascade is six levels — tenant_meter, tenant_integration, tenant, meter, integration, global (in that order); the narrowest match wins. The global default is 20%, but most calls hit a narrower row first (negotiated tenant rates or per-integration overrides), so the actual margin a tenant sees varies.

You don't set rates or margin from vendo.yaml. The public pricing page at vendo.run/pricing reads live from the rates table, so what a tenant sees on the marketing site is what they'll pay.

What your tool can do at runtime

The SDK gives you read access to the tenant's balance:

import vendo
remaining = vendo.billing.balance()  # USD float
if remaining < 0.10:
    return "Top up to continue"

This is purely advisory. The proxy is the enforcer — it'll 402 a metered call past zero regardless of whether your tool checks. The reason to read the balance is UX: showing a friendly banner before a long-running operation rather than letting it fail mid-stream.

vendo.billing.balance() is a Vendo-only feature; it raises VendoOnlyFeature in OSS mode. The OSS mode (no VENDO_API_KEY) doesn't go through the proxy and there's no balance to read — it's the tenant's own provider keys.

Suspension is gentle (until day 90)

When a tenant runs out of credits, their deployment goes through a state machine: running → suspending → suspended → resuming → running. The container is stopped, the database is paused (where the integration supports branch-pause), the public URL serves a status page. Everything is preserved.

A top-up via Stripe webhook auto-resumes any deployment suspended for insufficient_credits. Tenants who manually pause stay paused — Vendo doesn't override a deliberate decision.

Suspension is not infinite. The suspension-reaper worker warns the tenant at day 83 and day 89, and at day 90 transitions a still-suspended deployment through destroying → destroyed. Once destroyed, data is gone and the deployment must be reinstalled from scratch.

For your tool, this means: write your code assuming the deployment can be paused and resumed at any time, with no warning to the running container. Persist anything you care about to a database or an R2 bucket. The Build a tool section covers what is and isn't preserved across suspend cycles.

Test-mode tenants and admins

Two categories of tenant bypass the credit gate: those marked test = true in the database (used internally for E2E suites) and Vendo admins. Both can deploy and call the proxy at zero balance. Your tool sees them like any other tenant — the SDK doesn't surface "test" status. If you need to gate something on it, do it at the application layer, not the SDK.

What's not covered here

Revenue share — the question of whether you, the author, see any of what tenants pay — is being designed separately and will land in Pricing & revenue. The rest of that section covers ledger semantics, rate cards, the deploy-time billing gate, and auto-reload (Vendo's opt-in mechanism for tenants who want a card on file to recharge automatically).

On this page