Vendo Docs

How tenants pay for using your tool — the credits model from a tool author's view.

Vendo's payment surface is a prepaid credit balance, per tenant. Tenants buy credits in USD; every proxied API call drains a small amount; when the balance gets low, the tenant tops up. There are no subscriptions, no invoices, no usage caps you set. As a tool author this means most of the billing work has already been done — you mostly need to know what defaults you're getting and where the levers are.

The model in one paragraph

A tenant clicks "Add credits", pays via Stripe, and gets a positive entry in their credit ledger. Each call your tool makes through {provider}-proxy.vendo.run writes a negative entry: the upstream cost (e.g. OpenAI's price for those tokens) plus a margin Vendo configures per integration. The balance is the running sum. When it hits zero, the tenant's deployments are suspended — paused, not destroyed — and a top-up resumes them. Data is preserved for 90 days; only after that does the deployment get torn down.

The full ledger and suspension flow lives under Pricing & revenue. This page is the tool-author-facing summary.

What you declare: `marketing.pricing`

Your vendo.yaml has one field that toggles the wizard's payment step:

`marketing.pricing`	What the wizard does	When to pick it
`credits` (default)	Asks the tenant to fund their wallet before launching	Any tool that calls the proxy
`free`	Skips the payment step	Tools that never call the proxy (pure utility, no LLM/TTS/etc)

If your tool calls the proxy and you set pricing: "free", the wizard skips payment and the tenant will hit a 402 on the first proxy call. The flag does not exempt anyone from billing — it only controls the wizard.

pricing: "free" does not make your tool's calls free. It only suppresses the wizard's payment step. Tools that proxy any metered call must use pricing: "credits".

What costs what

Cost is per call, computed by the proxy. The formula:

cost = upstream_unit_cost × quantity × (1 + margin_pct / 100)

upstream_unit_cost comes from a rates table Vendo keeps in sync with each provider. For LLMs it's per million tokens; for TTS it's per character; for transcription it's per second.
quantity is what the upstream returned in the response (input + output tokens, characters, seconds).
margin_pct is Vendo's markup. The cascade is six levels — tenant_meter, tenant_integration, tenant, meter, integration, global (in that order); the narrowest match wins. The global default is 20%, but most calls hit a narrower row first (negotiated tenant rates or per-integration overrides), so the actual margin a tenant sees varies.

You don't set rates or margin from vendo.yaml. The public pricing page at vendo.run/pricing reads live from the rates table, so what a tenant sees on the marketing site is what they'll pay.

What your tool can do at runtime

The SDK gives you read access to the tenant's balance:

import vendo
remaining = vendo.billing.balance()  # USD float
if remaining < 0.10:
    return "Top up to continue"

This is purely advisory. The proxy is the enforcer — it'll 402 a metered call past zero regardless of whether your tool checks. The reason to read the balance is UX: showing a friendly banner before a long-running operation rather than letting it fail mid-stream.

vendo.billing.balance() is a Vendo-only feature; it raises VendoOnlyFeature in OSS mode. The OSS mode (no VENDO_API_KEY) doesn't go through the proxy and there's no balance to read — it's the tenant's own provider keys.

Suspension is gentle (until day 90)

When a tenant runs out of credits, their deployment goes through a state machine: running → suspending → suspended → resuming → running. The container is stopped, the database is paused (where the integration supports branch-pause), the public URL serves a status page. Everything is preserved.

A top-up via Stripe webhook auto-resumes any deployment suspended for insufficient_credits. Tenants who manually pause stay paused — Vendo doesn't override a deliberate decision.

Suspension is not infinite. The suspension-reaper worker warns the tenant at day 83 and day 89, and at day 90 transitions a still-suspended deployment through destroying → destroyed. Once destroyed, data is gone and the deployment must be reinstalled from scratch.

For your tool, this means: write your code assuming the deployment can be paused and resumed at any time, with no warning to the running container. Persist anything you care about to a database or an R2 bucket. The Build a tool section covers what is and isn't preserved across suspend cycles.

Test-mode tenants and admins

Two categories of tenant bypass the credit gate: those marked test = true in the database (used internally for E2E suites) and Vendo admins. Both can deploy and call the proxy at zero balance. Your tool sees them like any other tenant — the SDK doesn't surface "test" status. If you need to gate something on it, do it at the application layer, not the SDK.

What's not covered here

Revenue share — the question of whether you, the author, see any of what tenants pay — is being designed separately and will land in Pricing & revenue. The rest of that section covers ledger semantics, rate cards, the deploy-time billing gate, and auto-reload (Vendo's opt-in mechanism for tenants who want a card on file to recharge automatically).

Credits and billing

The model in one paragraph

What you declare: `marketing.pricing`

What costs what

What your tool can do at runtime

Suspension is gentle (until day 90)

Test-mode tenants and admins

What's not covered here

On this page

Credits and billing

The model in one paragraph

What you declare: marketing.pricing

What costs what

What your tool can do at runtime

Suspension is gentle (until day 90)

Test-mode tenants and admins

What's not covered here

On this page

What you declare: `marketing.pricing`