VendoVendo Docs
Build a toolvendo.yaml

Healthchecks

The contract Vendo uses to decide whether your deployment is alive — and what happens when it isn't.

A healthcheck is how Vendo knows your tool is up. The deploy worker polls a path on your container after launch; if it returns 2xx within the readiness budget, your deployment moves to ready and the tenant sees a green light. If it doesn't, the deploy is marked failed and the tenant gets a "your tool didn't come up" error with logs attached.

Healthchecks apply only to deployment-type tools. Downloadable and npm tools have their own readiness signals (binary uploaded, CLI authenticated) handled by the dashboard.

Declaring the check

In vendo.yaml:

health:
  path: /healthz
  port: 8787
  • path — the HTTP path the deploy worker GETs. Any 2xx response counts as healthy. Convention is /healthz; the schema validates only that the value starts with /.
  • port — the container port to hit. Templates pick their own (e.g. hermes uses 8787, twenty uses 3000); pick whatever your server listens on.

path and port are the only fields the vendo.yaml schema accepts under health:. Both are optional. The retry budget (how long the worker waits and between polls) is not configured here — see Retry budget below.

Earlier docs described health.timeout and health.interval in vendo.yaml. Those fields are not in the schema — adding them fails CLI validation. The retry behaviour lives in the template manifest's readiness block (maxRetries, retryIntervalSec), set by the platform on your behalf.

What to put in the endpoint

The simplest healthy implementation, in any language:

@app.get("/healthz")
def healthz():
    return {"ok": True}
app.get("/healthz", (_req, res) => res.json({ ok: true }));

For tools that talk to a database, a slightly richer check confirms the connection is live:

@app.get("/healthz")
async def healthz():
    await db.execute("select 1")
    return {"ok": True}

Don't make the check expensive. The worker polls it every few seconds during deploy and the platform's monitoring may keep polling for liveness. A healthcheck that spawns work, locks a row, or calls a third-party API will cause flaps. Stick to a couple of cheap synchronous checks: process up, DB reachable, in-memory state initialized.

If you genuinely need a deeper check, expose it on a different path (e.g. /diagnostics) and leave /healthz cheap.

Retry budget

The platform retries the healthcheck during the readiness window, not on a single attempt. Defaults (from the deploy worker — cloudflare/deploy-worker/src/workflows/deploy.ts): 30 retries × 10 seconds, for a 5-minute total budget. Slow-starting tools (Python with cold imports, services running migrations on boot) have headroom inside this window.

Templates can override the budget via readiness.maxRetries and readiness.retryIntervalSec in the template manifest. You don't set this in vendo.yaml; raise it during template authoring if your tool genuinely needs more than 5 minutes to come up — but first, move boot work behind a real readiness signal (cache warm, migrations complete) rather than papering over a slow /healthz.

Behavior on failure

During deploy. If the endpoint doesn't return 2xx within the readiness budget, the deploy worker marks the workflow failed at the wait_for_ready step, captures recent container logs, and surfaces them to the tenant in the dashboard. The container is left running so the tenant (and you) can inspect it. To fix: push a new release; the tenant clicks Redeploy and the workflow retries.

Post-deploy. Once a deployment is marked ready, Vendo doesn't restart your container automatically when /healthz starts failing. Railway's own liveness logic governs that — Railway restarts unhealthy containers up to a small retry budget before flagging them. If you want crash-loop containment, return 5xx from /healthz when your process is in an unrecoverable state and Railway will restart you.

Things that look like healthchecks but aren't

  • Readiness vs liveness. Vendo's healthcheck is closer to readiness — "the deploy is done, the tenant can use the tool". Liveness (restart on failure) is Railway's. For most tools the same endpoint serves both, but if your tool has a long warmup, expose readiness on /healthz and liveness on a separate path that only returns 5xx when the process is genuinely stuck.
  • Auth-protected endpoints. Don't put auth on /healthz. The deploy worker hits it from inside Vendo's network, but treating it as authenticated leads to "deploy succeeds, healthcheck 401s, deploy fails" loops.
  • Database migrations. Run migrations before your server starts listening, not in the healthcheck.

What about downloadable and npm tools?

Skip the health: block entirely — the field is ignored for non-deployment tools. Vendo decides "ready" for those by other signals: the binary is uploaded and signed (downloadable), the CLI authenticated and registered itself via /api/cli/apps (npm). Neither is customizable from the manifest.

On this page