Vendo Docs

Where to find logs for a deployed tool, what each log surface tells you, and a basic debug flow.

A Vendo deployment writes to three distinct log surfaces. Each captures a different layer of the request flow. Knowing which to read first is most of debugging.

The three log surfaces

Surface	What it logs	Where it lives	Use it for
`deploy_logs` (Supabase)	Every step of the deploy worker workflow with structured metadata	Postgres in the Vendo control plane	Failed deploys, retries, upgrade workflows, suspend/resume transitions
Compute logs (Railway / Cloudflare)	Your container's stdout/stderr	Provider platform, surfaced via dashboard	Application errors, request handling, your `console.log` and equivalent
App-proxy logs (Cloudflare Worker)	Per-request routing, status, body size for the `*.vendo.run` subdomain	Cloudflare Logs	"Why does the URL 404", upstream timeouts, KV staleness

Reads happen in the dashboard's Logs tab on each deployment. You don't need direct Railway or Cloudflare access.

For operator-level access (when the dashboard isn't enough), each Cloudflare Worker logs separately and can be tailed via wrangler tail:

Worker	What it does	Tail command
`vendo-app-proxy`	Routes `{deployment_slug}.vendo.run` traffic to Railway.	`wrangler tail vendo-app-proxy`
`vendo-hooks-worker`	Receives provider webhooks at `hooks.vendo.run/{external_id}` and forwards to the deployment.	`wrangler tail vendo-hooks-worker`
`vendo-credentials-worker`	Serves `credentials.vendo.run/v1/token` to running deployments.	`wrangler tail vendo-credentials-worker`
`vendo-credit-watchdog`	Hourly cron that suspends deployments at zero balance.	`wrangler tail vendo-credit-watchdog`
`vendo-suspension-reaper`	Daily cron that destroys deployments suspended >90 days.	`wrangler tail vendo-suspension-reaper`
`vendo-deploy-worker`	Runs `DeployWorkflow`, `UpgradeWorkflow`, `SuspendWorkflow`, `ResumeWorkflow`, `TeardownWorkflow`, `UpdateWorkflow`.	`wrangler tail vendo-deploy-worker`
`vendo-update-watcher`	Hourly cron for customize-enabled auto-update detection.	`wrangler tail vendo-update-watcher`

The *-proxy.vendo.run subdomains (one per integration: openrouter-proxy, anthropic-proxy, openai-proxy, telegram-proxy, etc.) are their own Workers — tail each with wrangler tail {provider}-proxy when debugging metered upstream calls.

Which to check first

Deploy never reached running → deploy_logs. The deploy worker's 14 timed phases — validate_template, collect_secrets, provision_r2, provision_databases, resolve_env, deploy_compute, await_compute_done, domain_setup, health_check_done, bootstrap_admin, seed_app_credentials, sync_proxy_keys, await_domain_ready_done, provision_bundle (see Railway deployments § Pipeline) — tell you exactly which step failed and why. collect_app_logs_on_failure runs automatically and dumps the container's build/runtime output into deploy_logs alongside the workflow trace.
Deploy succeeded but the tool 500s → compute logs. Your container's stdout/stderr is the source of truth.
Tool returns 404 / "deployment not found" → app-proxy logs. The KV mapping (deploy:{subdomain}) is either missing or wrong.
Tool's API calls to OpenAI / Telegram / etc. fail with 401 or 402 → check the proxy adapter logs (a fourth surface, scoped to {provider}-proxy.vendo.run). 402 means the tenant's balance is zero. 401 typically means the proxy key is stale or the connection isn't bound.

What `deploy_logs` looks like

[deploy-abc123] validate_template           → ok
[deploy-abc123] collect_secrets             → ok (3 generated)
[deploy-abc123] provision_databases         → ok (neon: pg-xyz)
[deploy-abc123] deploy_compute              → ok (railway: project-789)
[deploy-abc123] health_attempt_1            → 503 (boot)
[deploy-abc123] health_attempt_30           → 503 (no response)
[deploy-abc123] health_check_failed         → FAILED (timeout after 300s)
[deploy-abc123] collect_app_logs_on_failure → captured 47 lines
[deploy-abc123]   container stderr: ECONNREFUSED 127.0.0.1:6379

The health_check phase emits health_attempt_N rows on each poll and ends in either health_check_done (success) or health_check_failed (timeout / non-2xx). Look for the failure step name, not just the surrounding phase.

Each row is structured — query deploy_logs by deployment_id, step, or level from the dashboard. The container logs captured on failure are inlined into the same table for convenience.

Debug flow — failed deploy

Open the deployment row → Logs tab → filter by deploy_logs.
Find the first row with level='error'. The step name tells you which phase failed.
Common failure modes by step:
- validate_template — manifest schema violation. Re-validate locally.
- collect_secrets — almost never fails; if it does, it's a Vendo platform issue.
- provision_databases — Neon or Railway provisioning hiccup. Retry.
- deploy_compute — image not pullable, build failed, or the registry rate-limited. Check the captured container logs.
- health_check_failed — your readiness endpoint didn't respond 2xx in time. Most common cause: missing DB migration on first boot, or env var that's unset because of a missing integration binding.
- bootstrap_admin — your seedEndpoint returned an error. Check container logs.
- await_domain_ready_done — Cloudflare cert provisioning timing out. Usually a platform issue; destroy + retry.
If a quick fix is possible (e.g. integration not bound), do it and retry from the dashboard.
If the fix is in your code, cut a new patch version and the tenant can retry against the updated release.

Don't tear down a failed deployment to "start fresh." POST /retry reuses the original vendo_api_key, admin password, and user env vars — every step is idempotent. Teardown drops state you don't get back.

Debug flow — running deployment misbehaving

Reproduce the issue (have the tenant trigger the failing path, or do it yourself if internal).
Compute logs → grep for the timestamp. Look for stack traces, uncaught exceptions, or non-2xx upstream responses.
If the tool calls an integration and you see 401/402/429 from the proxy, switch to the proxy-side adapter logs to confirm whether it's a key issue (your connection binding), a balance issue (tenant out of credits), or a rate limit (upstream provider).
If the failure correlates with a recent release, check whether the upgrade workflow actually swapped the image (see Updating a tool).

Structured logging in your tool

The container surface is just stdout/stderr. Whatever your runtime writes there lands in compute logs. For things to search well later, write JSON lines with a request_id and tenant_id. Vendo doesn't enforce a format, but a consistent schema makes incidents an order of magnitude faster to triage.

Don't log secrets, raw API keys, or vendo_sk_* proxy keys. Compute logs are retained and visible to whoever has admin access on the tenant.

Log retention

deploy_logs — retained indefinitely while the deployment row exists. Teardown deletes them.
Compute logs — Railway's default retention (7 days at time of writing; check the provider for current).
App-proxy logs — Cloudflare Logs retention (varies by plan).

For long-term auditability, your tool should ship its own logs to a sink you control. Vendo's surfaces are for operational debugging, not compliance evidence.

Next: Updating a tool.

Logs and debugging

The three log surfaces

Which to check first

What `deploy_logs` looks like

Debug flow — failed deploy

Debug flow — running deployment misbehaving

Structured logging in your tool

Log retention

On this page

Logs and debugging

The three log surfaces

Which to check first

What deploy_logs looks like

Debug flow — failed deploy

Debug flow — running deployment misbehaving

Structured logging in your tool

Log retention

On this page

What `deploy_logs` looks like