VendoVendo Docs
Deploy & publishPublish to the catalog

Rollout & upgrades

What happens to existing tenant deployments when you ship a new release.

Cutting a new release (Versioning & releases) does nothing to existing deployments. Each tenant's instance is pinned to the manifest it deployed against. Upgrades are explicit — either you initiate them, or the tenant does. This page covers what an upgrade actually does, how to roll one out, and how to roll back.

What an upgrade workflow does

The deploy worker has a dedicated UpgradeWorkflow (cloudflare/deploy-worker/src/workflows/upgrade.ts) that:

  1. fetch_deployment — loads the current deployment row.
  2. resolve_upgrade_env (or fetch_upgrade_image_services for image-only bumps) — re-resolves env vars so new placeholders or integration bindings get picked up.
  3. snapshot_current — writes deployments.previousTemplateVersion = the current bundle_version so a rollback has somewhere to land.
  4. upgrade_compute — calls serviceInstanceUpdate({ source: { image } }) on each Railway service and triggers a redeploy.
  5. await_ready_* — polls the readiness endpoint until it returns 2xx.
  6. finalize — flips the deployment back to status='running' and progress: 100.

The web API writes deployments.bundle_version to the target version before firing the workflow, so the row's version pin is what tenants see while the upgrade is in flight. deployments.manifest is also updated to the new snapshot. There is no tool_release_id column; pinning is via deployments.deployed_template_version (varchar) and the manifest JSONB.

If the health check fails after the image swap, the deployment is marked failed — Railway keeps the old running revision because redeploy is not destructive.

Without the image swap, redeploying just reruns the previously-set image. The image tag has to change for the upgrade to actually pull new code.

Tenant-driven vs platform-driven vs auto-update

Three upgrade paths exist, only one of which the tenant initiates:

  • Tenant-driven upgrade. The dashboard shows an "Update available" banner on deployments whose deployments.deployed_template_version is older than the active tool_releases.template_version. The tenant clicks Update, which POSTs to /api/deployments/[id]/upgrade and kicks off UpgradeWorkflow. This is the default path.
  • Platform-driven upgrade. For a security patch or hot fix — a Vendo operator (or a tool author with admin access) initiates the upgrade on every affected deployment without waiting for the tenant.
  • Auto-update (customize-enabled tools). workers/update-watcher runs hourly (cron "0 * * * *"). For deployments backed by a customize-enabled tool, it detects upstream commits, queues a pending_updates row, and fires UpdateWorkflow in the deploy worker (resolver → judge → gate). Depending on the gate's decision the patch is either auto-applied or surfaced to the tenant for confirmation. The watcher honors a weekly cap (default 5/week) and per-deployment auto_update_paused_at. See Updating a tool for the full description.

The manifest-driven release-upgrade path (the focus of this page) never auto-applies — only the customize/update-watcher path does, and it only touches tools that explicitly opt in.

Rolling out a patch (mass upgrade)

For a security patch or fix where every existing deployment should move:

  1. Cut the new release.

  2. Query for deployments still on old versions:

    SELECT d.id, d.tenant_id, d.deployed_template_version, d.status
    FROM deployments d
    JOIN apps_catalog t ON t.slug = d.tool_slug
    JOIN tool_releases tr ON tr.tool_id = t.id AND tr.status = 'active'
    WHERE d.tool_slug = 'my-tool'
      AND d.status = 'running'
      AND d.deployed_template_version <> tr.template_version;
  3. For each row, POST /api/deployments/[id]/upgrade via an admin script. Rate-limit so you don't saturate Railway.

  4. Watch deploy_logs for failures and retry as needed.

Vendo doesn't ship a one-click "upgrade everyone" UI — it would be too easy to take down every tenant of a tool simultaneously. A patch rollout is a deliberate, scriptable operation.

Rolling back

Rollback is just an upgrade aimed backwards:

  1. Cut a new release pointing at the previous version (don't reactivate the old row — keep created_at ordering meaningful).
  2. Upgrade affected deployments to the new (older-content) release.

Old manifest versions stay in R2 indefinitely. The image tag you reference has to still exist in its registry — if you garbage-collected ghcr.io/me/my-tool:1.0.3, rollback fails.

Deployment management endpoints

EndpointWhat it doesWhen to use
POST /api/deployments/[id]/restartPushes every deployment_env_vars row into Railway via variableUpsert and redeploys each service. Status flows running → restarting → running.Env var edits, transient crash loops, a fresh boot without a new image.
POST /api/deployments/[id]/upgradeRuns UpgradeWorkflow — re-resolves env vars, swaps image tag, redeploys, polls readiness. Updates deployments.manifest, bundle_version, deployed_template_version.Shipping a new release.
POST /api/deployments/[id]/retrySpawns a fresh DeployWorkflow that re-runs idempotent phases on a failed row. Reuses the original vendo_api_key, admin password, and user env vars — no state lost.A previous deploy died midway.
POST /api/deployments/[id]/rollbackSwaps bundle_versionpreviousTemplateVersion and fires the deploy worker's /upgrade endpoint with the previous version.Undoing a release that's already in production.
POST /api/deployments/[id]/suspendRuns SuspendWorkflow — Railway deploymentRemove, Neon compute pause, app proxy flips to status-page mode.Pausing a deployment without losing state.
POST /api/deployments/[id]/resumeRuns ResumeWorkflow — databases first, then compute, then readiness.Bringing a suspended deployment back.
POST /api/deployments/[id]/destroyRuns TeardownWorkflow — destructive. See Teardown.Permanent decommission.

All seven endpoints are tenant-callable from the dashboard. None require operator intervention.

Suspend/resume during rollout

If you upgrade a suspended deployment, the upgrade resumes it first. Resumed deployments inherit the new version. This is usually what you want — sleeping deployments should not stay on a known-bad version.

What can break

  • Env var rename across versions. If 1.0.0 reads BOT_TOKEN and 1.1.0 reads TELEGRAM_BOT_TOKEN, existing deployments still have the old name in deployment_env_vars until the upgrade re-resolves. Use the provider's connectionEnvVars registry to keep names stable across releases.
  • Database schema migrations. Your tool owns its Postgres. If 1.1.0 needs a new column, your container's startup must run the migration before serving traffic. The readiness probe is your gate.
  • Volume layout changes. Volumes are preserved across upgrades. Moving data from /data/v1 to /data/v2 is your job, not Vendo's.

Next: Operate — running a tool after it's deployed.

On this page