Database migrations are one of the riskiest operations in application deployment. A poorly planned migration can lock tables for minutes, break running application instances, or corrupt data. Zero-downtime migrations ensure that your application remains fully operational throughout schema changes, which is essential for services that cannot tolerate maintenance windows.
The Fundamental Challenge
In a typical deployment, the new application code and the new database schema must be compatible. But during a rolling deployment, both the old and new versions of your application run simultaneously. If the new schema is not backward-compatible with the old code, the old instances will fail. The solution is to decouple schema changes from application deployments using the expand-and-contract pattern.
Expand-and-Contract Pattern
This pattern splits every breaking schema change into three phases:
Phase 1: Expand
Add the new schema elements (columns, tables, indexes) without removing or modifying existing ones. Both old and new application code works with the schema. For example, to rename a column from name to full_name:
-- Step 1: Add new column (non-breaking)
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
-- Step 2: Backfill data
UPDATE users SET full_name = name WHERE full_name IS NULL;
-- Step 3: Deploy application code that writes to both columns
-- The application writes to both 'name' and 'full_name'
Phase 2: Migrate
Deploy the new application version that reads from the new column and writes to both. Run data backfill jobs to populate the new column for any rows that were written between the expand migration and the application deployment.
Phase 3: Contract
Once all application instances use the new column and backfill is complete, remove the old column in a subsequent migration:
-- Only after all app instances are updated
ALTER TABLE users DROP COLUMN name;
Online DDL and Lock-Free Migrations
Standard ALTER TABLE commands in PostgreSQL may acquire locks that block reads and writes. For large tables, this can cause downtime. Use tools designed for lock-free schema changes:
- pg_repack: Reorganizes tables without exclusive locks, useful for removing bloat and reorganizing indexes.
- pgroll: Manages schema migrations with automatic versioning, allowing old and new schemas to coexist during rollout.
- gh-ost (MySQL): GitHub's tool for online schema migrations that uses row-based replication to copy data to a shadow table with the new schema.
PostgreSQL has made significant progress with non-blocking DDL. Adding a column with a default value, creating indexes concurrently, and adding constraints with NOT VALID are all operations that avoid long-held locks:
-- Non-blocking index creation
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
-- Add constraint without scanning existing rows
ALTER TABLE orders
ADD CONSTRAINT chk_positive_total
CHECK (total >= 0) NOT VALID;
-- Validate separately (still acquires a weaker lock)
ALTER TABLE orders VALIDATE CONSTRAINT chk_positive_total;
Migration Tooling
Use a migration framework that supports versioned, idempotent migrations. Popular choices include:
- Flyway: SQL-based migrations with version tracking. Supports Java, Node.js, and CLI usage.
- Knex.js: JavaScript query builder with built-in migration support, ideal for Node.js backends.
- Prisma Migrate: Declarative schema-driven migrations with drift detection, tightly integrated with the Prisma ORM.
- Alembic: Python migration tool commonly used with SQLAlchemy, supporting auto-generation of migration scripts.
Testing Migrations
Every migration should be tested against a copy of production data. Synthetic test data rarely exposes the edge cases—null values, encoding issues, constraint violations—that real data contains. Run migrations in a staging environment that mirrors production in size and schema. Measure lock duration, execution time, and disk I/O to ensure the migration completes within acceptable thresholds.
Rollback Planning
Every migration should have a corresponding rollback script. For additive changes like adding a column, the rollback is simply dropping it. For destructive changes, rollback may be impossible without data loss. This is another reason to use the expand-and-contract pattern—each phase is individually reversible, and destructive changes only happen after the new schema is fully validated.
Zero-downtime migrations require discipline and planning, but they are achievable with the right patterns and tooling. The investment pays off in the form of reliable deployments and the confidence to evolve your schema without fear. If your team is deploying frequently and struggling with migration-related incidents, let us help you establish a robust migration workflow.