Migrations Without Maintenance Windows

Database schema changes are among the riskiest operations in production systems. A poorly planned migration can lock tables, cause downtime, or corrupt data. Zero-downtime migrations ensure your application remains available throughout schema changes. At Nexis Limited, our SaaS products serve customers worldwide across time zones — there is no maintenance window, so all migrations must be zero-downtime.

The Expand-and-Contract Pattern

The fundamental strategy for zero-downtime migrations is expand-and-contract:

  1. Expand: Add new columns, tables, or indexes alongside existing ones. The old schema still works.
  2. Migrate: Deploy application code that writes to both old and new schema. Backfill existing data.
  3. Contract: Once all data is migrated and the application only reads from the new schema, remove the old columns or tables.

Each phase is deployed independently, allowing rollback at any point without data loss.

Safe Migration Operations

Adding a Column

Adding a nullable column with no default is safe in PostgreSQL — it does not lock the table or rewrite data. Adding a column with a default value is safe in PostgreSQL 11+ (it stores the default in the catalog rather than rewriting all rows). For older versions, add the column as nullable, set the default separately, and backfill existing rows.

Renaming a Column

Never rename a column directly — it breaks running application code that references the old name. Instead: add a new column, deploy code that writes to both, backfill existing data, deploy code that reads from the new column, and finally drop the old column.

Changing a Column Type

Similar to renaming: add a new column with the new type, dual-write, backfill with type conversion, switch reads, drop the old column. Direct type changes can lock tables and fail if data does not convert cleanly.

Adding an Index

Use CREATE INDEX CONCURRENTLY in PostgreSQL. Standard CREATE INDEX locks the table for writes during index creation, which can take minutes or hours on large tables. CONCURRENTLY builds the index without locking, at the cost of longer build time.

Dropping a Column

Before dropping a column, ensure no application code references it. Deploy code that stops reading the column first, wait for all old application instances to drain, then drop the column in a subsequent migration.

Data Backfill Strategies

  • Batch processing: Update rows in batches (1000-10000 at a time) with pauses between batches to avoid overwhelming the database.
  • Background jobs: Run backfill as a background job that can be paused, resumed, and monitored.
  • Trigger-based: Use database triggers to populate new columns on row updates. Combined with batch backfill for existing data.

Multi-Tenant Migration Considerations

For PostgreSQL schema-per-tenant architectures, migrations must run across all tenant schemas. For shared-schema architectures, migrations affect all tenants simultaneously. Our approach is to perform migrations on a non-production replica first, monitor for issues, then apply to production.

Rollback Strategies

  • Design every migration to be reversible.
  • Keep old columns and tables during the transition period — they enable instant rollback by reverting application code.
  • Test rollback procedures in staging before applying migrations to production.
  • Have a pre-tested database backup and restore procedure for catastrophic failures.

Conclusion

Zero-downtime database migrations require more planning and more deployment steps than traditional migrations, but they eliminate the risk and customer impact of maintenance windows. Use the expand-and-contract pattern, avoid locking operations, and test every migration in staging before production.

Planning a complex migration? Our team has experience with zero-downtime migrations across multiple SaaS products.