How to Write a Migration Runbook in English

Learn the English structure and vocabulary for writing a database or system migration runbook: pre-checks, execution steps, and the rollback plan.

A migration runbook exists so that whoever executes the migration — possibly not the person who wrote it, possibly at 2am during a maintenance window — doesn’t have to improvise. Vague steps like “update the schema carefully” don’t survive that test. This guide covers the structure and vocabulary that do.

Key Vocabulary

Pre-check — a verification step performed before the migration starts, confirming assumptions (backup exists, replication lag is low, disk space is sufficient) that the migration depends on being true. “Pre-check: confirm the latest backup completed successfully and replication lag is under 5 seconds before proceeding — do not start the migration if either check fails.”

Execution step — a single, explicit, numbered action in the migration itself, written so specifically that it requires no judgment call from whoever is running it. “Execution step 3: run migrate up --step=1 against the primary only, not the full migration set — confirm the new column exists before proceeding to step 4.”

Point of no return — the step in the migration after which rollback becomes significantly harder or impossible, explicitly flagged so the executor knows exactly where the risk profile changes. “Step 5 is the point of no return — once we drop the old column, reverting requires restoring from backup, not just running the down migration.”

Rollback plan — the documented, specific steps to revert the migration if something goes wrong, written with the same precision as the forward steps, not left as “we’ll figure it out if needed.” “Rollback plan: run migrate down --step=1, then restart the application pods to pick up the reverted schema — verified this works in staging before writing it down here.”

Verification step (post-migration) — an explicit check performed after the migration completes, confirming it actually succeeded rather than assuming success from the absence of an error. “Verification step: query row count on the new table and compare against the pre-migration count on the old table — they should match within the expected delta from ongoing writes.”

Common Phrases

  • “Have all the pre-checks passed, or are we proceeding with an unverified assumption?”
  • “Is this execution step specific enough that someone unfamiliar with the migration could run it correctly?”
  • “Where’s the point of no return in this runbook — is it clearly flagged?”
  • “Has the rollback plan actually been tested, or is it theoretical?”
  • “What’s the verification step confirming this succeeded, not just that it didn’t error?”

Example Sentences

Opening a migration runbook: “Migration: adding a NOT NULL constraint to orders.customer_id. Pre-checks: confirm no NULL values exist in the column (query below) and confirm current backup is less than 1 hour old. Point of no return: step 4, after which rollback requires backup restoration.”

Flagging a risky step during execution: “We’re at the point of no return now — step 4 is about to run. Confirming with the team: are we good to proceed, or do we need another minute to double check the pre-check results?”

Writing a rollback plan section: “Rollback plan: if verification fails after step 3, run the down migration immediately — this is safe up through step 3. Do not attempt rollback after step 4 without first consulting the on-call DBA.”

Professional Tips

  • Write every execution step as if the reader has never seen the system before — specificity is what makes a runbook usable under pressure instead of just a summary the author already understood.
  • Flag the point of no return explicitly and visually (bold, a warning callout) — it’s the single most important piece of information for whoever is deciding whether to pause and double-check.
  • Test the rollback plan in staging before publishing the runbook, and say so in the document — an untested rollback plan is a guess dressed up as a plan.
  • Include a concrete verification step after execution, not just “check that it worked” — a specific query or check removes the ambiguity of what “worked” actually means.

Practice Exercise

  1. Write a pre-check step for a hypothetical schema migration.
  2. Write a rollback plan entry for a specific migration step.
  3. Explain, in one sentence, why the point of no return should be visually flagged.