How to Write a Disaster Recovery Plan in English

Learn the English vocabulary and structure for writing a disaster recovery plan, including RPO, RTO, failover, and recovery procedures.

A disaster recovery (DR) plan is a document you hope you never need to use under pressure, which means the English has to be unambiguous even to someone reading it at 3 a.m. during an actual outage. This guide covers the standard vocabulary and phrasing patterns for writing a DR plan that’s clear, specific, and actionable.

Key Vocabulary

RPO (Recovery Point Objective) — the maximum acceptable amount of data loss, measured in time, if a disaster occurs. “Our RPO for the primary database is 15 minutes, meaning we can tolerate losing at most 15 minutes of writes.”

RTO (Recovery Time Objective) — the maximum acceptable time to restore service after a disaster. “The RTO for the payments service is one hour — if it’s down longer than that, we escalate to the executive incident bridge.”

Failover — the process of switching operations to a backup system, region, or data center when the primary one fails. “Failover to the secondary region is automated and should complete within five minutes of the health check failing.”

Failback — the process of returning operations to the original primary system once it’s restored. “Failback is a manual step — we don’t automate it, because we want a human to confirm the primary region is stable first.”

Runbook step — a single, specific, ordered instruction within a recovery procedure, written so it can be followed without additional judgment calls. “Each runbook step names the exact command to run and the expected output, so an on-call engineer unfamiliar with the system can still execute it correctly.”

Common Phrases

  • “In the event of a regional outage, failover is triggered automatically once health checks fail for two consecutive minutes.”
  • “This procedure assumes the primary database is unreachable, not merely slow.”
  • “Data loss in this scenario is bounded by our RPO of [X].”
  • “Escalate to [role/team] if recovery has not completed within [RTO].”
  • “This plan is tested quarterly via a scheduled failover drill.”

Example Sentences

Defining scope at the top of the document: “This plan covers recovery procedures for the primary Postgres cluster and the object storage layer. It does not cover recovery of the internal analytics pipeline, which has a separate DR plan with different RPO/RTO targets.”

Writing an unambiguous runbook step: “Step 4: Confirm the standby replica’s replication lag is under 10 seconds by running SELECT now() - pg_last_xact_replay_timestamp(); on the replica. Do not proceed to Step 5 until this value is under 10 seconds.”

Stating an assumption explicitly: “This procedure assumes DNS failover has already completed. If traffic is still routing to the primary region, stop and escalate — do not proceed with the database promotion.”

Describing test cadence: “We run a full failover drill every quarter, and the results — including actual time to recovery — are recorded in the DR test log linked at the bottom of this document.”

Professional Tips

  • State RPO and RTO as specific numbers, not vague terms like “minimal data loss” — a number is testable and an on-call engineer can act on it under pressure.
  • Write runbook steps as imperative sentences with a single action each: “Run X,” “Confirm Y,” “Do not proceed until Z” — avoid combining multiple actions in one step.
  • Distinguish failover (moving to backup) from failback (returning to primary) explicitly — conflating them in the plan causes real confusion during an actual incident.
  • State your assumptions at the start of each procedure (“this assumes X is already true”) so a reader under pressure knows exactly when the procedure applies.
  • Include who to escalate to and when as an explicit trigger condition, not a vague “if things go wrong.”

Practice Exercise

  1. Write an RPO and RTO statement for a hypothetical service, with specific numbers.
  2. Write three ordered runbook steps for a failover procedure, each a single imperative action.
  3. Write one sentence stating an assumption a reader must confirm before starting the procedure.