Runbook writing essentials

  • Purpose: step-by-step procedures for known incidents — executable by any on-call engineer at 3am
  • Step formula: exact command + expected healthy output + decision branch for each outcome
  • Header: trigger alert + severity + ETA + escalation contact if runbook fails
  • Decision branches: "if X → go to step N; if Y → escalate to @person"
  • Runbooks handle the top 80%; escalation IS the answer for unknown failure modes

Question 0 of 5

What is the primary purpose of a DevOps runbook?