Build fluency in the language of writing and maintaining operational runbooks.
0 / 5 completed
1 / 5
At standup, a dev references the step-by-step document engineers follow to diagnose and resolve a specific known type of incident. What is this document called?
A runbook provides step-by-step procedures for diagnosing and resolving a specific, previously encountered type of incident or operational task. It exists to reduce reliance on tribal knowledge during high-pressure situations. Well-maintained runbooks speed up incident response significantly.
2 / 5
During a design review, the team debates how detailed a runbook step should be versus leaving room for engineer judgment. What tension does this represent?
Runbooks balance being prescriptive enough to guide someone unfamiliar with the issue, against being flexible enough to accommodate situational nuance an engineer must judge in the moment. Overly rigid runbooks can mislead in edge cases; overly vague ones fail to help under pressure. Getting this balance right is a common runbook-writing challenge.
3 / 5
In a code review, a dev notices a runbook step still references a deprecated dashboard that no longer exists. What maintenance gap does this reveal?
A runbook referencing a deprecated dashboard or tool signals it wasn't updated as the underlying systems evolved, which can actively slow down responders during a real incident. Regularly reviewing and testing runbooks against the current environment prevents this drift. Stale runbooks can be worse than no runbook if they send responders down a dead end.
4 / 5
An incident report shows a new on-call engineer resolved an incident quickly by following an existing runbook. What benefit does this illustrate?
A well-written runbook lets even an engineer unfamiliar with a specific system resolve a known issue quickly, reducing dependence on one specialist's memorized knowledge. This resilience matters especially during off-hours on-call shifts. It is a core motivation for investing in runbook documentation.
5 / 5
During a PR review, a teammate asks how a runbook differs from a postmortem document. What is the distinction?
A runbook is a forward-looking, reusable guide for handling a known situation when it recurs, while a postmortem is a retrospective analysis of a specific past incident, often used to inform improvements including new or updated runbooks. They serve complementary but distinct purposes in incident management. Confusing the two can lead to writing one when the other is actually needed.