Intermediate Documentation Types #post-mortem #sre #blameless #incident

Blameless Post-Mortem Writing

5 exercises — impact statements, blameless timelines, Five Whys root cause, action items with owners, and lessons learned. The SRE culture vocabulary.

0 / 5 completed

Blameless post-mortem structure

Impact: When, how long, how many users, revenue impact — with numbers
Timeline: Chronological events with UTC timestamps — system-level, not people-level
Root cause: Five Whys — systemic gap, not "a bug" or "human error"
Contributing factors: Everything that made the failure worse or harder to detect
Action items: Owner + due date + success criterion — no vague "we should"
Lessons learned: What went well, what to improve, key systemic insight

1 / 5

A post-mortem impact statement reads: "The payment system was down and lots of users were affected." A tech lead says it needs to be rewritten. Which version is better?

2 / 5

A post-mortem timeline entry reads: "14:47 — John noticed the error rate going up and eventually decided to check the database." A reviewer says this is not blameless enough. The improved version is _____.

3 / 5

A post-mortem lists "Root Cause" as: "A bug in the payment service caused the outage." A senior SRE says this is incomplete. What is missing?

4 / 5

The "Action Items" section of a post-mortem lists: "Fix the bug. Improve monitoring. Update documentation." A tech lead says these are unacceptable. Why?

5 / 5

A post-mortem document ends with: "We apologise to our users for the inconvenience caused." A senior engineer objects to this in an internal post-mortem. Why?