4 exercises — write complete blameless post-mortems: root cause analysis, impact, action items, and what went well.
0 / 4 completed
1 / 4
Which section of a post-mortem describes what failed and why without assigning blame to individuals?
The Root Cause Analysis (RCA) section explains the technical and process failures that caused the incident. It answers: What specifically failed? Why did it fail? What conditions allowed it to fail?
Standard post-mortem sections: • Executive Summary — 1–2 sentences for leadership • Impact — duration, users affected, business effect • Timeline — chronological events • Root Cause Analysis — what failed and why • Contributing Factors — conditions that made failure worse • What Went Well — response actions that helped • What Went Poorly — gaps in response or process • Action Items — concrete follow-up with owner + due date
The RCA should identify root causes (the fundamental reason) not just proximate causes (the immediate trigger). Example: the proximate cause was "a config file had an error". The root cause might be "there was no automated config validation in the deployment pipeline".
2 / 4
Which "Impact" section entry is most complete and useful for a post-mortem?
Option C is the professional standard. A complete Impact section quantifies:
• Duration — exact start and end times in UTC • Users affected — count and percentage • Business impact — transactions, revenue, or equivalent • Scope — which regions, services, features were affected • SLO/SLA impact — did this breach your availability target?
Vague impact statements ("many users", "for a while") make it impossible to prioritise fixes, communicate with customers accurately, or calculate the cost of prevention vs. remediation. Numbers are non-negotiable for post-mortems that leadership will read.
3 / 4
Write the best "Action Item" entry for fixing missing config validation. Which format is correct?
Option C follows the required format for action items: What + Owner + Due Date + Priority. Each element is essential:
• What — specific, actionable ("add automated config schema validation to the deployment pipeline" — not "fix config") • Owner — a named person, not a team (teams don't complete tasks, individuals do) • Due date — a specific date, not "soon" or "next sprint" • Priority — P1 (prevent recurrence) vs P2 (improve) vs P3 (monitoring)
Action items without owners and dates are aspirations, not commitments. In a blameless culture, assigning ownership to an action item is not blame — it's accountability for a future improvement, which is healthy.
4 / 4
Complete the post-mortem "What Went Well" section. Scenario: the incident team identified the root cause in 15 minutes. Which entry is most useful?
Option C is the professional standard for a "What Went Well" entry. It: (1) names the specific positive behaviour (rapid root cause identification in 15 minutes); (2) explains the mechanism that made it possible (deployment logs + correlated dashboards); (3) connects to a metric (MTTR reduction).
The "What Went Well" section is often written hastily — but it's valuable. It captures which investments paid off (the logging, the dashboards) and provides evidence that good practices are worth maintaining. Generic entries like "the team was fast" provide no institutional knowledge or replicable lesson.