Advanced Vocabulary #postmortem#incidents#sre#blameless

Advanced Postmortem & Blameless Culture Vocabulary

5 exercises — Practice advanced postmortem and blameless incident culture vocabulary in English: contributing factors, CAPA, learning reviews, and systemic analysis.

Core Postmortem & Blameless Culture vocabulary clusters

Blameless culture: blameless postmortem, just culture, psychological safety, learning review, contributing factors (vs. root cause)
Analysis: 5 Whys (blameless), contributing conditions, systemic factors, cognitive biases (hindsight bias, counterfactual reasoning)
Action items: CAPA (Corrective and Preventive Action), action item owner, due date, preventive action, corrective action
Timeline: detection time, mitigation time, resolution time, MTTR, SLA breach timeline, contributing events
Communication: incident summary, customer communication, status page, external postmortem, internal postmortem

0 / 5 completed

1 / 5

An SRE lead introduces blameless postmortems:
"A blameless postmortem assumes that people don't make mistakes maliciously — they were trying their best given the information and tools available. When we find 'human error', we don't stop there. We ask: why did the system make it easy for a human to make this mistake? What safeguards were missing? The goal is to improve systems and processes — not to assign blame to individuals."
What is the core principle of a blameless postmortem?

2 / 5

An incident commander facilitates a postmortem discussion:
"I want us to avoid hindsight bias. When we review what the on-call engineer did at 2am with an unfamiliar alert, we need to put ourselves in their position at that moment — not evaluate decisions from the comfort of knowing what we know now. What information did they have? What did the system tell them? What seemed like the right decision at the time?"
What is hindsight bias in incident analysis and why is it problematic?

3 / 5

A reliability engineer presents contributing factors analysis:
"Instead of asking 'what was the root cause?' we ask 'what were the contributing factors?' There's rarely one cause — there are multiple factors that aligned to make the incident possible. The Swiss cheese model: each slice of cheese has holes. A serious incident happens when the holes in all the slices align. Each factor is a hole in one slice — remove any one, and the incident doesn't happen."
Why do advanced postmortems use contributing factors instead of a single root cause?

4 / 5

A platform team reviews their incident metrics:
"We track MTTD, MTTI, MTTR, and MTTF. Detection time is when the monitoring alerted. Identification time is when we understood what was wrong. Resolution time is when the service was restored. Time to failure is how long between deployments and incidents. For this incident: detection at T+3min, identification at T+22min, mitigation at T+35min, full resolution at T+2hr. The long gap between detection and identification is our biggest problem — our alerts aren't telling us enough."
What does MTTD measure and why is minimising it important?

5 / 5

An engineering manager writes the CAPA for a major incident:
"Our CAPA has two parts: corrective actions address the specific failure that occurred — in this case, adding rate limiting to the authentication endpoint. Preventive actions address the broader category — we're auditing all public-facing endpoints for rate limiting gaps, and adding a checklist item to our feature review process. Each action has an owner and a due date — tracked in our incident management system."
What is the difference between a corrective action and a preventive action in CAPA?