How to Write a Root Cause Analysis Report in English

Learn the English vocabulary and structure needed to write a clear, blameless root cause analysis report after a technical incident.

A root cause analysis report is often the most-read document to come out of an incident, and it gets read by people with very different levels of technical context — the engineers who fixed it, the leadership who needs assurance it won’t recur, and future on-call engineers debugging something similar. Writing it in clear, precise English, separating fact from speculation, is what makes it useful months later instead of just another postmortem nobody trusts.

Key Vocabulary

Root cause — the underlying condition that, if it hadn’t been present, would have prevented the incident, as distinct from the immediate trigger or the symptom that was first noticed. “The symptom was a spike in 500 errors, the trigger was a deploy, but the root cause was a missing null check that had been in the code for months and only got exercised by that day’s traffic pattern.”

Contributing factor — a condition that made the incident worse, more likely, or harder to detect, without being the single root cause on its own. “The missing null check was the root cause, but slow alerting was a contributing factor that turned a five-minute blip into a forty-minute outage.”

Detection time — the interval between when an incident actually began and when the team first became aware of it, a metric that’s often as important to fix as the root cause itself. “Our detection time was eighteen minutes, which is too long for a customer-facing outage of this severity — we need an alert on this specific failure mode.”

Corrective action — a specific, assignable, and time-bound task committed to as a result of the incident, intended to prevent recurrence or reduce impact next time. “Each corrective action in this report has an owner and a due date — ‘be more careful’ isn’t a corrective action, it’s a wish.”

Blameless — describing an incident review culture and document that focuses on systems and conditions rather than individual fault, on the premise that people acted reasonably given the information they had at the time. “This report is blameless by design — we’re not naming who ran the command, we’re explaining why the system allowed that command to cause this much damage.”

Structuring the Report

  • Summary: “In two or three sentences, state what happened, how long it lasted, and who was affected — write this section last, but put it first.”
  • Timeline: “List events in chronological order with timestamps, distinguishing what the system did automatically from what a person did manually.”
  • Root cause and contributing factors: “Separate the single root cause from the conditions that made it worse or harder to catch — don’t blend them into one paragraph.”
  • Impact: “Quantify it wherever possible — number of failed requests, affected customers, or minutes of degraded service, not just ‘some users were affected.’”
  • Corrective actions: “List each action with an owner and a deadline, and distinguish immediate fixes already shipped from longer-term follow-up work.”

Communicating Findings to Different Audiences

  • “For leadership: ‘The outage lasted forty minutes and affected roughly eight percent of checkout traffic; the fix is already deployed, and we’ve identified two follow-up actions to prevent recurrence.’”
  • “For the engineering team: ‘The root cause was a race condition between the cache invalidation and the write path — see the timeline for the exact sequence.’”
  • “For future on-call engineers: ‘If you see this exact error signature again, check the cache invalidation order first — this is the second time it’s caused a similar symptom.’”

Professional Tips

  1. Write the root cause as a sentence that starts with “because.” If you can’t complete “the incident happened because ___” with a specific, falsifiable condition, you likely still have a symptom description, not a root cause.
  2. Keep contributing factors separate from the root cause. Blending them (“the root cause was a bug, plus slow alerting, plus an unclear runbook”) dilutes the report and makes it unclear what to actually prioritize fixing first.
  3. Make every corrective action independently understandable. A reader six months from now, with no memory of the incident, should be able to read one corrective action and know exactly what changed and why, without re-reading the whole report.

Practice Exercise

  1. Write a two-sentence summary of a hypothetical incident, suitable for a leadership audience.
  2. Draft one corrective action with a clear owner and deadline, distinct from a vague intention like “improve monitoring.”
  3. Explain, in one sentence, the difference between a root cause and a contributing factor.