Post-Incident Report English: Writing Effective Postmortems

Learn postmortem vocabulary and writing style — blameless language, precise timeline writing, root cause analysis, and corrective action ownership.

Introduction

A postmortem (also called a post-incident review or PIR) is a written document that analyses what went wrong during an incident, why it happened, and what the team will do to prevent it in the future. Writing a good postmortem in English requires precision — every word matters, because these documents are read by engineers, managers, and sometimes customers. The writing style is also deliberately blameless, which requires specific linguistic choices that many non-native writers find counterintuitive.

The Vocabulary of Postmortems

Understanding core postmortem vocabulary allows you to read, write, and discuss incidents accurately.

Timeline terms:

  • “The incident began at 14:32 UTC when the first alert fired.”
  • Approximately 20 minutes after the initial deploy, error rates began to rise.”
  • “The service was restored at 16:14 UTC, giving a total outage duration of 1 hour 42 minutes.”
  • Detection time was 8 minutes — the gap between the incident starting and the first alert being raised.”

MTTR stands for Mean Time to Recovery — the average time it takes to restore a service after an incident. You might write: “Our MTTR for this class of incident is approximately 45 minutes based on the last five occurrences.”

Root cause vs contributing factors:

  • The root cause is the fundamental reason the incident happened: “The root cause was an uncaught exception in the payment processor when the currency field was empty.”
  • Contributing factors are conditions that made the incident worse or more likely: “Contributing factors included the lack of input validation on the upstream API and insufficient monitoring coverage for this error type.”

Using the correct distinction between root cause and contributing factors is important — many incidents have multiple contributing factors but only one root cause.

The Blameless Writing Style

The “blameless postmortem” culture, pioneered by companies like Google and Etsy, holds that incidents are caused by systemic failures, not by individual mistakes. The writing style reflects this.

Avoid naming individuals as causes:

  • Instead of: “John deployed without running tests and caused the outage.”
  • Write: “A deployment was made without running the integration test suite. The incident was caused by the absence of a mandatory pre-deployment gate in the CI pipeline.”

Use passive voice to describe actions:

  • “The configuration change was applied without triggering a canary rollout.”
  • “The alert threshold was set too high, which delayed detection.”
  • “The rollback was initiated at 15:47 UTC.”

Frame human error as system design failure:

  • “The engineer was not aware that this change required a database migration — the runbook did not make this clear.”
  • “The on-call engineer was not notified in time because the escalation path was not documented.”

This is not about hiding accountability — it is about identifying where the system failed to prevent human error, which is where the most durable improvements lie.

Writing a Precise Timeline

The timeline section is often the most technically demanding to write. Precision matters.

Time markers:

  • “At 14:32 UTC, the deployment of version 2.4.1 began.”
  • Three minutes later, the first 5xx errors appeared in the monitoring dashboard.”
  • At approximately 14:41 UTC — based on log timestamps — the error rate crossed the 5% threshold.”
  • By 15:00 UTC, 100% of traffic to the checkout service was returning errors.”

Sequence connectors:

  • Following the initial rollback, error rates decreased but did not return to baseline.”
  • Simultaneously, the database team began investigating connection pool exhaustion.”
  • As a result, the decision was made to initiate a full rollback to version 2.3.8.”

Hedging for uncertainty:

  • “It is believed that the issue began before the alert fired, based on the pattern of slow requests in the access logs.”
  • “The exact start time of the degradation is unclear — logs suggest it may have begun as early as 14:20 UTC.”

Using hedging language like “it is believed that” or “logs suggest” is correct and honest when you cannot verify a fact precisely. Inventing false certainty in a timeline is a common mistake that undermines the document’s credibility.

Writing Corrective Actions

Corrective actions are the most important part of a postmortem — they define what will actually change. Weak corrective actions use vague language; strong ones are specific, owned, and time-bound.

Weak (avoid):

  • “Improve monitoring.”
  • “Engineers should be more careful with deployments.”
  • “We need better documentation.”

Strong (use):

  • “Add an alert for payment processor exceptions with a threshold of more than 10 errors per minute. Owner: Platform team. Due: 2026-06-30.”
  • “Update the deployment runbook to include a mandatory pre-deploy checklist that requires integration test results. Owner: DevOps lead. Due: 2026-06-23.”
  • “Schedule a review of all services missing structured error logging. Owner: Engineering manager. Due: 2026-07-07.”

The formula for a strong corrective action is: action verb + specific outcome + owner + deadline.

Key Vocabulary

TermDefinition
postmortemA written analysis of an incident — what happened, why, and what will change
root causeThe fundamental underlying reason an incident occurred
contributing factorA condition that made an incident more likely or more severe
MTTRMean Time to Recovery — average time to restore a service after failure
detection timeThe time between an incident starting and the team becoming aware of it
blameless postmortemA postmortem culture that attributes incidents to system failures, not individual blame
corrective actionA specific, owned, time-bound step to prevent a recurrence
outage durationThe total length of time a service was unavailable or degraded

Practice Tips

  1. Write the timeline first, then the analysis. Reconstructing events in chronological order forces you to be precise about what you know vs what you are inferring. This discipline flows through into the rest of the document.
  2. Use “the system did not…” instead of “no one noticed…”. When you catch yourself blaming a person, reframe it as a system failure: what alert, test, or process should have caught this but did not?
  3. Distinguish root cause from contributing factors explicitly. Write a separate sentence for each. If you find yourself listing more than one root cause, reconsider — there is usually one, and the others are contributing factors.
  4. Read back every corrective action and ask: who is doing what, by when? If you cannot answer all three, the action item is not ready.

Conclusion

A well-written postmortem is a learning document — it turns a painful incident into permanent institutional knowledge. The vocabulary in this guide (root cause, contributing factor, MTTR, blameless culture) and the writing techniques (passive voice, precise timelines, strong corrective actions) are the building blocks of postmortems that actually improve systems. Practise writing clearly and without blame, and your postmortems will become one of the most valued forms of engineering communication on your team.