Writing Incident Communications
3 exercises — executive summaries during live incidents, customer-facing status page updates, and all-hands resolution notifications written to the professional standard.
0 / 3 completed
1 / 3
Scenario: A SEV-1 database outage is affecting 100% of paying customers. You are the incident commander. 12 minutes have elapsed since detection. Your VP of Engineering is asking for a status update via Slack. Which executive summary is most effective?
Option B is the standard incident executive summary format. In a live SEV-1, the VP needs to decide: does this require customer communication? Board notification? Do we need more resources? Every word of the summary should help answer those questions.
Five-part executive incident summary structure:
① Header with severity, component, scope, and elapsed time — "SEV-1 | Database outage | All paying customers | 12 min elapsed" — this can be read in 4 seconds and the reader knows the severity magnitude immediately.
② Current impact in business terms — "100% of paying customers, ~4,200 active sessions" — not just a technical metric. The VP uses this to decide if customer communications need to start now.
③ What you know / what you don't — "Replica is healthy. Failover has NOT triggered — investigating why." Two pieces of information: one reassuring (replica is fine), one concerning (why didn't auto-failover work?). Don't give only reassuring facts — the VP needs the full picture.
④ Current action with named owner — "On-call DBA is running diagnostics; manual failover decision pending in 5 min" — this shows the incident is managed, not chaotic. The 5-minute decision window is also critical: the VP knows there's a fork point coming.
⑤ Specific next update time — "14:55 UTC (in ~6 min)" — not "soon." The VP doesn't need to Slack you again, which keeps your incident channel clear of requests for updates.
Options C and D contain only sentiment ("working on it", "investigating") with no information useful for decision-making. "Sorry for the disruption" is appropriate in a customer email, not in an executive incident update.
Five-part executive incident summary structure:
① Header with severity, component, scope, and elapsed time — "SEV-1 | Database outage | All paying customers | 12 min elapsed" — this can be read in 4 seconds and the reader knows the severity magnitude immediately.
② Current impact in business terms — "100% of paying customers, ~4,200 active sessions" — not just a technical metric. The VP uses this to decide if customer communications need to start now.
③ What you know / what you don't — "Replica is healthy. Failover has NOT triggered — investigating why." Two pieces of information: one reassuring (replica is fine), one concerning (why didn't auto-failover work?). Don't give only reassuring facts — the VP needs the full picture.
④ Current action with named owner — "On-call DBA is running diagnostics; manual failover decision pending in 5 min" — this shows the incident is managed, not chaotic. The 5-minute decision window is also critical: the VP knows there's a fork point coming.
⑤ Specific next update time — "14:55 UTC (in ~6 min)" — not "soon." The VP doesn't need to Slack you again, which keeps your incident channel clear of requests for updates.
Options C and D contain only sentiment ("working on it", "investigating") with no information useful for decision-making. "Sorry for the disruption" is appropriate in a customer email, not in an executive incident update.
Related exercise: Post-mortems & Incident Writing →