Incident Response Phrases — IT Phrasebook

Incident communication rules

Declare early — it's easier to downgrade a P1 than to under-respond to one
Update on a cadence — every 20–30 min for P1, even if there's nothing new to say
Separate mitigation from fix — mitigation reduces impact; a fix removes the root cause
Post-mortem = blameless — focus on systems and processes, never individuals

Declaring & Opening an Incident

We're currently experiencing [issue] affecting [service / users].

First public status update — factual, no speculation
There is an ongoing issue with [service]. We are investigating.

Use on the status page before root cause is known
This is a P1 / SEV-1 — I'm declaring an incident. Joining the war room now.

Formal declaration — use your company's severity language
I'm spinning up an incident channel — joining #incident-[date] now.

Centralize communication immediately
We need an incident commander — can someone volunteer / [Name], can you take IC?

Assign roles early: IC, scribe, comms lead
Symptoms so far: [X]. I'm not sure of the root cause yet.

Share what you know, be honest about what you don't

Update at [HH:MM UTC]: [status]. Root cause is still under investigation.

Regular cadence — every 20–30 min for P1
We believe the root cause is [X]. Still confirming.

Tentative root cause — hedged language is appropriate here
We've applied a mitigation: [action]. Monitoring the impact now.

Distinguish mitigation (reduces impact) from fix (resolves root cause)
Error rate has dropped from [X]% to [Y]% following the rollback.

Quantify improvement — builds confidence
The issue appears to be contained — no new alerts in the past [X] minutes.

Cautious all-clear — don't declare resolved too early
We are rolling back [service / deployment] to [version]. ETA to complete: [X] minutes.

Rollback update with time estimate

The issue has been resolved as of [HH:MM UTC].

Resolution statement — always include a timestamp
Normal service has been restored. All metrics are back within SLO.

Confirm the service is healthy, not just "fixed"
Root cause: [brief explanation]. Fix: [action taken].

Minimal root cause for the close message
Impact: approximately [X] users / [Y] requests affected over [duration].

Quantify impact for the record
A post-mortem will be conducted this week. I'll share a draft by [date].

Commit to the post-mortem timeline immediately
Thank you everyone who helped debug and respond — great teamwork.

Close on a human note — blameless culture includes credit

Root cause: [specific technical cause] led to [failure].

Be precise — avoid "human error" as a root cause
Contributing factors: [factor A], [factor B], [factor C].

Multiple causes — incidents are usually a chain of failures
Timeline: [HH:MM] — [event]; [HH:MM] — [event]; …

Chronological timeline — basis for all analysis
Impact: [X users] impacted, [Y% error rate] for [Z minutes].

Quantify always
What went well: [monitoring alerted quickly / rollback was fast / team coordinated well].

Blameless means also recognizing what worked
Action items: [item] — owner: [Name] — due: [date]

Every action item needs an owner and a deadline
To prevent recurrence, we will [specific system / process change].

Prevention is the goal — not punishment