Declaring and Coordinating

  • We're seeing elevated error rates on [service] — declaring a P[1/2] incident.
    Clear declaration with severity level
    "We're seeing elevated error rates on the payment service — declaring a P1 incident."
  • I'm taking incident command — [name], can you take notes?
    Assigning the IC and scribe roles
    "I'm taking incident command — Sarah, can you take notes in the incident doc?"
  • What's the blast radius?
    Asking how many users or systems are affected
    "What's the blast radius right now — are all users affected or just a subset?"
  • Don't make changes without announcing in the incident channel first.
    Preventing uncoordinated changes during an incident
    "Important: don't make changes without announcing in #incidents first — we need to track what's been tried."

Status Updates and Resolution

  • [Time] update: [impact]. We're investigating [hypothesis].
    Timed status update template
    "14:32 update: ~20% of checkout requests failing. We're investigating a potential DB connection pool exhaustion."
  • We've identified the root cause: [cause].
    Root cause announcement
    "We've identified the root cause: a bad deploy at 14:15 introduced a null dereference in the order service."
  • Rolling back to the previous version — ETA 5 minutes.
    Announcing a rollback with timeline
    "Rolling back to the previous version — ETA 5 minutes, monitoring for recovery."
  • We're monitoring — error rates are dropping. Will confirm resolution shortly.
    Transitioning from mitigation to resolution
    "We're monitoring — error rates are dropping from 18% to 4%. Will confirm full resolution shortly."
  • Incident resolved at [time]. A postmortem will follow within 48 hours.
    All-clear announcement with postmortem commitment
    "Incident resolved at 15:07. A blameless postmortem will follow within 48 hours."

Phrases to Avoid

These common phrasings undermine your professionalism. Here are better alternatives.

Avoid "I think maybe it could be the database."
Better "Current hypothesis: DB connection pool exhaustion — [metric] supports this."

Vague guesses in an incident waste time. State a hypothesis with supporting evidence.

Avoid "Whose fault is this?"
Better "What changed in the last hour that could explain this?"

Blame-seeking during an incident delays resolution and damages team culture. Focus on the timeline of changes.

Avoid "It should be fine now."
Better "Monitoring metrics — will confirm resolution in 10 minutes when we see sustained recovery."

"Should be fine" sets false expectations. Commit to a monitoring window before declaring resolution.

Practice Exercises

Choose the most professional or correct phrase for each scenario.

Frequently Asked Questions

What is a P1 vs P2 incident?

Priority levels indicate severity. P1 (Priority 1) typically means a critical outage affecting all or many users. P2 is a significant degradation but with a workaround or partial impact. Teams define exact thresholds in their incident runbooks.

What is an incident commander?

The incident commander (IC) is the person coordinating the response — assigning roles, making decisions, and managing communication. They focus on coordination, not necessarily on fixing the issue themselves.

What is a blameless postmortem?

A blameless postmortem analyses what went wrong without assigning personal blame. It focuses on system and process failures, producing action items to prevent recurrence.