4 exercises — write clear, structured incident announcements, updates, and resolution messages when production is down.
0 / 4 completed
1 / 4
Production payment service is returning 500 errors for 30% of users since 14:32 UTC. You are the Incident Commander. Which Slack message best declares the incident?
Option B follows the industry-standard incident announcement format. It contains all required elements:
• [P1 INCIDENT] — severity + incident label visible at a glance • 🔴 — visual severity indicator • What is broken — payment service, 500 errors • Scope — 30% of users since 14:32 UTC • Business impact — ~6,000 transactions/hr • Incident Commander — clear owner (@alex) • Bridge link — where to join the response call • Status — "Investigating now"
Compare Option A — too casual, no structure. Options C and D lack scope, IC, and bridge. During an incident, the initial announcement sets the tone. A well-structured one means engineers can get up to speed instantly — seconds matter.
2 / 4
You're posting a status update 15 minutes into the incident. Complete the update: "14:47 UTC — Root cause identified: DB connection pool exhausted after config change deployed at 14:28. _____ Estimated recovery: 10–15 minutes."
"Rolling back the config change now." is the ideal update because it states the action in progress with the active present tense ("Rolling back"), which signals that work is actively happening right now.
Incident updates must follow a strict format: timestamp → status → root cause → action being taken → ETA. Every update should move the story forward. "We are trying to fix it" is vague (what specifically?). "Someone is working on this" doesn't say what they're doing.
Key language patterns for incident updates: • "Rolled back [X]" — past tense, completed • "Rolling back [X] now" — in progress • "Deploying hotfix for [X]" — in progress • "Monitoring recovery metrics" — observation phase
3 / 4
The incident is resolved. Which closing message is best?
Option C is the model resolution message. It contains:
• [RESOLVED] 🟢 — clear status change; people scanning channels see it instantly • Timestamp — when it was resolved (important for timeline and SLA calculation) • What was restored — payment service • How it was fixed — rolled back config change • Confirmation — P99 latency back to baseline (proves recovery is real) • Next step — post-mortem scheduled
The resolution message is the final formal communication about the incident. All engineering postmortems, SLA calculations, and customer communications reference it. Options A, B, and D are adequate for a casual fix but not for a production P1 incident.
4 / 4
During an incident bridge call, the database issue requires a specialist. How should you escalate?
Option C demonstrates effective escalation communication. Key elements:
• "Escalating to DB team" — names the escalation clearly for the channel log • Direct mention — @maria (specific person, not vague) • Context included — what the problem is and what was already tried • Bridge link — the expert knows exactly where to go
In incident response, vague escalations ("we need a DBA") lose precious minutes while people figure out who should respond, where to join, and what they're walking into. A good escalation message should let the expert understand the situation in 5 seconds and join the call ready to contribute.