5 exercises — choose the best-structured answer to common Incident Commander interview questions. Focus on command assumption, role assignment, mitigated vs resolved distinction, stakeholder update templates, and post-mortem close-out.
Structure for incident command questions
Command first, investigate second: structure before debugging
Assign roles explicitly: Scribe, Comms Lead, Ops Lead by name
Precise language: mitigated vs. resolved, declare severity formally
Fixed update cadence: template + next update time, no open-ended promises
0 / 5 completed
1 / 5
At 2:47am, the on-call page fires. The system message reads: "Checkout service error rate 94%, latency p99 > 60s. Revenue impact estimated at $15K/min." You are the first senior engineer awake. The interviewer asks: "Walk me through your first 90 seconds as Incident Commander." Which answer best demonstrates IC discipline?
Option B is strongest: it demonstrates the key IC discipline — assuming command explicitly before investigating, immediately assigning roles (scribe, comms lead), declaring severity formally, and starting the stakeholder communication cadence. The explicit "do not start debugging yet" is a critical IC insight: an IC who dives into debugging loses the command role, and the incident becomes chaotic. Option D is actually a good first technical hypothesis (correlate deployment time to incident start) but it's what the Ops Lead does, not the IC. The IC coordinates; the subject matter experts investigate. Option C (page everyone) adds coordination overhead without command structure — a war room with 8 engineers and no coordinator is often slower than 2 engineers with clear roles. Option A is the most common beginner mistake: investigating before establishing command. Key structure: assume IC explicitly → assign scribe + comms lead → declare severity → start stakeholder cadence → let ops lead investigate.
2 / 5
The interviewer asks: "It's 45 minutes into the incident. The Operations Lead says: 'I think the database is the issue — the primary is showing high lock wait times.' The Database engineer says: 'Our monitoring looks fine on our end, no unusual activity.' How do you handle this disagreement as IC?" Choose the most effective IC response.
Option B is strongest on multiple dimensions: it time-boxes the disagreement (prevents indefinite debate during a live incident), demands concrete evidence (screenshots) rather than assertions, identifies the root cause of the problem (different data sources), and pivots to parallel investigation if the data remains contradictory. The framing "I don't resolve the technical debate — I resolve the coordination problem" is the essence of IC thinking. Option D relies on assumed expertise, which can fail (the SME may not have the right monitoring, or their monitoring may have a lag). Option C is effectively "you two sort it out" which doesn't resolve the time pressure. Option A is directionally correct but doesn't specify the mechanism for resolving the disagreement — "I make the final call" is meaningless without a process for gathering the information to make that call. Key structure: time-box → demand concrete comparable evidence → parallel investigation if unresolved → IC resolves the coordination, not the technical debate.
3 / 5
The interviewer asks: "When do you call an incident 'mitigated' vs. 'resolved'? Why does the distinction matter?" Choose the most precise answer.
Option B is the strongest: it defines both terms precisely (mitigation = impact stopped, root cause may persist; resolution = root cause addressed), provides a concrete example (rollback scenario), and explains why the distinction matters in three practical domains: monitoring duration, customer communication accuracy, and post-incident follow-up tracking. Option C is partially correct but defines resolution incorrectly — completing the post-mortem is a subsequent activity, not the definition of resolution. Option D is a reasonable operational heuristic for declaring resolution but doesn't address the conceptual distinction between mitigation and resolution. Option A conflates the post-mortem with resolution. The IC must be precise about this distinction because it affects the on-call roster (when to stand down), the status page (what to communicate), and the follow-up work (what action items remain). Key tip: mitigated = impact stopped (root cause may persist → keep monitoring); resolved = root cause fixed (post-mortem tracks to this, not mitigation).
4 / 5
The interviewer asks: "How do you structure a stakeholder update during a major incident? Your CTO and Head of Customer Success are both in the incident channel." Which answer best demonstrates professional communication under pressure?
Option B is the strongest: it specifies a template with five components (current status, customer impact, what we know, what we're doing, next update), identifies the critical "read channel vs. write channel" principle for executive presence during incidents, explains why executive questions slow resolution, and gives a full, realistic example update. The prohibition on time-to-resolution promises unless the timeline is known is a real-world discipline — premature ETA promises create a second crisis when the ETA is missed. Option C is directionally correct (separating executives from the war room) but dedicating a full engineer to executive communication during a major incident is too expensive; the Comms Lead handles this. Option D is reasonable but doesn't use the structured update format, which matters because executives in different timezones who see the update 30 minutes late need to immediately understand the current state without reading back through context. Option A is true but doesn't describe a specific structure or address the executive-in-channel problem. Key structure: 5-part template → read channel for executives → Comms Lead handles responses → no ETAs without high confidence.
5 / 5
The interviewer asks: "After 3 hours, the incident is mitigated. How do you close out the incident and hand off to the post-mortem process?" Choose the most complete answer.
Option B is the strongest: it covers stand-down (with the distinction between mitigated and resolved), timeline documentation timing (while memory is fresh — impossible to reconstruct accurately 48 hours later), stakeholder final communication, and post-mortem kickoff with two specific best practices: neutral facilitator (not the IC) and action item ownership confirmed at close-out, not at the post-mortem meeting. The "action items rot in the backlog without named ownership" observation is a real failure mode. Option D makes two errors: calling it "resolved" when it's mitigated (timeline issue), and scheduling the post-mortem for "when the team is rested" — 48–72 hour maximum is the standard because memory degrades. Option C archives the channel (destroys evidence) and delays the post-mortem to "next week." Option A is the minimum viable answer — accurate but lacks the operational depth expected of an IC. Key structure: formal stand-down → timeline documented immediately → stakeholder all-clear → post-mortem with neutral facilitator + named action item owners before sleep.