This set builds vocabulary for on-call escalation, alert routing, and incident notification practices.
0 / 5 completed
1 / 5
At standup, a dev mentions a defined sequence of people who get notified in order if the first responder doesn't acknowledge an alert within a set time. What is this called?
An escalation policy defines an ordered sequence of people or teams to notify if an alert isn't acknowledged within a set time, ensuring a critical issue doesn't go unaddressed just because the first responder is unavailable. This automated fallback removes single points of failure in incident response. It's a foundational concept in any on-call and incident management platform.
2 / 5
During a design review, the team wants alerts from multiple monitoring tools to be automatically grouped into one incident instead of paging separately for each. Which capability supports this?
Alert grouping, or deduplication, automatically combines related alerts from multiple monitoring sources into a single incident, preventing an on-call engineer from being paged repeatedly for what is really one underlying problem manifesting across several systems. This reduces alert fatigue and keeps the incident response focused on the root issue rather than a flood of related noise. It's a common feature in incident management platforms handling alerts from many integrated tools.
3 / 5
In a code review, a dev sets up a rule so alerts during business hours page a different team than alerts overnight. What does this represent?
Time-based routing directs alerts to different teams or individuals depending on the schedule, like business hours versus overnight, matching who's actually best positioned to respond at that moment. This scheduling awareness avoids waking up an unrelated team for an issue better handled by whoever is on duty. It's a common configuration in incident management tools supporting rotating on-call schedules.
4 / 5
An incident report shows a critical alert paged an engineer who was on approved leave, delaying response until someone else noticed. What practice would prevent this?
Keeping the on-call schedule, including temporary overrides for planned leave, accurately updated ensures an alert reaches someone who is actually available to respond rather than paging someone known to be unreachable. Assuming a static schedule will always reflect real availability is how this kind of delay happens. This upkeep is a basic operational responsibility of maintaining any on-call rotation.
5 / 5
During a PR review, a teammate asks why the team configures automated escalation policies instead of just paging one fixed person for every alert. What is the reasoning?
Relying on a single fixed contact for every alert creates a single point of failure if that person is unavailable, unreachable, or simply misses the notification, while an escalation policy automatically falls back to the next person in line. This redundancy is essential for genuinely critical alerts where a missed response has real consequences. The tradeoff is the added complexity of designing and maintaining a multi-step escalation chain.