Learn the vocabulary of on-call escalation and alert routing during an incident.
0 / 5 completed
1 / 5
At standup, an engineer mentions an alert that automatically notifies the next person on a defined on-call schedule if the first responder doesn't acknowledge it within a set time window. What is this capability called?
An escalation policy automatically routes an unacknowledged alert to the next person on a defined on-call schedule after a set time window passes, rather than sending a single notification and leaving the alert unaddressed if the first responder doesn't see it in time. This ensures a critical alert doesn't go unnoticed simply because one specific person happened to be unavailable. It's a foundational capability of any incident alerting system built around a rotating on-call schedule.
2 / 5
During a design review, the team wants a burst of many related alerts from the same underlying incident to be grouped into a single notification, rather than paging the on-call engineer dozens of times in a row. Which capability supports this?
Alert deduplication and grouping combines a burst of many related alerts stemming from the same underlying incident into a single consolidated notification, rather than repeatedly paging the on-call engineer for what is effectively the same problem. This prevents alert fatigue, where a responder becomes desensitized to notifications because there are simply too many of them. Reducing this kind of noise is essential for keeping an on-call engineer actually responsive to genuinely distinct, important alerts.
3 / 5
In a code review, a dev notices an alert's routing rule sends it to a different specific team depending on which service or component the underlying alert originated from. What does this represent?
Service-based alert routing directs an incoming alert to the specific team responsible for the service or component it actually originated from, rather than sending every alert to one single team regardless of relevance. This ensures the alert reaches whoever has the most context and ability to actually resolve the underlying issue quickly. It scales far better than a single team manually triaging and forwarding every alert across the entire organization by hand.
4 / 5
An incident report shows a critical alert was silently suppressed because an overly broad deduplication rule incorrectly grouped it together with an unrelated, lower-priority alert. What practice would prevent this?
Carefully scoping a deduplication rule to only group alerts that are actually related, and periodically reviewing that rule's real-world behavior, catches an overly broad configuration before it silently suppresses a genuinely critical, distinct alert. Assuming a deduplication rule can never over-suppress ignores a real and documented failure mode of poorly scoped grouping logic. This periodic review is an important safeguard for any alerting system that relies on deduplication to manage alert volume.
5 / 5
During a PR review, a teammate asks why the team configures an escalation policy instead of relying on a single on-call engineer to always be available and responsive to every alert. What is the reasoning?
Depending entirely on a single on-call engineer to always be available and responsive creates a single point of failure, since anyone can be temporarily unreachable for a legitimate reason. An escalation policy automatically routes the alert onward if the first responder doesn't acknowledge it in time, ensuring the alert still reaches someone. The tradeoff is the need to carefully configure and periodically review the escalation timing and grouping rules, so they route effectively without either over-paging or over-suppressing alerts.