Intermediate Numbers & Data #on-call #incident #rotation #handoff #pagerduty

On-Call Scheduling Language

5 exercises on describing on-call rotations and incident timelines in professional IT English.

On-call vocabulary essentials

Primary on-call: first responder — gets paged first
Secondary on-call: escalation path if primary doesn't respond
Handoff: transfer of on-call responsibility between engineers
Paged: received an alert via PagerDuty, OpsGenie, or similar

0 / 5 completed

1 / 5

You're the primary on-call engineer. An alert fires at 02:47 UTC. How do you describe this in the incident log?

2 / 5

You're handing off on-call responsibility to a colleague at the end of your shift. Which handoff message is most professional?

3 / 5

An incident post-mortem says: "MTTD: 14 minutes. MTTR: 2 hours 22 minutes." What do these metrics tell you?

MTTD and MTTR — distinct phases of incident duration

Incident lifecycle:

Issue starts (unknown to the team)
[MTTD: 14 minutes] Alert fires / issue detected
Team begins investigation and mitigation
[MTTR: 2h 22m from detection] Service restored

Total user-facing impact duration = MTTD + MTTR = 14 min + 2h 22m = ~2h 36m

Why MTTD matters separately: A high MTTD indicates monitoring/alerting gaps — the issue existed for a long time before detection. Reducing MTTD often has more impact than reducing MTTR.

DORA thresholds for MTTR:

Tier	MTTR
Elite	Under 1 hour
High	1 hour to 1 day
Medium	1 day to 1 week

2 hours 22 minutes = High performer tier.

Vocabulary:

"MTTD was [X] — the alert fired [X] minutes after the issue started."
"MTTR was [Y] — from detection to service restoration."

4 / 5

The on-call rotation cycles every week. An engineer says: "I'm on primary on-call from Monday 08:00 UTC to the following Monday 08:00 UTC." A new team member asks: "What happens if you don't respond to a page within 5 minutes?" What is the correct answer?

5 / 5

After resolving an incident, you write in the post-mortem: "Incident duration: 47 minutes (14:32 UTC – 15:19 UTC). Impact: ~12,000 users affected. SLA status: within budget (error budget consumed: 23%)." Which additional sentence best completes this summary?