Reading a Postmortem Timeline — Comprehension Exercises
Read the incident timeline below, then answer comprehension questions about time-to-detect, the distinction between mitigation and resolution, and how to interpret timeline gaps.
📄 PASSAGE — Read carefully before answering
Incident Timeline — INC-2026-0314 (All times UTC)
09:00 — Scheduled certificate renewal job runs; completes successfully for 47 public-facing endpoints. queue-svc is not in scope.
09:17 — Synthetic monitoring detects elevated 503 error rate on /api/deploy endpoint. Alert fires automatically.
09:31 — On-call engineer (OCE) acknowledges page and begins investigation.
09:44 — OCE rules out database as cause; escalates to platform team lead.
09:51 — Platform team lead joins incident channel.
10:04 — Root cause identified: certificate expiry on queue-svc confirmed via logs.
10:19 — Certificate manually renewed; queue-svc restarted. Some traffic resumes.
[No entries 10:22 – 11:34]
11:34 — Remaining errors traced to CDN edge nodes caching stale 503 responses.
11:47 — Cache purge initiated across all CDN nodes.
11:52 — Error rate returns to baseline. Incident resolved.
Notes: "Mitigated" (10:19) refers to the point at which the root cause was addressed and recovery began. "Resolved" (11:52) refers to full restoration of normal service. The gap between escalation (09:44) and root cause identification (10:04) represents active investigation time with no discrete loggable events.
Question 1 of 4