Reading a Postmortem Timeline — Comprehension Exercises

📄 PASSAGE — Read carefully before answering

Incident Timeline — INC-2026-0314 (All times UTC)

00 — Scheduled certificate renewal job runs; completes successfully for 47 public-facing endpoints. queue-svc is not in scope.
17 — Synthetic monitoring detects elevated 503 error rate on /api/deploy endpoint. Alert fires automatically.
31 — On-call engineer (OCE) acknowledges page and begins investigation.
44 — OCE rules out database as cause; escalates to platform team lead.
51 — Platform team lead joins incident channel.
04 — Root cause identified: certificate expiry on queue-svc confirmed via logs.
19 — Certificate manually renewed; queue-svc restarted. Some traffic resumes.
[No entries 10:22 – 11:34]
34 — Remaining errors traced to CDN edge nodes caching stale 503 responses.
47 — Cache purge initiated across all CDN nodes.
52 — Error rate returns to baseline. Incident resolved.

Notes: "Mitigated" (10:19) refers to the point at which the root cause was addressed and recovery began. "Resolved" (11:52) refers to full restoration of normal service. The gap between escalation (09:44) and root cause identification (10:04) represents active investigation time with no discrete loggable events.

Question 1 of 4

Exercise complete!