Advanced Log Reading #incident-response #post-mortem #timeline #root-cause

Incident Log Analysis

5 exercises — reconstruct incident timelines, identify root vs proximate cause, recognise cascading failures, write blameless post-mortem timelines, and compose effective incident status updates.

0 / 5 completed
1 / 5
An incident is declared at 03:47 UTC. You are the on-call engineer. Your log search returns several entries between 03:30 and 03:47 UTC. Arrange these events into the correct incident timeline and identify what happened first:

[03:30] INFO order-service: processed 1,240 orders/min (normal baseline)
[03:41] WARN order-service: response time p99=2.1s (threshold: 1.0s)
[03:43] WARN db-primary: connections=485/500 (97% utilised)
[03:44] ERROR order-service: query timeout after 30s on SELECT * FROM orders WHERE status='pending'
[03:47] ERROR order-service: health check failed — database unreachable

What is the correct reconstruction of the incident cause?