Incident Timeline Writing
Writing precise, UTC-timestamped, blameless incident timeline entries
Timeline entry format
- HH:MM UTC — always use UTC, never local time or "around"
- Actor — who or what: on-call engineer, deployment pipeline, customer, monitoring alert
- Action — specific verb: received, deployed, rolled back, escalated, resolved
- Evidence — metric, ticket number, log entry, error code
- Blameless — describe the technical event, not the person's fault
Question 0 of 5
Which incident timeline entry is written correctly?
Timestamp + UTC + actor + specific action + metric is the correct format. Timeline entry components:
- Timestamp: exact time in UTC (not "around", not local time)
- Actor: who or what triggered the event — "On-call engineer", "Deployment pipeline", "Customer report"
- Action: specific verb + object — "received PagerDuty alert", not "got paged"
- Evidence/metric: the specific signal — "p99 latency exceeded 2s threshold"
A post-mortem timeline should be written in which verb tense?
Past simple — events that happened at specific points in time. Why past simple in incident timelines:
- Timeline entries record completed events: "alert fired", "engineer was paged", "deployment was rolled back"
- Past continuous is used for ongoing states between events: "The service was serving degraded responses from 14:32 to 15:10 UTC"
- Present tense is wrong — the incident is over; the timeline is a historical record
You need to add a timeline entry for when the incident was detected by a customer. Which entry is best?
Specific actor (account ID), action (submitted support ticket), and symptom (504 errors) is correct. What customer-reported incidents need in timelines:
- Identifier: ticket number, account ID, or customer segment — allows traceability
- Symptom as reported: "intermittent 504 errors" — preserves the external perspective
- Detection gap: "15:03 customer-reported" vs "14:32 alert" = 31-minute detection gap — this matters for MTTD (Mean Time to Detect)
Which phrase is NOT appropriate in a blameless post-mortem timeline?
"Alex Chen deployed the broken configuration that caused the outage" assigns blame to an individual. Blameless language in timelines:
- ❌ "Alex deployed the broken config" — blames a person
- ✅ "A misconfigured load balancer timeout was deployed at 14:15 UTC" — describes the technical event
- ❌ "John forgot to run the health check" — blames oversight
- ✅ "The health check step was not included in the deployment checklist" — identifies a process gap
In a post-mortem timeline, which time-related language is most precise?
Exact timestamps with calculated duration is most precise. Why calculated durations matter:
- "3 minutes after deployment" — calculated from timestamps, shows causation clearly
- Stakeholders reading the post-mortem need concrete numbers for SLA reporting and process improvement
- "Shortly" and "a few minutes" are subjective and untraceable
- MTTD (Mean Time to Detect) = alert time − incident start time
- MTTR (Mean Time to Resolve) = resolution time − incident start time
- MTTF (Mean Time to Fail) = time between incidents