Lessons-Learned Post Writing
Blameless framing, deepening lessons, timeline entries, tense for improvements, and findings vs. lessons
Lessons-learned writing essentials
- Blameless: "a configuration change introduced an error" not "an engineer made a mistake"
- Lesson depth: "we underestimated X" must include why + specific fix — vague acknowledgement teaches nothing
- Timeline format: timestamp UTC + specific system/version + observable state + no blame
- Tense accuracy: future for planned ("we will add"), present perfect + future for done ("we have added and will monitor")
- Findings → Lessons: finding + root cause of the gap + specific fix + scope of change
Question 0 of 5
Which opening for a lessons-learned post is most effective at setting a blameless tone?
Blameless: describe what the system did, not what a person did — then promise the learning. Blameless language principles:
- ❌ "an engineer made a configuration mistake" — names a person as the failure point; discourages others from reporting mistakes
- ✅ "a configuration change introduced an error" — describes the action and its effect; the system (process, tooling, review) failed to prevent it
- Structure signals blamelessness: "what happened, why our systems didn't catch it sooner, what we've changed" — the focus is systemic, not personal
- Engineers who fear blame hide problems instead of reporting them
- Most incidents are caused by systems/processes, not individual error — focusing on people misses the root cause
- Google, Netflix, and Etsy's "blameless postmortem" culture is now industry-standard precisely because it produces better learning outcomes
A lessons-learned post contains: "We underestimated the complexity of the migration." How should this be strengthened?
"We underestimated X" must be followed by: why we underestimated it + what changes prevent us underestimating it again. Lessons-learned depth formula:
- ❌ "We underestimated complexity" — acknowledges failure but teaches nothing
- ✅ Specific cause: "our load testing used synthetic data that didn't match production query patterns" — tells readers exactly what the gap was
- ✅ Specific fix: "going forward, all load tests will use anonymised production snapshots" — concrete, verifiable action
- ❌ "We should have tested more thoroughly" — tells no one how to test more thoroughly
- ✅ "We should have load-tested with production query patterns at 3× expected peak, using the shadow traffic replay tool" — actionable
Which timeline entry in a lessons-learned post is written most effectively?
Timestamp UTC + specific system/version + observable state + no blame. Timeline entry formula:
- Timestamp in UTC: "14:35 UTC" — unambiguous across time zones
- Specific system + version: "auth-service v2.3.1" — makes the timeline reproducible for anyone tracing in logs
- Observable state: "No errors in the first 3 minutes" — what monitoring showed at that moment
- No blame language: no subject ("The deployment began", not "John started the deployment")
What is the correct tense to use when describing future improvements in a lessons-learned post?
Future simple for planned items; present perfect + future for completed items — be honest about what is done vs. planned. Improvement tense guide:
- Not yet implemented: "We will add X by [date]" — clear commitment with timeline
- Implemented, effectiveness TBD: "We have added X and will monitor effectiveness over the next two sprints"
- Implemented and verified: "We added X on [date] and have seen no recurrence in N subsequent deployments"
- Readers (customers, executives, other teams) use the lessons-learned post to assess risk — overstating progress is a trust violation
- "We added monitoring" implies it's done; if it's actually "on the backlog", the reader is misled
- Conditional tense ("we would add... if...") reads as deflection — avoid unless explaining a genuine constraint
A team writes in their lessons-learned post: "We discovered that our monitoring was insufficient." This is a finding, not a lesson. How should it be rewritten?
Finding → Root cause of the gap → Specific fix implemented → Scope of change. Finding-to-lesson transformation:
- Finding: "monitoring was insufficient" — describes the gap
- Root cause of the gap: "did not track per-region error rates" — explains specifically what was missing
- Specific fix: "per-region dashboards in Grafana + PagerDuty alerts at 5% per region" — verifiable, named tools
- Why it was missed before: "we only alerted on global error rate" — explains how the gap was created