Learn to write effective troubleshooting guides: symptom-cause-resolution structure, 'If X then Y' patterns, prerequisites, escalation paths, and common mistakes sections.
0 / 5 completed
1 / 5
What is the 'symptom-cause-resolution' structure in troubleshooting guide writing and why is it more effective than a purely procedural format?
Readers of troubleshooting docs are in a problem state — they have a symptom. Structuring by symptom: 'Error: Connection refused on port 5432' → cause: PostgreSQL service not running → resolution: systemctl start postgresql. Structuring by cause alone ('PostgreSQL service management') forces readers to diagnose before they can navigate. Symptom-first lets users Ctrl+F their exact error message and land at the fix. Each symptom should be written as the user would describe it, including exact error message text.
2 / 5
What makes the 'If you see X, do Y' pattern effective in troubleshooting documentation?
'If you see X, do Y' examples: 'If the error includes ECONNREFUSED, the service is not running. Start it with: sudo systemctl start myapp.' vs. 'If the error includes ETIMEDOUT, the service is running but unreachable — check firewall rules.' This pattern lets readers skip irrelevant branches. Avoid: 'Check if the service is running.' Better: 'Run: systemctl status myapp. If it shows inactive (dead), proceed to Step 3. If it shows active (running), skip to Step 5.'
3 / 5
What should a 'Prerequisites' section in a troubleshooting guide contain?
Prerequisites to document: (1) Access required: 'You need SSH access to the affected server and sudo privileges.' (2) Tools required: 'Ensure kubectl, jq, and curl are installed.' (3) Information to gather first: 'Collect the request ID from the error message and the approximate time the issue occurred.' (4) Environment context: 'These steps apply to Linux/systemd environments. For macOS, see the macOS troubleshooting guide.' Missing prerequisites is the most common reason a troubleshooting guide fails — readers get blocked by an undocumented dependency.
4 / 5
How should escalation paths be documented in a troubleshooting guide?
Escalation section template: 'If you have completed all steps above and the issue persists, escalate to the Platform Engineering team. Include in your escalation: (1) The exact error message. (2) Steps you have already tried. (3) Output of: kubectl describe pod [pod-name]. (4) The time the issue started. Contact via: Slack #platform-incidents for P1/P2, Jira ticket for P3/P4. Response SLA: P1 15 min, P2 1 hour, P3 next business day.' Clear escalation paths prevent both under-escalation (struggling too long) and over-escalation (paging on-call for config issues).
5 / 5
What is the purpose of a 'Common mistakes' or 'Pitfalls' section in a troubleshooting guide?
Common mistakes section adds institutional knowledge: 'Common mistake: Running the database migration before stopping the application servers. This causes constraint violations. Always stop app servers first (Step 2) before running migrations (Step 4).' Or: 'Do not restart the auth service while a migration is running — it will invalidate all active sessions.' These warnings come from real incidents and support tickets. Phrase them without blame: not 'users often forget to' but 'A common mistake is to...' or 'Note: Do not...' They turn one engineer's painful experience into a guardrail for all future readers.