Toil Reduction Language
5 exercises — Practice vocabulary for toil identification and reduction: defining toil, quantifying time saved, the 50% toil cap, and making the case for automation.
0 / 5 completed
1 / 5
At a team retrospective, an SRE says: "This task is manual and repetitive — it qualifies as toil." A developer asks what makes something "toil" in the SRE sense, rather than just "a boring task." Which definition is most accurate?
Google's SRE definition of toil has six specific characteristics — all six together define toil, not just one.
The six characteristics from Google's SRE book:
1. Manual — a human performs the steps
2. Repetitive — done again and again, not a one-time task
3. Automatable — a machine could do it with sufficient engineering
4. Tactical — reactive/interrupt-driven, not proactive
5. Scales with service growth — more traffic/users = more of this work
6. No enduring value — completing it leaves the service in exactly the same state
A post-mortem analysis is manual and time-consuming, but it is not toil — it produces enduring value (improved reliability, documented learnings). Toil is specifically work that a machine could replace.
Key vocabulary:
• toil — manual, repetitive, automatable operational work that scales with service size
• enduring value — lasting improvement to the system or team capability
• tactical work — reactive, interrupt-driven work; opposite of strategic engineering
• toil budget — the maximum acceptable percentage of an SRE's time spent on toil (typically <50%)
The six characteristics from Google's SRE book:
1. Manual — a human performs the steps
2. Repetitive — done again and again, not a one-time task
3. Automatable — a machine could do it with sufficient engineering
4. Tactical — reactive/interrupt-driven, not proactive
5. Scales with service growth — more traffic/users = more of this work
6. No enduring value — completing it leaves the service in exactly the same state
A post-mortem analysis is manual and time-consuming, but it is not toil — it produces enduring value (improved reliability, documented learnings). Toil is specifically work that a machine could replace.
Key vocabulary:
• toil — manual, repetitive, automatable operational work that scales with service size
• enduring value — lasting improvement to the system or team capability
• tactical work — reactive, interrupt-driven work; opposite of strategic engineering
• toil budget — the maximum acceptable percentage of an SRE's time spent on toil (typically <50%)