Toil & Automation Vocabulary
5 exercises — Practice identifying toil, calculating ROI for automation, and communicating toil reduction strategy to leadership.
0 / 5 completed
Quick reference: Toil
- Toil definition — manual, repetitive, automatable, O(n) with growth, no enduring value
- 50% cap — SRE teams target less than half their time on toil
- Toil ROI — hours saved per year ÷ hours to automate = payback period
- Alert noise — a primary toil source; every non-actionable alert wastes human time
1 / 5
An SRE team lead says in a planning meeting: "We need to quantify our toil so we can justify the automation work to leadership." Which of the following is the correct SRE definition of toil?
Google SRE defines toil using five criteria: manual, repetitive, automatable, reactive, and O(n) with growth.
The classic toil examples:
• Manually restarting services that crash on deploys
• Manually acknowledging recurring noisy alerts
• Running the same SQL query to fix data inconsistencies every week
• Manually rotating credentials on a schedule
Key distinction: toil is automatable. Hard engineering problems aren't toil just because they're difficult — toil is specifically work a script or automation could handle.
Key vocabulary:
• Toil — manual, repetitive, automatable, non-cumulative operational work
• O(n) scaling — toil that grows linearly as the service grows (more servers = more manual work)
• Toil budget — SRE teams target < 50% of time spent on toil; rest on engineering
• Enduring value — work that permanently improves the system (toil lacks this)
The classic toil examples:
• Manually restarting services that crash on deploys
• Manually acknowledging recurring noisy alerts
• Running the same SQL query to fix data inconsistencies every week
• Manually rotating credentials on a schedule
Key distinction: toil is automatable. Hard engineering problems aren't toil just because they're difficult — toil is specifically work a script or automation could handle.
Key vocabulary:
• Toil — manual, repetitive, automatable, non-cumulative operational work
• O(n) scaling — toil that grows linearly as the service grows (more servers = more manual work)
• Toil budget — SRE teams target < 50% of time spent on toil; rest on engineering
• Enduring value — work that permanently improves the system (toil lacks this)