Toil & Automation Vocabulary
5 exercises — Practice identifying toil, calculating ROI for automation, and communicating toil reduction strategy to leadership.
0 / 5 completed
1 / 5
An SRE team lead says in a planning meeting: "We need to quantify our toil so we can justify the automation work to leadership." Which of the following is the correct SRE definition of toil?
Google SRE defines toil using five criteria: manual, repetitive, automatable, reactive, and O(n) with growth.
The classic toil examples:
• Manually restarting services that crash on deploys
• Manually acknowledging recurring noisy alerts
• Running the same SQL query to fix data inconsistencies every week
• Manually rotating credentials on a schedule
Key distinction: toil is automatable. Hard engineering problems aren't toil just because they're difficult — toil is specifically work a script or automation could handle.
Key vocabulary:
• Toil — manual, repetitive, automatable, non-cumulative operational work
• O(n) scaling — toil that grows linearly as the service grows (more servers = more manual work)
• Toil budget — SRE teams target < 50% of time spent on toil; rest on engineering
• Enduring value — work that permanently improves the system (toil lacks this)
The classic toil examples:
• Manually restarting services that crash on deploys
• Manually acknowledging recurring noisy alerts
• Running the same SQL query to fix data inconsistencies every week
• Manually rotating credentials on a schedule
Key distinction: toil is automatable. Hard engineering problems aren't toil just because they're difficult — toil is specifically work a script or automation could handle.
Key vocabulary:
• Toil — manual, repetitive, automatable, non-cumulative operational work
• O(n) scaling — toil that grows linearly as the service grows (more servers = more manual work)
• Toil budget — SRE teams target < 50% of time spent on toil; rest on engineering
• Enduring value — work that permanently improves the system (toil lacks this)