Build fluency in the language of SRE error budgets and reliability policy.
0 / 5 completed
1 / 5
At standup, a dev mentions the team has nearly exhausted the allowed amount of unreliability for this quarter under their SLO. What is this allowance called?
The error budget is the amount of allowable unreliability, derived from an SLO, that a service can spend before corrective action is required. It quantifies how much failure is acceptable rather than demanding perfection. This budget framing helps balance reliability work against feature velocity.
2 / 5
During a design review, the team debates pausing new feature launches because reliability work is overdue. What policy are they invoking?
An error-budget policy defines what happens once the budget is exhausted, commonly pausing risky launches in favor of reliability work until the service is back within budget. This creates an automatic, objective trigger rather than relying on ad hoc debate. It formalizes the tradeoff between velocity and reliability.
3 / 5
In a code review, a dev references the specific reliability target the error budget is calculated against. What is this target called?
The SLO (service-level objective) defines the target reliability level, such as 99.9% availability, from which the error budget is derived as the complementary allowance for failure. Without a defined SLO, there is no basis for calculating a budget. SLOs are the foundation the error-budget framework is built on.
4 / 5
An incident report shows the team kept shipping risky changes despite an exhausted error budget. What process gap does this reveal?
If risky changes continue after the error budget is exhausted, the policy meant to trigger a pause wasn't actually enforced in practice. Having a policy on paper doesn't help unless it is genuinely acted on when the threshold is crossed. This gap is a common finding in postmortems about repeated reliability incidents.
5 / 5
During a PR review, a teammate asks why a small feature is being delayed despite passing all tests. What might the error budget explain?
Even a fully tested change can be intentionally delayed if the team is currently operating under an exhausted error budget, which prioritizes stabilization over adding new risk. This is a deliberate policy decision, not a sign the change itself is flawed. Understanding this context avoids misreading a reliability-driven delay as a quality problem.