Learn the vocabulary of setting and tracking a measurable reliability target.
0 / 5 completed
1 / 5
At standup, a dev mentions defining a measurable target for how reliably a specific user-facing behavior, like request latency, should perform, expressed as a percentage over a rolling time window. What is this target called?
A service-level objective, or SLO, defines a measurable target for how reliably a specific user-facing behavior, like request latency or successful response rate, should perform, typically expressed as a percentage over a rolling time window, such as 99.9% of requests succeeding over 30 days. An informal, undocumented expectation gives the team nothing concrete to actually measure against or hold themselves accountable to. This explicit, measurable target is what turns a vague reliability goal into something the team can objectively track and make decisions around.
2 / 5
During a design review, the team wants to select the specific underlying metric, like request latency or error rate, that will actually be measured to determine whether an SLO is being met. Which concept supports this?
A service-level indicator, or SLI, is the specific underlying metric, like request latency or error rate, that gets actually measured to determine whether the associated SLO's target is being met. Setting an SLO target with no underlying SLI actually being measured leaves the target unenforceable and unverifiable in practice. Choosing a well-defined, genuinely representative SLI is a critical first step, since the SLO's usefulness depends entirely on that indicator accurately reflecting the user experience it's meant to represent.
3 / 5
In a code review, a dev notices the team calculates how much unreliability is still allowed within the current time window before breaching the SLO, and uses that remaining margin to decide whether it's safe to ship a risky change. What does this represent?
An error budget derived from the SLO calculates how much unreliability is still allowed within the current time window before breaching that objective, giving the team a concrete, quantified basis for deciding whether it's currently safe to ship a riskier change. Ignoring the SLO entirely when making that decision means the team is flying blind on how much reliability margin they actually have left. This error-budget-driven decision-making is one of the most practical, widely adopted applications of having a well-defined SLO in the first place.
4 / 5
An incident report shows the team had already breached their SLO's error budget for the month, yet continued shipping several more risky changes without adjusting their release approach. What practice would prevent this?
Tightening the release process, such as requiring extra review or pausing a risky change once the error budget is exhausted, is the actual point of tracking that budget in the first place, since it's meant to inform real decisions about acceptable risk. Continuing to ship risky changes at the same pace regardless of an exhausted budget makes the whole SLO and error-budget exercise purely theoretical with no real behavioral impact. This budget-informed adjustment to the release process is what makes an SLO a genuinely actionable tool rather than just a dashboard number.
5 / 5
During a PR review, a teammate asks why the team defines a formal SLO with a specific numeric target instead of just informally aiming to keep the service reliable. What is the reasoning?
An informal, unmeasured aspiration to keep the service reliable gives the team nothing concrete to track or hold themselves accountable to, since there's no defined target or underlying metric being measured. A formal SLO with a specific numeric target, backed by a well-defined SLI, lets the team objectively track reliability and derive a concrete error budget to guide real decisions. The tradeoff is the upfront work of selecting a genuinely representative SLI and setting a realistic, meaningful target.