5 exercises on the language of reliability engineering: SLOs, error budgets, burn rate, the SLA/SLI/SLO family, meeting and breaching targets, and nines of availability.
Key patterns
define / set an SLO; measure the SLI; sign the SLA
spend / burn / exhaust the error budget
burn rate = how fast the budget is consumed
meet / hit vs. miss / breach the target
"five nines of availability" = 99.999%
0 / 5 completed
1 / 5
An SRE kicks off a reliability project: "Step one is to ___ an SLO for the API — let's say 99.9% of requests succeed under 300ms." Which verb collocates with setting a target like an SLO?
Define an SLO — the standard verb for setting the target:
An SLO (Service-Level Objective) is the internal target for a reliability metric — e.g. "99.9% of requests succeed in under 300ms over 30 days."
Collocations: define / set / agree / meet / miss / breach an SLO
"We set an SLO of 99.95% availability and track it monthly."
The SLO/SLI/SLA family — know the difference:
SLI (Indicator) — the measured number itself ("success rate = 99.97%")
SLO (Objective) — your internal target for that SLI ("≥ 99.9%")
SLA (Agreement) — the contractual promise to customers, usually with penalties if breached; the SLA is typically looser than the SLO
Why not the distractors?Compile and deploy act on code/software; render acts on visuals. Only define (or set) collocates with an objective. You define the SLO, measure the SLI, and sign the SLA.
2 / 5
A reliability review notes: "We've only spent 20% of our ___ this month, so we have room to ship riskier changes." Which term names the allowed amount of unreliability?
Error budget — the permitted amount of unreliability:
An error budget is the flip side of an SLO: if your SLO is 99.9% success, then 0.1% of requests are allowed to fail — that 0.1% is your error budget for the period.
Collocations: spend / burn / consume / exhaust / blow the error budget; "we have budget left", "the budget is used up"
Policy: "if the error budget is exhausted, we freeze feature releases and focus on reliability."
Why it's powerful: the error budget turns reliability into a shared, quantitative trade-off — teams can spend remaining budget on bold releases, but must slow down once it's gone. It aligns developers (who want to ship) and SREs (who want stability).
Distractors:tech debt = accumulated shortcuts to repay later; capacity budget = resource/cost allocation; sprint budget = planning effort. Only error budget measures allowed failure against an SLO.
3 / 5
An on-call engineer raises the alarm: "The error budget is being consumed fast — the ___ is 10x normal, we'll exhaust it in two days." Which term names the rate at which an error budget is spent?
Burn rate — how fast the error budget is spent:
Burn rate measures how quickly you are consuming your error budget relative to the SLO window. A burn rate of 1 means you will use exactly the whole budget over the period; 10x means you will exhaust it ten times faster.
"A burn rate of 14.4 over 1 hour means a full month's budget gone in ~2 hours."
Collocations: "the budget is burning fast", "a fast-burn vs slow-burn alert", "high burn rate triggers a page"
Modern SRE practice uses multi-window, multi-burn-rate alerts — page fast on a steep burn, ticket slowly on a gentle one.
Why it beats raw error-rate alerts: burn-rate alerting ties severity to budget impact, reducing noise from brief blips while still paging quickly on serious, sustained failures.
Distractors:refresh rate (display), bounce rate (analytics), and sample rate (data sampling) all use "rate" but in unrelated domains. Only burn rate measures error-budget consumption.
4 / 5
A status update reports a near-miss: "We came close, but we just managed to ___ our 99.9% SLA this quarter." Which verb means "successfully achieve the target"?
Meet the target — and its opposite, breach:
The verbs for hitting or missing reliability targets are a tight, contrasting set:
meet / hit / achieve / satisfy the SLO/SLA — succeed ("we met our availability target")
miss / breach / violate / blow the SLO/SLA — fail ("we breached the SLA and owe service credits")
"We stayed within the SLO" / "we fell short of the target"
Breach is especially important with contracts: breaching an SLA often triggers penalties or refunds, whereas missing an internal SLO triggers a process response (e.g. a release freeze).
Why not the distractors?Breach is the antonym of success; waive means to formally drop a requirement; escalate means to push an issue to higher severity/seniority. Only meet (or hit/achieve) means "we reached the target." Pairing meet/breach correctly with SLA/SLO is core reliability English.
5 / 5
A sales engineer explains availability tiers: "Our premium plan offers ___ — that's 99.999%, about five minutes of downtime a year." Which idiomatic phrase describes availability levels by counting 9s?
Nines of availability — counting the leading 9s:
Availability is commonly described by the number of nines in the percentage, a fixed idiom in SRE and sales:
Two nines = 99% ≈ 3.65 days of downtime/year
Three nines = 99.9% ≈ 8.76 hours/year
Four nines = 99.99% ≈ 52.6 minutes/year
Five nines = 99.999% ≈ 5.26 minutes/year
Collocations: "we offer five nines", "chasing the extra nine gets exponentially expensive", "this tier guarantees three nines of uptime".
Reality check: each additional nine roughly tenfolds the engineering cost, which is exactly why error budgets and SLOs exist — to decide how many nines are actually worth paying for.
Distractors:bars (phone signal), sigmas (Six Sigma quality / standard deviations), and marks are not availability idioms. The phrase is always "N nines of availability/uptime".