🧭 SRE & Reliability Vocabulary Hub
5 categories, 120 exercises. One map for every site reliability and resilience English topic on Coders Lingo.
The SRE English landscape, in plain terms
Broadly, the five categories below form a pipeline. Observability Engineering Language is the foundation — instrumenting a system with traces, metrics, and logs so you can see what it is doing. SLO & Error Budget Engineering Language turns that data into targets — SLIs become SLOs, and error budgets decide how much risk a team can take. Chaos Engineering Language and Progressive Delivery Language are the two practices that protect those targets: chaos engineering deliberately breaks things to verify resilience in general, while progressive delivery limits the blast radius of any one specific change through canaries and feature flags. Service Mesh Operations Language is the networking layer that often implements the traffic-splitting behind progressive delivery and produces much of the trace data observability tooling consumes — tying the other four together at the infrastructure level.
These categories are not duplicates of each other, even where terms like SLI, SLO, reliability, and canary recur across them. Each card below states in one line exactly what makes that category distinct, so you can see how the vocabulary connects rather than assuming overlap means repetition.
The 5 SRE & reliability vocabulary categories
- Advanced
Observability Engineering Language
OpenTelemetry, distributed tracing, SLIs, alerting strategy, cardinality, and structured logging vocabulary.
Not a duplicate because: The instrumentation vocabulary for seeing what a system is doing — the data collection layer underneath every reliability conversation.
- Advanced
SLO & Error Budget Engineering Language
SLI/SLO/SLA hierarchy, error budget burn, toil quantification, burn rate alerts, and reliability communication.
Not a duplicate because: The measurement and negotiation vocabulary for how reliable a service needs to be — turns observability data into targets and budgets.
- Intermediate – Advanced
Chaos Engineering Language
Experiment design, GameDay planning, resilience reports, and fault injection vocabulary.
Not a duplicate because: Proactively testing whether a system survives failure — verifying resilience before an incident happens, not measuring or reacting to one.
- Intermediate – Advanced
Progressive Delivery Language
Feature flags, canary deployments, blue-green deployments, and traffic-splitting strategies.
Not a duplicate because: How changes are rolled out safely rather than all-at-once — a release-strategy vocabulary that limits blast radius rather than measuring or injecting failure.
- Advanced
Service Mesh Operations Language
Service mesh fundamentals, traffic management, mTLS security, mesh observability, and troubleshooting vocabulary.
Not a duplicate because: Networking-layer vocabulary for service-to-service communication — the infrastructure that often implements canary routing and produces the traces observability tools consume.
Frequently asked questions
Why are there so many separate SRE and reliability vocabulary categories on Coders Lingo?
Site reliability engineering covers several genuinely distinct activities: instrumenting a system to see what it is doing, setting targets for how reliable it must be, deliberately breaking it to verify resilience, rolling out changes safely, and managing the network layer that connects services. Coders Lingo splits this into five focused categories rather than one unfocused mega-category, and shares recurring terms like SLI, SLO, reliability, and canary across them because the vocabulary is genuinely used across the discipline. This hub explains how the pieces fit together.
Which SRE category should I start with?
Start with Observability Engineering Language — tracing, metrics, and structured logging are the data foundation that SLO Engineering, Chaos Engineering, and Service Mesh Operations all assume you understand. From there, move to SLO & Error Budget Engineering Language to learn how that data becomes reliability targets, then Chaos Engineering Language and Progressive Delivery Language for the practices that protect those targets.
What is the difference between "Chaos Engineering" and "Progressive Delivery"?
Chaos Engineering Language is about deliberately injecting failure into a system to verify it survives — GameDays, fault injection, and resilience reporting. Progressive Delivery Language is about safely rolling out a new change — canary releases, blue-green deployments, and feature flags. Chaos engineering tests whether a system is resilient to failure in general; progressive delivery limits the blast radius of a specific new change. Many teams practice both.
Why do "SLO Engineering" and "Observability Engineering" share terms like SLI?
Because they are adjacent layers of the same discipline. Observability Engineering Language covers how you collect the underlying signals — traces, metrics, and logs, including the SLI as a measured indicator. SLO & Error Budget Engineering Language covers what you do with that indicator once collected: setting a target (the SLO), tracking an error budget, and deciding when to alert on burn rate. You cannot set a meaningful SLO without observability data feeding it, so the categories are sequential rather than duplicates.
How does "Service Mesh Operations" relate to the rest of this cluster?
A service mesh is infrastructure that often implements the traffic-splitting used in progressive delivery (canary routing) and produces much of the trace data consumed by observability tooling, in addition to its own mTLS and networking vocabulary. It is included here because engineers working across this cluster frequently need mesh vocabulary alongside SLOs, tracing, and chaos experiments, even though the mesh itself is a distinct networking layer.
How many total exercises are covered across the SRE and reliability vocabulary cluster?
The five categories in this hub cover 120 exercises in total, spanning instrumentation, target-setting, resilience testing, safe rollout strategy, and service mesh networking. Each category is self-contained, so you can start with whichever matches your current work.