Chaos Engineering English: Game Days, Experiments, and Findings Vocabulary
Learn the English vocabulary for chaos engineering — game days, blast radius, steady state, hypotheses, and experiment findings vocabulary explained for IT professionals.
Introduction
Chaos engineering is the practice of deliberately injecting failure into systems to discover weaknesses before they cause real incidents. Pioneered by Netflix, it has grown into a mainstream reliability practice with its own distinct vocabulary. Whether you are participating in a game day, writing a chaos experiment, or reviewing findings with your team, understanding the English terms used in chaos engineering discussions will help you contribute effectively. This guide covers the vocabulary from hypothesis to findings report.
The Chaos Engineering Mindset
Chaos engineering starts with a specific way of thinking about reliability that is expressed in English with precise terms:
- steady state — the normal, healthy behaviour of the system before any experiment; “we define steady state as: error rate below 0.1%, p99 latency below 200ms, all health checks passing”
- hypothesis — a prediction about what will happen during the experiment; “our hypothesis is that if one availability zone goes down, traffic will automatically route to the remaining zones with no user-visible impact”
- blast radius — the scope of potential impact if an experiment causes unexpected harm; “we limit the blast radius by starting with 5% of traffic before expanding”
- “Start small” — a core chaos engineering principle; begin with limited blast radius and expand gradually
The word hypothesis is important. Chaos experiments are not random destruction — they are structured scientific tests. Engineers say: “We write a hypothesis before the experiment, so we know whether the system behaved as expected or not.” Without a hypothesis, you are running chaos without a purpose.
Game Days
A game day is a planned exercise where the team deliberately triggers failures to test system and team response:
- “We are running a game day on Thursday” — a planned chaos experiment session
- facilitator — the person who runs the game day; “the chaos engineering team facilitates, but the on-call team responds as they would in a real incident”
- scenario — a specific failure situation the game day tests; “our scenario is: the primary database becomes unavailable”
- “The game day is a no-blame exercise” — failures discovered are system failures, not people failures
- “We debrief after the game day” — a structured review of what was learned; “the debrief lasts one hour and covers what we expected, what actually happened, and what we learned”
- tabletop exercise — a lower-intensity game day where the team talks through scenarios rather than triggering real failures; “we run tabletop exercises for scenarios that are too risky to test in production”
The phrase no-blame culture is common in chaos engineering and reliability discussions generally. It means problems are treated as system failures, not personal failures, encouraging honest reporting and learning.
Running an Experiment
When running a chaos experiment, specific vocabulary describes the workflow:
- inject — deliberately introduce a failure; “we inject latency into the payment service to test timeout handling”
- abort — stop the experiment if it causes unexpected harm; “we abort the experiment if error rates exceed 5%”
- rollback — restore the system to its state before the experiment; “the experiment includes an automatic rollback trigger”
- control group — a portion of the system not affected by the experiment, used for comparison; “we inject failures into 10% of users and compare their error rates against the control group”
- observe — monitor the system during the experiment; “we observe CPU, latency, and error rate throughout”
- “The system behaved as expected” — the hypothesis was confirmed; resilience is verified
- “The system did not behave as expected” — the hypothesis was falsified; a weakness was discovered
Findings and Remediation
After an experiment, findings are documented:
- finding — a discovered weakness or surprising behaviour; “we have three findings from today’s game day”
- “The finding indicates a single point of failure” — one component failure causes a larger outage
- action item — a task to address a finding; “the action item is to implement circuit breakers for all downstream service calls”
- follow-up experiment — a future experiment to verify the fix; “we will run a follow-up experiment after the circuit breakers are deployed”
- findings report — a document summarising the experiment, observations, and action items; “we publish findings reports to the engineering organisation to share learnings broadly”
Key Vocabulary
| Term | Definition |
|---|---|
| steady state | The normal, healthy behaviour of a system before experimentation |
| hypothesis | A prediction about how the system will behave during an experiment |
| blast radius | The scope of potential harm if an experiment causes unexpected damage |
| game day | A planned exercise where failures are deliberately triggered |
| tabletop exercise | A discussion-based scenario review without triggering real failures |
| inject | Deliberately introduce a failure condition |
| abort | Stop an experiment to prevent unexpected harm |
| control group | The unaffected portion of the system used as a comparison baseline |
| finding | A weakness or unexpected behaviour discovered during an experiment |
| no-blame culture | A principle that treats failures as system issues, not personal failures |
Practice Tips
-
Write a hypothesis before your next experiment. Use the format: “We believe that [action] will result in [outcome] because [rationale]. We will verify this by observing [metrics].” This forces precise English and clear thinking before running any experiment.
-
Practise limiting blast radius in conversation. When proposing a chaos experiment, say: “We will start with a 1% traffic sample to limit blast radius, observe for 10 minutes, then expand to 10% if everything looks healthy.” This demonstrates disciplined practice.
-
Write a findings report after every experiment. Even a simple one. Structure it as: Hypothesis, Setup, What we expected, What we observed, Findings, Action items. Practise using this structure in English to build the habit.
-
Use “no-blame” explicitly in game day introductions. When you facilitate a game day, say at the start: “This is a no-blame exercise. We are testing the system, not the people. If something breaks unexpectedly, that is valuable information, not a failure to be ashamed of.” Setting this tone in English matters for participation.
Conclusion
Chaos engineering vocabulary — steady state, hypothesis, blast radius, game day, findings, no-blame — describes a rigorous scientific approach to building reliable systems. Using this vocabulary precisely signals that your team is running disciplined experiments rather than random destruction. For non-native English speakers, this vocabulary is accessible and consistent — once you learn the terms, they appear in the same form across Chaos Monkey, LitmusChaos, AWS Fault Injection Simulator, and the broader SRE community.