Practice vocabulary for running chaos experiments in production: safeguards, self-healing, monitoring, and production chaos boundaries.
0 / 5 completed
1 / 5
Your team says 'We run chaos experiments in production with safeguards.' What are typical safeguards for production chaos?
Production chaos safeguards include: automatic abort conditions (if error rate exceeds X%, stop), blast radius limits (only affect a small % of traffic), rollback procedures, and pre-agreed halt criteria — ensuring experiments can be stopped before causing customer impact.
2 / 5
A chaos runbook says 'The chaos experiment is automated and self-healing.' What does self-healing mean in this context?
A self-healing chaos experiment is designed so the system automatically recovers once the fault injection stops — the experiment tests the system's ability to restore itself, not just that it fails gracefully.
3 / 5
Your chaos protocol says 'The chaos engineer monitors during the experiment.' What should the engineer specifically watch?
During a chaos experiment, the engineer monitors business metrics (error rates, latency, transaction success rates) and technical metrics simultaneously. The goal is to detect unexpected blast radius and abort before real customer harm occurs.
4 / 5
A post-experiment report says 'We ran the experiment during low-traffic hours.' Why is traffic timing important for production chaos?
Running experiments during low-traffic periods (nights, weekends) reduces the number of customers affected if something goes wrong unexpectedly. It is a risk management approach for early-stage production chaos experiments.
5 / 5
Your chaos policy document defines 'the production chaos boundary.' What does this boundary define?
The production chaos boundary explicitly defines the scope of what is allowed: which services are in-scope, what percentage of requests can be affected, what failure types can be injected, and what is strictly off-limits — protecting critical paths.