Learn vocabulary for retry with backoff, jitter, timeout, bulkhead pattern, fallback mechanisms, degraded mode, and service health indicators.
0 / 5 completed
1 / 5
What is 'exponential backoff with jitter' in resilience pattern vocabulary?
Exponential backoff: wait 1s, 2s, 4s, 8s... between retries (doubles each time). Jitter: add random delay (e.g., wait = min(cap, base * 2^attempt) * random(0.5, 1.5)). Without jitter, all clients that hit the same failure retry at the same time ('thundering herd'), re-overloading the recovering service. AWS, Google Cloud, and most SDK retry policies recommend 'full jitter' or 'decorrelated jitter.'
2 / 5
What is the 'bulkhead pattern' in resilience vocabulary?
Bulkhead (from ship design: watertight compartments prevent sinking if one compartment floods): in software, allocate separate thread pools or semaphores per downstream service. If Service A becomes slow and exhausts its thread pool, calls to Service B and Service C use their own separate pools and remain unaffected. Netflix Hystrix popularized this. Prevents 'slow dependency cascading to full service failure' — one of the most common distributed systems failure modes.
3 / 5
What is a 'fallback mechanism' in resilience pattern vocabulary?
Fallback strategies: (1) Cached response — return the last successful result (acceptable if data can be slightly stale). (2) Default value — return a safe default ('0 items in cart' instead of an error). (3) Stub/static content — return a simplified response. (4) Fail fast — return an immediate error rather than a slow timeout. The key vocabulary: 'graceful degradation' means the user still gets partial functionality, not a hard error.
4 / 5
What is 'degraded mode' in resilience vocabulary?
Degraded mode (graceful degradation) vocabulary: 'We are operating in degraded mode — recommendation engine is unavailable, but checkout and order placement remain fully functional.' Key design principle: identify which features are critical (checkout) vs. non-critical (personalized recommendations, product reviews). When dependencies for non-critical features fail, disable those features instead of failing the entire request. Feature flags enable programmatic degraded mode.
5 / 5
What is a 'service health indicator' (SHI) in resilience and chaos engineering vocabulary?
Service health indicators go beyond 'is the process running?' They measure business-meaningful signals: 'order completion rate > 99%' (not just 'HTTP 200 rate'), 'payment processing latency p99 < 400ms' (not just 'server is reachable'). SHIs are inputs to chaos experiment steady-state definitions, SLO burn rate alerts, and automated circuit breakers. Choosing the right SHIs is itself a resilience engineering discipline.