Soak test: sustained load over hours — detects memory leaks and resource exhaustion
0 / 5 completed
1 / 5
A load test report states: "p99 latency: 420ms, p50 latency: 38ms, error rate: 0.3%, throughput: 1,240 req/s." Which is the most accurate interpretation?
Reading a load test summary — percentile interpretation
p99 = 420ms: 99% of all requests completed within 420ms. The slowest 1% took longer. p50 = 38ms: The median request — half were faster, half were slower. Error rate 0.3%: 3 in every 1,000 requests failed. Throughput 1,240 req/s: The system processed 1,240 requests per second at this load level.
p99 ≠ average: The average (mean) could be much lower than p99 if most requests are fast with rare outliers. Percentiles are far more informative than averages for latency.
Percentile vocabulary:
"p50 represents the median response time."
"p99 captures the worst-case experience for 1% of users."
"The p99 latency is our SLO binding constraint."
"Latency at the 99th percentile was [X]ms under [N] virtual users."
2 / 5
A test engineer says: "We ran a soak test at 500 VUs for 4 hours." What is the primary purpose of a soak test and what are they looking for?
Soak test — purpose and what it finds
A soak test (also called an endurance test) runs sustained load for an extended period — hours, not minutes. Its purpose is to find issues invisible in short load tests:
Memory leaks: memory that grows linearly over time and never gets released
Connection pool exhaustion: database connections that accumulate and eventually run out
Log disk space: verbose logging that fills disks over hours
Thread/resource leaks: handles or threads that aren't released after requests
GC pressure: garbage collection that degrades over time as heap fills up
Contrast with load test: A load test at 500 VUs for 5 minutes may show acceptable performance. The same test at 4 hours reveals what only time exposes.
Soak test vocabulary:
"The soak test revealed a memory leak — heap grew by 2 GB over 4 hours."
"Connection pool was exhausted after 3 hours of sustained load."
"No degradation detected during the 8-hour soak — the service is stable under sustained load."
3 / 5
A load test shows the system handles 800 req/s with p99 under 200ms, but at 1,200 req/s p99 climbs to 2,400ms and error rate increases to 8%. How would you describe this finding?
Capacity cliff — non-linear degradation
The pattern described is a classic capacity cliff (or saturation point):
Below ~800 req/s: system handles load well (p99 under 200ms)
Above ~1,000–1,200 req/s: performance degrades non-linearly — latency multiplies rather than growing proportionally
The math: p99 goes from 200ms to 2,400ms — a 12x increase for a 50% increase in load. This is non-linear degradation.
Why non-linear degradation happens:
Queue saturation — requests wait in a queue longer than they take to process
Thread pool exhaustion — new requests wait for threads to free up
Database connection pool saturation
CPU contention causing excessive context switching
Vocabulary for this pattern:
"The system has a capacity ceiling at approximately [X] req/s."
"The saturation point is around [X] concurrent users."
4 / 5
A load test report mentions a "warm-up phase of 60 seconds before measurements begin." Why is a warm-up phase necessary?
Warm-up phase — why it matters
Many systems have higher latency immediately after startup than during steady-state operation:
JIT compilation (JVM): Java/Kotlin/Scala apps compile bytecode to native code on first execution. Initial requests are slower until "hot" paths are compiled.
Cache cold start: Application caches, CDN caches, and database query caches are empty. First requests miss the cache and hit slower data sources.
Connection pool establishment: Database and HTTP connection pools need time to fill with reusable connections.
Class loading (JVM): First use of classes triggers loading overhead.
Without warm-up, benchmarks are misleading: the first 30–60 seconds of measurements often show 3–10x higher latency than steady-state. This would make the system look far worse than it actually is in production.
Warm-up vocabulary:
"Results exclude the 60-second warm-up phase."
"Measurements begin after steady state is reached."
"The cold-start overhead is excluded from benchmark figures."
5 / 5
After a load test, a developer says: "Throughput plateaued at 950 req/s regardless of how many VUs we added." What does this indicate?
Throughput plateau — bottleneck identification
When adding more virtual users no longer increases throughput, the system has hit a bottleneck. Throughput has saturated.
Common bottleneck sources:
CPU-bound: all CPU cores are fully utilised; additional requests queue
Database I/O: DB queries are the slowest component; more app instances don't help if they all wait on the same DB
Network bandwidth: the network link is saturated
Single-threaded component: a component that can't parallelize (e.g. a global lock)
How to diagnose: Check CPU, memory, DB connection pool, and network metrics during the plateau. The metric that hits 100% is the bottleneck.
Throughput ceiling vocabulary:
"Throughput plateaued at [X] req/s — the system is bottlenecked at [component]."
"Adding more capacity at the application tier won't help until we address the DB bottleneck."
"The system is I/O-bound — horizontal scaling of the app servers won't improve throughput."