5 exercises — practice reading and explaining p50/p95/p99 percentile latency, tail latency, and the "99th percentile latency" pattern in monitoring and SLA contexts.
0 / 5 completed
Key percentile vocabulary for performance discussions
"p50 = median: half of requests are faster, half are slower."
"p99 = tail latency: 99% of requests are under this value; the slowest 1% take longer."
"At 1,000 RPS, p99 = 10 users per second experiencing worst-case latency."
"Fat tail: when p99.9 is 100x higher than p50, you have a bimodal distribution with a specific slow mode."
"Percentiles are not additive: p95 of A + p95 of B ≠ p95 of A+B (requires measurement)."
1 / 5
A monitoring alert fires: "p99 latency: 4,200ms." The p50 is 45ms. What does this tell you?
Option B correctly interprets the p50/p99 gap. Key insights: (1) p50 = 45ms means half of requests are faster than 45ms — the "normal" experience, (2) p99 = 4,200ms means 1% of requests take 4+ seconds, (3) the 93x ratio between p50 and p99 is the alarming signal — this is not a gradual distribution, it indicates a bimodal distribution with specific slow cases, (4) likely causes of severe tail latency: garbage collection pauses (JVM languages), lock contention on hot data, cache cold starts, database queries hitting unindexed rows. Option A incorrectly averages two percentiles — statistically invalid. Option C dismisses "only 1%" — at 1,000 RPS, 1% = 10 users per second experiencing 4s latency. Option D misreads p99: 99% of requests are FAST, 1% are slow.
2 / 5
Why do teams monitor p99 (or p99.9) latency rather than just p50 (median)?
Option B gives the correct rationale for tracking p99. The key framing: percentiles are user counts, not abstract statistics. If you have 1,000 requests per second: p95 issues affect 50 users/second; p99 issues affect 10 users/second; p99.9 issues affect 1 user/second. At scale, even p99.9 represents significant real user impact. Additionally: (1) SLAs are often written in percentile terms ("99.9% of requests under 200ms"), (2) tail latency is where bug investigation is most valuable — if p50 is fine but p99 is bad, there's a specific code path causing the problem, (3) median can look fine even when user experience is poor, because the median hides the worst cases. Option A is wrong — percentile measurement difficulty is similar across all percentiles.
3 / 5
An SLA states: "p95 response time under 200ms." A new microservice call adds 80ms to the critical path. What's the impact?
Option B gives the nuanced correct answer. The important statistical subtlety: percentiles are not simply additive across services. If Service A p95 = 100ms and Service B p95 = 80ms, the combined p95 is NOT necessarily 180ms — because both services are at their p95 at the same time only 0.25% of requests (0.05 × 0.05), not 5%. The true combined p95 is lower than the sum of individual p95s. However, in practice, when services are correlated (under the same load spike), they often both slow down together, making this theoretical independence not fully applicable. The correct answer is: measure under load. Option A is partially right (it does depend on existing latency) but doesn't quantify the risk. Option C is statistically wrong in the "won't necessarily affect" framing — it will definitely affect it, the question is by how much. Option D is the worst engineering decision: adjusting SLAs to hide problems.
4 / 5
A dashboard shows: "p50: 12ms, p95: 45ms, p99: 210ms, p99.9: 3,200ms." Which statement correctly characterises the system's latency profile?
Option B correctly characterises a "fat tail" latency distribution. The vocabulary to know: (1) fat tail = extreme values are much higher than expected from a normal distribution, (2) 267x ratio between p50 (12ms) and p99.9 (3,200ms) is a classic sign of a bimodal distribution with a "slow mode" triggered by specific conditions, (3) actionability: the p99.9 cases are often the most investigable — there's a specific cause, not general slowness. Common causes of p99.9 fat tails: JVM garbage collection pauses, database query plan changes, cold cache hits, network retries, background job contention. Option A is colloquially accurate but not technically precise. Option C makes the same error as before — averaging percentiles is statistically invalid. Option D dismisses p99.9 — at 1,000 RPS, p99.9 = 1 user per second experiencing 3-second latency.
5 / 5
How should you describe p50/p95/p99 latency metrics to a non-technical product manager?
Option B is the gold standard plain-English explanation of percentile metrics. The technique: (1) anchor to 1,000 requests — percentiles become intuitive counts at this scale, (2) translate to users — "950 out of 1,000 users" is more meaningful than "95th percentile," (3) name the slow users ("our slowest 10 users") — this makes p99 feel real, not abstract, (4) connect to work ("that's what we're optimising") — gives the PM a handle on why this number matters. Option A is technically correct but meaningless to a non-technical PM. Option C uses "distribution curve" — still technical. Option D calls p99 "basically the maximum" — it is not (p100 is the maximum; some systems have much higher maximums than p99).