Performance Engineering Vocabulary: From Baseline to Bottleneck
A comprehensive English vocabulary guide for performance engineers — baselines, regressions, percentile latency, saturation, load test types, and key performance metrics explained.
Performance Engineering Has a Precise Vocabulary
Performance engineering is the discipline of making software systems fast, stable, and scalable under real-world conditions. Like any engineering discipline, it has developed a precise vocabulary for describing problems, measurements, and solutions.
If English is not your first language, learning this vocabulary is not just about communication — it is about thinking clearly about performance problems. The right word often encodes an entire concept that would otherwise take a paragraph to explain.
Measurement Vocabulary
| Term | Definition |
|---|---|
| Baseline | A reference measurement taken under known, stable conditions, used for comparison |
| Benchmark | A standardised test or procedure for measuring performance, often for comparison between configurations |
| Regression | A performance degradation compared to a previous baseline — things that got worse |
| Improvement | A performance enhancement compared to a previous baseline — things that got better |
| Measurement noise | Variability in measurements caused by factors unrelated to the system under test |
| Steady state | The condition where a system’s performance metrics have stabilised and are no longer warming up |
| Warm-up period | The interval before steady state, during which JIT compilation, caches, and connection pools are initialising |
| Variance | The spread of measurement values around the mean — high variance indicates instability |
A regression in performance engineering is a specific term. It means performance has become worse compared to a known baseline. When reviewing a pull request, writing “this change introduces a 15% p95 latency regression” is precise and actionable; “this change is slower” is not.
Latency Vocabulary
| Term | Definition |
|---|---|
| Latency | The time between a client sending a request and receiving a response |
| Round-trip time (RTT) | The total time for a packet to travel from sender to receiver and back |
| p50 (median) | The latency value below which 50% of requests fall |
| p95 | The latency value below which 95% of requests fall |
| p99 | The latency value below which 99% of requests fall |
| p99.9 | The latency value below which 99.9% of requests fall — sometimes called “three nines” |
| Tail latency | Latency at the extreme high end of the distribution (p99 and above) |
| Jitter | Variability in latency over time — high jitter means unpredictable response times |
| Percentile distribution | The full statistical distribution of latency values across all measured requests |
The choice between p50, p95, and p99 depends on your user impact model. p50 (median) tells you the typical experience. p99 tells you the worst-case experience for 1% of users. For payment processing or authentication, p99 and p99.9 are typically the most relevant — because the slowest 1% of requests may represent real user pain.
Throughput and Capacity Vocabulary
| Term | Definition |
|---|---|
| Throughput | The rate of successful work completed per unit of time (requests per second, transactions per second) |
| Capacity | The maximum throughput a system can sustain while meeting its latency and error rate targets |
| Saturation | The state where a resource is fully utilised and cannot absorb additional load without degradation |
| Bottleneck | The constrained resource that limits overall system throughput |
| Concurrency | The number of requests being processed simultaneously |
| Queue depth | The number of requests waiting for a resource that is currently busy |
| Headroom | The difference between current load and the system’s capacity limit |
| Little’s Law | The relationship: concurrency = throughput × latency — useful for reasoning about queue depth |
Saturation is one of the four “golden signals” from Google’s SRE book (alongside latency, traffic, and errors). When a resource is saturated, adding more load causes latency to climb and errors to appear — it is the signature of a bottleneck.
Load Test Type Vocabulary
| Test type | What it measures |
|---|---|
| Smoke test | Whether the system functions correctly at all under minimal load |
| Load test | Performance at expected normal and peak load |
| Stress test | Behaviour beyond expected peak; where does the system break? |
| Soak test | Long-duration stability; does performance degrade over hours or days? |
| Spike test | Response to sudden sharp load increases |
| Scalability test | How performance changes as hardware resources are added |
| Endurance test | Synonym for soak test; emphasises the extended duration |
Resource Utilisation Vocabulary
| Term | Definition |
|---|---|
| CPU-bound | A bottleneck where CPU utilisation is the limiting factor |
| Memory-bound | A bottleneck where memory bandwidth or capacity is the limiting factor |
| I/O-bound | A bottleneck where disk or network I/O is the limiting factor |
| Lock contention | Performance degradation caused by threads waiting for shared locks |
| Cache miss rate | The proportion of cache lookups that fail to find the requested data |
| GC pressure | Excessive garbage collection activity reducing application throughput (relevant to JVM, .NET, Go) |
| Connection pool exhaustion | A state where all available database or network connections are in use, causing requests to queue |
Example Sentences
- “After the refactoring, we re-ran the benchmark suite and confirmed a 22% reduction in p99 latency — this is a meaningful improvement, not measurement noise.”
- “The stress test revealed that saturation occurs at approximately 3,200 RPS; our current production peak is 1,800 RPS, giving us about 40% headroom.”
- “The bottleneck is connection pool exhaustion on the read replica — increasing the pool size from 20 to 50 connections should push the saturation point well beyond our capacity requirements.”
- “We identified a p99.9 tail latency regression of 400 ms introduced in the last deploy — it is caused by a new synchronous call to the notification service that blocks the request path.”
- “The soak test revealed gradual GC pressure building over six hours; heap analysis shows retained objects in the session cache — we suspect a cache eviction bug.”
Talking About Performance in Team Discussions
Use precise vocabulary in performance discussions to avoid misunderstandings:
- Say “p99 latency regression” not “it got slower”
- Say “CPU-bound bottleneck at the API tier” not “the API is struggling”
- Say “20% headroom before saturation” not “we have some room left”
The more precise your vocabulary, the faster a team can diagnose and resolve performance issues — without spending meeting time clarifying what the problem actually is.