Performance Engineering Vocabulary: From Baseline to Bottleneck

A comprehensive English vocabulary guide for performance engineers — baselines, regressions, percentile latency, saturation, load test types, and key performance metrics explained.

Performance Engineering Has a Precise Vocabulary

Performance engineering is the discipline of making software systems fast, stable, and scalable under real-world conditions. Like any engineering discipline, it has developed a precise vocabulary for describing problems, measurements, and solutions.

If English is not your first language, learning this vocabulary is not just about communication — it is about thinking clearly about performance problems. The right word often encodes an entire concept that would otherwise take a paragraph to explain.


Measurement Vocabulary

TermDefinition
BaselineA reference measurement taken under known, stable conditions, used for comparison
BenchmarkA standardised test or procedure for measuring performance, often for comparison between configurations
RegressionA performance degradation compared to a previous baseline — things that got worse
ImprovementA performance enhancement compared to a previous baseline — things that got better
Measurement noiseVariability in measurements caused by factors unrelated to the system under test
Steady stateThe condition where a system’s performance metrics have stabilised and are no longer warming up
Warm-up periodThe interval before steady state, during which JIT compilation, caches, and connection pools are initialising
VarianceThe spread of measurement values around the mean — high variance indicates instability

A regression in performance engineering is a specific term. It means performance has become worse compared to a known baseline. When reviewing a pull request, writing “this change introduces a 15% p95 latency regression” is precise and actionable; “this change is slower” is not.


Latency Vocabulary

TermDefinition
LatencyThe time between a client sending a request and receiving a response
Round-trip time (RTT)The total time for a packet to travel from sender to receiver and back
p50 (median)The latency value below which 50% of requests fall
p95The latency value below which 95% of requests fall
p99The latency value below which 99% of requests fall
p99.9The latency value below which 99.9% of requests fall — sometimes called “three nines”
Tail latencyLatency at the extreme high end of the distribution (p99 and above)
JitterVariability in latency over time — high jitter means unpredictable response times
Percentile distributionThe full statistical distribution of latency values across all measured requests

The choice between p50, p95, and p99 depends on your user impact model. p50 (median) tells you the typical experience. p99 tells you the worst-case experience for 1% of users. For payment processing or authentication, p99 and p99.9 are typically the most relevant — because the slowest 1% of requests may represent real user pain.


Throughput and Capacity Vocabulary

TermDefinition
ThroughputThe rate of successful work completed per unit of time (requests per second, transactions per second)
CapacityThe maximum throughput a system can sustain while meeting its latency and error rate targets
SaturationThe state where a resource is fully utilised and cannot absorb additional load without degradation
BottleneckThe constrained resource that limits overall system throughput
ConcurrencyThe number of requests being processed simultaneously
Queue depthThe number of requests waiting for a resource that is currently busy
HeadroomThe difference between current load and the system’s capacity limit
Little’s LawThe relationship: concurrency = throughput × latency — useful for reasoning about queue depth

Saturation is one of the four “golden signals” from Google’s SRE book (alongside latency, traffic, and errors). When a resource is saturated, adding more load causes latency to climb and errors to appear — it is the signature of a bottleneck.


Load Test Type Vocabulary

Test typeWhat it measures
Smoke testWhether the system functions correctly at all under minimal load
Load testPerformance at expected normal and peak load
Stress testBehaviour beyond expected peak; where does the system break?
Soak testLong-duration stability; does performance degrade over hours or days?
Spike testResponse to sudden sharp load increases
Scalability testHow performance changes as hardware resources are added
Endurance testSynonym for soak test; emphasises the extended duration

Resource Utilisation Vocabulary

TermDefinition
CPU-boundA bottleneck where CPU utilisation is the limiting factor
Memory-boundA bottleneck where memory bandwidth or capacity is the limiting factor
I/O-boundA bottleneck where disk or network I/O is the limiting factor
Lock contentionPerformance degradation caused by threads waiting for shared locks
Cache miss rateThe proportion of cache lookups that fail to find the requested data
GC pressureExcessive garbage collection activity reducing application throughput (relevant to JVM, .NET, Go)
Connection pool exhaustionA state where all available database or network connections are in use, causing requests to queue

Example Sentences

  1. “After the refactoring, we re-ran the benchmark suite and confirmed a 22% reduction in p99 latency — this is a meaningful improvement, not measurement noise.”
  2. “The stress test revealed that saturation occurs at approximately 3,200 RPS; our current production peak is 1,800 RPS, giving us about 40% headroom.”
  3. “The bottleneck is connection pool exhaustion on the read replica — increasing the pool size from 20 to 50 connections should push the saturation point well beyond our capacity requirements.”
  4. “We identified a p99.9 tail latency regression of 400 ms introduced in the last deploy — it is caused by a new synchronous call to the notification service that blocks the request path.”
  5. “The soak test revealed gradual GC pressure building over six hours; heap analysis shows retained objects in the session cache — we suspect a cache eviction bug.”

Talking About Performance in Team Discussions

Use precise vocabulary in performance discussions to avoid misunderstandings:

  • Say “p99 latency regression” not “it got slower”
  • Say “CPU-bound bottleneck at the API tier” not “the API is struggling”
  • Say “20% headroom before saturation” not “we have some room left”

The more precise your vocabulary, the faster a team can diagnose and resolve performance issues — without spending meeting time clarifying what the problem actually is.