Performance Testing Vocabulary: k6, JMeter, and Load Testing Language
Master the English vocabulary for performance testing — throughput, p99 latency, virtual users, ramp-up, and the language of load, stress, soak, and smoke tests.
Performance Testing: Why Vocabulary Precision Matters
Performance testing results mean nothing if you cannot communicate them clearly. Terms like “the system was slow” or “it couldn’t handle much load” are too vague to drive engineering decisions. Performance engineers and QA professionals use a precise vocabulary that quantifies behaviour and guides remediation. This guide covers the core terms you need to discuss performance test results, design test plans, and interpret reports in English.
Core Metrics Vocabulary
Throughput
Throughput is the number of requests or transactions a system processes per unit of time. It is typically expressed as requests per second (RPS) or transactions per second (TPS).
“At peak load, the checkout service achieved a throughput of 1,200 requests per second before response times began to degrade.”
Latency and Percentiles
Latency is the time from the moment a request is sent to the moment a response is received. A single latency number (like average) is rarely meaningful on its own — distributions matter.
Percentile latency is more informative than averages because it shows the experience of the worst-affected users:
- p50 (median) — 50% of requests completed within this time.
- p95 — 95% of requests completed within this time.
- p99 — 99% of requests completed within this time. “Our SLO requires p99 latency to remain below 500ms under normal load.”
- p999 (99.9th percentile) — only 0.1% of requests are slower than this. Used for systems requiring very low tail latency.
Tail latency — the latency experienced by the slowest requests, represented by high percentiles. High tail latency often indicates resource contention, GC pauses, or cold code paths.
Saturation Point
The saturation point is the load level at which a system’s resource (CPU, memory, connections) becomes fully utilised and throughput plateaus or response times increase sharply. “The saturation point for the API gateway was reached at approximately 800 concurrent virtual users.”
Error Rate
Error rate is the percentage of requests that return an error response (typically 4xx or 5xx HTTP status codes). “During the stress test, the error rate climbed from 0.1% to 8% as the system passed its saturation point.”
Test Types Vocabulary
Virtual User (VU)
A virtual user is a simulated user in a load testing tool like k6 or JMeter. Each virtual user executes the test scenario independently and concurrently. “We ramped from 0 to 500 virtual users over five minutes to simulate gradual traffic growth.”
Ramp-Up
Ramp-up is the phase at the start of a test during which load is gradually increased to the target level. Ramp-up avoids instant spike load that doesn’t reflect realistic user arrival patterns.
“The ramp-up period is configured to 10 minutes to let the JVM warm up its JIT compiler before we measure steady-state performance.”
Test Types
Smoke test — a minimal load test (e.g., 1–5 virtual users) run to verify that the test script and system are functioning correctly before a full test run. “We run a smoke test on every deployment to catch configuration regressions before the nightly load test.”
Load test — a test at the expected normal or peak traffic level, used to verify that performance meets SLOs. “The load test runs at 80% of our projected peak traffic for 30 minutes.”
Stress test — a test that pushes the system beyond its normal operating capacity to find the breaking point and observe failure behaviour. “The stress test revealed that the database connection pool was the first resource to saturate.”
Soak test (also called endurance test) — a test that runs at a sustained load level for an extended period (hours to days) to detect memory leaks, slow resource exhaustion, and other time-dependent issues. “The overnight soak test revealed a memory leak in the session cache that was not visible in 30-minute tests.”
Spike test — a test that applies a sudden, sharp increase in load to simulate a flash crowd event. “We ran a spike test to verify the auto-scaling policy would respond before response times breached the SLO.”
k6 and JMeter Language
Script — in k6, a JavaScript file defining the test scenario. In JMeter, a test plan (.jmx file) serves the same purpose.
Think time — a pause inserted between requests in a virtual user script to simulate realistic user behaviour. “Without think time, virtual users hammer the API at maximum rate, which doesn’t reflect real usage patterns.”
Threshold — in k6, a pass/fail criterion applied to a metric. “We set a threshold of p99 < 500ms and error rate < 1% so the CI pipeline fails automatically if performance regresses.”
Five Example Sentences
- “The load test results show that p99 latency remains below 300ms up to 600 virtual users but climbs sharply to 1.8 seconds at 800 users, indicating the saturation point lies around that threshold.”
- “We always run a smoke test after deploying to the staging environment to confirm the test script itself is working before committing to a two-hour soak test.”
- “The stress test exposed a bottleneck in the connection pool configuration — the database ran out of available connections at 350 concurrent virtual users.”
- “After adding a 2-second think time between requests, the test scenario more accurately reflected real user behaviour and our throughput numbers became much more realistic.”
- “The k6 threshold configuration ensures that any deployment degrading p95 latency by more than 20% will fail the CI performance gate automatically.”
Communicating Results
When presenting performance test results to stakeholders, always contextualise numbers. “P99 is 480ms” is less useful than “p99 latency is 480ms against our SLO of 500ms — we have a 20ms margin, which we consider adequate for the current traffic projection but will need to revisit before the marketing campaign in Q3.”