Performance · English usage comparison
Latency vs Throughput: English Usage Guide for IT Professionals
Latency measures how long one request takes; throughput measures how many requests a system handles per second. A system can have low latency but low throughput (fast for one user, slow under load) or high throughput but high latency (processes many requests, each slowly). Both matter — but for different reasons.
Side-by-side comparison
| Aspect | Latency | Throughput |
|---|---|---|
| What it measures | Time for one operation (ms) | Operations per unit of time (req/s) |
| Analogy | How fast one car travels | How many cars pass a toll booth per hour |
| Optimised by | Caching, reducing hops, faster code | Parallelism, horizontal scaling, queues |
| Key metric form | p50, p95, p99 in milliseconds | Requests per second (RPS), TPS |
Example sentences
Latency
- "Our p99 latency is 450 ms — the slowest 1% of requests take nearly half a second."
- "Adding a Redis cache cut database latency from 80 ms to under 2 ms."
Throughput
- "After horizontal scaling, throughput increased from 500 to 3,000 requests per second."
- "The message queue increases throughput by letting workers process jobs in parallel."
Exercises: choose the correct English usage
Select the best answer for each question, then check your reasoning.
1. A user complains the page "takes forever to load". They are describing high ___.
Explanation: "Latency" — the delay experienced for a single request.
2. An engineer says "we need to handle 10,000 requests per second." They are talking about ___.
Explanation: "Throughput" — the volume of requests the system can handle per unit of time.
3. Which metric uses "p99"?
Explanation: "p99 latency" (99th-percentile latency) measures the worst-case response time excluding the top 1%.
4. Which sentence uses "throughput" correctly?
Explanation: Adding workers increases concurrency and therefore throughput (more work done per second).
5. True or false: reducing latency always increases throughput.
Explanation: They are related but independent. You can improve one without improving the other.
Frequently asked questions
What is the difference between latency and response time?
"Response time" usually means the total time from request to response, including processing. "Latency" often refers specifically to the network delay component, though in practice the terms are used interchangeably.
What does p99 mean?
"p99 latency" is the 99th-percentile latency — 99% of requests are faster than this value. It captures the tail experience without being skewed by occasional outliers.
How is throughput measured?
Requests per second (RPS), transactions per second (TPS), or messages per second (MPS), depending on the system.
Can high throughput coexist with high latency?
Yes — a batch-processing system might handle millions of records per hour (high throughput) while each record takes seconds to process (high latency). Streaming pipelines aim to reduce both.
What is "the latency-throughput trade-off"?
Batching requests increases throughput but adds latency (you wait for the batch to fill). Real-time systems prioritise low latency at the cost of lower throughput.
What is "end-to-end latency"?
The total delay from when a user action starts to when they see the result — including network, server processing, and rendering time.
What does "saturate" mean in this context?
"The system is saturated" means throughput has hit its limit — adding more requests only increases latency, it doesn't increase work done.
What is "bandwidth" and how does it differ?
Bandwidth is the maximum data transfer rate of a network connection (e.g. 1 Gbps). Throughput is the actual rate achieved. Latency is the delay regardless of bandwidth.
How do you say "the system is slow"?
Precisely: "latency is high", "p99 is above SLO", or "response times are degraded". Avoid vague "the system is slow" in technical discussions.
What is a "tail latency"?
Latency at the high percentiles (p99, p99.9). Even if average latency is low, tail latency affects the worst-experiencing users and can cascade in microservice calls.