Can high throughput coexist with high latency?

Yes — a batch-processing system might handle millions of records per hour (high throughput) while each record takes seconds to process (high latency). Streaming pipelines aim to reduce both.

What is "the latency-throughput trade-off"?

Batching requests increases throughput but adds latency (you wait for the batch to fill). Real-time systems prioritise low latency at the cost of lower throughput.

What is "end-to-end latency"?

The total delay from when a user action starts to when they see the result — including network, server processing, and rendering time.

What does "saturate" mean in this context?

"The system is saturated" means throughput has hit its limit — adding more requests only increases latency, it doesn't increase work done.

What is "bandwidth" and how does it differ?

Bandwidth is the maximum data transfer rate of a network connection (e.g. 1 Gbps). Throughput is the actual rate achieved. Latency is the delay regardless of bandwidth.

How do you say "the system is slow"?

Precisely: "latency is high", "p99 is above SLO", or "response times are degraded". Avoid vague "the system is slow" in technical discussions.

What is a "tail latency"?

Latency at the high percentiles (p99, p99.9). Even if average latency is low, tail latency affects the worst-experiencing users and can cascade in microservice calls.

Performance · English usage comparison

Latency vs Throughput: English Usage Guide for IT Professionals

Latency measures how long one request takes; throughput measures how many requests a system handles per second. A system can have low latency but low throughput (fast for one user, slow under load) or high throughput but high latency (processes many requests, each slowly). Both matter — but for different reasons.

Side-by-side comparison

Aspect	Latency	Throughput
What it measures	Time for one operation (ms)	Operations per unit of time (req/s)
Analogy	How fast one car travels	How many cars pass a toll booth per hour
Optimised by	Caching, reducing hops, faster code	Parallelism, horizontal scaling, queues
Key metric form	p50, p95, p99 in milliseconds	Requests per second (RPS), TPS

Example sentences

Latency

"Our p99 latency is 450 ms — the slowest 1% of requests take nearly half a second."
"Adding a Redis cache cut database latency from 80 ms to under 2 ms."

Throughput

"After horizontal scaling, throughput increased from 500 to 3,000 requests per second."
"The message queue increases throughput by letting workers process jobs in parallel."

Exercises: choose the correct English usage

Select the best answer for each question, then check your reasoning.

1. A user complains the page "takes forever to load". They are describing high ___.

2. An engineer says "we need to handle 10,000 requests per second." They are talking about ___.

3. Which metric uses "p99"?

4. Which sentence uses "throughput" correctly?

5. True or false: reducing latency always increases throughput.

Frequently asked questions

What is the difference between latency and response time?

"Response time" usually means the total time from request to response, including processing. "Latency" often refers specifically to the network delay component, though in practice the terms are used interchangeably.

What does p99 mean?

"p99 latency" is the 99th-percentile latency — 99% of requests are faster than this value. It captures the tail experience without being skewed by occasional outliers.

How is throughput measured?

Requests per second (RPS), transactions per second (TPS), or messages per second (MPS), depending on the system.

Show more questions (7)