increase / boost / scale throughput, in requests per second
reduce round-trip time (RTT); collapse round trips
keep work off the hot path
latency = time per request; throughput = requests per time
0 / 5 completed
1 / 5
A platform engineer writes: "After adding the cache, we managed to ___ p99 latency from 800ms to 120ms." Which verb is the standard collocation for bringing latency down?
Reduce latency — the canonical performance collocation:
Latency is the time a single request takes to travel and be processed. The verbs that pair with it are precise:
reduce / cut / lower / shave latency — bring the number down ("we shaved 40ms off the request path")
increase / add latency — make it worse ("the extra hop added 20ms of latency")
measure / track / observe latency — watch it
Tail latency refers to the slowest requests — the long "tail" of the distribution. It is reported as percentiles:
p50 (median), p95, p99, p99.9 — "the p99 is 500ms" means 99% of requests are faster than 500ms
"We optimise for tail latency, because the slowest 1% of requests hurt user experience most."
Why not the distractors? "Shorten" collocates with response time or duration but sounds wrong with the abstract noun latency. "Downgrade" and "minify" belong to entirely different domains (versions/quality and code/assets).
2 / 5
A load-testing report states: "At peak load the service handled 12,000 ___ — well above our target." Which phrase is the standard unit of throughput?
Requests per second (RPS) — the core throughput metric:
Throughput is how much work a system completes per unit of time — the rate, as opposed to latency (the time per request). The two are distinct: you can have low latency and low throughput, or high throughput with high latency.
Common units: requests per second (RPS), queries per second (QPS), transactions per second (TPS), messages/sec
"The endpoint sustained 12k RPS" — held that rate steadily
Contrast the distractors:
Round-trip time (RTT) — the latency of one request going out and the response coming back; a duration, not a rate
Hot path — the code path executed most frequently or most critical to performance ("keep allocations off the hot path")
Tail latency — the p95/p99 slow requests; also a duration
Remember: latency = time per request; throughput = requests per time. Improving one does not automatically improve the other.
3 / 5
A network engineer explains a delay: "Most of the slowness is ___ — the packet has to cross the Atlantic and back for every call." Which term names that out-and-back travel time?
Round-trip time (RTT) — the out-and-back latency:
Round-trip time is the duration for a signal to travel from sender to receiver and the acknowledgement to come back. It dominates latency over long network distances and on chatty protocols.
"Transatlantic RTT is ~70-90ms; you can't beat the speed of light."
"Each extra round trip in the TLS handshake adds one RTT of delay."
Collocations: reduce / measure / add a round trip; "collapse multiple round trips into one"
Why it matters for design: a hot path (the most-executed code route) that makes several sequential network calls pays the RTT cost repeatedly. Engineers reduce round trips by batching, pipelining, or co-locating services.
Distractors:throughput is a rate (RPS); cardinality is the number of distinct values a metric label can take (a metrics-storage concern); saturation means a resource is fully utilised. None describe per-request travel time.
4 / 5
During an optimisation review someone says: "This allocation runs on the ___, so even a tiny inefficiency is multiplied millions of times." Which term describes the most frequently executed code route?
Hot path — the performance-critical, high-frequency route:
The hot path (or hot loop) is the part of the code executed most often or most latency-sensitively — the inner loop of a request handler, a tight serialization loop, etc. Optimising it yields the biggest wins.
"Keep allocations and logging off the hot path."
"The serializer is on the hot path for every API call."
Related: hot spot (a place a profiler shows burning CPU), hot loop
Contrast the distractors carefully — these are commonly confused:
Critical section — a region protected by a lock so only one thread enters at a time; about concurrency safety, not frequency
Happy path — the scenario where everything succeeds and no errors occur; about control flow, not performance
Hot vs. cold is about execution frequency; happy vs. unhappy is about error state; critical section is about mutual exclusion. Choosing the right one signals real engineering fluency.
5 / 5
A capacity planner reports: "We need to ___ throughput to handle Black Friday — current limit is 8k RPS, we expect 20k." Which verb best expresses raising the system's rate of work?
Increase throughput — the standard scaling collocation:
When a system must handle more work per second, engineers talk about increasing throughput. The verb set is direction-aware:
increase / raise / boost / improve / scale up throughput — handle more RPS
cap / limit / throttle / rate-limit throughput — deliberately restrict it
"We scaled out to 12 replicas to increase throughput to 20k RPS."
Throughput vs. latency in scaling talk:
Adding replicas usually increases throughput (more parallel work) but does not reduce per-request latency.
"We hit a ceiling at 8k RPS — the database became the bottleneck."
Why not the distractors? "Prolong" and "deepen" simply do not collocate with the rate noun throughput. "Tighten" implies reducing or constraining — the wrong direction when you need to serve 20k requests per second. Pairing the right directional verb with throughput, latency, and RPS is a hallmark of native engineering English.