5 exercises — practice requests per second, transactions per second, throughput vs. latency, and what happens to each metric under scaling.
0 / 5 completed
Key throughput vocabulary
"Throughput = work per unit time (RPS, TPS, msg/s, MB/s) — measures system capacity."
"Horizontal scaling improves throughput but does NOT improve individual request latency."
"The knee of the curve: near capacity, small load increases cause large latency spikes."
"Consumer < producer = the queue grows indefinitely. Scale consumer or throttle producer."
"Throughput is only meaningful relative to requirements — always compare to your peak load + headroom."
1 / 5
What is the correct definition of "throughput" in a software system context?
Option B is the accurate, complete definition. Throughput = work done per unit time. Common units: (1) RPS (requests per second) — web APIs, (2) TPS (transactions per second) — databases, payment systems, (3) messages/second — message queues (Kafka, RabbitMQ), (4) MB/s or GB/s — storage, data pipelines. The key distinction from latency: throughput is aggregate system capacity, latency is individual operation speed. Option A defines latency, not throughput. Option C defines concurrent connection capacity, which is related but distinct from throughput. Option D defines availability/success rate (error rate), which is also a separate metric.
2 / 5
A payment system processes 500 TPS. A product manager asks "is that good?" Which answer is most technically accurate?
Option B is the complete contextual answer. The key principle: throughput numbers are only meaningful relative to requirements. The answer provides: (1) industry benchmarks (Visa ~24K TPS, PayPal ~450 TPS) — real context, (2) a method for self-evaluation (current volume vs. TPS capacity), (3) the right framing ("exceeds peak load with headroom") — not just "is X big?" but "is X enough for our needs?" Note: the Visa and PayPal numbers are approximate public figures and should be verified. Option A makes a universal positive claim without context. Option C defers without providing any useful guidance. Option D invents a standard that doesn't exist.
3 / 5
Your system's latency is 10ms (p99) and throughput is 500 RPS. If you horizontally scale to 5 servers with a load balancer, what happens to each metric?
Option B correctly describes horizontal scaling's effect on throughput vs. latency. This is one of the most important concepts in distributed systems: (1) Horizontal scaling improves throughput — 5x servers = ~5x request handling capacity, (2) Horizontal scaling does NOT improve individual request latency — each request still takes 10ms to process (on one server), (3) Queueing latency may improve: if latency was high due to requests waiting in queue (load > capacity), more capacity reduces queue wait time, (4) Load balancer adds overhead: typically 0.1-1ms additional hop. This distinction is why "just add more servers" solves capacity (throughput) problems but not algorithmic (latency) problems. If p99 is 10ms due to database queries, 10 servers still have 10ms queries.
4 / 5
What is the relationship between throughput and latency under increasing load?
Option B describes the "knee of the curve" relationship — one of the most fundamental concepts in capacity planning. The three phases: (1) Below capacity: throughput scales with load; latency is stable (requests processed as fast as they arrive), (2) Near capacity (the "knee"): throughput still increasing but latency begins to rise as processing queues start forming, (3) At/beyond capacity: throughput plateaus (system is at maximum), latency spikes as queue depth grows unbounded. This is why "90% utilisation" is dangerous: at 90% capacity, small traffic spikes (10% more) can cause latency to double or triple as queuing kicks in. Option A is wrong — they are deeply linked under load. Option C has it backwards — higher throughput (meaning more load) typically causes higher latency once past the knee. Option D is wrong for the below-capacity region.
5 / 5
A Kafka consumer processes 50,000 messages per second. The producer generates 80,000 messages per second. What phrase best describes this situation?
Option B correctly identifies a throughput imbalance. The key concept: sustainable throughput requires consumer rate ≥ producer rate. When consumer < producer: (1) the queue grows at (producer_rate - consumer_rate) = 30,000 msg/s, (2) at this rate, a 1-hour backlog accumulates: 30,000 × 3,600 = 108 million messages queued, (3) resolution requires either: increasing consumer throughput (horizontal scaling of consumer group, optimising per-message processing) or reducing producer throughput (rate limiting, backpressure). Option D is technically accurate (Kafka buffers) but misses the sustainability problem — buffering is for temporary spikes, not sustained throughput gaps. Option A focuses on absolute number without the relative comparison that matters. Option C suggests hardware without diagnosing whether the bottleneck is CPU, I/O, or algorithmic.