Learn the vocabulary of how a load balancer decides which backend instance handles the next request.
0 / 5 completed
1 / 5
At standup, a dev mentions a load balancer cycling through backend instances one after another in a fixed rotation, sending each new request to the next instance in the list regardless of how busy any of them currently are. What is this algorithm called?
Round-robin load balancing cycles through backend instances in a fixed rotation, sending each new request to the next instance on the list regardless of how much work that instance is currently handling. Sticky sessions instead pin a specific client's requests to one particular instance for session-affinity reasons, an entirely different concern from how new, unrelated requests get distributed in the first place. This simple, order-based rotation is easy to implement but assumes every request costs roughly the same amount of work, which isn't always true.
2 / 5
During a design review, the team picks least-connections load balancing instead of round-robin, specifically because some requests take far longer to process than others and the team wants new requests routed to whichever instance currently has the fewest active connections. Which capability does this algorithm provide?
Least-connections load balancing provides accounting for uneven request processing times by routing new work toward instances that are actually less busy right now, since it tracks each instance's current active connection count and sends the next request to whichever instance has the fewest, rather than blindly following a fixed rotation order. Round-robin, by contrast, assumes every request is roughly equal in cost and can overload an instance that happens to be stuck processing several slow requests while a truly idle instance sits waiting further down the rotation. This load-awareness is exactly why least-connections is preferred whenever request processing times vary significantly.
3 / 5
In a code review, a dev notices a load balancer configured with round-robin distribution serving a workload where certain request types take ten times longer to process than others, and one backend instance keeps ending up overloaded while others sit comparatively idle. What does this represent?
This is a load-balancing algorithm mismatch, since round-robin's even, order-based distribution treats every request as roughly equal cost, and a workload with request types differing tenfold in processing time will inevitably concentrate expensive requests onto whichever instance's turn happens to come up, overloading it while others sit comparatively idle. A cache eviction policy is an unrelated concept about discarded cache entries. This is exactly the scenario where switching to a load-aware algorithm like least-connections, or weighted routing based on actual response time, would distribute work far more evenly than a fixed rotation ever could.
4 / 5
An incident report shows one backend instance repeatedly became overloaded and timed out under normal traffic volume, while sibling instances remained comfortably under capacity, because round-robin distribution kept sending it a disproportionate share of the workload's especially expensive request type by pure chance of rotation order. What practice would prevent this?
Switching to a load-aware algorithm, such as least-connections or response-time-weighted routing, accounts for how busy each instance genuinely is at the moment a new request arrives, instead of following a rotation order blind to actual load, which directly addresses the repeated overload described in this incident. Continuing to rely on round-robin's fixed order with no load awareness is exactly what let one instance keep absorbing a disproportionate share of expensive requests by chance. This load-aware routing is the standard fix once an uneven workload has made a simple rotation-based algorithm inadequate.
5 / 5
During a PR review, a teammate asks why the team switches to least-connections load balancing instead of just staying with round-robin, given that round-robin is simpler to reason about and implement. What is the reasoning?
Round-robin assumes every request costs roughly the same to process, which holds up fine for a genuinely uniform workload but breaks down the moment request costs vary significantly, since it can end up funneling several expensive requests onto the same instance purely by the luck of rotation order. Least-connections instead actively tracks how busy each instance is right now and routes new work toward whichever one has the fewest active connections, adapting to real load rather than following a fixed sequence. The tradeoff is the small added overhead of tracking and comparing connection counts across instances, which is a modest cost against the meaningfully better load distribution it provides for an uneven workload.