Learn the vocabulary of merging many concurrent identical cache misses into a single call.
0 / 5 completed
1 / 5
At standup, a dev mentions many concurrent requests for the same cache key, all missing at once, being merged so only the first one actually triggers the expensive backend call while the rest simply wait for and share its result. What is this technique called?
Request coalescing, also called the single-flight pattern, merges many concurrent requests for the same cache key that all miss at once, so only the first one actually triggers the expensive backend call while every other concurrent request simply waits for and shares that same result. Cache eviction decides which already-cached entries to discard, which is a different concern from coalescing concurrent misses for the same key. This coalescing is what prevents a single expired cache entry from turning into a flood of redundant backend calls the instant many clients request it at once.
2 / 5
During a design review, the team wants a cache layer to track which keys currently have an in-flight backend call and have any new concurrent request for that same key attach to the existing call instead of starting a new one. Which capability supports this?
In-flight request tracking that deduplicates a concurrent miss for the same key lets the cache layer recognize an already-running backend call for that key and attach any new concurrent request to it, rather than starting a redundant second call for data that's already being fetched. Letting every concurrent miss independently trigger its own call defeats the purpose of coalescing entirely, since the backend would still see as many calls as there were concurrent requests. This in-flight tracking is the mechanism that actually makes request coalescing possible.
3 / 5
In a code review, a dev notices a cache-reading function has no mechanism at all for detecting that another concurrent call for the same key is already in progress, so each caller independently triggers its own backend fetch on a miss. What does this represent?
This is missing request coalescing, leaving the system exposed to many redundant concurrent backend calls for the same key, since without any in-flight tracking, every caller that misses the cache at roughly the same moment triggers its own independent fetch. A hot partition problem is an unrelated concept about partition-key distribution in a messaging system. Catching this gap in review matters because it becomes a serious problem specifically at the moment a popular cache entry expires and many requests miss simultaneously.
4 / 5
An incident report shows a database was overwhelmed by a sudden spike of thousands of identical queries the instant a popular cache entry expired, because every one of those concurrent requests independently missed the cache and hit the database with no coalescing in place. What practice would prevent this?
Implementing request coalescing ensures only the first concurrent miss for that key actually triggers a database query, with every other concurrent request for the same key simply waiting for and sharing that one result instead of triggering a redundant query of its own. Continuing to let every concurrent miss independently hit the database is exactly what overwhelmed the database the instant the popular cache entry expired in this incident. This coalescing is a standard, well-known defense against exactly this kind of expiration-triggered spike, sometimes discussed alongside the closely related cache stampede problem.
5 / 5
During a PR review, a teammate asks why the team adds request coalescing on top of an already reasonable cache TTL instead of just trusting the TTL alone to keep backend load manageable. What is the reasoning?
A cache TTL only controls how often a key expires, but says nothing about how many concurrent requests happen to arrive at the exact moment that expiration occurs, which for a popular key can easily be thousands at once. Request coalescing specifically protects against that scenario by ensuring only one of those concurrent requests actually triggers a backend call, regardless of how many arrived simultaneously. The tradeoff is the added implementation complexity of tracking in-flight requests per key, which a bare TTL-based cache doesn't need to handle at all.