Learn the vocabulary of two different keys hashing to the same bucket in a hash table.
0 / 5 completed
1 / 5
At standup, a dev mentions two different keys hashing to the exact same bucket index in a hash table, requiring the table to store both under that one slot instead of overwriting one with the other. What is this situation called?
A hash collision is exactly two different keys hashing to the same bucket index, which a well-designed hash table handles gracefully, typically by chaining both entries in a small list at that bucket or by probing to a nearby open slot, rather than one key silently overwriting the other. A cache eviction is an unrelated concept about discarding an existing cached entry to make room, not about two keys sharing a hash. Collisions are an expected, unavoidable consequence of mapping a much larger space of possible keys down to a much smaller number of buckets.
2 / 5
During a design review, the team picks a hash function specifically chosen to spread keys evenly across buckets, minimizing how often two keys collide for the kind of data the table will actually store. Which capability does this choice support?
Choosing a hash function that spreads keys evenly supports keeping average lookup time close to constant, since a low collision rate means most buckets hold only one or two entries, so finding a specific key rarely requires scanning through a long chain. No realistic hash function can guarantee zero collisions ever, since mapping a much larger key space into a fixed number of buckets makes some collision mathematically inevitable once enough keys are inserted. This is why hash function quality is judged by how evenly it distributes keys in practice, not by claiming to eliminate collisions altogether.
3 / 5
In a code review, a dev notices a hash table's load factor, the ratio of stored entries to available buckets, has grown far past the recommended threshold with no resizing triggered. What does this represent?
This is a hash table at risk of degraded performance, since a high load factor directly means more collisions on average and longer chains at each affected bucket, pushing lookup and insertion time away from constant and toward scanning an increasingly long list. A deadlock is an unrelated concurrency concept about blocked threads. This is exactly why hash table implementations typically resize, roughly doubling their bucket count, once the load factor crosses a set threshold, trading a one-time rehashing cost for restored average-case performance.
4 / 5
An incident report shows a service's lookup times degraded sharply under heavy traffic because a hash table kept growing its entry count with no automatic resizing, pushing its load factor far past a healthy range and lengthening the collision chains at nearly every bucket. What practice would prevent this?
Triggering an automatic resize, roughly doubling the bucket count and rehashing the existing entries across the larger table, restores the load factor to a healthy range and shortens collision chains back toward constant-time lookups, which is exactly the safeguard missing in this incident. Continuing to let entries accumulate with no resizing is precisely what let lookup times degrade sharply under heavy traffic. This automatic resizing on a load-factor threshold is a standard, built-in behavior of most production hash table implementations, though it does introduce a temporary cost spike during the rehashing itself.
5 / 5
During a PR review, a teammate asks why the team accepts an occasional collision-handling cost, like chaining or probing, instead of designing a hash function guaranteed to never collide for any possible set of keys. What is the reasoning?
Mapping a much larger space of possible keys down to a fixed, much smaller number of buckets makes some collision mathematically unavoidable once the number of stored keys exceeds the number of buckets, by the pigeonhole principle, regardless of how cleverly the hash function is designed. The practical, achievable goal is therefore minimizing how often collisions occur and how long any single chain grows, not eliminating collisions outright. The tradeoff is between a hash function's computation cost and how evenly it distributes keys, since a slightly more expensive hash function that spreads keys better can still be a net win if it keeps chains short enough to preserve near-constant-time lookups.