Learn the vocabulary of estimating item frequencies in a huge stream with a small, fixed-size hashed counter grid.
0 / 5 completed
1 / 5
At standup, a dev mentions estimating how many times each item has appeared in a huge stream using a small, fixed-size grid of counters updated by several hash functions, trading a small, one-directional overestimation error for using far less memory than an exact per-item counter. What is this structure called?
Count-min sketch is exactly this: a count-min sketch estimates how many times each item has appeared in a huge stream by hashing every item with several independent hash functions into a small, fixed-size grid of counters, incrementing one counter per row on every occurrence, and estimating an item's count as the minimum value across its hashed counters, which can only ever overestimate, never underestimate, the true count. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This fixed-size, several-hash-functions approach is exactly why a count-min sketch can approximate frequency counts for an enormous stream using far less memory than tracking an exact counter per distinct item.
2 / 5
During a design review, the team uses a count-min sketch to track approximate item frequencies in a massive, high-cardinality stream, specifically because a fixed-size grid of counters uses far less memory than an exact counter per distinct item, at the cost of a small, always-one-directional overestimation error. Which capability does this provide?
Count-min sketch here provides Memory-bounded frequency estimation regardless of how many distinct items appear, since the counter grid's size stays fixed no matter how many distinct items appear in the stream, unlike an exact per-item counter map, which grows in proportion to the number of distinct items, trading a small, always-one-directional overestimation error for that bounded memory footprint. Tracking one exact counter per distinct item costs memory that grows without bound as the number of distinct items in the stream grows. This bounded-memory, small-overestimation-error tradeoff is exactly why a count-min sketch is favored for approximate frequency tracking over an enormous, high-cardinality stream.
3 / 5
In a code review, a dev notices a frequency-tracking feature over a massive, high-cardinality stream maintains one exact counter per distinct item ever seen, letting its memory usage grow without bound, instead of using a fixed-size probabilistic counter grid. What does this represent?
This is a missed count-min-sketch opportunity, since using a count-min sketch's fixed-size grid of hashed counters would bound memory usage regardless of how many distinct items appear, at the cost of only a small, one-directional overestimation error. A cache eviction policy is an unrelated concept about discarded cache entries. This unbounded-memory-growth pattern is exactly the kind of risk a reviewer flags once the stream's item cardinality is confirmed to be extremely high.
4 / 5
An incident report shows a frequency-tracking service's memory usage grew without bound and eventually exhausted available memory, because it maintained one exact counter per distinct item over a massive, high-cardinality stream instead of using a fixed-size probabilistic structure. What practice would prevent this?
Switching to a count-min sketch's fixed-size hashed-counter grid removes the unbounded memory growth entirely. Continuing to maintain one exact counter per distinct item over a high-cardinality stream regardless of how many distinct items actually appear in the stream is exactly what caused the issue described in this incident. This count-min-sketch approach is the standard fix once the stream's item cardinality is confirmed high enough to make exact per-item counting impractical.
5 / 5
During a PR review, a teammate asks why the team reaches for a count-min sketch instead of an exact hash map of counters, given that an exact hash map gives perfectly accurate counts with no estimation error at all. What is the reasoning?
A count-min sketch bounds memory usage to a small, fixed footprint regardless of how many distinct items appear, at the cost of a small, always-one-directional overestimation error, while an exact hash map of counters gives perfectly accurate counts but its memory usage grows without bound as distinct items accumulate. This is exactly why a count-min sketch is favored for a massive, high-cardinality stream, while an exact hash map remains preferable when perfect accuracy is required and cardinality stays manageable.