AdvancedVocabulary#software-architecture#developer-tools#backend

HyperLogLog Vocabulary

Build fluency in the vocabulary of estimating a distinct count over a huge stream with a small, fixed memory footprint.

0 / 5 completed

1 / 5

At standup, a dev mentions estimating the number of distinct items in a huge stream using only a small, fixed amount of memory, by hashing each item and tracking the longest run of leading zero bits observed across many hashed buckets. What is this structure called?

2 / 5

During a design review, the team uses HyperLogLog to estimate the number of distinct visitors to a high-traffic service, specifically because tracking leading-zero-bit runs across a fixed number of buckets uses only a small, constant amount of memory, unlike storing every distinct visitor identifier seen. Which capability does this provide?

3 / 5

In a code review, a dev notices a distinct-visitor-counting feature over a high-traffic service stores an exact set of every visitor identifier ever seen, letting its memory usage grow in direct proportion to the true distinct count, instead of using a fixed-memory probabilistic estimator. What does this represent?

4 / 5

An incident report shows a distinct-visitor-counting service's memory usage grew without bound and became unsustainable, because it stored an exact set of every visitor identifier ever seen instead of using a fixed-memory probabilistic estimator. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team reaches for HyperLogLog instead of an exact hash set of every distinct item, given that an exact hash set gives a perfectly accurate distinct count with no estimation error at all. What is the reasoning?