AdvancedVocabulary#software-architecture#developer-tools#backend

Reservoir Sampling Vocabulary

Learn the vocabulary of selecting a fixed number of uniformly random items from a stream of unknown length.

0 / 5 completed

1 / 5

At standup, a dev mentions selecting a fixed number of random items from a stream of unknown, possibly enormous length, without ever storing the whole stream, and with every item having an exactly equal chance of ending up in the final selection. What is this technique called?

2 / 5

During a design review, the team picks reservoir sampling specifically so a random sample of a fixed size can be drawn from a stream whose total length isn't known in advance and is too large to store entirely in memory. Which capability does this provide?

3 / 5

In a code review, a dev notices a feature meant to sample a fixed number of random log lines from an unbounded log stream instead buffers the entire stream into memory first and then randomly selects from that fully buffered collection. What does this represent?

4 / 5

An incident report shows a log-sampling feature crashed with an out-of-memory error, because it buffered an entire, effectively unbounded log stream into memory before randomly selecting a fixed number of sample lines from that buffered collection. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team reaches for reservoir sampling instead of just recording the stream's total length first and then making a second pass to randomly pick items at that point, given that both approaches can produce a uniformly random sample. What is the reasoning?