Learn the vocabulary of a storage engine physically writing far more data than a logical write actually requested.
0 / 5 completed
1 / 5
At standup, a dev mentions that a storage engine ends up physically writing far more data to disk than the size of the logical write a client actually requested, because of extra work like compaction or metadata updates triggered by that write. What is this phenomenon called?
Write amplification is exactly this: a storage engine ends up physically writing far more data to disk than the size of the logical write a client requested, because that write triggers extra work like background compaction, rewriting adjacent data, or updating metadata structures. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This logical-write-triggers-extra-physical-writes phenomenon is exactly why write-heavy storage engines, especially log-structured ones, closely monitor and try to minimize their write-amplification factor.
2 / 5
During a design review, the team tunes a log-structured storage engine's compaction settings specifically to reduce how much extra physical data gets rewritten to disk for every logical write the engine accepts. Which capability does this tuning target?
Tuning compaction settings here targets reducing write amplification, since compaction is one of the main sources of extra physical rewriting triggered by logical writes, and a less aggressive or better-scheduled compaction strategy can directly cut down on how many extra bytes get rewritten to disk, and correspondingly reduce disk I/O and hardware wear, for the same amount of logical write traffic. Compaction settings having no effect at all on write amplification would contradict exactly why storage engines expose these settings for tuning in the first place. This direct link between compaction behavior and total physical writes is exactly why write amplification is a key metric operators watch.
3 / 5
In a code review, a dev notices a storage engine's default compaction strategy aggressively rewrites large amounts of adjacent data on every small logical write, with no consideration given to how much extra physical disk I/O that generates. What does this represent?
This is a missed opportunity to reduce write amplification, since an aggressive default compaction strategy that rewrites large amounts of adjacent data for every small logical write generates disproportionate extra physical disk I/O relative to the actual size of the write, when a less aggressive or better-tuned strategy could cut much of that extra rewriting. A cache eviction policy is an unrelated concept about discarded cache entries. This ignore-the-extra-I/O-cost pattern is exactly the kind of avoidable overhead a reviewer flags once write amplification is measured and found to be high.
4 / 5
An incident report shows a storage engine's disks wore out and needed replacement far sooner than expected, because its aggressive default compaction strategy rewrote large amounts of adjacent data for every small logical write, driving physical disk I/O far above what the logical write volume alone would suggest. What practice would prevent this?
Tuning the compaction strategy to reduce write amplification directly cuts down the extra physical rewriting triggered by each small logical write, which lowers total disk I/O and correspondingly extends disk lifespan, exactly addressing the premature wear described in this incident. Continuing to use the aggressive default strategy regardless of the extra I/O it generates is exactly what drove disk wear far above what the logical write volume alone would suggest. This compaction-tuning approach is the standard fix once write amplification is identified as the driver of excessive physical disk I/O in a storage engine.
5 / 5
During a PR review, a teammate asks why the team accepts some residual write amplification instead of tuning compaction all the way down to zero extra rewriting. What is the reasoning?
Compaction's extra rewriting also serves the purpose of reclaiming disk space from deleted or overwritten data and keeping the on-disk layout organized for efficient reads, so tuning it all the way down to zero extra rewriting would eliminate those benefits too, trading lower write amplification for wasted space and slower or more fragmented reads over time. This is exactly why storage engines accept some residual write amplification as the cost of keeping space reclamation and read performance healthy, tuning compaction to balance all three concerns rather than minimizing write amplification alone.