Practice the vocabulary of merging on-disk segments to keep a log-structured merge-tree's reads fast.
0 / 5 completed
1 / 5
At standup, a dev mentions a background process that merges and rewrites sorted on-disk segments in a log-structured merge-tree, reclaiming space from an overwritten or deleted key. What is this process called?
LSM-tree compaction is a background process that merges and rewrites sorted on-disk segments, reclaiming space from an overwritten or deleted key and keeping the number of segments a read has to check bounded. Appending every new write as a brand-new segment forever, with no merging, would let stale, overwritten data accumulate indefinitely on disk. This compaction is what keeps a log-structured merge-tree's storage usage and read performance sustainable over time.
2 / 5
During a design review, the team accepts that compaction rewrites the same data to disk multiple times over its lifetime, in exchange for keeping a read fast. Which tradeoff does this represent?
Write amplification traded for bounded read latency describes how compaction rewrites the same underlying data to disk multiple times over its lifetime, in exchange for keeping the number of segments a read must check under control. A system with zero write amplification would mean no compaction ever ran, letting an unbounded number of segments accumulate and slow reads down over time. This tradeoff is a deliberate, well-understood design choice at the core of how an LSM-tree balances write and read performance.
3 / 5
In a code review, a dev notices the team chose a leveled compaction strategy over a size-tiered one, accepting more write amplification in exchange for lower space amplification and more predictable read latency. What does this represent?
A compaction strategy choice, like leveled versus size-tiered, trades write amplification differently against read and space amplification, since each strategy organizes and merges on-disk segments in a structurally different way. Assuming there's a single universal strategy with no tradeoff ignores that a real system has to pick the balance that best fits its actual read-versus-write workload. This strategy choice is a meaningful, workload-specific tuning decision for any system built on an LSM-tree.
4 / 5
An incident report shows read latency degraded significantly under heavy write load because compaction had fallen behind, letting a large number of small on-disk segments accumulate that every read had to check individually. What practice would prevent this?
Ensuring compaction keeps pace with the write rate bounds the number of on-disk segments a read has to check, keeping read latency predictable even under sustained write pressure. Letting compaction fall arbitrarily behind lets segments accumulate without bound, exactly as this incident describes, directly hurting read performance. This is why monitoring and provisioning enough resources for compaction to keep pace is essential to an LSM-tree-based system's health.
5 / 5
During a PR review, a teammate asks why the team accepts the extra write cost of ongoing compaction instead of simply appending every write as a new segment forever with no merging. What is the reasoning?
Appending every write as a new segment forever, with no compaction, lets the number of segments a read must check grow unbounded as the system accumulates writes over time. Compaction bounds that number at the cost of rewriting the same data to disk multiple times over its lifetime. The tradeoff is real, ongoing write amplification, which is why choosing an appropriate compaction strategy for the workload matters so much.