Build fluency in the vocabulary of buffering writes in memory and flushing them as immutable sorted files merged later.
0 / 5 completed
1 / 5
A teammate explains that a storage engine buffers writes in an in-memory sorted structure, periodically flushes it to disk as an immutable sorted file, and later merges those files in the background, so writes stay sequential and fast even though reads may need to check several files. What data structure is being described?
A log-structured merge tree, or LSM tree, is exactly this: writes are buffered in an in-memory sorted structure called a memtable, periodically flushed to disk as an immutable sorted file, and a background compaction process merges those files over time, keeping writes sequential and fast at the cost of reads sometimes needing to check multiple files before a value is found. A DNS zone transfer is an unrelated concept about replicating name server records. This buffer-flush-merge approach is exactly why LSM trees underlie write-heavy storage engines like Cassandra, RocksDB, and LevelDB.
2 / 5
During a design review, the team chooses an LSM-tree-based storage engine for a write-heavy time-series database, specifically because writes only need to append to an in-memory buffer and a sequential flush file rather than performing random-access disk writes. Which capability does this provide?
An LSM-tree-based storage engine here provides high sustained write throughput, since writes are sequential appends to memory and flush files instead of random-access updates scattered across disk pages. A B-tree that updates individual pages in place performs random-access disk writes that become a bottleneck under a high sustained write rate, which is exactly the workload this time-series database needs to handle. This sequential-append-then-flush behavior is exactly why LSM trees are favored for write-heavy workloads.
3 / 5
In a code review, a dev notices a storage engine performs an in-place random-access disk write for every incoming data point in a write-heavy time-series workload, instead of buffering writes in memory and flushing them sequentially as an LSM tree would. What does this represent?
This is a missed LSM-tree opportunity, since buffering writes and flushing them sequentially would sustain far higher write throughput than in-place random-access writes. A cache eviction policy is an unrelated concept about discarded cache entries. This in-place-random-write pattern is exactly the kind of throughput ceiling a reviewer flags once the workload is confirmed to be write-heavy.
4 / 5
An incident report shows write throughput collapsed under load because every incoming data point triggered an in-place random-access disk write, and disk seek time dominated the write path once concurrent writers scaled up. What practice would prevent this?
Switching to an LSM-tree-based storage engine that buffers writes in memory and flushes them as sequential sorted files eliminates the per-write disk seek entirely. Continuing to perform an in-place random-access disk write for every data point regardless of how much seek time that adds under concurrent load is exactly what caused the throughput collapse described in this incident. This buffer-then-sequential-flush approach is the standard fix once random-access seeks are confirmed to be the write-path bottleneck.
5 / 5
During a PR review, a teammate asks why the team reaches for an LSM tree instead of a B-tree, given that a B-tree gives more predictable single-key read latency without needing to check multiple files. What is the reasoning?
An LSM tree trades some read amplification from checking multiple sorted files for much higher sustained write throughput, while a B-tree gives more predictable single-key reads but pays random-access write costs that limit sustained write throughput. This is exactly why LSM trees are favored for write-heavy workloads like time-series ingestion, while B-trees remain preferable when read latency predictability matters more than raw write throughput.