Cover the failure modes: bloat, wraparound, thrashing, over-indexing
0 / 5 completed
1 / 5
The interviewer asks: "Compare LSM-trees and B-trees for a write-heavy workload — explain write amplification, read amplification, and space amplification for each." Which answer best covers the trade-off analysis?
Option B gives the full RUM-conjecture analysis: B-tree write amplification (WAF 2–10×, random I/O pattern), LSM write amplification mechanics (memtable → L0 → levelled compaction, WAF 10–30× with real numbers), LSM read amplification (multiple SSTable checks, Bloom filter mitigation), space amplification during compaction, and practical tuning knobs (RocksDB parameters). It also gives the key insight about which workloads favour each. Options C and D each identify one aspect but don't give the full amplification picture across all three dimensions (read/write/space).
2 / 5
The interviewer asks: "Explain how MVCC (Multi-Version Concurrency Control) works in PostgreSQL — how are versions stored, how are they cleaned up, and what can go wrong?" Which answer demonstrates the deepest understanding?
Option B covers every layer: tuple format (xmin/xmax), snapshot visibility rule with exact logic, MVCC storage bloat mechanics, the difference between VACUUM and VACUUM FULL, XID wraparound (32-bit limit, 2.1B transactions, VACUUM FREEZE as the solution, and the emergency shutdown consequence), and the long-transaction + bloat interaction. Option C states the xmin/xmax fact but doesn't explain the snapshot rule, wraparound risk, or tuning implications. Options A and D are correct but surface-level.
3 / 5
The interviewer asks: "Explain the WAL (Write-Ahead Log) recovery process in a crash scenario — what are the ARIES phases and how does PostgreSQL implement them?" Which answer best covers crash recovery?
Option B is the complete answer: ARIES phases with the nuance that redo replays ALL transactions (not just committed ones — a common misconception), WAL record structure with LSN as a 64-bit pointer, checkpoint mechanics with the specific configuration knobs, PITR with WAL archiving, WAL level modes, and full-page writes (explaining why they're needed — torn page prevention). Options C and D each name 2-3 correct concepts but miss the ARIES redo-all-transactions nuance, full-page writes, and PITR mechanics.
4 / 5
The interviewer asks: "How does a database buffer pool work, and what are the key eviction policy trade-offs between LRU and clock-sweep?" Which answer best covers buffer pool internals?
Option B covers the full picture: buffer pool structure with the page table hash map and descriptor fields (pin count, dirty flag, usage count, LSN), LRU mechanics and its sequential scan thrashing failure mode, clock-sweep with exact PostgreSQL usage counter values (max 5), the sequential scan ring buffer optimisation (a key PostgreSQL-specific detail rarely mentioned), ARC as an alternative with its use in ZFS/Oracle, and pinning semantics and their impact on effective pool size. Options C and D mention the right algorithm names but don't explain the sequential scan ring buffer, the eviction counter mechanics, or ARC.
5 / 5
The interviewer asks: "Design an index advisor for a relational database — what signals would you collect, how would you recommend indexes, and how do you avoid over-indexing?" Which answer demonstrates the most complete design?
Option B covers all five design dimensions: signal collection with specific PostgreSQL views and the metrics they provide, candidate index generation with the column ordering rule (equality first, range last), cost estimation using `hypopg` for hypothetical evaluation, over-indexing prevention with specific thresholds (zero scan detection, 30-day window, 5-index limit, partial indexes), and output format with write overhead reporting and human approval gates for large tables. Options C and D each identify 1-2 of the five dimensions but none cover column ordering rules, write overhead estimation, or the over-indexing prevention mechanisms.