Senior Distributed Systems Engineer Interview Questions
5 exercises — choose the best-structured answer to common Senior Distributed Systems Engineer interview questions. Focus on Paxos and Raft consensus comparison, CRDTs and eventual consistency, linearizability vs serializability, two-phase commit and its failure modes, and CAP theorem practical application.
Structure for Senior Distributed Systems Engineer interview answers
Name the property precisely: distinguish linearizability, serializability, sequential consistency
Explain failure modes: what happens when nodes crash, partition occurs, or messages are delayed
State the trade-off: consistency vs availability vs performance latency
0 / 5 completed
1 / 5
"Compare Paxos and Raft — why was Raft designed as an alternative?"
Option B is best because it explains the single-decree vs Multi-Paxos gap, names Raft's three sub-problems with their mechanisms (randomised timeouts, majority acknowledgement, election restriction), explains the strong-leader design and its throughput trade-off, contrasts EPaxos as an alternative, and lists real-world adopters. Options A, C, and D correctly note the understandability motivation but none explains the Paxos completeness gap, Raft's specific sub-problems, or the strong-leader throughput trade-off.
2 / 5
"What are CRDTs and when would you use them over consensus-based approaches?"
Option B is best because it defines both CRDT variants (state-based CvRDTs with join-semilattice and operation-based CmRDTs), names concrete CRDT types (G-Counter, OR-Set, RGA), gives three specific decision criteria for choosing CRDTs over consensus including the CAP positioning, explicitly identifies the limitation (invariants like non-negative balance), and names real systems that use them. Options A, C, and D give correct high-level descriptions but none explains the mathematical properties, names CRDT types, or articulates the bank-balance limitation.
3 / 5
"Explain the difference between linearizability and serializability with examples."
Option B is best because it clearly separates the abstraction levels (single-object vs multi-object transactions), gives a concrete read-after-write linearizability example, gives a concrete multi-account transaction serializability example, defines strict serializability as the combination, and names real systems that implement each (etcd for linearizability, Spanner and FoundationDB for strict serializability). Options A, C, and D correctly state the high-level distinction but none provides the concrete examples, defines strict serializability, or names real implementations.
4 / 5
"How does two-phase commit work and what are its failure modes?"
Option B is best because it names the exact phases with precise message vocabulary (PREPARE, VOTE-COMMIT, VOTE-ABORT, COMMIT/ABORT), explains why each failure mode occurs (participants cannot unilaterally decide post-VOTE-COMMIT because the coordinator's decision is the authoritative record), covers three distinct failure scenarios, explains 3PC as a mitigation and its own limitations, and names PostgreSQL as a real implementation. Options A, C, and D correctly identify the basic protocol and the blocking failure but none explains why it blocks mechanically, covers all three failure scenarios, or discusses 3PC.
5 / 5
"How do you apply the CAP theorem to a practical system design decision?"
Option B is best because it explains why partition tolerance is non-optional (partitions are inevitable), gives a detailed concrete example (e-commerce inventory) with both CP and AP implementations naming specific databases (etcd, Cassandra, DynamoDB), articulates the real business consequences of each choice (downtime vs overselling), provides a three-step decision process, and introduces PACELC as the extension that covers latency trade-offs outside partition scenarios. Options A, C, and D state the CP/AP choice correctly but none provides the concrete inventory example, names specific databases, explains the PACELC extension, or shows a structured decision process.