Question 1

What does the CAP theorem state and how do engineers use it in design discussions?

Accepted Answer

The CAP theorem states a distributed system can guarantee at most two of Consistency, Availability, and Partition Tolerance simultaneously. Engineers use it to frame trade-off decisions, describing systems as CP (consistent under partitions) or AP (always available but may serve stale data).

Question 2

How do engineers explain the Raft consensus algorithm in technical discussions?

Accepted Answer

Raft uses leader election, log replication, and term numbers to achieve consensus. Engineers describe it as a leader being elected each term by a majority vote, with all writes going through the leader and being replicated to a quorum before committing.

Question 3

What is split-brain in distributed systems and how is it communicated?

Accepted Answer

Split-brain occurs when a network partition causes two cluster parts to each believe they are the active primary, leading to conflicting writes. Engineers prevent it using quorum-based fencing and describe resolution using terms like STONITH for forcibly evicting a node.

Question 4

What is linearisability and how does it differ from eventual consistency?

Accepted Answer

Linearisability guarantees operations appear to take effect instantaneously, making a distributed system behave like a single machine. Eventual consistency only guarantees replicas will eventually converge, allowing stale reads. Engineers describe linearisability as the strongest consistency model with a latency and availability cost.

Question 5

How do engineers discuss distributed locks and their failure modes?

Accepted Answer

Distributed locks coordinate exclusive access across processes using systems like Redis or ZooKeeper. Failure modes include lock expiry during slow processing and partition-caused lease renewal failures. Fencing tokens — monotonically increasing numbers — detect and reject stale lock holders.

Question 6

What is two-phase commit (2PC) and what are its limitations?

Accepted Answer

Two-phase commit has a coordinator ask participants to prepare, then commit or abort based on votes. Its key limitation is the blocking problem — if the coordinator fails after prepare, participants hold locks indefinitely, unable to progress without manual intervention.

Question 7

What does quorum mean in distributed consensus and how is it calculated?

Accepted Answer

A quorum is the minimum number of nodes that must agree for an operation to succeed, typically a simple majority. Any two quorums overlap by at least one node, guaranteeing a committed decision is visible to any subsequent quorum read.

Question 8

How do engineers explain Lamport timestamps and vector clocks?

Accepted Answer

Lamport timestamps assign logical counters to events, providing a partial ordering across nodes. Vector clocks extend this to a vector of counters per node, enabling detection of concurrent events. Two events are causally related if one vector clock dominates the other, and concurrent if neither dominates.

Question 9

What is the PACELC theorem and how does it extend CAP?

Accepted Answer

PACELC extends CAP to cover the latency/consistency trade-off under normal operation: during a Partition choose Availability or Consistency; Else choose Latency or Consistency. Engineers use it for more nuanced discussions beyond failure-only CAP scenarios.

Question 10

How do engineers discuss consistency levels in distributed databases like Cassandra?

Accepted Answer

Cassandra allows per-query consistency levels like QUORUM, ONE, and ALL. Writing and reading at QUORUM ensures at least one replica in every pair overlaps, achieving strong consistency without requiring ALL replicas to respond.

Distributed Systems Consensus

Frequently Asked Questions