Senior Distributed Systems Engineer English: Consensus, CRDTs, and CAP Theorem Vocabulary

Consensus, Raft vs Paxos, split-brain, CRDT, CAP theorem — English vocabulary for senior distributed systems engineers in design reviews and architecture discussions.

Distributed systems has a vocabulary that separates engineers who have worked in the domain from those who have read about it. In design reviews, architecture discussions, and postmortem write-ups, senior engineers use this language precisely — and imprecise usage is immediately noticeable to peers.

This post focuses on the vocabulary you need to discuss distributed systems confidently in English, with the collocations and sentence patterns that native speakers actually use.


Consensus Algorithms

Consensus in distributed systems means agreement — specifically, the ability of a cluster of nodes to agree on a single value or decision even when some nodes fail or messages are delayed.

“The core problem of distributed consensus is getting nodes to agree on the order of operations without a single source of truth.”

A consensus algorithm is the protocol that achieves this agreement. The two you will hear in almost every senior distributed systems discussion are Raft and Paxos.

Raft was designed to be easier to understand than Paxos. It uses a leader node that coordinates log replication. If the leader fails, an election selects a new one. Engineers often say Raft is more approachable or more understandable than Paxos.

“We chose etcd over ZooKeeper partly because etcd uses Raft, which is easier to reason about when you’re debugging a leader election issue.”

Paxos is the original consensus algorithm, proposed by Leslie Lamport. It is notoriously difficult to implement correctly. Engineers who work with it often use the phrase “getting Paxos right” to describe this difficulty.

“Multi-Paxos is what most production systems actually implement, not the single-decree version from the original paper.”

A quorum is the minimum number of nodes that must agree for a decision to be valid. In a cluster of five nodes, a quorum is typically three.

“We can tolerate up to two node failures and still maintain quorum. Lose a third, and the cluster becomes unavailable rather than inconsistent.”


Failure Modes

Split-brain is a failure scenario in which a network partition causes two parts of a cluster to operate independently, each believing it is the sole authority. Both sides may accept writes, leading to conflicting state.

“The incident was a classic split-brain scenario. The network partition lasted 90 seconds, and both primaries accepted writes. We spent six hours reconciling the diverged data.”

Use cause, trigger, result in, or lead to with split-brain: “The asymmetric network failure triggered a split-brain condition.”


Consistency Models

The CAP theorem (Consistency, Availability, Partition tolerance) states that a distributed system can guarantee only two of the three properties simultaneously when a network partition occurs. Engineers say a system is CP (consistent and partition-tolerant) or AP (available and partition-tolerant).

“Cassandra is an AP system — it prioritises availability over strong consistency. If you need linearisable reads, Cassandra is the wrong choice.”

“People often misapply CAP. The real tradeoff in practice is consistency versus latency, not consistency versus availability.”

Eventual consistency means that, given enough time without new updates, all replicas will converge to the same value. Reads may return stale data in the meantime.

“The shopping cart is eventually consistent — if two clients add items simultaneously, both additions will eventually be reflected, but there’s a window where one client won’t see the other’s change.”

Strong consistency (or linearisability) means that every read reflects the most recent write, as if the system were a single node. It is more expensive to achieve.

“We need strong consistency for account balances. We cannot show a user a stale balance before they initiate a payment.”


CRDTs and Clocks

A CRDT (Conflict-free Replicated Data Type) is a data structure designed so that concurrent updates from multiple nodes can always be merged without conflicts. The structure’s mathematical properties guarantee that all replicas will eventually converge to the same state.

“We use a CRDT for the collaborative document editor. No matter how many clients edit simultaneously, the state always converges deterministically.”

“CRDTs are not free — they impose constraints on the data model. A counter CRDT can only increment, never decrement, without losing the conflict-free guarantee.”

Pronounce CRDT as individual letters: “C-R-D-T.”

A vector clock is a mechanism for tracking causal order of events across distributed nodes. Each node maintains a counter, and the vector of counters across all nodes reveals which events happened before others.

“We use vector clocks to detect causal ordering violations in the audit log. If event B’s vector clock does not dominate event A’s, we know they are concurrent.”

A Lamport timestamp is a simpler logical clock (a single counter) that establishes a partial ordering of events across nodes — sufficient for many use cases where full causal tracking is not needed.

“Lamport timestamps give us a consistent order to replay events, but they don’t tell us about causality — two events with different timestamps might still be concurrent.”


Phrases for Design Reviews and Architecture Discussions

Use these in system design interviews, architecture review boards, and technical postmortems:

  • “Under partition, this design falls into the AP category — we tolerate stale reads to maintain availability.”
  • “The split-brain risk is the main concern here. We need to ensure that write quorum is strictly enforced.”
  • “We’re using a CRDT for the presence indicators — it handles concurrent updates from multiple devices without coordination.”
  • “Raft gives us a simpler mental model for leader election, but we still need to reason carefully about term numbers and log replication.”
  • “The postmortem identified the root cause as a quorum misconfiguration — the cluster was accepting writes with only one replica acknowledged.”

Key Collocations

CollocationExample
achieve consensus”The algorithm achieves consensus in two rounds under normal conditions.”
maintain quorum”We cannot maintain quorum with fewer than three nodes.”
trigger a split-brain”The network partition triggered a split-brain condition.”
guarantee eventual consistency”The CRDT guarantees eventual consistency without coordination.”
reason about consistency”It’s easier to reason about consistency with Raft than with Paxos.”
tolerate node failures”The system tolerates up to two node failures without data loss.”

Practice

Take any system design problem — a distributed key-value store, a leaderboard, or a collaborative text editor. Write a short paragraph (four to six sentences) explaining your consistency model choice using the vocabulary from this post. Include at least: one reference to the CAP theorem, one consistency model (eventual or strong), and one mechanism (quorum, CRDT, or vector clock). Read it aloud to a colleague or record yourself — does the vocabulary feel natural, or are you still translating in your head?