Vocabulary for Distributed Databases: Consensus, Replication, and Partitioning
Master the English of distributed databases: consensus, quorum, replication lag, sharding, partition tolerance, and the CAP theorem. Precise terms for backend and platform engineers.
Distributed databases come with a vocabulary that’s notoriously easy to misuse. Words like consistency, availability, and partition mean something specific here — and not always what they mean in everyday English. This guide walks through the core terms, the verb collocations engineers actually use, and the pronunciation traps that catch non-native speakers.
The CAP theorem: three words you must use precisely
The CAP theorem says a distributed system can guarantee only two of three properties when a network partition happens:
- Consistency — every read sees the most recent write.
- Availability — every request gets a (non-error) response.
- Partition tolerance — the system keeps working despite dropped network messages between nodes.
The classic mistake is saying “CAP means pick two.” More precisely: when a partition occurs, you must choose between consistency and availability — partition tolerance isn’t optional in a real distributed system. Sounding fluent means saying:
“Under a network partition, this system favours availability over consistency — it’s an AP system. It’ll keep serving reads even if they’re slightly stale.”
Vocabulary: stale read (out-of-date data), strong consistency, eventual consistency (replicas converge over time), PACELC (the extended theorem: else, trade latency vs. consistency).
Replication
Replication means keeping copies of data on multiple nodes. The terms:
- Leader / follower (also primary / replica) — the node that accepts writes vs. the copies.
- Synchronous vs. asynchronous replication — wait for replicas to confirm vs. don’t.
- Replication lag — how far behind a follower is.
- Failover — promoting a follower to leader when the leader dies.
- Split-brain — two nodes both think they’re the leader (a serious failure).
Verb collocations:
- a leader replicates writes to followers
- a follower falls behind / catches up
- the system promotes a replica and demotes the old leader
- replication lag spikes under load
“Our replication lag spiked to 30 seconds during the import, so read replicas served stale data. We failed over cleanly, with no split-brain.”
Pronunciation: replica is /ˈreplɪkə/ — stress the first syllable, short e. Not “ree-PLY-ka.”
Consensus and quorum
When multiple nodes must agree on a value, they run a consensus algorithm — Raft or Paxos. Key terms:
- Quorum — the minimum number of nodes that must agree (usually a majority).
- Leader election — choosing a coordinator node.
- Commit — a write is committed once a quorum acknowledges it.
- Linearizability — the strongest consistency: operations appear to happen instantly, in order.
Collocations:
- nodes reach consensus / agree on a value
- a write is acknowledged by a quorum
- the cluster elects a leader
- a partition loses quorum and goes read-only
“With five nodes, quorum is three. If a partition isolates two nodes, the minority side loses quorum and stops accepting writes to avoid split-brain.”
Pronunciation: quorum is /ˈkwɔːrəm/. Paxos is /ˈpæksɒs/. Raft rhymes with “craft.”
Partitioning and sharding
Careful: “partition” has two meanings in this field — the network partition (above) and data partitioning. Context disambiguates, but be aware.
Sharding = splitting data across nodes so each holds a subset.
- Shard / partition key — the field that decides which node holds a row.
- Hash vs. range partitioning — distribute by hash of the key vs. by ranges.
- Hot shard / hotspot — one shard getting disproportionate traffic.
- Rebalancing — moving data between shards to even out load.
- Resharding — changing the sharding scheme (expensive).
“We shard by
customer_idusing hash partitioning to avoid hotspots. One large customer created a hot shard, so we had to rebalance.”
A subtle word: data is partitioned or sharded (split); it isn’t “divided into” in idiomatic English. Say “we partition the data by region,” not “we divide the data on region.”
Transactions in a distributed world
- ACID — Atomicity, Consistency, Isolation, Durability (the classic single-node guarantees).
- BASE — Basically Available, Soft state, Eventual consistency (the relaxed distributed alternative).
- Two-phase commit (2PC) — a protocol to commit a transaction across nodes atomically.
- Isolation levels — read committed, repeatable read, serializable.
- Write skew, phantom reads, dirty reads — anomalies weaker isolation levels allow.
“We need serializable isolation for the ledger, so we accept the latency cost. Everything else runs at read committed.”
Conflict resolution
When replicas accept writes independently, conflicts happen:
- Last-write-wins (LWW) — newest timestamp wins (simple, can lose data).
- Vector clocks — track causality to detect concurrent writes.
- CRDTs (Conflict-free Replicated Data Types) — data structures that merge automatically.
- Tombstone — a marker that records a deletion so it propagates.
“We use CRDTs for the collaborative editor so concurrent edits merge without conflicts, rather than relying on last-write-wins.”
Before / after: sounding fluent
Before: “The copy database is late and sometimes the data is old, and when the main one dies we change another to main.”
After: “Replication lag caused stale reads on the followers. When the leader failed, we promoted a replica via automatic failover.”
Quick glossary
| Term | One-line meaning |
|---|---|
| Consensus / quorum | Nodes agreeing / the majority needed |
| Replication lag | How far a follower is behind |
| Failover | Promoting a replica to leader |
| Split-brain | Two leaders at once (bad) |
| Sharding | Splitting data across nodes |
| Hot shard | One shard overloaded |
| Eventual consistency | Replicas converge over time |
| Linearizable | Strongest consistency |
| CRDT | Auto-merging data structure |
Key takeaways
- CAP forces a choice between consistency and availability under partition — phrase it that way.
- Learn the leader/follower, failover, replication lag collocations cold.
- “Partition” means two things — the network split and data partitioning. Read context.
- Use shard / partition as verbs the idiomatic way: “we shard by key.”
- Nail the pronunciation of replica, quorum, Paxos — these are the words you’ll say most.