Distributed systems comparison
Leader election vs consensus
Engineers often use these two terms almost interchangeably, which causes confusion in design reviews. Leader election is one specific, narrow problem. Consensus is the general problem class that leader election belongs to — and understanding the relationship explains why "just pick a leader" is harder than it sounds.
TL;DR
- Consensus is the general problem: get a set of nodes to agree on a single value (or an ordered sequence of values) despite failures and delays.
- Leader election is a specific application of consensus: the "value" being agreed on is simply "who is in charge right now."
- The trap: leader election that is not built on real consensus (no quorum requirement) can silently produce two leaders at once — the exact failure consensus protocols are designed to rule out.
Side-by-side comparison
| Aspect | Leader election | Consensus (general) |
|---|---|---|
| Scope | Narrow — agree on exactly one thing: who is leader | Broad — agree on any value, or an ordered log of many values |
| Typical output | A single node ID, valid until the next election | A committed value, or a growing replicated log |
| Can be solved without full consensus? | Yes, with weaker guarantees (e.g. TTL locks) — but split-brain risk increases | No — by definition, consensus requires the strong agreement guarantee |
| Relationship | A special case / building block | The general theory leader election is built on |
| Common implementations | ZooKeeper ephemeral-sequential recipe, Kubernetes Lease API, Redis-based locks (Redlock) | Raft, Paxos / Multi-Paxos, ZAB |
| Failure mode if done wrong | Split-brain: two leaders active simultaneously | Divergent state: nodes disagree on committed values |
| Key mechanism | Lease / lock with expiry, or a consensus-backed election | Quorum (majority) agreement before anything is final |
| Where it appears in interviews | "Design a distributed lock service" | "Design a replicated key-value store" / "explain Raft" |
Code side-by-side
A weak (lock-based) leader election vs. a leader election built on real consensus:
Weak: TTL lock (no consensus)
// Any node can attempt this independently
acquired = redis.set(
"leader-lock", myNodeId,
nx=True, ex=10 // 10s TTL
)
if acquired:
isLeader = True
# Risk: clock skew, GC pause, or a
# slow network write can let the lock
# expire while this node still thinks
# it's the leader -> TWO leaders possible Strong: election via Raft-backed lease
// Kubernetes-style leader election
// (built on etcd, which runs Raft)
lease = coordination.Lease(
name="controller-leader",
holderIdentity=myNodeId,
leaseDurationSeconds=15,
)
// etcd only accepts the lease renewal if
// this node's write reaches a RAFT QUORUM.
// A partitioned former leader cannot renew
// -> cannot keep believing it's safe to act When "just do leader election" (a lock) is enough
- Low-stakes coordination. Picking which instance runs a nightly batch job — if two instances briefly both run it, the job is idempotent and safe to duplicate.
- You already depend on a single, reliable coordination store. A managed lock service (e.g. a cloud provider's distributed lock API) removes the need to reason about consensus yourself.
- Latency matters more than perfect safety. Simple TTL locks fail over faster than full consensus re-elections in some designs, at the cost of a small split-brain window.
- You can tolerate brief, bounded duplication. If "acting as leader twice for a few seconds" only costs a duplicate log line, don't over-engineer it.
When you need real consensus underneath
- The leader's actions are irreversible or high-stakes. Committing writes to a primary database, issuing payments, or promoting a replica — two leaders acting simultaneously here causes data corruption or financial loss.
- You are building the coordination layer itself. If you're writing a distributed lock service (not just using one), you need Raft or a Paxos variant underneath it, not a bare TTL.
- Regulatory or correctness guarantees are required. Systems that must prove "exactly one leader was active at any instant" for audit or compliance reasons need consensus-backed leases, not best-effort locks.
- Network partitions are common in your environment. Multi-region deployments, unreliable networks, or long GC pauses make clock-based TTL locks unsafe; quorum-based leases degrade more predictably.
English phrases engineers use
Leader election conversations
- "We acquired the lock, so this instance is now the leader."
- "The lease expired before the renewal reached the server — we lost leadership."
- "That's a weak leader election — it doesn't rule out split-brain."
- "Let's use the Kubernetes Lease API instead of hand-rolling this."
- "Which node is currently the leader? Check the lease holder identity."
Consensus conversations
- "That write didn't reach quorum, so it can't be considered committed."
- "We need strong consistency here, which means a real consensus protocol."
- "This is a consensus problem, not just a coordination convenience."
- "The partitioned node can't commit — it lost contact with the majority."
- "Don't reinvent consensus — build leader election on top of Raft, not a database row."
Quick decision tree
- Low-stakes, idempotent job scheduling → Simple lock-based leader election
- Leader actions are irreversible (writes, payments, promotions) → Consensus-backed leader election
- Already running Kubernetes/etcd → Use the built-in Lease API (Raft underneath)
- Building a new coordination primitive from scratch → Implement Raft, don't invent your own quorum logic
- Network partitions are rare and brief in your environment → TTL lock may be an acceptable trade-off
- Auditors or compliance require provable single-leader guarantees → Full consensus, no exceptions
Frequently asked questions
Is leader election a type of consensus, or something different?
Leader election is a specific instance of the general consensus problem: the cluster must agree on one value (the identity of the leader) even though nodes may crash or messages may be delayed. Consensus is the broader class — agreeing on any value, or an ordered sequence of values (a replicated log). Every leader election is a consensus problem, but not every consensus problem is a leader election.
Can you have leader election without a full consensus protocol?
Yes, with weaker guarantees. Simple approaches — a distributed lock with a TTL in Redis, or "lowest node ID wins" — can pick a leader quickly, but they do not guarantee there is never more than one leader active at the same time (no split-brain protection) under all failure modes. Full consensus protocols like Raft or ZooKeeper's ZAB guarantee that property because leadership is itself established through majority agreement.
Why do people say "ZooKeeper does leader election"?
ZooKeeper does not have a leader-election primitive baked into its API; it exposes ephemeral, sequential znodes and watches, and application code implements leader election as a recipe on top of those primitives (the classic recipe: each candidate creates an ephemeral sequential node, and whoever created the lowest-numbered node is the leader). ZooKeeper itself uses ZAB — a Paxos-family consensus protocol — internally to keep its own znode data consistent across its cluster.
What happens if the elected leader is wrong about being the leader?
This is called a "stale leader" and is exactly what consensus protocols are designed to prevent from causing damage. In Raft, a leader must maintain contact with a majority of followers (directly or via a lease) to keep committing writes; if it is partitioned away, it can still believe it is the leader locally, but it cannot commit new entries because it cannot reach quorum, so its stale belief never affects the replicated state.
Is "quorum" the same thing as "consensus"?
No. A quorum is a mechanism — a minimum number of nodes (usually a strict majority) that must agree before an action is considered final. Consensus is the outcome — all correct nodes agreeing on the same value. Quorums are the tool most practical consensus protocols use to achieve consensus safely, but you can imagine (weaker) systems that use quorums without providing full consensus guarantees.
Do I need to implement my own leader election?
Almost never. Use an existing coordination service: Kubernetes leases (built on etcd/Raft) for anything running in a cluster, ZooKeeper recipes for JVM-heavy stacks, or a managed service like AWS-native locking. Hand-rolled leader election using a database row with a TTL is a common but risky shortcut — clock skew and slow queries can produce two active leaders simultaneously.
What is a "split-brain" and how does it relate to both terms?
Split-brain is the failure mode both concepts exist to prevent: a network partition causes two groups of nodes to each believe they have a valid leader, and both accept writes independently, corrupting shared state. Leader election protocols that lack real consensus (no quorum requirement) are especially prone to split-brain; leader election built on a genuine consensus protocol is specifically designed to make split-brain writes impossible.