Distributed systems comparison

Leader election vs consensus

Engineers often use these two terms almost interchangeably, which causes confusion in design reviews. Leader election is one specific, narrow problem. Consensus is the general problem class that leader election belongs to — and understanding the relationship explains why "just pick a leader" is harder than it sounds.

TL;DR

  • Consensus is the general problem: get a set of nodes to agree on a single value (or an ordered sequence of values) despite failures and delays.
  • Leader election is a specific application of consensus: the "value" being agreed on is simply "who is in charge right now."
  • The trap: leader election that is not built on real consensus (no quorum requirement) can silently produce two leaders at once — the exact failure consensus protocols are designed to rule out.

Side-by-side comparison

AspectLeader electionConsensus (general)
ScopeNarrow — agree on exactly one thing: who is leaderBroad — agree on any value, or an ordered log of many values
Typical outputA single node ID, valid until the next electionA committed value, or a growing replicated log
Can be solved without full consensus?Yes, with weaker guarantees (e.g. TTL locks) — but split-brain risk increasesNo — by definition, consensus requires the strong agreement guarantee
RelationshipA special case / building blockThe general theory leader election is built on
Common implementationsZooKeeper ephemeral-sequential recipe, Kubernetes Lease API, Redis-based locks (Redlock)Raft, Paxos / Multi-Paxos, ZAB
Failure mode if done wrongSplit-brain: two leaders active simultaneouslyDivergent state: nodes disagree on committed values
Key mechanismLease / lock with expiry, or a consensus-backed electionQuorum (majority) agreement before anything is final
Where it appears in interviews"Design a distributed lock service""Design a replicated key-value store" / "explain Raft"

Code side-by-side

A weak (lock-based) leader election vs. a leader election built on real consensus:

Weak: TTL lock (no consensus)

// Any node can attempt this independently
acquired = redis.set(
  "leader-lock", myNodeId,
  nx=True, ex=10  // 10s TTL
)
if acquired:
  isLeader = True
  # Risk: clock skew, GC pause, or a
  # slow network write can let the lock
  # expire while this node still thinks
  # it's the leader -> TWO leaders possible

Strong: election via Raft-backed lease

// Kubernetes-style leader election
// (built on etcd, which runs Raft)
lease = coordination.Lease(
  name="controller-leader",
  holderIdentity=myNodeId,
  leaseDurationSeconds=15,
)
// etcd only accepts the lease renewal if
// this node's write reaches a RAFT QUORUM.
// A partitioned former leader cannot renew
// -> cannot keep believing it's safe to act

When "just do leader election" (a lock) is enough

  • Low-stakes coordination. Picking which instance runs a nightly batch job — if two instances briefly both run it, the job is idempotent and safe to duplicate.
  • You already depend on a single, reliable coordination store. A managed lock service (e.g. a cloud provider's distributed lock API) removes the need to reason about consensus yourself.
  • Latency matters more than perfect safety. Simple TTL locks fail over faster than full consensus re-elections in some designs, at the cost of a small split-brain window.
  • You can tolerate brief, bounded duplication. If "acting as leader twice for a few seconds" only costs a duplicate log line, don't over-engineer it.

When you need real consensus underneath

  • The leader's actions are irreversible or high-stakes. Committing writes to a primary database, issuing payments, or promoting a replica — two leaders acting simultaneously here causes data corruption or financial loss.
  • You are building the coordination layer itself. If you're writing a distributed lock service (not just using one), you need Raft or a Paxos variant underneath it, not a bare TTL.
  • Regulatory or correctness guarantees are required. Systems that must prove "exactly one leader was active at any instant" for audit or compliance reasons need consensus-backed leases, not best-effort locks.
  • Network partitions are common in your environment. Multi-region deployments, unreliable networks, or long GC pauses make clock-based TTL locks unsafe; quorum-based leases degrade more predictably.

English phrases engineers use

Leader election conversations

  • "We acquired the lock, so this instance is now the leader."
  • "The lease expired before the renewal reached the server — we lost leadership."
  • "That's a weak leader election — it doesn't rule out split-brain."
  • "Let's use the Kubernetes Lease API instead of hand-rolling this."
  • "Which node is currently the leader? Check the lease holder identity."

Consensus conversations

  • "That write didn't reach quorum, so it can't be considered committed."
  • "We need strong consistency here, which means a real consensus protocol."
  • "This is a consensus problem, not just a coordination convenience."
  • "The partitioned node can't commit — it lost contact with the majority."
  • "Don't reinvent consensus — build leader election on top of Raft, not a database row."

Quick decision tree

  • Low-stakes, idempotent job scheduling → Simple lock-based leader election
  • Leader actions are irreversible (writes, payments, promotions) → Consensus-backed leader election
  • Already running Kubernetes/etcd → Use the built-in Lease API (Raft underneath)
  • Building a new coordination primitive from scratch → Implement Raft, don't invent your own quorum logic
  • Network partitions are rare and brief in your environment → TTL lock may be an acceptable trade-off
  • Auditors or compliance require provable single-leader guarantees → Full consensus, no exceptions

Frequently asked questions

Is leader election a type of consensus, or something different?

Leader election is a specific instance of the general consensus problem: the cluster must agree on one value (the identity of the leader) even though nodes may crash or messages may be delayed. Consensus is the broader class — agreeing on any value, or an ordered sequence of values (a replicated log). Every leader election is a consensus problem, but not every consensus problem is a leader election.

Can you have leader election without a full consensus protocol?

Yes, with weaker guarantees. Simple approaches — a distributed lock with a TTL in Redis, or "lowest node ID wins" — can pick a leader quickly, but they do not guarantee there is never more than one leader active at the same time (no split-brain protection) under all failure modes. Full consensus protocols like Raft or ZooKeeper's ZAB guarantee that property because leadership is itself established through majority agreement.

Why do people say "ZooKeeper does leader election"?

ZooKeeper does not have a leader-election primitive baked into its API; it exposes ephemeral, sequential znodes and watches, and application code implements leader election as a recipe on top of those primitives (the classic recipe: each candidate creates an ephemeral sequential node, and whoever created the lowest-numbered node is the leader). ZooKeeper itself uses ZAB — a Paxos-family consensus protocol — internally to keep its own znode data consistent across its cluster.