Consensus algorithm comparison
Raft vs Paxos
Both algorithms solve the same problem — a cluster of machines agreeing on a value, or a sequence of values, even when some of them crash or the network partitions. Paxos got there first (1989/1998) and proved it was possible. Raft (2014) was designed from day one to be understandable enough that ordinary engineering teams could implement it correctly.
TL;DR
- Paxos is the original, general consensus protocol. It is provably correct but famously difficult to extend into a production replicated log — teams had to invent their own multi-decree extensions.
- Raft solves the identical problem but decomposes it into three understandable pieces: leader election, log replication, and safety — with a strong, always-present leader.
- Neither is "more correct." Both guarantee safety under the same failure model (crash-recovery, majority quorum). The practical difference is implementability and the clarity of the specification.
Side-by-side comparison
| Aspect | Raft | Paxos |
|---|---|---|
| Published | 2014 ("In Search of an Understandable Consensus Algorithm") | 1989/1998 (Lamport's original papers) |
| Design goal | Understandability — explicit, teachable state machine | Generality and minimal assumptions |
| Leader model | Strong leader — all writes go through one elected leader at a time | No mandated leader in the base protocol; Multi-Paxos adds a de facto leader as an optimisation |
| Roles | Follower, Candidate, Leader (named, explicit) | Proposer, Acceptor, Learner (roles can overlap on one node) |
| Term / ballot number | "Term" — monotonically increasing, one leader per term | "Ballot" or proposal number — same purpose, different name |
| Log replication | Specified as part of the core algorithm | Left to implementers in the base paper (Multi-Paxos extension) |
| Membership changes | Joint consensus, specified in the paper | Not covered by the original papers; every system invents its own |
| Reference implementations | Plentiful and interoperable (etcd/raft, hashicorp/raft) | Fewer, more varied (each team's "Paxos" often differs) |
| Used by | etcd, Consul, CockroachDB, TiKV, Kafka's KRaft mode | Chubby, Spanner (Paxos groups), ZooKeeper's ZAB (Paxos-like) |
| Learning curve | Lower — designed to be taught in one university lecture | Higher — original paper uses an allegory (Greek parliament) that many engineers find harder to map to code |
Code / pseudocode side-by-side
Both protocols commit an entry once a leader (or proposer) collects acknowledgements from a majority. Here is the shape of a single "commit a log entry" round in each:
Raft — AppendEntries RPC
// Leader for term T replicates entry E
for each follower:
send AppendEntries(term=T, prevLogIndex,
prevLogTerm, entries=[E])
// Follower accepts only if:
if req.term < currentTerm: reject
if log[prevLogIndex].term != req.prevLogTerm:
reject // log doesn't match, ask for backtrack
else:
append(entries)
reply(success=true)
// Leader commits E once a MAJORITY of
// followers have replied success=true Paxos — one instance (Prepare/Accept)
// Proposer picks ballot number N, value V
// Phase 1: Prepare
send Prepare(N) to all acceptors
// Acceptor promises not to accept < N
// and returns any value it already accepted
// Phase 2: Accept
// If a majority promised, proposer sends:
send Accept(N, V') to all acceptors
// (V' = highest-numbered value already
// accepted by any acceptor, or V if none)
// Value is CHOSEN once a MAJORITY
// of acceptors accept (N, V') When to reach for Raft
- You are building or choosing infrastructure from scratch. etcd, Consul, and hashicorp/raft give you a battle-tested, interoperable implementation with clear docs.
- Your team needs to reason about failure scenarios. Raft's explicit follower/candidate/leader states and term numbers make split-brain and network-partition behaviour much easier to trace in an incident review.
- You want strong consistency for configuration or coordination data. Kubernetes' control plane, service discovery, and distributed locks are the classic Raft use case — a small amount of critical state that must never diverge.
- You are teaching or onboarding engineers onto distributed systems. Raft's thesis was written explicitly to be teachable; use it as the reference model even if your production system runs something else.
When Paxos (or a Paxos variant) still shows up
- You are integrating with an existing Paxos-based system. Google Chubby, Spanner, and many internal Google services are built on Paxos; if you work in that ecosystem you inherit its vocabulary.
- You need Paxos's more flexible leaderless mode. Base Paxos does not require a single stable leader, which can matter for certain multi-datacentre designs that want to avoid a single point of coordination for proposing values.
- ZooKeeper is already in your stack. ZooKeeper's ZAB protocol is a Paxos-family, leader-based atomic broadcast protocol — you get Paxos-style guarantees through a mature, widely deployed coordination service without implementing consensus yourself.
- Academic or research contexts. Paxos remains the canonical reference for proving new consensus variants correct — most published extensions (Fast Paxos, Generalized Paxos, EPaxos) compare themselves against it, not Raft.
English phrases engineers use
Raft conversations
- "The cluster lost quorum — we're down to two out of five nodes."
- "A new leader was elected after the election timeout fired."
- "That write hasn't been committed yet — it's only on a majority of logs, not applied."
- "We saw a term bump in the logs right before the failover."
- "Don't worry, Raft handles the split-brain case for us — the minority side can't commit."
Paxos conversations
- "That's single-decree Paxos — we need Multi-Paxos for an ordered log."
- "The proposer retried with a higher ballot number after being rejected."
- "This looks like a Paxos variant, not vanilla Paxos — probably closer to Fast Paxos."
- "ZooKeeper's ZAB is basically Paxos with a permanent leader, similar to Multi-Paxos."
- "We had to roll our own membership-change logic — the Paxos papers don't specify it."
Quick decision tree
- Building new coordination infrastructure from scratch → Raft (etcd/raft, hashicorp/raft)
- Need a battle-tested library with clear documentation → Raft
- Already running ZooKeeper or Chubby-style systems → Stick with the existing Paxos-family protocol
- Teaching consensus to a team new to distributed systems → Start with Raft's model
- Researching or extending consensus theory → Compare against Paxos, the academic baseline
- Need a leaderless mode for a specific multi-region design → Consider base Paxos or EPaxos
Frequently asked questions
What is the core difference between Raft and Paxos?
Both solve the same problem — getting a cluster of nodes to agree on a sequence of values even when some nodes fail — but Raft was explicitly designed for understandability. Raft decomposes consensus into leader election, log replication, and safety as separate sub-problems with a strong, single leader at all times. Classic (multi-decree) Paxos allows any node to propose in any round, which is more general but notoriously hard to reason about and even harder to implement correctly.
Why do engineers say "Paxos is hard to implement correctly"?
The original Paxos paper describes single-decree consensus (agreeing on one value) precisely, but production systems need multi-decree Paxos (an ordered log of values), which the paper only sketches. Teams historically had to invent their own extensions — leader leases, log compaction, membership changes — without an agreed specification, leading to subtly different and sometimes buggy implementations. Google's own engineers wrote the "Paxos Made Live" paper specifically to document the gap between the paper and a working system.
Which real systems use Raft, and which use Paxos?
Raft is used by etcd (and therefore Kubernetes' control plane), Consul, CockroachDB (per-range replication), and TiKV. Paxos and its variants power Google's Chubby lock service, Spanner (via Paxos groups), and Apache ZooKeeper's ZAB protocol (a Paxos-like, leader-based variant built for the same total-order broadcast goal).
Does Raft ever have more than one leader at a time?
Only transiently during a network partition, and it is never allowed to matter: a leader must see acknowledgements from a majority (quorum) of nodes before committing an entry. A leader isolated in a minority partition cannot reach quorum, so it cannot commit new entries even though it still believes it is the leader — preventing the split-brain scenario where two leaders both accept conflicting writes.
Is Multi-Paxos the same as Raft?
They are close cousins. Multi-Paxos (running repeated rounds of Paxos to agree on a growing log) converges on a similar shape to Raft — a stable leader that replicates a log to followers — but Raft makes that leader role an explicit, named part of the protocol with clear election rules, whereas Multi-Paxos treats a stable leader as an optimisation on top of the more general single-decree protocol.
What is a "term" in Raft, and does Paxos have an equivalent?
A Raft term is a monotonically increasing integer that identifies one leader's time in office; every log entry and message carries the current term, so nodes can detect and reject messages from a stale, deposed leader. Paxos has an equivalent concept called a "ballot number" (or proposal number) that plays the same role — ensuring nodes always prefer the highest-numbered proposal they have seen.
Why does understandability matter for a consensus algorithm?
Consensus code sits underneath the data that a distributed system promises never to lose or corrupt. Diego Ongaro's Raft thesis argued that an algorithm engineers cannot fully reason about is an algorithm they will implement incorrectly under pressure. Raft's explicit state machine (follower/candidate/leader) and strong leader model made it possible for teams without distributed-systems PhDs to build correct implementations — which is why it displaced Paxos in most new systems built after 2014.