Split-Brain & Network Partitions
5 exercises — master network partition vocabulary: split-brain causes and prevention, fencing tokens, lease timeouts, minority partition behaviour, epoch numbers, and ZooKeeper distributed lock patterns.
0 / 5 completed
Split-brain & partition quick reference
- Split-brain — two nodes simultaneously believe they are primary; caused by network partition.
- Prevention: quorum (majority must vote) + fencing tokens + STONITH.
- Fencing token — monotonically increasing epoch; storage rejects writes from lower-token leaders.
- Lease — time-bounded authority grant; expires automatically; follower must wait full timeout before electing new leader.
- Minority partition — cannot form quorum → enters read-only mode (CP choice).
- Epoch / generation — higher epoch supersedes lower; old leader's writes rejected if epoch is stale.
- ZooKeeper ephemeral node — auto-deleted on client crash; crash-safe lock primitive.
- Watch-predecessor pattern — each lock waiter watches only its immediate predecessor; prevents herd effect.
1 / 5
A site reliability engineer describes an incident: "We had a split-brain scenario in our database cluster. Both nodes thought they were the primary and accepted writes. The result was diverged data that took 4 hours to reconcile."
What causes split-brain and how is it prevented?
Split-brain = two nodes both acting as primary. Root cause: network partition. Solution: quorum + fencing to ensure only one can succeed.
How split-brain occurs:
• In a 3-node cluster, require 2 nodes to acknowledge writes
• During partition: {A} has 1 node − cannot form quorum → stops accepting writes
• {B, C} has 2 nodes − forms quorum → continues operation
• Only one side can form a majority quorum; split-brain prevented
Prevention 2: Fencing tokens
• When a new leader is elected, it gets a monotonically increasing fencing token (e.g., epoch/generation number)
• All writes to storage include the fencing token
• Storage layer rejects writes from any leader with a lower token than it has seen
• Old primary (unaware it was replaced) attempts writes → storage sees lower token → rejects → old primary cannot cause split-brain damage
Prevention 3: STONITH / node fencing
• When new leader believes old leader might still be running, it actively powers it off via IPMI/iDRAC/PDU
• Ensures the old leader is definitively dead before the new leader begins accepting writes
• Used by Pacemaker/Corosync cluster managers
Key vocabulary:
• Split-brain — cluster state where two nodes simultaneously believe they are the active primary; causes data divergence
• Network partition — network failure that prevents communication between cluster nodes while they remain individually operational
• Fencing token — monotonically increasing epoch number; storage layer rejects writes from superseded leaders
• STONITH — "Shoot The Other Node In The Head"; proactive node termination to prevent split-brain writes
How split-brain occurs:
Normal: Client → Primary (Node A) → Replica (Node B) Network partition: [Node A] ................... [Node B] Node B can't see A → "A must be dead" → B elects itself primary Node A is still running and accepting writes Result: Client 1 → Node A: writes X=1 Client 2 → Node B: writes X=2 Partition heals: X=1 and X=2 in conflictPrevention 1: Quorum-enforced writes
• In a 3-node cluster, require 2 nodes to acknowledge writes
• During partition: {A} has 1 node − cannot form quorum → stops accepting writes
• {B, C} has 2 nodes − forms quorum → continues operation
• Only one side can form a majority quorum; split-brain prevented
Prevention 2: Fencing tokens
• When a new leader is elected, it gets a monotonically increasing fencing token (e.g., epoch/generation number)
• All writes to storage include the fencing token
• Storage layer rejects writes from any leader with a lower token than it has seen
• Old primary (unaware it was replaced) attempts writes → storage sees lower token → rejects → old primary cannot cause split-brain damage
Prevention 3: STONITH / node fencing
• When new leader believes old leader might still be running, it actively powers it off via IPMI/iDRAC/PDU
• Ensures the old leader is definitively dead before the new leader begins accepting writes
• Used by Pacemaker/Corosync cluster managers
Key vocabulary:
• Split-brain — cluster state where two nodes simultaneously believe they are the active primary; causes data divergence
• Network partition — network failure that prevents communication between cluster nodes while they remain individually operational
• Fencing token — monotonically increasing epoch number; storage layer rejects writes from superseded leaders
• STONITH — "Shoot The Other Node In The Head"; proactive node termination to prevent split-brain writes