CAP & PACELC
5 exercises — master CAP and PACELC trade-off vocabulary: CP vs AP systems, PACELC latency/consistency trade-off, Spanner external consistency, tunable consistency, and applying CAP analysis to payment system design.
0 / 5 completed
CAP & PACELC quick reference
- CAP theorem — during partition, choose C (consistency: error rather than stale) or A (availability: respond even if stale). P is mandatory.
- CP examples — etcd, ZooKeeper, RDBMS sync replicas, HBase, CockroachDB.
- AP examples — Cassandra, DynamoDB (eventual), CouchDB, Riak.
- PACELC — extends CAP: if Partition → A/C; Else (healthy) → Latency/Consistency trade-off.
- PA/EL — available under partition, low latency when healthy (Cassandra, DynamoDB eventual).
- PC/EC — consistent under partition, higher latency when healthy (Spanner, etcd).
- Tunable consistency — per-operation consistency levels (Cassandra: ONE/QUORUM/ALL; DynamoDB: ConsistentRead).
- Spanner — CP, external consistency via TrueTime; not a CAP refutation — sacrifices availability under partition.
1 / 5
A system design interviewer asks: "Explain the CAP theorem. Can a distributed system ever provide all three guarantees?"
CAP theorem (Brewer, 2000; formally proven by Gilbert & Lynch, 2002): pick C or A when P occurs — you cannot have all three during a partition.
The three properties:
• Consistency (C) — every read receives the most recent write or an error. (Note: CAP uses a specific definition — equivalent to linearizability, not just "no corruption")
• Availability (A) — every request to a non-failing node receives a non-error response (though it may be stale)
• Partition tolerance (P) — the system continues operating even when messages are lost or delayed between nodes
Why P is mandatory:
Any distributed system operating over a network will experience partitions. Network cables break, switches fail, BGP re-routes, clouds have outages. Choosing "not P" means your system stops when the first partition occurs — not viable for production systems.
The practical trade-off therefore is: during a partition, choose C or A:
Important nuance:
• CAP says nothing about how often to sacrifice C or A — just that under partition you must choose
• Modern systems (Spanner, CockroachDB) are "effectively CA when the network is healthy" and CP when partitioned — the partition case matters
• CAP's "Availability" has a specific formal definition (every non-failing node responds) — differs from "99.99% uptime SLA"
Key vocabulary:
• CAP theorem — a distributed system can guarantee at most two of: Consistency, Availability, Partition-tolerance; during a partition, must choose C or A
• CP system — chooses consistency over availability during partition; returns errors rather than stale data
• AP system — chooses availability over consistency during partition; returns possibly-stale data rather than errors
• Linearizability — the consistency semantics CAP uses; every read sees the latest committed write
The three properties:
• Consistency (C) — every read receives the most recent write or an error. (Note: CAP uses a specific definition — equivalent to linearizability, not just "no corruption")
• Availability (A) — every request to a non-failing node receives a non-error response (though it may be stale)
• Partition tolerance (P) — the system continues operating even when messages are lost or delayed between nodes
Why P is mandatory:
Any distributed system operating over a network will experience partitions. Network cables break, switches fail, BGP re-routes, clouds have outages. Choosing "not P" means your system stops when the first partition occurs — not viable for production systems.
The practical trade-off therefore is: during a partition, choose C or A:
| Choice | Behaviour during partition | Examples |
|---|---|---|
| CP | Return error or wait rather than return stale data | etcd, ZooKeeper, Consul, HBase |
| AP | Return potentially stale data rather than error | Cassandra, CouchDB, Riak, DynamoDB (eventually consistent) |
Important nuance:
• CAP says nothing about how often to sacrifice C or A — just that under partition you must choose
• Modern systems (Spanner, CockroachDB) are "effectively CA when the network is healthy" and CP when partitioned — the partition case matters
• CAP's "Availability" has a specific formal definition (every non-failing node responds) — differs from "99.99% uptime SLA"
Key vocabulary:
• CAP theorem — a distributed system can guarantee at most two of: Consistency, Availability, Partition-tolerance; during a partition, must choose C or A
• CP system — chooses consistency over availability during partition; returns errors rather than stale data
• AP system — chooses availability over consistency during partition; returns possibly-stale data rather than errors
• Linearizability — the consistency semantics CAP uses; every read sees the latest committed write