5 exercises on distributed data terms — consistency, replication, and partitioning.
0 / 5 completed
1 / 5
What does the CAP theorem state?
The CAP theorem says that a distributed data store can simultaneously provide at most two of three properties: Consistency (every read sees the latest write), Availability (every request gets a non-error response), and Partition tolerance (the system keeps working despite dropped messages between nodes). Since network partitions are unavoidable in real systems, the practical choice during a partition is between consistency and availability. CP systems reject requests to stay consistent; AP systems keep serving but may return stale data. The theorem frames the fundamental trade-offs in distributed database design.
2 / 5
What is a quorum in a replicated database?
A quorum is the minimum number of replica nodes that must respond for a read or write to be considered successful. With N replicas, a write quorum W and read quorum R satisfying W + R > N guarantees that any read overlaps with the latest write, ensuring strong consistency. Tuning W and R trades latency against consistency: smaller quorums are faster and more available but risk reading stale data. Quorum-based replication (popularized by Dynamo-style systems) lets operators dial the consistency level per operation.
3 / 5
What is sharding?
Sharding is horizontal partitioning: the dataset is split into disjoint subsets called shards, each stored on a different node, so the system scales beyond a single machine's capacity. A shard key determines which shard a record lives in — via range, hash, or directory-based routing. Good shard keys distribute load evenly and avoid hotspots. The trade-off is that cross-shard queries, joins, and transactions become expensive or require coordination. Rebalancing shards as data grows is one of the harder operational challenges in distributed databases.
4 / 5
What is eventual consistency?
Eventual consistency is a weak consistency model in which, given no new updates, all replicas will eventually converge to the same value. Reads may temporarily return stale data because writes propagate asynchronously between nodes. This model trades immediate consistency for higher availability and lower latency — a common choice for AP systems under CAP. To resolve conflicting concurrent updates, such systems use techniques like last-write-wins timestamps, vector clocks, or CRDTs. Eventual consistency suits use cases (shopping carts, social feeds) where brief staleness is acceptable.
5 / 5
What is a vector clock used for?
A vector clock is a data structure for capturing the causal relationships between events in a distributed system without relying on synchronized physical clocks. Each node maintains a vector of counters — one per node — incrementing its own on each event and merging maximums on message receipt. By comparing two vectors you can tell whether one event happened-before another or whether they are concurrent (causally independent). Databases use vector clocks to detect conflicting writes that need reconciliation, distinguishing true concurrency from a simple update sequence.