Practise vocabulary for table partitioning strategies, shard key selection, hot spots, and horizontal scaling discussion.
0 / 5 completed
1 / 5
Range partitioning a table by date means:
Range partitioning by date is the most common pattern for time-series and transactional data. Query benefit: 'WHERE created_at > 2024-01-01' prunes all pre-2024 partitions, scanning only recent data. Also enables efficient archival: drop the oldest partition without a DELETE scan.
2 / 5
A hot partition (or hot spot) in a sharded database occurs when:
Shard key design is critical: sharding by user_id is good if users have uniform activity. Sharding by region may create hot spots if 80% of traffic is from one region. Monotonically increasing IDs (auto-increment) are a classic hot spot: all new writes go to the 'latest' shard.
3 / 5
A cross-shard query (scatter-gather) is described as expensive because:
Scatter-gather: the query is broadcast to all shards, each executes it, results are gathered and merged. Cost scales with shard count: 50 shards = 50 parallel queries + merge overhead. Good shard key design ensures most queries target a single shard (shard-local), avoiding scatter-gather.
4 / 5
When discussing shard key selection, 'cardinality' refers to:
Cardinality determines distribution granularity: sharding by country (200 values) limits you to 200 shards before collisions — many countries would share a shard. Sharding by user_id (millions of values) allows fine-grained distribution. High-cardinality shard keys are generally preferred.
5 / 5
Resharding a database means:
Resharding is one of the most operationally complex database tasks: data must be moved between shards while the system remains live, without data loss or extended downtime. Consistent hashing (adding virtual nodes) minimises data movement during resharding. Teams try to avoid it by choosing a good initial shard key.