Learn the vocabulary of splitting one logical table into physical partitions within a single database.
0 / 5 completed
1 / 5
At standup, a dev mentions splitting one logical table into several physical partitions, each holding rows for a specific range of a partition key, like one partition per calendar month, all within the same single database instance. What is this technique called?
Table partitioning splits one logical table into several physical partitions, each holding rows for a specific range or category of a partition key, such as one partition per calendar month, while everything still lives within a single database instance and is queried through the same logical table. Database sharding is a related but distinct technique that splits data across multiple separate database instances or nodes entirely, typically to scale beyond what one instance can handle. This within-a-single-instance scope is exactly what distinguishes partitioning from sharding, even though both techniques divide data along some key.
2 / 5
During a design review, the team partitions a large events table by month specifically so a query filtered to a recent date range only needs to scan that month's partition instead of the entire table's full history. Which capability does this partitioning provide?
This partitioning provides partition pruning, where the query planner recognizes that a query filtered to a recent date range can only possibly match rows in the partitions covering that range, and skips scanning every other partition entirely. Scanning every partition in full regardless of the filter would defeat the entire purpose of partitioning by date in the first place. This pruning is exactly what makes date-range queries against a partitioned table dramatically faster than the same query against one enormous unpartitioned table holding years of history.
3 / 5
In a code review, a dev notices a reporting query filters on a customer ID column, but the events table is partitioned by month, so the query planner still has to scan every single partition since customer ID isn't the partition key. What does this represent?
This is a partitioning-key mismatch, since partition pruning only kicks in when a query's filter aligns with the actual column the table is partitioned by, and a query filtering on customer ID against a table partitioned by month has no way to know in advance which month partitions contain that customer's rows, forcing a full scan across all of them. A hash collision is an unrelated concept from hash tables, not table partitioning. This is exactly why the choice of partition key needs to match the query patterns the table will actually see most often, since partitioning by the wrong dimension provides no benefit for queries filtering on a different one.
4 / 5
An incident report shows a reporting dashboard's queries kept timing out even after the underlying table was partitioned by month, because the dashboard's most common queries actually filtered by customer ID, a dimension the partitioning scheme didn't align with at all. What practice would prevent this?
Choosing a partitioning strategy, or supplementing the existing scheme with a secondary index, aligned with the query patterns actually run against the table addresses the real problem, since partition pruning only helps when a query's filter matches the partition key, and this dashboard's queries clearly didn't. Continuing to partition only by month while the dominant query pattern filters by customer ID leaves those queries scanning every partition regardless, exactly as this incident describes. This alignment between partitioning scheme and actual query patterns is essential for partitioning to deliver its intended performance benefit rather than becoming dead weight.
5 / 5
During a PR review, a teammate asks why the team carefully chooses a partition key based on actual query patterns instead of just partitioning by whatever column seems most natural, like an auto-incrementing ID. What is the reasoning?
Partition pruning only benefits queries that actually filter on the column the table is partitioned by, so partitioning by an auto-incrementing ID when the real workload mostly filters by date or by customer provides little to no performance benefit while still adding all of partitioning's operational overhead, like managing partition boundaries and any necessary maintenance. Choosing the partition key based on the workload's actual, dominant query patterns ensures the pruning benefit is realized where it actually matters. The tradeoff is the upfront analysis effort of understanding real query patterns before committing to a partitioning scheme, since retrofitting a different partition key later is typically a costly, disruptive migration.