Database scaling comparison

Vertical vs horizontal partitioning

When a table gets too big, you split it. The question is which direction you cut: by columns (vertical) or by rows (horizontal — better known as sharding). They solve different problems, and mixing them up is one of the most common mistakes in "how would you scale this database" system-design interviews.

TL;DR

  • Vertical partitioning splits a table by columns — hot, frequently-used columns in one table, cold or large columns in another, joined by a shared key.
  • Horizontal partitioning (sharding) splits a table by rows — each partition/shard holds a subset of rows but every column, usually distributed across separate servers.
  • Different problems solved: vertical partitioning shrinks row size and working-set memory; horizontal partitioning scales total data volume and write throughput across machines.

Side-by-side comparison

AspectVertical partitioningHorizontal partitioning (sharding)
Splits byColumnsRows
Each partition containsAll rows, subset of columnsSubset of rows, all columns
Primary goalShrink row size, isolate rarely-used or large columnsScale total storage and write throughput across machines
Typical triggerWide table with rarely-accessed large columns (blobs, JSON, history)Table too large for one server, or write throughput exceeds one server's capacity
Key conceptShared primary key joining the split tablesShard key (partition key) that decides row placement
Cross-partition joinsCheap — usually within one database instanceExpensive or app-level — often avoided by denormalising
Scales writes across servers?No — all data still lives on one instanceYes — this is the primary reason to shard
Failure risk if done poorlyExtra join overhead if the split boundary is wrongHot shards from a badly chosen shard key
Common real-world useSeparating profile "core fields" from "activity history"Sharding users table by user_id hash across 16 database clusters

Code / schema side-by-side

Vertical partitioning

-- Before: one wide table
CREATE TABLE users (
  id, email, name, status,
  bio_text, avatar_blob, last_login_json
);

-- After: hot columns split from cold ones
CREATE TABLE users_core (
  id, email, name, status
);
CREATE TABLE users_profile_detail (
  user_id REFERENCES users_core(id),
  bio_text, avatar_blob, last_login_json
);

-- Hot-path query only touches the small table:
SELECT id, email, status FROM users_core
WHERE email = 'a@b.com';

Horizontal partitioning (sharding)

// Application-level shard routing
function shardFor(userId) {
  const NUM_SHARDS = 16;
  return hash(userId) % NUM_SHARDS;
}

function getUser(userId) {
  const shard = shardFor(userId);
  const conn = shardConnections[shard];
  return conn.query(
    "SELECT * FROM users WHERE id = $1",
    [userId]
  );
}
// users 1..1M live on shard 0..15, each on
// its own database server -- writes scale
// roughly linearly as shards are added

When to use vertical partitioning

  • Most queries only need a handful of "hot" columns. Splitting off large or rarely-read columns (bios, blobs, JSON history) shrinks the working set that has to fit in cache or memory for common queries.
  • You have a small number of very wide, infrequently accessed columns. Storing them separately avoids paying I/O cost for them on every row scan of the main table.
  • You want to apply different access controls or storage tiers per column group. Sensitive or bulky data can live in a separately secured or cheaper-storage table.
  • Single-server capacity is not yet the bottleneck. Vertical partitioning is a lighter-weight first step before reaching for sharding.

When to use horizontal partitioning (sharding)

  • Total data volume exceeds what one server can hold or serve efficiently. Sharding is the standard answer once vertical scaling (bigger machine) hits diminishing returns.
  • Write throughput exceeds a single primary's capacity. Distributing writes across shards is the main way to scale write-heavy workloads horizontally.
  • Your access pattern maps naturally to a partition key. Multi-tenant SaaS (shard by tenant_id) and user-centric apps (shard by user_id) are classic, low-risk sharding candidates.
  • You've already exhausted vertical partitioning and read replicas. Sharding adds real operational complexity, so it's usually the last lever pulled, not the first.

English phrases engineers use

Vertical partitioning conversations

  • "Let's split out the cold columns into a separate table."
  • "That blob field is bloating every row scan — vertically partition it."
  • "We joined back on the shared key to reassemble the full profile."
  • "The working set now fits in memory after we split the table."

Sharding conversations

  • "We picked user_id as the shard key — good cardinality, even distribution."
  • "Shard 7 is hot — one tenant is generating 40% of the traffic."
  • "This query needs a cross-shard join, which we try to avoid."
  • "We're planning a resharding to double our shard count next quarter."
  • "Over-provision logical shards now so rebalancing is a remap, not a re-split."

Quick decision tree

  • Rarely-used large columns bloating common queries → Vertical partitioning
  • Total data or write volume exceeds one server → Horizontal partitioning (sharding)
  • Need both smaller rows and more write throughput → Vertically partition first, then shard the hot table
  • Access pattern has an obvious high-cardinality key (tenant_id, user_id) → Shard on that key
  • Access pattern has no good shard key (e.g. only a boolean status) → Reconsider — pick a different key or avoid sharding
  • Not yet hitting single-server limits → Neither — read replicas or a bigger instance first

Frequently asked questions

Is horizontal partitioning the same thing as sharding?

Yes, in almost all everyday usage. "Horizontal partitioning" is the formal database term for splitting a table by rows, so each partition holds a subset of the rows but all of the columns. "Sharding" is the same technique applied at a larger scale, usually across separate database instances or servers rather than partitions within one instance. Engineers use the words interchangeably in most conversations.

What is vertical partitioning used for in practice?

Vertical partitioning splits a table by columns rather than rows — for example, keeping frequently-accessed columns (name, status) in one table and rarely-accessed, large columns (full address history, binary blobs, long text fields) in a separate table joined by a shared key. It reduces the amount of data scanned for common queries and can shrink the working set that fits in memory/cache.

Which one solves "my database is too big for one machine"?

Horizontal partitioning (sharding). Splitting rows across multiple database instances lets you scale storage and write throughput roughly linearly by adding more shards. Vertical partitioning does not solve this on its own — you still have all the rows, just spread across narrower tables — though it is often applied first to shrink each row before sharding.