Intermediate Reading #blog #scaling #architecture

Reading Engineering Blog Posts

5 exercises on real-world engineering write-ups — database sharding, scaling lessons, and a monolith-to-microservices migration. Find the main idea, draw inferences, and read the author’s stance.

Reading "lessons learned" posts effectively
  • Thesis lives in the conclusion — look for “If we started over...” or “Would we do it again?”
  • Inference vs. quotation — inference answers are supported by details, not copied word-for-word
  • Idioms in context“bought us time” means delayed, not purchased; test phrases against nearby sentences
  • Cautionary anecdotes — a described mistake usually implies the opposite advice
  • Nuanced stance — “X has real costs” is rarely the same as “X is bad”
0 / 5 completed
1 / 5
Passage: Database Sharding — Engineering Blog Post
Title: How We Cut Our p99 Latency by 60% — Lessons From Sharding Our Database

For three years, our single Postgres primary handled everything. It worked beautifully —
until it didn't. By Q2 our write throughput had tripled, and the p99 latency on our
checkout endpoint crept from 120 ms to over 800 ms during peak hours. Vertical scaling
bought us time, but we were running on the largest instance our provider offered. We had
hit a ceiling.

We considered three options:

  1. Read replicas. These help read-heavy workloads, but our bottleneck was writes.
     Replicas would not have moved the needle.
  2. Caching everything. We already cached aggressively; the remaining traffic was
     genuinely dynamic.
  3. Sharding. Splitting the data across multiple primaries by customer ID.

We chose sharding, fully aware it is the option of last resort. It introduces real
complexity: cross-shard queries become expensive, transactions can no longer span the
whole dataset, and re-sharding later is painful.

The migration took four months. The hardest part was not the code — it was the
backfill. We had to copy two years of historical data into the new shards without
downtime, using a dual-write phase where every write went to both the old and new
systems while we verified consistency.

The result: p99 dropped to 320 ms and write throughput is now effectively unbounded.
Would we do it again? Yes — but only because we genuinely had no alternative. If you
can solve your problem with a read replica or a cache, do that first. Sharding is
powerful, but the operational cost is permanent.
What is the main argument the author makes about sharding?