5 exercises on real-world engineering write-ups — database sharding, scaling lessons, and a monolith-to-microservices migration. Find the main idea, draw inferences, and read the author’s stance.
Reading "lessons learned" posts effectively
Thesis lives in the conclusion — look for “If we started over...” or “Would we do it again?”
Inference vs. quotation — inference answers are supported by details, not copied word-for-word
Idioms in context — “bought us time” means delayed, not purchased; test phrases against nearby sentences
Cautionary anecdotes — a described mistake usually implies the opposite advice
Nuanced stance — “X has real costs” is rarely the same as “X is bad”
0 / 5 completed
1 / 5
Passage: Database Sharding — Engineering Blog Post
Title: How We Cut Our p99 Latency by 60% — Lessons From Sharding Our Database
For three years, our single Postgres primary handled everything. It worked beautifully —
until it didn't. By Q2 our write throughput had tripled, and the p99 latency on our
checkout endpoint crept from 120 ms to over 800 ms during peak hours. Vertical scaling
bought us time, but we were running on the largest instance our provider offered. We had
hit a ceiling.
We considered three options:
1. Read replicas. These help read-heavy workloads, but our bottleneck was writes.
Replicas would not have moved the needle.
2. Caching everything. We already cached aggressively; the remaining traffic was
genuinely dynamic.
3. Sharding. Splitting the data across multiple primaries by customer ID.
We chose sharding, fully aware it is the option of last resort. It introduces real
complexity: cross-shard queries become expensive, transactions can no longer span the
whole dataset, and re-sharding later is painful.
The migration took four months. The hardest part was not the code — it was the
backfill. We had to copy two years of historical data into the new shards without
downtime, using a dual-write phase where every write went to both the old and new
systems while we verified consistency.
The result: p99 dropped to 320 ms and write throughput is now effectively unbounded.
Would we do it again? Yes — but only because we genuinely had no alternative. If you
can solve your problem with a read replica or a cache, do that first. Sharding is
powerful, but the operational cost is permanent.
What is the main argument the author makes about sharding?
Sharding is a powerful last resort.
The author states it twice: "we chose sharding, fully aware it is the option of last resort" and "If you can solve your problem with a read replica or a cache, do that first. Sharding is powerful, but the operational cost is permanent."
Why the other options fail:
Option A reverses the author's stance — they explicitly recommend trying cheaper options first.
Option C overstates it: replicas "would not have moved the needle" here only because the bottleneck was writes, not a general rule.
Option D contradicts the text — they already cached aggressively and still needed sharding.
Reading strategy — find the thesis: In a "lessons learned" post, the main idea usually appears in the conclusion, often signalled by "Would we do it again?" or "If you can...". Read the final paragraph closely; that is where the author distils the takeaway.
2 / 5
Passage: Database Sharding — Engineering Blog Post
Title: How We Cut Our p99 Latency by 60% — Lessons From Sharding Our Database
For three years, our single Postgres primary handled everything. It worked beautifully —
until it didn't. By Q2 our write throughput had tripled, and the p99 latency on our
checkout endpoint crept from 120 ms to over 800 ms during peak hours. Vertical scaling
bought us time, but we were running on the largest instance our provider offered. We had
hit a ceiling.
We considered three options:
1. Read replicas. These help read-heavy workloads, but our bottleneck was writes.
Replicas would not have moved the needle.
2. Caching everything. We already cached aggressively; the remaining traffic was
genuinely dynamic.
3. Sharding. Splitting the data across multiple primaries by customer ID.
We chose sharding, fully aware it is the option of last resort. It introduces real
complexity: cross-shard queries become expensive, transactions can no longer span the
whole dataset, and re-sharding later is painful.
The migration took four months. The hardest part was not the code — it was the
backfill. We had to copy two years of historical data into the new shards without
downtime, using a dual-write phase where every write went to both the old and new
systems while we verified consistency.
The result: p99 dropped to 320 ms and write throughput is now effectively unbounded.
Would we do it again? Yes — but only because we genuinely had no alternative. If you
can solve your problem with a read replica or a cache, do that first. Sharding is
powerful, but the operational cost is permanent.
The author calls the backfill "the hardest part." What can you infer about why it was so difficult?
Migrating live historical data without downtime.
The text says: "We had to copy two years of historical data into the new shards without downtime, using a dual-write phase where every write went to both the old and new systems while we verified consistency."
This is an inference question — the answer is not stated as "the reason it was hard" in one sentence; you assemble it from the details: two years of data + without downtime + verifying consistency all point to the operational difficulty of a live backfill.
Why the distractors fail:
Option A — no language change is mentioned; the author actually says "the hardest part was not the code."
Option C confuses the earlier vertical scaling ceiling with the backfill.
Option D names a real challenge of sharding, but the passage does not tie it to the backfill specifically.
Tip: when a question asks what you can infer, the correct answer is supported by the text but not copied word-for-word as the reason.
3 / 5
Passage: Database Sharding — Engineering Blog Post
Title: How We Cut Our p99 Latency by 60% — Lessons From Sharding Our Database
For three years, our single Postgres primary handled everything. It worked beautifully —
until it didn't. By Q2 our write throughput had tripled, and the p99 latency on our
checkout endpoint crept from 120 ms to over 800 ms during peak hours. Vertical scaling
bought us time, but we were running on the largest instance our provider offered. We had
hit a ceiling.
We considered three options:
1. Read replicas. These help read-heavy workloads, but our bottleneck was writes.
Replicas would not have moved the needle.
2. Caching everything. We already cached aggressively; the remaining traffic was
genuinely dynamic.
3. Sharding. Splitting the data across multiple primaries by customer ID.
We chose sharding, fully aware it is the option of last resort. It introduces real
complexity: cross-shard queries become expensive, transactions can no longer span the
whole dataset, and re-sharding later is painful.
The migration took four months. The hardest part was not the code — it was the
backfill. We had to copy two years of historical data into the new shards without
downtime, using a dual-write phase where every write went to both the old and new
systems while we verified consistency.
The result: p99 dropped to 320 ms and write throughput is now effectively unbounded.
Would we do it again? Yes — but only because we genuinely had no alternative. If you
can solve your problem with a read replica or a cache, do that first. Sharding is
powerful, but the operational cost is permanent.
In context, what does the author mean by "Vertical scaling bought us time"?
"Bought us time" = a temporary reprieve, not a fix.
Vertical scaling means making a single machine bigger (more CPU/RAM), as opposed to horizontal scaling (adding more machines). The idiom "bought us time" means it postponed the problem. The very next sentence confirms this: "but we were running on the largest instance our provider offered. We had hit a ceiling."
Vocabulary in context:
vertical scaling / scaling up — bigger single machine.
horizontal scaling / scaling out — more machines (this is option C, a different concept).
bought us time — bought breathing room; a delay, not a cure.
Option B reads "bought time" literally (purchasing) — a classic trap when an idiom is taken at face value. Always check whether a phrase is idiomatic by testing it against the surrounding sentences.
4 / 5
Passage: Monolith to Microservices — Engineering Blog Post
Title: We Migrated From a Monolith to Microservices. Here's What Nobody Told Us.
The promise of microservices is seductive: independent deployments, team autonomy,
and the ability to scale services individually. After eighteen months on the other
side, we can confirm those benefits are real. But they came at a cost we underestimated.
What we expected and got:
- Teams ship independently. No more coordinated "release trains."
- We scale the payment service separately from the search service.
What surprised us:
- Debugging got dramatically harder. A single user request now touches seven
services. When something breaks, you cannot just read one stack trace — you need
distributed tracing, and you need it from day one. We added it on day ninety,
and those first three months were genuinely painful.
- "Simple" changes now span multiple repositories and require coordinated deploys
anyway — exactly the coupling we were trying to escape.
- Local development became a nightmare. Running the full system on a laptop is no
longer practical.
If we started over, we would not begin with microservices. We would build a
well-structured monolith first, with clear internal module boundaries, and extract
services only when a specific scaling or team-ownership pain forced our hand. The
boundaries you draw on a whiteboard before you understand the domain are almost
always wrong.
What is the author's overall stance on starting a new project with microservices?
Start with a structured monolith; extract services later.
The conclusion is explicit: "If we started over, we would not begin with microservices. We would build a well-structured monolith first... and extract services only when a specific scaling or team-ownership pain forced our hand."
Reading the author's stance — watch for nuance:
This is not "microservices are bad" (option D). The author confirms the benefits are "real".
It is not indifference (option C). They have a clear preferred order.
It is a conditional recommendation: monolith first, then split when forced.
Signal phrase:"If we started over..." introduces the author's considered recommendation with the benefit of hindsight — a reliable place to find the thesis in a retrospective post.
5 / 5
Passage: Monolith to Microservices — Engineering Blog Post
Title: We Migrated From a Monolith to Microservices. Here's What Nobody Told Us.
The promise of microservices is seductive: independent deployments, team autonomy,
and the ability to scale services individually. After eighteen months on the other
side, we can confirm those benefits are real. But they came at a cost we underestimated.
What we expected and got:
- Teams ship independently. No more coordinated "release trains."
- We scale the payment service separately from the search service.
What surprised us:
- Debugging got dramatically harder. A single user request now touches seven
services. When something breaks, you cannot just read one stack trace — you need
distributed tracing, and you need it from day one. We added it on day ninety,
and those first three months were genuinely painful.
- "Simple" changes now span multiple repositories and require coordinated deploys
anyway — exactly the coupling we were trying to escape.
- Local development became a nightmare. Running the full system on a laptop is no
longer practical.
If we started over, we would not begin with microservices. We would build a
well-structured monolith first, with clear internal module boundaries, and extract
services only when a specific scaling or team-ownership pain forced our hand. The
boundaries you draw on a whiteboard before you understand the domain are almost
always wrong.
The author writes that they added distributed tracing "on day ninety." What does the article imply you should do differently?
Adopt distributed tracing from day one.
The text says debugging got harder because a request "touches seven services", and: "you need distributed tracing, and you need it from day one. We added it on day ninety, and those first three months were genuinely painful." The lesson is implied through their regret: starting at day ninety was a mistake; day one is the recommendation.
Inference from a cautionary anecdote: When an author describes a painful delay ("those first three months were genuinely painful"), the implied advice is to avoid that delay. The article does not say "do it on day one" as a command — it shows the cost of not doing so.
Why the distractors fail:
Option A inverts the message — they argue tracing is essential.
Option C restates the exact thing the author says fails in microservices ("you cannot just read one stack trace").