5 exercises — practice "millions of users", "billions of records", "at scale", and converting abstract large numbers into actionable metrics for architecture decisions.
0 / 5 completed
Key scaling vocabulary and conversions
"100B calls/day ÷ 86,400s = ~1.16M RPS average; plan for 2-5x peak = ~2.3-5.8M RPS."
"MAU → DAU → peak concurrent: rough rule: DAU = 10-30% of MAU; peak concurrent = 5-10% of DAU."
"5 billion records — how big per record? 100 bytes = 500GB; 10KB = 50TB. Very different architectures."
"At scale" is only meaningful with a specific dimension and threshold: "at 10M rows" or "at 100K RPS"."
"Capacity plan for peak, not average — peak typically runs 2-5x the daily average RPS."
1 / 5
An architect says "this design works at our current scale but won't work at 10x." What does "10x scale" typically mean in this context?
Option B gives the complete definition of "10x scale." The critical nuance: (1) "scale" refers to load dimensions (users, RPS, data volume, storage) — which dimension matters depends on the bottleneck, (2) non-linear failure modes — many problems don't appear until scale: lock contention, N+1 queries, fan-out amplification, cache stampedes, and distributed coordination cost all worsen faster than linearly with scale, (3) the word "catastrophically" is important — scaling failures are often sudden, not gradual. Option A only mentions data size. Option C mentions server power — horizontal scaling (more servers) is usually preferred over vertical scaling (bigger servers). Option D is the worst-case outcome but not a definition of "10x scale."
2 / 5
A product announcement says "our platform serves millions of users." Which follow-up question extracts the most useful technical information?
Option B asks the technically meaningful follow-up. The "millions of users" claim is one of the most context-free numbers in tech marketing. The key disambiguation: (1) Monthly Active Users (MAU) — users who logged in at least once in 30 days (lowest bar), (2) Daily Active Users (DAU) — users who used the service today, (3) Peak Concurrent Users — the highest number of simultaneously active sessions, which is the actual engineering challenge. The real infrastructure numbers follow from peak concurrency, not total user count. Rule of thumb: MAU → DAU is typically 10-30% of MAU; peak concurrent is roughly 5-10% of DAU. 1M MAU could mean 5,000-30,000 peak concurrent users — a 6x difference in infrastructure requirements. Option A just seeks a more precise vague number.
3 / 5
A database stores "5 billion records." Which phrase correctly contextualises this number for a technical discussion?
Option B provides the technically complete analysis. Five billion records is ambiguous without record size: (1) 100-byte records: 5B × 100B = 500GB — this fits on a single high-end server (e.g., 8TB NVMe SSD), and with appropriate indexes can be served by a single PostgreSQL instance, (2) 10KB records: 5B × 10KB = 50TB — this absolutely requires distributed storage. The 100:1 difference in infrastructure requirements from a seemingly minor record size difference is the core lesson. Additionally: query patterns matter enormously — 5B records with narrow, indexed queries are much easier than 5B records requiring full-table scans. Option A makes a universal claim that's wrong for small records. Option C assumes distributed architecture is always necessary for large counts. Option D recommends Spark/Hadoop based on record count alone — Spark is for processing, not necessarily storage.
4 / 5
A team says "we handle 100 billion API calls per day." How do you convert this to a more actionable metric for capacity planning?
Option B is the complete capacity planning conversion. The math: 100,000,000,000 ÷ 86,400 (seconds in a day) = 1,157,407 ≈ 1.16M RPS average. But the key insight Option B adds: average RPS is not the right metric for capacity planning. Systems must be designed for peak load, which for most consumer services runs 2-5x the daily average (morning commute, lunch break, post-work hours are common peak periods). So the planning target should be 2.3-5.8M RPS. Option A defers the calculation entirely. Option C makes an arithmetic error: 100B / 86,400 = ~1.16M, not 100,000. Option D converts to per-hour (correct) but doesn't go further to RPS which is what infrastructure capacity requires. Critical principle: capacity plan for peak, not average.
5 / 5
Which phrase most accurately describes "at scale" in a technical architecture context?
Option B gives the intellectually precise definition. "At scale" is one of the most overused and under-defined phrases in software engineering. The correct view: (1) scale is multi-dimensional — users, data size, RPS, geographic distribution, team size — all are different dimensions of scale, (2) problems are scale-relative — N+1 queries are fine at 1,000 records, slow at 100,000, catastrophic at 100M, (3) the phrase requires a number and a dimension to be useful — "at scale" without "at what scale?" is not actionable. Option A defines scale as millions of users — too narrow (a 10GB dataset might be "at scale" for an in-process cache even if there's only 1 user). Option C makes a universal distributed systems claim — not always true. Option D defines scale by SLA violation — this is a consequence of scale, not the definition.