Advanced Reading #scalability #sharding #caching #system-design

Reading Scalability Descriptions

5 exercises on reading scalability architecture: horizontal vs vertical scaling, stateless design, database sharding, caching TTL trade-offs, load balancer placement, and auto-scaling policies.

Scalability vocabulary quick reference
  • Vertical scaling (scale up) — bigger machine; simple but has a ceiling
  • Horizontal scaling (scale out) — more machines; requires stateless design
  • Sharding — split data across multiple DB instances; improves writes, complicates cross-shard queries
  • Cache hit/miss — hit: fast (Redis); miss: slow (DB fetch + cache store)
  • TTL — balance: long TTL = high hit rate + stale risk; short TTL = fresh data + more misses
  • Thrashing — rapidly scaling in and out; prevented by asymmetric evaluation windows
0 / 5 completed
1 / 5

Read this scalability description and answer the question:

Scaling the API Tier

The API tier currently runs on a single server with 16 CPU cores and 64 GB of RAM. During peak traffic, the server reaches 90% CPU utilisation. The team is evaluating two options:

Option A — Vertical scaling: Upgrade the current server to 64 CPU cores and 256 GB of RAM. This requires a maintenance window and a full restart. There is a hard upper limit to how powerful a single machine can be.

Option B — Horizontal scaling: Add more servers of the same size (16 cores, 64 GB each) behind a load balancer. No downtime required — new servers are added to the load balancer pool while the existing server continues handling traffic. The application must be stateless for this to work.
Why must the application be stateless for horizontal scaling to work correctly?