Reading Scalability Descriptions
5 exercises on reading scalability architecture: horizontal vs vertical scaling, stateless design, database sharding, caching TTL trade-offs, load balancer placement, and auto-scaling policies.
Scalability vocabulary quick reference
- Vertical scaling (scale up) — bigger machine; simple but has a ceiling
- Horizontal scaling (scale out) — more machines; requires stateless design
- Sharding — split data across multiple DB instances; improves writes, complicates cross-shard queries
- Cache hit/miss — hit: fast (Redis); miss: slow (DB fetch + cache store)
- TTL — balance: long TTL = high hit rate + stale risk; short TTL = fresh data + more misses
- Thrashing — rapidly scaling in and out; prevented by asymmetric evaluation windows
0 / 5 completed
1 / 5
Read this scalability description and answer the question:
Scaling the API Tier
The API tier currently runs on a single server with 16 CPU cores and 64 GB of RAM. During peak traffic, the server reaches 90% CPU utilisation. The team is evaluating two options:
Option A — Vertical scaling: Upgrade the current server to 64 CPU cores and 256 GB of RAM. This requires a maintenance window and a full restart. There is a hard upper limit to how powerful a single machine can be.
Option B — Horizontal scaling: Add more servers of the same size (16 cores, 64 GB each) behind a load balancer. No downtime required — new servers are added to the load balancer pool while the existing server continues handling traffic. The application must be stateless for this to work.
Why must the application be stateless for horizontal scaling to work correctly?The API tier currently runs on a single server with 16 CPU cores and 64 GB of RAM. During peak traffic, the server reaches 90% CPU utilisation. The team is evaluating two options:
Option A — Vertical scaling: Upgrade the current server to 64 CPU cores and 256 GB of RAM. This requires a maintenance window and a full restart. There is a hard upper limit to how powerful a single machine can be.
Option B — Horizontal scaling: Add more servers of the same size (16 cores, 64 GB each) behind a load balancer. No downtime required — new servers are added to the load balancer pool while the existing server continues handling traffic. The application must be stateless for this to work.
Stateless means each request carries all the information needed to process it — no server-local state that another server would not have.
The problem with stateful horizontal scaling:
Imagine a user logs in and the login session is stored in the memory of Server 1. When they make the next request, the load balancer routes it to Server 2. Server 2 has no knowledge of the session — the user is now logged out, or gets an error.
How to make an application stateless:
The problem with stateful horizontal scaling:
Imagine a user logs in and the login session is stored in the memory of Server 1. When they make the next request, the load balancer routes it to Server 2. Server 2 has no knowledge of the session — the user is now logged out, or gets an error.
How to make an application stateless:
- Store session data in a shared external store (Redis, database) — any server can look up the session
- Use stateless authentication: JWT tokens carry all user info in the token itself — no server-side session needed
- Store uploaded files in object storage (S3, GCS), not on the local server filesystem
- Vertical (scale up) → bigger machine: more CPU, RAM. Simple, but has a physical ceiling and requires downtime. A single point of failure.
- Horizontal (scale out) → more machines: add instances behind a load balancer. No ceiling in theory, no downtime, but requires stateless design.
- stateless → each request is self-contained; the server holds no client-specific memory between requests
- stateful → the server retains state between requests (e.g. in-memory session)
- load balancer → distributes incoming requests across a pool of servers
- maintenance window → a scheduled period of planned downtime for infrastructure changes
Next up: More Reading Exercises →