Scalability Risk Assessment
5 exercises — master scalability risk vocabulary for TDD: horizontal vs vertical scaling, stateful vs stateless services, SPOF identification and communication, bottleneck analysis (N+1 query, connection pool exhaustion, synchronous pipeline), load testing vocabulary (RPS, P99 latency, error rate, soak test), and scalability architecture language (cache hit rate, CDN offload, read replica).
- Scalability ceiling: the load at which architecture fails without redesign. Key probe: "What breaks first at 10x load?"
- Horizontal scaling: add more instances (requires stateless services). Vertical scaling: add more resources to existing (hard ceiling, not elastic).
- SPOF: any component without redundancy whose failure = total outage. Trace every critical path and ask "if this fails, what happens?"
- Bottleneck types: N+1 query (loop with DB call inside), connection pool exhaustion, synchronous pipeline, missing index, I/O-bound vs CPU-bound.
- Load test vocabulary: RPS (load unit), P99 latency (tail latency), error rate (% 5xx under load). No load tests = unquantified ceiling = risk finding.
- Architecture quality signals: stateless services ✓, high cache hit rate ✓, CDN offload ✓, read replicas ✓, async queue pattern ✓.
A VC partner asks the TDD team: "The company currently has 50,000 active users. Post-acquisition, our growth model projects 500,000 within 18 months. Before we sign, I need to know: can this platform handle that?"
The TDD lead replies: "That's exactly the right question. Let me walk you through how we frame scalability risk, and what the architecture tells us about their ceiling."
How do you define and frame scalability risk in a technical due diligence context?
Scalability dimension assessment:
| Dimension | Healthy indicator | Risk indicator |
|---|---|---|
| Application tier | Stateless; horizontally auto-scalable (Kubernetes/ECS) | Stateful; sticky sessions; manual scaling only |
| Database tier | Read replicas; connection pooling; or managed cloud DB | Single primary DB, no replicas, high connection usage |
| Caching | Redis/Memcached layer; high cache hit rate (>80%) | No caching layer; all reads hit primary DB directly |
| CDN / static | Static assets served via CDN; origin shielded | All static assets served from application servers |
| Load test evidence | Recent load tests at 2–5x current peak documented | No load tests exist; ceiling unknown |
Scalability risk framing phrases:
• "The application tier is stateless and auto-scales via Kubernetes. The scalability risk is concentrated entirely in the database layer — a single PostgreSQL primary with no read replicas and a connection pool of 150."
• "The current architecture has a proven ceiling of approximately 10,000 concurrent users. Beyond that point, the primary database connection pool exhausts and the application returns errors. This is addressable in infrastructure, but requires 6–8 weeks of work."
Key vocabulary:
• Horizontal scaling (scale out) — adding more instances of a service to distribute load; requires stateless architecture; elastic, cost-proportional
• Vertical scaling (scale up) — adding more compute resources (CPU/RAM) to existing instances; limited by hardware ceiling; non-elastic
• Scalability ceiling — the load threshold at which the current architecture fails or requires architectural redesign (not just more hardware); the key TDD metric for investment risk
• Shared state bottleneck — a resource (database, cache, queue) that all application instances must access; because it cannot be easily replicated, it constrains horizontal scaling