System Design Interview Vocabulary: Scalability, Availability, Consistency, and Partitioning
Master the English vocabulary for system design interviews — scalability patterns, availability trade-offs, consistency models, database partitioning, and distributed systems terminology.
System design interviews are as much a test of communication as they are of technical knowledge. Interviewers evaluate whether you can articulate trade-offs, reason through requirements, and use the correct vocabulary to describe distributed systems. Getting the words wrong — saying “scaling” when you mean “sharding,” or “availability” when you mean “durability” — signals a gap in your understanding. This guide covers the precise vocabulary you need to communicate clearly in system design discussions.
Scalability Vocabulary
Scalability — the ability of a system to handle increased load by adding resources. The fundamental distinction is between horizontal and vertical scaling.
Vertical scaling (scale up) — increasing the capacity of a single machine: more CPU, more RAM, faster storage. “We scaled up the database server from 16 GB to 64 GB of RAM to handle the increased query load.” Vertical scaling has a ceiling — eventually you cannot add more to a single machine.
Horizontal scaling (scale out) — adding more machines to distribute the load. “We scaled out the API tier from 3 to 12 instances behind the load balancer.” Horizontal scaling requires stateless services.
Stateless — a service that does not store client session information between requests. Any instance can handle any request. “The API must be stateless for horizontal scaling to work — session data goes in Redis, not in memory.”
Load balancer — a component that distributes incoming requests across multiple server instances. “Requests are distributed across the API instances by a Layer 7 load balancer using round-robin.”
Throughput — the number of requests a system can handle per unit of time, often measured in requests per second (RPS) or transactions per second (TPS). “Our target is 10,000 RPS at peak throughput.”
Latency — the time from a request being sent to a response being received. Often expressed as p50, p95, or p99 percentiles. “We need p99 latency under 200 ms — the p99 captures the worst 1% of requests, which is where user experience degrades most.”
Bottleneck — the component in a system that limits overall performance. “The bottleneck is the database — the API tier has spare capacity but queries are slow.”
Availability and Reliability Vocabulary
Availability — the proportion of time a system is operational and accessible. Usually expressed as a percentage: 99.9% (“three nines”), 99.99% (“four nines”). “We need five nines availability for the payment service — that’s less than 6 minutes of downtime per year.”
Reliability — the probability that a system performs its intended function without failure over a given period. Availability and reliability are related but distinct.
Fault tolerance — the ability of a system to continue operating correctly when one or more components fail. “The system is fault-tolerant: if a replica fails, requests are automatically routed to healthy replicas.”
Redundancy — having extra components that can take over if the primary fails. “We have redundancy in the database layer — a primary and two read replicas, with automatic failover.”
Failover — the automatic or manual switching to a backup system when the primary fails. “Failover to the secondary region completed in 90 seconds during the incident.”
Single Point of Failure (SPOF) — a component whose failure would bring down the entire system. “The current architecture has a SPOF — the message queue. If it goes down, the entire order pipeline stops.”
Graceful degradation — the ability of a system to continue providing partial functionality when components fail, rather than failing completely. “If the recommendation service is down, the homepage still loads — it just shows popular items instead of personalised recommendations. That’s graceful degradation.”
Circuit breaker — a pattern that stops a system from repeatedly trying to call a failing service, allowing it time to recover. “We added a circuit breaker to the payments integration — after 5 consecutive failures, the circuit opens and we return a friendly error instead of hammering the payment provider.”
Consistency and the CAP Theorem
Consistency — in distributed systems, every read receives the most recent write or an error. (Note: this is different from consistency in ACID transactions.) “Strong consistency ensures every node returns the same data, but it comes at a latency cost.”
The CAP theorem — in a distributed system, you can guarantee at most two of three properties simultaneously: Consistency (C), Availability (A), and Partition tolerance (P). In practice, partition tolerance is necessary, so the real trade-off is between consistency and availability.
Eventual consistency — a weaker consistency model where, given no new updates, all nodes will eventually return the same data. “DNS is a classic example of eventual consistency — a record change propagates across nameservers within minutes, not milliseconds.”
Strong consistency — every read reflects the most recent write across all nodes. Required for financial transactions. “We require strong consistency for the account balance service — users cannot see stale balances.”
ACID — the properties of reliable database transactions: Atomicity (all or nothing), Consistency (data always valid), Isolation (transactions do not interfere), Durability (committed data is saved permanently). “We use PostgreSQL because we need ACID guarantees for financial operations.”
BASE — an alternative to ACID for distributed systems: Basically Available, Soft state, Eventually consistent. Often used to describe NoSQL databases. “The analytics pipeline uses a BASE model — we can tolerate slightly stale data in exchange for high write throughput.”
Partitioning and Sharding Vocabulary
Partitioning — dividing data across multiple storage units to improve performance and manageability. Two main types: horizontal partitioning (sharding) and vertical partitioning.
Sharding (horizontal partitioning) — splitting rows of a table across multiple database instances based on a shard key. “We shard user data by user ID — users 1–1,000,000 are on shard A, 1,000,001–2,000,000 on shard B.”
Shard key — the field used to determine which shard a record belongs to. Choosing a bad shard key leads to hotspots. “We chose user ID as the shard key because queries are almost always scoped to a single user.”
Hotspot — a shard that receives disproportionately more traffic than others, limiting scalability. “Using a timestamp as the shard key created a hotspot on the latest shard — all writes were going to a single node.”
Replication — copying data from one database node to others for redundancy and read scaling. “We have one primary and three read replicas — all writes go to the primary, reads are distributed.”
Replication lag — the delay between a write on the primary and its appearance on a replica. “During peak load, replication lag can reach 2–3 seconds — users reading immediately after writing may see stale data.”
Caching Vocabulary
Cache — a fast, temporary storage layer that holds frequently accessed data to reduce load on slower backends. “We added a Redis cache in front of the database — cache hit rate is 85%, which reduced database load significantly.”
Cache hit / cache miss — a cache hit occurs when requested data is found in the cache; a miss means the cache does not have it and the request goes to the origin. “Our cache miss rate is too high — we need to increase the TTL or cache more aggressively.”
TTL (Time to Live) — the duration a cached item is considered valid before it must be refreshed. “The user profile cache has a TTL of 5 minutes.”
Eviction policy — the strategy for removing items from a cache when it is full. Common policies: LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First In First Out). “We use LRU eviction — the least recently accessed items are evicted first when the cache is full.”
Write-through cache — a cache that updates both cache and persistent storage simultaneously on every write. “A write-through cache ensures consistency but adds write latency.”
Write-back (write-behind) cache — a cache that acknowledges writes immediately and flushes to persistent storage asynchronously. “Write-back improves write throughput but risks data loss if the cache fails before flushing.”
Practical Exercises
-
Mock interview warm-up: Have a colleague ask “Design a URL shortener” and record yourself for 10 minutes. Review the recording and identify every time you used vocabulary from this article correctly — and any time you should have used a precise term but used a vague description instead.
-
Trade-off articulation: For each pair below, write two sentences explaining the trade-off: (a) consistency vs. availability, (b) horizontal vs. vertical scaling, (c) write-through vs. write-back cache.
-
Architecture vocabulary mapping: Find a public architecture blog post (AWS, Stripe, Netflix tech blogs are good sources). Identify 10 vocabulary terms from this article in context.
-
CAP theorem positioning: For each of these systems, state whether they prioritise CP or AP, and why: (a) a bank’s account balance service, (b) a social media feed, (c) a distributed configuration store.
Practise interview vocabulary and system design discussion with the interview exercises on Coders Lingo.