System Design Vocabulary: 40 Terms Every Senior Engineer Must Know
CAP theorem, CQRS, event sourcing, circuit breaker, saga — the 40 system design terms you must know to discuss distributed systems confidently in interviews and architecture reviews.
System design interviews and architecture discussions share a common language. Knowing the terms precisely — not just vaguely — is what separates junior answers from senior answers. These 40 terms cover the vocabulary you need to discuss distributed systems, databases, reliability, and deployment with confidence.
Distributed System Fundamentals
CAP theorem — You can guarantee only two of three properties simultaneously: Consistency (every read gets the latest write), Availability (every request gets a response), Partition tolerance (the system continues despite network partitions). In practice, partitions happen, so you choose between C and A.
Consistency models — A spectrum from strong consistency (reads always see the latest write) to eventual consistency (reads will eventually see the latest write). Intermediate models include read-your-writes, monotonic reads, and causal consistency.
Eventual consistency — A guarantee that, given no new updates, all replicas will converge to the same value. Used in many large-scale distributed systems where strict consistency is too costly.
Linearizability — The strongest consistency model: operations appear to happen atomically and in the real-time order they were issued. Phrase: “A linearisable system behaves as if there’s a single copy of the data.”
Sharding — Partitioning data across multiple nodes so that each node holds only a subset. Reduces load per node. Phrase: “We shard by user ID — each shard handles one-eighth of the user base.”
Replication — Copying data to multiple nodes for fault tolerance and read scalability. Types: leader-follower (primary-replica) and multi-leader (multi-primary).
Read replica — A copy of a database that accepts reads only, offloading read traffic from the primary. Phrase: “Analytics queries go to the read replica so they don’t impact write latency on primary.”
Write-ahead log (WAL) — A durability mechanism: before applying a change, the database writes it to an append-only log. If the system crashes, the WAL allows recovery.
Coordination and Consensus
Consensus — Agreement among distributed nodes on a single value (e.g. who is the current leader). Algorithms: Raft, Paxos. Phrase: “Raft is used for consensus — a majority of nodes must agree before a value is committed.”
Leader election — The process by which distributed nodes agree on which node is currently the leader, responsible for coordinating writes. Common in databases (primary selection), queues, and coordination services.
Distributed lock — A lock held across multiple nodes to prevent concurrent modifications to a shared resource. More complex than a local mutex because network failures can cause the lock holder to disappear.
Two-phase commit (2PC) — A protocol for distributed transactions: a coordinator asks all participants to prepare, then tells them to commit or abort. Ensures atomicity but is blocking if the coordinator crashes.
Idempotency — An operation is idempotent if applying it multiple times produces the same result as applying it once. Critical for safe retries. Phrase: “Use an idempotency key in the payment API so retries don’t double-charge.”
Exactly-once delivery — A guarantee that a message is delivered and processed exactly once, not zero times (at-most-once) or more than once (at-least-once). Extremely difficult to achieve in distributed systems.
Patterns and Architecture
CQRS (Command Query Responsibility Segregation) — Separating the write model (commands) from the read model (queries). Write and read sides can use different databases optimised for their respective workloads.
Event sourcing — Storing state as a sequence of immutable events rather than the current value. Current state is derived by replaying events. Phrase: “With event sourcing, you can replay history to see what the account balance was at any point in time.”
Saga — A pattern for managing distributed transactions as a sequence of local transactions, each publishing an event or message to trigger the next step. Failures trigger compensating transactions.
Message queue — A durable buffer between producers and consumers, decoupling them in time and throughput. Examples: Kafka, RabbitMQ, SQS. Phrase: “The order service publishes to the queue; the fulfilment service consumes at its own pace.”
Backpressure — A signal from a slow consumer to a fast producer to slow down. Prevents overwhelming the consumer.
Bulkhead — Isolating components so a failure in one does not cascade to others, similar to watertight compartments in a ship. Phrase: “We applied the bulkhead pattern — the payment service has its own thread pool so it can’t starve the checkout service.”
Reliability and Traffic Management
Circuit breaker — A pattern that stops calling a failing service after a threshold of errors, allowing it time to recover. States: closed (normal), open (blocking calls), half-open (testing recovery). Phrase: “The circuit breaker tripped — we’re returning cached data until the upstream recovers.”
Rate limiting — Restricting the number of requests a client can make in a time window. Algorithms: token bucket, leaky bucket, sliding window. Phrase: “The API is rate-limited to 1,000 requests per minute per API key.”
Load balancing — Distributing incoming traffic across multiple servers. Algorithms: round-robin, least connections, consistent hashing.
CDN (Content Delivery Network) — A geographically distributed network of servers that caches static content close to users, reducing latency and origin server load.
Caching layers — Multiple levels of caching: browser cache, CDN edge cache, application-level cache (Redis, Memcached), database query cache.
API gateway — A single entry point for external clients, handling routing, authentication, rate limiting, and protocol translation.
Service mesh — Infrastructure layer for service-to-service communication: mTLS, retries, circuit breaking, observability. Examples: Istio, Linkerd.
Service discovery — The mechanism by which services find each other’s network locations. Examples: Consul, Kubernetes DNS, AWS Cloud Map.
Health check — An endpoint or probe that indicates whether a service instance is healthy and ready to serve traffic.
Deployment and Database
Blue-green deployment — Two environments; switch traffic between them for zero-downtime releases.
Canary deployment — Gradually shift traffic to a new version; roll back if metrics degrade.
Feature flag — A runtime switch to enable or disable features without deploying code.
Database partitioning — Dividing a table into smaller pieces (partitions) stored separately. Improves query performance on large tables.
Connection pooling — Reusing database connections rather than opening a new one per request. Reduces connection overhead.
Index — A data structure that speeds up read queries at the cost of write overhead and storage.
Query planner — The database component that determines the optimal execution plan for a query.
Optimistic locking — Assume no conflict; check at commit time. Efficient for low-contention scenarios.
Pessimistic locking — Lock the resource when reading, preventing concurrent modification.
Race condition — A bug where the outcome depends on the timing of concurrent operations.
Deadlock — Two or more transactions each waiting for the other’s lock, resulting in a standstill.
Practice: In your next system design discussion or interview, consciously use at least ten of these terms in their correct context. Recording yourself answering a mock system design question and replaying it is one of the most effective ways to spot gaps in your technical vocabulary.