5 exercises on the language of capacity: CPU and memory utilisation, saturating resources, throttling and rate-limiting, contention, capacity planning, and right-sizing.
Key patterns
CPU / memory utilisation (in use) vs. allocation (reserved)
A dashboard review notes: "Average CPU ___ is sitting at 85% across the fleet — we're close to the limit." Which noun is the standard term for how much of a resource is in use?
Utilisation — the fraction of a resource in use (US: utilization):
Utilisation is the proportion of a resource's capacity currently being used, usually a percentage.
"CPU utilisation is at 85%", "memory utilisation peaked at 92%", "disk/network utilisation"
Utilisation vs. the distractors — a key distinction:
Allocation — what you reserve/assign (e.g. a pod requests 2 CPUs); you can be allocated a lot but utilise little, which is waste
Saturation — the resource is fully used and work is queueing (utilisation near/at 100% with backlog)
Throughput — work completed per second; latency — time per request
In the USE method (Utilisation, Saturation, Errors), utilisation answers "how busy?" while saturation answers "is there a queue?". They are related but not the same.
2 / 5
A load test summary says: "At 30k RPS we ___ the database connection pool — every connection was busy and requests started queuing." Which verb means "drove a resource to its full capacity"?
Saturate a resource — drive it to full capacity:
To saturate a resource is to use 100% of it so that additional work must wait — the queue builds and latency climbs.
"At peak we saturated the NIC", "the disk is saturated", "we hit saturation on the connection pool"
Noun: saturation — one of the USE method's three signals (Utilisation, Saturation, Errors)
"Beyond saturation, every extra request just increases the queue depth and tail latency."
Utilisation vs. saturation: 100% utilisation alone is not always bad; saturation specifically means demand exceeds capacity so work backs up (run-queue length, pool wait time, etc.).
Why not the distractors?Provision = set up/allocate resources; deprecate = mark for removal; serialize = convert data to a storable/transmittable form. Only saturate describes pushing a resource to its limit until work queues.
3 / 5
An API gateway is configured so that "each client is allowed 100 requests per minute; beyond that we ___ them and return HTTP 429." Which verb describes capping a client's request rate?
Rate-limit — cap how many requests a client may make:
To rate-limit is to restrict the number of requests a client can make in a time window; excess requests are rejected, classically with HTTP 429 Too Many Requests.
"We rate-limit the public API to 100 req/min per key."
Closely related: throttle — to deliberately slow down or restrict throughput ("we throttled background jobs to protect the DB")
Rate-limit vs. throttle: both restrict, but rate-limit usually means a hard request cap (reject over the limit), while throttle often means slowing things down or reducing allotted resources. They overlap and are sometimes used interchangeably.
Distractors:cache = store responses to avoid recomputation; load-balance = distribute traffic across instances; replicate = copy data/services for redundancy. None cap a client's request rate — that is rate-limit.
4 / 5
A post-mortem explains a slowdown: "Two services fought over the same lock — that ___ added 200ms to every write." Which term names competition for a shared resource?
Resource contention — competition for a shared resource:
Contention is when multiple threads, processes, or services compete for the same resource — a lock, a connection-pool slot, CPU, or disk — and must wait their turn, adding latency.
"Lock contention serialised the writes", "contention on the connection pool", "the mutex is hot / highly contended"
Collocations: reduce / avoid / eliminate contention; "shard the data to cut contention"
Fixes: finer-grained locks, lock-free structures, sharding, increasing the pool, backing off retries.
Why not the distractors?
Fragmentation — data/free space scattered in non-contiguous chunks (memory/disk)
Redundancy — duplicating components so one can fail without downtime
Idempotency — an operation that can be repeated safely with the same result
When two things "fight over" the same resource, the precise word is contention.
5 / 5
A finance-driven review concludes: "These instances are massively over-provisioned; we should ___ them to cut cloud spend without hurting performance." Which verb means "match the instance size to the actual need"?
Right-size — match capacity to real demand:
To right-size (a resource, instance, or cluster) is to adjust its size so it matches actual utilisation — neither over-provisioned (wasting money) nor under-provisioned (risking saturation).
"We right-sized the nodes from 8 vCPU to 2 and cut cost 60%."
Capacity planning is the broader practice: forecasting future load and ensuring enough headroom to meet it without over-spending. Right-sizing is one of its tools, alongside autoscaling and setting sensible CPU/memory requests and limits.
Why not the distractors?Refactor restructures code; defragment reorganises storage; obfuscate obscures code. None concern instance sizing. When you tune capacity to match real need, the verb is right-size.