Vocabulary for API Rate Limiting: 30 Terms Explained in Context

Learn essential English vocabulary for API rate limiting: throttling, token bucket, backoff, 429 responses, quotas, and how to use each term correctly in conversation.

Rate limiting controls how many requests a client can make to an API in a given time. Every backend engineer talks about it, but the vocabulary trips up non-native speakers because many terms are metaphors (bucket, leak, burst) or look similar (throttle vs limit vs quota). This guide explains 30 essential terms in context, so you can both understand the docs and speak about rate limiting naturally.


The foundational terms

TermMeaningIn a sentence
Rate limitMax requests per time window”The rate limit is 100 requests per minute.”
ThrottleTo deliberately slow down requests”We throttle clients that exceed the limit.”
QuotaTotal allowance over a long period”Your daily quota is 10,000 calls.”
BurstA short spike of requests”The client sent a burst of 500 at once.”
BackoffWaiting longer between retries”Use exponential backoff on 429s.”

The key distinction: a rate limit is requests per short window (per second/minute); a quota is the total over a long period (per day/month). Don’t use them interchangeably.

“You’re within your monthly quota, but you hit the per-second rate limit because you sent everything in one burst.”


Throttle vs limit vs reject

These three verbs describe different responses to too many requests:

  • Throttle — slow the client down (often by delaying responses)
  • Limit — cap the allowed rate
  • Reject / drop — refuse extra requests outright

“When you exceed the limit, we don’t drop your requests immediately — we throttle you first, then start rejecting if you keep pushing.”

The HTTP status code for “you’ve been rate limited” is 429 Too Many Requests. Engineers say “you got a 429” or “the API 429’d us.”


The algorithms (and their metaphors)

Rate-limiting algorithms borrow vivid metaphors. Knowing the imagery makes the vocabulary stick.

AlgorithmMetaphorPlain meaning
Token bucketA bucket fills with tokens; each request spends oneAllows bursts up to the bucket size
Leaky bucketRequests drip out at a steady rateSmooths traffic to a constant rate
Fixed windowA counter resets every minuteSimple but allows edge spikes
Sliding windowA rolling time windowSmoother than fixed window

“We use a token bucket because it tolerates short bursts — the bucket holds 100 tokens, refilling at 10 per second. A leaky bucket would smooth that out but reject bursts.”

Note the verbs: the bucket fills and refills; requests consume or spend tokens; when empty, requests are rejected until it refills.


Retry vocabulary

When a client gets rate-limited, it should retry intelligently. The vocabulary here is precise.

TermMeaning
RetryTry the request again
BackoffIncrease the wait between retries
Exponential backoffDouble the wait each time (1s, 2s, 4s…)
JitterRandom variation added to backoff
Retry-AfterA header telling you when to retry

“Respect the Retry-After header. If it’s missing, fall back to exponential backoff with jitter so all clients don’t retry at the same instant — that’s the thundering herd problem.”

Thundering herd describes many clients retrying simultaneously, overwhelming the server again. Jitter prevents it.


Server-side vocabulary

TermMeaning
Per-key limitLimit applied per API key
Per-IP limitLimit applied per IP address
Global limitA cap on total traffic
Soft limitA warning threshold
Hard limitAn absolute cap, enforced strictly
Burst allowanceExtra headroom for short spikes

“We enforce a per-key rate limit with a small burst allowance, plus a global hard limit to protect the backend during traffic spikes.”

The contrast between soft limit (warns you) and hard limit (stops you) is worth memorising.


Phrases for discussing rate limits in meetings

  • “We’re getting rate limited by the third-party API.”
  • “Let’s back off and retry instead of hammering it.”
  • “We’re hitting the ceiling on the per-key quota.”
  • “Can we request a higher limit from the vendor?”
  • “The client isn’t respecting the Retry-After header.”

“Our integration keeps getting 429’d because we’re not backing off. We’re effectively hammering their API. Let’s add exponential backoff with jitter and respect the Retry-After header.”

The verb hammer (to send too many requests aggressively) is common and useful.


Common mistakes

  1. Confusing quota and rate limit. Quota = long-term total; rate limit = short-term rate. They’re enforced separately.
  2. Saying “limited” when you mean “throttled.” Being throttled means slowed; being limited/rejected means blocked.
  3. Using “burst” as a verb wrongly. Say “a burst of requests” (noun) or “traffic bursts” — not “we bursted the API.”
  4. Mispronouncing “quota.” It’s /ˈkwoʊtə/ — “KWOH-tuh,” not “koo-OH-ta.”
  5. Forgetting “exponential.” It’s “exponential backoff,” not “exponentional” — practise the stress: ex-po-NEN-tial.

A mini-dialogue using the vocabulary

A: “Why are we getting 429s from the payments API?”

B: “We’re sending requests in a burst at the top of every hour. We blow through the token bucket instantly.”

A: “Can we smooth that out?”

B: “Yes — add jitter so we don’t all fire at once, and respect their Retry-After. Long term, we should ask for a higher rate limit, but we’re still well under our daily quota.”


Quick reference glossary

  • Idempotent retry — retrying safely without side effects
  • Circuit breaker — stops calling a failing service entirely for a while
  • Rate limiter — the component that enforces the limit
  • Window — the time period the limit applies to
  • Headroom — spare capacity below the limit
  • Cooldown — a forced wait before you can retry

Key takeaways

  • Rate limit = per short window; quota = long-term total. Keep them distinct.
  • Throttle (slow), limit (cap), reject/drop (refuse) describe different responses.
  • Learn the bucket metaphors: token bucket tolerates bursts, leaky bucket smooths traffic.
  • On 429s, use exponential backoff with jitter and respect Retry-After to avoid the thundering herd.

Master these 30 terms and you’ll read rate-limiting docs effortlessly — and sound precise when your team debugs that next wave of 429s.