5 exercises on API rate-limiting algorithms and client behavior.
0 / 5 completed
1 / 5
How does the token bucket algorithm work?
The token bucket algorithm models rate limiting as a bucket that refills with tokens at a fixed rate up to a maximum capacity. Each incoming request must take a token; if the bucket is empty the request is throttled. Because tokens accumulate when traffic is low, the bucket permits short bursts up to its capacity while enforcing a sustained average rate over time. This burst tolerance makes it the most popular rate-limiting algorithm for APIs, balancing fairness with flexibility.
2 / 5
How does the leaky bucket differ from the token bucket?
The leaky bucket models requests as water poured into a bucket with a hole that leaks at a constant rate. Requests queue in the bucket and are processed (leak out) at that fixed rate; if the bucket overflows, excess requests are dropped. Unlike the token bucket, it does not permit bursts to pass through — it enforces a perfectly smooth, constant output rate, shaping bursty input into steady traffic. This makes it ideal where downstream systems need a predictable, even load rather than spikes.
3 / 5
What is a quota in API rate limiting?
A quota is a cap on the total number of requests (or units of work) a client may make over a longer period — typically a day or a billing month — as opposed to the short-window throttling of a rate limit. Quotas often tie to pricing tiers: a free plan might allow 1,000 calls/day while a paid plan allows millions. They are tracked per API key or account and reset on a schedule. Rate limits protect short-term system stability; quotas govern longer-term fair use and monetization.
4 / 5
What does HTTP status 429 mean?
HTTP 429 Too Many Requests signals that the client has sent too many requests in a given window and has hit a rate limit. A well-behaved API includes a Retry-After header telling the client how long to wait before trying again, and often RateLimit-* headers exposing the limit, remaining allowance, and reset time. Clients should treat 429 as a cue to back off rather than hammering the endpoint, since continued requests prolong the throttling and waste resources on both sides.
5 / 5
What is exponential backoff when a client is throttled?
Exponential backoff is a retry strategy where the delay between attempts grows multiplicatively — say 1s, 2s, 4s, 8s — after each failure or 429 response. This relieves pressure on an overloaded or recovering server instead of hammering it. Adding random jitter to each delay prevents the thundering herd problem, where many clients that failed together retry in perfect sync and re-overwhelm the server. Combined with a maximum retry cap and respect for Retry-After, backoff makes clients resilient and well-behaved.