Rate limit — the maximum number of API requests a client can make in a given time window (e.g., 100 requests per minute).
Quota — a longer-term usage ceiling, often monthly (e.g., 10,000 calls/month on the free tier).
Burst limit — allows a short spike of requests above the sustained rate limit before throttling kicks in.
Throttle — the act of slowing down or temporarily blocking requests that exceed the rate limit; the API returns HTTP 429 Too Many Requests.
X-RateLimit-Remaining — a response header telling the client how many requests it has left in the current window.
0 / 5 completed
1 / 5
A developer receives HTTP 429 Too Many Requests. What has happened?
HTTP 429 Too Many Requests is the standard status code for throttling. It means the client has exceeded the allowed rate. The response typically includes a Retry-After header indicating how many seconds to wait before retrying. Well-behaved API clients implement exponential backoff when they receive 429 responses.
2 / 5
What is the difference between a rate limit and a quota?
Both rate limits and quotas cap usage, but on different timescales. Rate limits protect infrastructure from sudden traffic spikes (per second or per minute). Quotas enforce business model constraints — for example, the free tier might allow 500 requests/day. When a quota is exhausted, the user typically cannot make more calls until the period resets or they upgrade their plan.
3 / 5
An API allows 60 requests/minute sustained but 200 requests in the first 10 seconds before throttling. The 200 request allowance is called a:
A burst limit accommodates legitimate traffic spikes without throttling immediately. Many APIs implement a token bucket algorithm: tokens accumulate at the sustained rate (e.g., 1 per second), up to a bucket capacity (e.g., 200). A burst drains the bucket quickly; subsequent requests are throttled until the bucket refills. This balances flexibility for bursty workloads with protection for shared infrastructure.
4 / 5
A response header X-RateLimit-Remaining: 23 tells a developer:
X-RateLimit-Remaining is a de facto standard response header (part of the IETF draft for rate limit headers). Together with X-RateLimit-Limit (total allowed) and X-RateLimit-Reset (Unix timestamp when the window resets), it lets clients implement adaptive throttling — slowing down proactively before hitting a 429. The Retry-After header is used specifically on 429 responses to indicate the wait time.
5 / 5
A SaaS API offers a free tier and a paid tier. A user has exhausted their free monthly quota. What options does the API typically offer?
When a quota resets depends on the billing cycle — typically the 1st of each month or the anniversary of sign-up. Alternatively, a tier upgrade immediately gives access to a higher quota. This moment — when a free user hits their quota — is a critical monetization event: the API product team designs this friction point intentionally to convert users to paid plans. Clear upgrade flows and proactive quota alerts are key conversion tactics.