Vocabulary for API Gateway Patterns: BFF, Rate Limiting and Auth Delegation
Master the English vocabulary for API gateway patterns including Backend for Frontend, rate limiting, auth delegation, and request routing in microservices.
API gateways are the front door of a microservices architecture. Discussing them fluently in English — in design reviews, architecture discussions, or technical interviews — requires precise vocabulary for routing, security, and aggregation patterns.
What Is an API Gateway?
An API gateway is a server that acts as a single entry point for client requests. It handles cross-cutting concerns so individual services do not have to.
Common responsibilities:
- Request routing — directing requests to the correct backend service
- Authentication and authorisation — verifying identity and permissions
- Rate limiting — controlling request volume per client
- Load balancing — distributing requests across service instances
- Protocol translation — converting between REST, gRPC, WebSocket
In sentences
- “The gateway routes requests based on the URL path and HTTP method.”
- “We centralise authentication at the gateway rather than in each service.”
- “The gateway aggregates responses from three microservices into a single payload.”
Backend for Frontend (BFF)
The Backend for Frontend (BFF) pattern creates a dedicated API layer for each type of client (mobile, web, third-party).
“Each frontend has its own BFF that tailors the response shape to its specific needs.”
Why use BFF?
- “The mobile BFF reduces payload size by returning only the fields the app needs.”
- “The web BFF can aggregate data from five services so the browser makes one request.”
- “Different clients have different latency tolerances — the BFF optimises for each.”
Vocabulary
- spin up a BFF — create a dedicated backend layer for a client
- the BFF owns — the BFF is responsible for orchestrating the underlying service calls
- client-specific API contract — the interface tailored to one frontend’s needs
Rate Limiting
Rate limiting (also called throttling) restricts how many requests a client can make in a given time window.
Types
- Fixed window — “Allows 100 requests per minute. The counter resets at the start of each minute.”
- Sliding window — “Counts requests in a rolling window of the last 60 seconds.”
- Token bucket — “Clients accumulate tokens at a fixed rate; each request consumes one.”
- Leaky bucket — “Requests are processed at a fixed rate regardless of burst.”
In design discussions
- “We enforce a rate limit of 1,000 requests per hour per API key.”
- “Clients that exceed the limit receive a
429 Too Many Requestsresponse.” - “We allow bursting up to 200 requests in any 10-second window.”
- “The rate limit is applied per tenant, not globally.”
Response headers for rate limiting
Engineers often discuss these in code reviews:
X-RateLimit-Limit— the maximum requests allowedX-RateLimit-Remaining— how many requests are leftRetry-After— when the client can try again
Auth Delegation
Auth delegation means the gateway handles authentication so downstream services can trust the validated identity passed along in headers.
Patterns
- JWT validation at the gateway — “The gateway verifies the JWT signature; services only parse the decoded claims.”
- OAuth 2.0 token introspection — “The gateway calls the auth server to validate the opaque token before forwarding.”
- mTLS (mutual TLS) — “Services authenticate each other using certificates, not tokens.”
Key phrases
- “The gateway strips the external token and injects an internal service identity header.”
- “Downstream services trust the gateway’s identity assertion.”
- “We use JWKS (JSON Web Key Sets) to validate token signatures without a round-trip to the auth server.”
- “The gateway enforces scope-based authorisation — services see only pre-authorised requests.”
Request Routing
Request routing directs incoming traffic to the correct backend service based on rules.
Types of routing
- Path-based routing —
/api/users/*→ User Service - Header-based routing —
X-Version: v2→ New service version - Weighted routing — “80% to stable, 20% to canary”
- Content-based routing — “Route requests with
Content-Type: application/xmlto the legacy handler”
In conversation
- “We route by path prefix —
/payments/goes to the Payment Service.” - “The gateway rewrites the path before forwarding to the backend.”
- “Traffic is split 95/5 between the stable and canary releases.”
- “We use host-based routing to serve different tenants from the same gateway.”
Common Gateway Vocabulary Table
| Term | Definition |
|---|---|
| upstream | The backend service the gateway calls |
| downstream | The client that calls the gateway |
| passthrough | Forwarding a request with minimal modification |
| circuit breaker | Stops routing to a failing backend after a threshold |
| sticky session | Routes requests from the same client to the same instance |
| egress | Traffic leaving the system |
| ingress | Traffic entering the system |
Key Takeaways
- API gateway — single entry point that handles routing, auth, rate limiting, and load balancing.
- BFF — a dedicated gateway layer per client type (mobile, web, third-party).
- Rate limiting — enforced per API key/tenant using fixed window, sliding window, or token bucket.
- Auth delegation — the gateway validates identity so services can trust the downstream assertion.
- Request routing — path-based, header-based, weighted, or content-based rules.
- Key verbs: route, enforce, aggregate, validate, inject, strip, rewrite, split.