Vocabulary for API Gateway Patterns: BFF, Rate Limiting and Auth Delegation

Master the English vocabulary for API gateway patterns including Backend for Frontend, rate limiting, auth delegation, and request routing in microservices.

API gateways are the front door of a microservices architecture. Discussing them fluently in English — in design reviews, architecture discussions, or technical interviews — requires precise vocabulary for routing, security, and aggregation patterns.


What Is an API Gateway?

An API gateway is a server that acts as a single entry point for client requests. It handles cross-cutting concerns so individual services do not have to.

Common responsibilities:

  • Request routing — directing requests to the correct backend service
  • Authentication and authorisation — verifying identity and permissions
  • Rate limiting — controlling request volume per client
  • Load balancing — distributing requests across service instances
  • Protocol translation — converting between REST, gRPC, WebSocket

In sentences

  • “The gateway routes requests based on the URL path and HTTP method.”
  • “We centralise authentication at the gateway rather than in each service.”
  • “The gateway aggregates responses from three microservices into a single payload.”

Backend for Frontend (BFF)

The Backend for Frontend (BFF) pattern creates a dedicated API layer for each type of client (mobile, web, third-party).

“Each frontend has its own BFF that tailors the response shape to its specific needs.”

Why use BFF?

  • “The mobile BFF reduces payload size by returning only the fields the app needs.”
  • “The web BFF can aggregate data from five services so the browser makes one request.”
  • “Different clients have different latency tolerances — the BFF optimises for each.”

Vocabulary

  • spin up a BFF — create a dedicated backend layer for a client
  • the BFF owns — the BFF is responsible for orchestrating the underlying service calls
  • client-specific API contract — the interface tailored to one frontend’s needs

Rate Limiting

Rate limiting (also called throttling) restricts how many requests a client can make in a given time window.

Types

  • Fixed window — “Allows 100 requests per minute. The counter resets at the start of each minute.”
  • Sliding window — “Counts requests in a rolling window of the last 60 seconds.”
  • Token bucket — “Clients accumulate tokens at a fixed rate; each request consumes one.”
  • Leaky bucket — “Requests are processed at a fixed rate regardless of burst.”

In design discussions

  • “We enforce a rate limit of 1,000 requests per hour per API key.”
  • “Clients that exceed the limit receive a 429 Too Many Requests response.”
  • “We allow bursting up to 200 requests in any 10-second window.”
  • “The rate limit is applied per tenant, not globally.”

Response headers for rate limiting

Engineers often discuss these in code reviews:

  • X-RateLimit-Limit — the maximum requests allowed
  • X-RateLimit-Remaining — how many requests are left
  • Retry-After — when the client can try again

Auth Delegation

Auth delegation means the gateway handles authentication so downstream services can trust the validated identity passed along in headers.

Patterns

  • JWT validation at the gateway — “The gateway verifies the JWT signature; services only parse the decoded claims.”
  • OAuth 2.0 token introspection — “The gateway calls the auth server to validate the opaque token before forwarding.”
  • mTLS (mutual TLS) — “Services authenticate each other using certificates, not tokens.”

Key phrases

  • “The gateway strips the external token and injects an internal service identity header.”
  • “Downstream services trust the gateway’s identity assertion.”
  • “We use JWKS (JSON Web Key Sets) to validate token signatures without a round-trip to the auth server.”
  • “The gateway enforces scope-based authorisation — services see only pre-authorised requests.”

Request Routing

Request routing directs incoming traffic to the correct backend service based on rules.

Types of routing

  • Path-based routing/api/users/* → User Service
  • Header-based routingX-Version: v2 → New service version
  • Weighted routing — “80% to stable, 20% to canary”
  • Content-based routing — “Route requests with Content-Type: application/xml to the legacy handler”

In conversation

  • “We route by path prefix — /payments/ goes to the Payment Service.”
  • “The gateway rewrites the path before forwarding to the backend.”
  • “Traffic is split 95/5 between the stable and canary releases.”
  • “We use host-based routing to serve different tenants from the same gateway.”

Common Gateway Vocabulary Table

TermDefinition
upstreamThe backend service the gateway calls
downstreamThe client that calls the gateway
passthroughForwarding a request with minimal modification
circuit breakerStops routing to a failing backend after a threshold
sticky sessionRoutes requests from the same client to the same instance
egressTraffic leaving the system
ingressTraffic entering the system

Key Takeaways

  • API gateway — single entry point that handles routing, auth, rate limiting, and load balancing.
  • BFF — a dedicated gateway layer per client type (mobile, web, third-party).
  • Rate limiting — enforced per API key/tenant using fixed window, sliding window, or token bucket.
  • Auth delegation — the gateway validates identity so services can trust the downstream assertion.
  • Request routing — path-based, header-based, weighted, or content-based rules.
  • Key verbs: route, enforce, aggregate, validate, inject, strip, rewrite, split.