Distributed Rate Limiter

Problem statement

Enforce per user / per IP / per API key request limits globally (e.g. 100 req/min) across many stateless API instances and regions.

How it works

Idea: Centralize “how many tokens used” in a fast store, or approximate with local + sync tradeoffs.

Analogy: A theme park wristband that only allows 5 rides per day: the gate (API) checks a central register (Redis) so you cannot cheat by entering different gates (servers).

High-level options

Rendering diagram…

Algorithm	Pros	Cons
Token bucket	Allows bursts, smooth	Need atomic Lua / RedisCell
Fixed window	Simple	Spike at window edges
Sliding window log	Accurate	Memory heavy
Sliding window counter	Good balance	Slightly more complex

High-level architecture

Rendering diagram…

Edge: AWS API Gateway usage plans, Azure API Management quotas — first line of defense.
App: Redis with Lua script for atomic INCR + EXPIRE or token bucket.
Global: Redis Global Datastore or regional limit + sync (weaker global guarantee).

Components explained — this design

Component	What it is	Why we use it here
API Gateway throttling	Built-in per-client or per-key rate limits at the edge.	First line of defense against accidental or malicious traffic spikes before it consumes app CPU.
Microservice	Your domain API behind the gateway.	Executes business logic after coarse edge limits; may apply finer per-route weights.
Rate limit middleware	In-process or sidecar logic calling Redis.	Implements token bucket / sliding window with atomic updates (Lua) — impossible to do correctly with naive read-modify-write across pods.
Redis Cluster / KeyDB	In-memory data store with atomic commands and Lua.	Sub-ms increments and expirations; central view of counters across all replicas. KeyDB if you need multi-thread Redis compatibility.
Local token bucket (optional)	In-memory approximate limiter per instance.	Reduces Redis chatter for very hot endpoints; trade accuracy for cost/latency.

Shared definitions: 00-glossary-common-services.md

Low-level design

Redis token bucket (sketch)

Lua ensures read-modify-write is atomic:

tokens = GET or default max.
now = TIME.
Refill tokens based on elapsed time.
If tokens >= 1, decrement and allow; else 429.

Service preference: ElastiCache Redis (managed), KeyDB multi-thread alternative.

API Gateway vs app-level

When API Gateway is enough: uniform limits per API key; no complex rules.
When app-level: per-endpoint cost weights, dynamic limits by subscription tier.

Multi-region tricky part

Problem: Redis in us-east does not see traffic in eu-west → user doubles quota.

Mitigations:

Route users to home region (sticky routing) — simplest.
CRDT / gossip approximate counters — complex.
Central Redis with cross-region latency cost — acceptable for some APIs.
Cell architecture — each cell has limiter; product accepts regional cells.

E2E request flow

Rendering diagram…

Tricky problems and solutions

Problem	Solution
Race without Lua	Lost updates → use Lua or Redisson
Thundering herd on expiry	Jitter expiry; stagger windows
Clock skew	Prefer Redis TIME or monotonic server clock
Fairness vs burst	Token bucket for UX; leaky bucket for strict egress

Caveats

Exact global limits are hard; be honest in docs (eventual, best-effort).
DDoS is not solved by app rate limiting alone — need WAF, Shield, CDN.

Azure / GCP mapping

Redis: Azure Cache for Redis, Memorystore.
Gateway limits: Azure APIM, Cloud Endpoints / Apigee.