Distributed Rate Limiter
Problem statement
Enforce per user / per IP / per API key request limits globally (e.g. 100 req/min) across many stateless API instances and regions.
How it works
Idea: Centralize “how many tokens used” in a fast store, or approximate with local + sync tradeoffs.
Analogy: A theme park wristband that only allows 5 rides per day: the gate (API) checks a central register (Redis) so you cannot cheat by entering different gates (servers).
High-level options
Rendering diagram…
| Algorithm | Pros | Cons |
|---|---|---|
| Token bucket | Allows bursts, smooth | Need atomic Lua / RedisCell |
| Fixed window | Simple | Spike at window edges |
| Sliding window log | Accurate | Memory heavy |
| Sliding window counter | Good balance | Slightly more complex |
High-level architecture
Rendering diagram…
- Edge: AWS API Gateway usage plans, Azure API Management quotas — first line of defense.
- App: Redis with Lua script for atomic INCR + EXPIRE or token bucket.
- Global: Redis Global Datastore or regional limit + sync (weaker global guarantee).
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| API Gateway throttling | Built-in per-client or per-key rate limits at the edge. | First line of defense against accidental or malicious traffic spikes before it consumes app CPU. |
| Microservice | Your domain API behind the gateway. | Executes business logic after coarse edge limits; may apply finer per-route weights. |
| Rate limit middleware | In-process or sidecar logic calling Redis. | Implements token bucket / sliding window with atomic updates (Lua) — impossible to do correctly with naive read-modify-write across pods. |
| Redis Cluster / KeyDB | In-memory data store with atomic commands and Lua. | Sub-ms increments and expirations; central view of counters across all replicas. KeyDB if you need multi-thread Redis compatibility. |
| Local token bucket (optional) | In-memory approximate limiter per instance. | Reduces Redis chatter for very hot endpoints; trade accuracy for cost/latency. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Redis token bucket (sketch)
Lua ensures read-modify-write is atomic:
tokens = GETor default max.now = TIME.- Refill tokens based on elapsed time.
- If
tokens >= 1, decrement and allow; else 429.
Service preference: ElastiCache Redis (managed), KeyDB multi-thread alternative.
API Gateway vs app-level
- When API Gateway is enough: uniform limits per API key; no complex rules.
- When app-level: per-endpoint cost weights, dynamic limits by subscription tier.
Multi-region tricky part
Problem: Redis in us-east does not see traffic in eu-west → user doubles quota.
Mitigations:
- Route users to home region (sticky routing) — simplest.
- CRDT / gossip approximate counters — complex.
- Central Redis with cross-region latency cost — acceptable for some APIs.
- Cell architecture — each cell has limiter; product accepts regional cells.
E2E request flow
Rendering diagram…
Tricky problems and solutions
| Problem | Solution |
|---|---|
| Race without Lua | Lost updates → use Lua or Redisson |
| Thundering herd on expiry | Jitter expiry; stagger windows |
| Clock skew | Prefer Redis TIME or monotonic server clock |
| Fairness vs burst | Token bucket for UX; leaky bucket for strict egress |
Caveats
- Exact global limits are hard; be honest in docs (eventual, best-effort).
- DDoS is not solved by app rate limiting alone — need WAF, Shield, CDN.
Azure / GCP mapping
- Redis: Azure Cache for Redis, Memorystore.
- Gateway limits: Azure APIM, Cloud Endpoints / Apigee.