SuryanandHome

Distributed Rate Limiter

Problem statement

Enforce per user / per IP / per API key request limits globally (e.g. 100 req/min) across many stateless API instances and regions.

How it works

Idea: Centralize “how many tokens used” in a fast store, or approximate with local + sync tradeoffs.

Analogy: A theme park wristband that only allows 5 rides per day: the gate (API) checks a central register (Redis) so you cannot cheat by entering different gates (servers).

High-level options

Rendering diagram…
AlgorithmProsCons
Token bucketAllows bursts, smoothNeed atomic Lua / RedisCell
Fixed windowSimpleSpike at window edges
Sliding window logAccurateMemory heavy
Sliding window counterGood balanceSlightly more complex

High-level architecture

Rendering diagram…
  • Edge: AWS API Gateway usage plans, Azure API Management quotas — first line of defense.
  • App: Redis with Lua script for atomic INCR + EXPIRE or token bucket.
  • Global: Redis Global Datastore or regional limit + sync (weaker global guarantee).

Components explained — this design

ComponentWhat it isWhy we use it here
API Gateway throttlingBuilt-in per-client or per-key rate limits at the edge.First line of defense against accidental or malicious traffic spikes before it consumes app CPU.
MicroserviceYour domain API behind the gateway.Executes business logic after coarse edge limits; may apply finer per-route weights.
Rate limit middlewareIn-process or sidecar logic calling Redis.Implements token bucket / sliding window with atomic updates (Lua) — impossible to do correctly with naive read-modify-write across pods.
Redis Cluster / KeyDBIn-memory data store with atomic commands and Lua.Sub-ms increments and expirations; central view of counters across all replicas. KeyDB if you need multi-thread Redis compatibility.
Local token bucket (optional)In-memory approximate limiter per instance.Reduces Redis chatter for very hot endpoints; trade accuracy for cost/latency.

Shared definitions: 00-glossary-common-services.md

Low-level design

Redis token bucket (sketch)

Lua ensures read-modify-write is atomic:

  1. tokens = GET or default max.
  2. now = TIME.
  3. Refill tokens based on elapsed time.
  4. If tokens >= 1, decrement and allow; else 429.

Service preference: ElastiCache Redis (managed), KeyDB multi-thread alternative.

API Gateway vs app-level

  • When API Gateway is enough: uniform limits per API key; no complex rules.
  • When app-level: per-endpoint cost weights, dynamic limits by subscription tier.

Multi-region tricky part

Problem: Redis in us-east does not see traffic in eu-west → user doubles quota.

Mitigations:

  1. Route users to home region (sticky routing) — simplest.
  2. CRDT / gossip approximate counters — complex.
  3. Central Redis with cross-region latency cost — acceptable for some APIs.
  4. Cell architecture — each cell has limiter; product accepts regional cells.

E2E request flow

Rendering diagram…

Tricky problems and solutions

ProblemSolution
Race without LuaLost updates → use Lua or Redisson
Thundering herd on expiryJitter expiry; stagger windows
Clock skewPrefer Redis TIME or monotonic server clock
Fairness vs burstToken bucket for UX; leaky bucket for strict egress

Caveats

  • Exact global limits are hard; be honest in docs (eventual, best-effort).
  • DDoS is not solved by app rate limiting alone — need WAF, Shield, CDN.

Azure / GCP mapping

  • Redis: Azure Cache for Redis, Memorystore.
  • Gateway limits: Azure APIM, Cloud Endpoints / Apigee.