Ride Hailing (e.g. Uber / Lyft)
Problem statement
Match riders to drivers in real time, estimate ETA and fare, handle surge pricing, payments, ratings, and safety events at city scale.
How it works
- Drivers broadcast GPS periodically → location ingestion updates a geospatial index.
- Rider requests ride → matching service finds nearby available drivers (radius + ETA).
- Dispatch assigns best driver; both apps get WebSocket updates for trip state machine.
- Trip ends → billing + receipt + analytics.
Analogy: A restaurant pager system: the kitchen (matching) sees who is waiting (riders) and which tables are free (drivers), then buzzes the right pair.
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| API Gateway + WAF | Edge HTTP + firewall (see glossary). | Protects matching endpoints from scrapers; authenticates riders/drivers via JWT. |
| Location ingestion → Kafka | Stream of GPS updates from drivers. | High volume append stream; Kafka gives buffer + replay if downstream geo index lags. |
| Redis GEO / PostGIS | Geospatial indexes for “nearby drivers”. | Redis GEO for sub-second radius queries at huge QPS; PostGIS if you already run SQL and moderate scale. |
| Matching / Dispatch | Scores candidates by ETA, surge, rating. | Core marketplace brain; isolated service to scale CPU-heavy routing independently. |
| Trip state service | State machine for ride lifecycle. | Clear ownership of transitions; emits events to billing and analytics. |
| PostgreSQL | Relational store for trips, users, payments references. | ACID for money-adjacent records and referential integrity between trip and payment. |
| Time-series DB | Optimized store for metrics-like location traces. | Optional for heatmaps and fraud investigation without bloating OLTP schema. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Location pipeline
- Mobile → HTTPS batch or MQTT for low battery; AWS IoT Core optional.
- Stream: Amazon Kinesis / Kafka for ordered driver updates per
driver_id. - Hot geospatial: Redis GEOADD + GEORADIUS for sub-second nearby queries at high QPS.
- Cold path: S3 + Athena for historical heatmaps / city planning.
Matching
- Criteria: distance, ETA from Mapbox / Google Routes API or internal Valhalla graph.
- Surge: Redis counter by geohash cell; pricing service reads multiplier.
- Fairness: two-sided marketplace — avoid starvation with aging in match queue.
Trip state machine
States: REQUESTED → MATCHED → EN_ROUTE → IN_PROGRESS → COMPLETED | CANCELLED.
Use event sourcing in Kafka + CQRS read model in PostgreSQL for audits.
Payments
- Stripe Connect for marketplace splits (platform fee + driver payout).
- PCI: never store PAN; use tokenization.
Notifications
- SNS + FCM/APNs for driver offer timeout (10s accept window).
E2E: request to match
Rendering diagram…
Tricky parts
| Problem | Solution |
|---|---|
| Thundering herd at bar close | Shard surge by geohash; capacity limits on dispatch |
| Split-brain double assign | Optimistic locking on driver row version; idempotent accept |
| Ghost drivers offline but “available” | Heartbeat TTL in Redis; auto-offline |
| Regulatory | Per-city feature flags; data residency (EU trips in EU region) |
Caveats
- Map APIs cost money — cache static road segments; refresh dynamic traffic more often.
- Safety: 911 integration, trip sharing, in-app emergency — separate highly available microservice.
Azure mapping
- Event Hubs for location stream; Azure Cache for Redis GEO; Azure Maps for routing.