Recommendation Engine (Homepage / “You may also like”)
Problem statement
Rank millions of items per user in <100ms using behavioral signals, cold start handling, and freshness without filter bubbles becoming a PR issue.
How it works
- Retrieval: cheap candidate set (hundreds) via embedding ANN, co-visitation, graph walks.
- Ranking: heavier model scores candidates with context features.
- Re-rank: diversity, business rules (out of stock downrank).
Analogy: Restaurant menu: appetizer tray (retrieval) brings 8 bites; chef (ranker) picks best 3 for your known allergies; manager (re-rank) ensures vegetarian option visible.
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| Clickstream Kafka | Raw user behavior events. | Durable input to both batch training and nearline features. |
| Flink feature pipeline | Real-time aggregations (CTR windows). | Powers fresh signals beyond nightly batch tables. |
| Feature Store (Redis + Parquet) | Online low-latency features + offline training snapshots. | Training-serving skew reduction by sharing definitions. |
| ANN service (FAISS/ScaNN) | Approximate nearest neighbor retrieval. | Millions of candidate items can’t be scored by heavy ranker; retrieve hundreds fast. |
| Ranker endpoint | XGBoost / neural ranker on candidates. | Adds contextual scoring using dense features. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Feature store
- Online: Redis / DynamoDB low-latency user features (
last_category,ctr_7d_bucket). - Offline: Snowflake / BigQuery for training joins.
ANN index
- FAISS IVF+PQ for memory efficiency; periodic rebuild from nightly embeddings.
- Two-tower model: user tower + item tower cosine similarity.
Exploration
- Epsilon-greedy or Thompson sampling bandit layer for cold items.
Fairness
- Demographic parity constraints in re-rank; audit dashboards per segment.
E2E: homepage request
Rendering diagram…
Tricky parts
| Problem | Solution |
|---|---|
| Filter bubble | Inject exploration + editorial slots |
| Latency SLA | Timeout budget per stage; degrade to popularity baseline |
| Privacy | Federated learning optional; DP noise on sensitive features |
Caveats
- Feedback loops amplify clickbait — human eval sets and offline metrics beyond CTR.
- Seasonality — time-based features and retrain cadence.
Managed
- Amazon Personalize, Google Recommendations AI, Azure Personalizer.