URL Shortener (e.g. bit.ly)
Problem statement
Design a service that takes long URLs and returns short aliases. When users visit the short URL, they are redirected (HTTP 302/301) to the original URL. The system must be highly available for reads (redirects), low latency globally, and collision-free for codes.
Stretch goals: custom aliases, analytics (click counts, geography), expiration, abuse prevention.
How it works (plain English)
- User submits
https://example.com/very/long/path?....
- Service generates a short code (e.g. 7 characters from
[A-Za-z0-9]) or validates a custom alias.
- Store mapping
code → long_url (and metadata).
- On
GET /{code}, look up the long URL and return 302 (temporary) or 301 (permanent) redirect.
Analogy: A hotel gives you room 204 instead of saying “second floor, east wing, third door past the ice machine.” The number is the short code; the full path is the long URL.
Requirements
| Type | Examples |
|---|
| Functional | Create short URL, resolve redirect, optional custom slug, optional TTL |
| Non-functional | Read-heavy, low p99 latency, high availability, horizontal scale |
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we chose it here |
|---|
| Browser / App | End user or API client. | Issues HTTPS requests; no business logic on device for redirect resolution beyond following redirects. |
| CloudFront / Akamai CDN | Content delivery network: caches responses at edge PoPs close to users; supports TLS, signed URLs. | Redirects are read-heavy and latency-sensitive globally; CDN can cache 301/302 responses for anonymous hits and shields origin from DDoS-scale traffic. |
| AWS WAF / Azure Front Door WAF | Web Application Firewall at the edge: IP reputation, rate rules, managed rule groups (OWASP-style). | Shorteners are abused for phishing; WAF reduces automated create/resolve abuse before it hits your app tier. |
| API Gateway | Managed HTTP front door: routing, JWT validation, usage plans / throttling, optional request validation. | Create-short-URL path is authenticated and needs central rate limits and API keys for partners without reimplementing in every service. |
| Shortener API | Your stateless service: validate URL, allocate code, write metadata. | Holds business rules (blocked domains, custom alias policy) that should not live in the gateway. |
| Redirect service | Often separate slim service or CDN-origin function optimized for GET-by-code only. | Hot path scales independently; you can use smaller images, higher replica count, and different cache policy than write API. |
| DynamoDB / Cassandra | NoSQL wide-column stores: partition key lookup by short code, ms-scale reads, horizontal scale. | Access pattern is exactly code → long_url; no joins; DynamoDB global tables help multi-region redirect read latency. |
| Redis / ElastiCache | In-memory cache (Redis protocol). | Cache-aside for hottest codes avoids repeated Dynamo reads; short TTL lets you invalidate malicious targets quickly. |
| SQS or Kinesis | SQS: managed queue of jobs (analytics, audit). Kinesis: ordered stream with replay. | Decouple “URL created / resolved” from slow work (aggregate counts, push to warehouse). SQS is simpler for worker pools; Kinesis if you need ordered per-user analytics or Flink stream processing. |
Shared definitions: 00-glossary-common-services.md
- Writes (create URL): authenticated API path, validation, rate limits.
- Reads (redirect): separate lightweight redirect tier at the edge when possible (cache), else fast KV lookup.
End-to-end flows
Create short URL (write path)
Rendering diagram…
Sequence diagram — components
| Step participant | What it is | Why it appears |
|---|
| Client | App or CLI creating a short link. | Sends JWT (from Cognito/Auth0) so the create API knows who owns the link for quotas and abuse. |
| API Gateway | Edge HTTP layer (see table above). | Terminates TLS, enforces throttles, may validate JWT before Lambda/container. |
| Shortener service | Core write logic. | Performs conditional PutItem for idempotency (Idempotency-Key). |
| DynamoDB | Primary metadata store (see table). | Strong per-item consistency for “does this code exist?” checks via ConditionExpression. |
| SQS | Queue for async work (see glossary). | Never lose analytics jobs if the analytics warehouse is slow; DLQ catches poison payloads. |
Resolve redirect (read path)
Rendering diagram…
| Participant | What it is | Why it appears |
|---|
| Browser | Public redirect client. | Usually unauthenticated; must be rate-limited at edge to prevent enumeration. |
| CDN / Edge | CloudFront etc. | May answer 302 without origin if you cache stable redirects (careful with personalized destinations). |
| Redirect service | Read path compute. | On origin miss, does Redis then Dynamo lookup pattern. |
| Redis | Hot cache for code → url. | Cuts p99 and Dynamo RCU cost for viral links. |
| DynamoDB | Authoritative store if cache miss. | Always has final mapping for correctness. |
Low-level design (deep dive)
Short code generation
- Preferred: random base62, length 7+ → ~3.5 trillion space; check uniqueness with conditional write in DynamoDB (
attribute_not_exists(pk)).
- Alternative: Snowflake-like ID encoded in base62 (time-ordered, no DB round-trip for uniqueness if clock+machine ID unique).
- When to use which: Random for simplicity; Snowflake when you want sortable URLs or avoid hot partitions on write (with sharded counters).
Storage (AWS-flavored example)
| Concern | Choice | Why |
|---|
| Primary mapping | DynamoDB pk=code, long_url, ttl, owner_id | Single-digit ms reads at scale, global tables for multi-region |
| Hot read cache | ElastiCache Redis | Sub-ms; cache-aside; short TTL for abusive URL takedown |
| Auth for API | Amazon Cognito or Auth0 | Managed JWT, MFA for org accounts |
| Abuse / bots | AWS WAF + rate limits at API Gateway | Block scrapers creating millions of URLs |
| Analytics | SQS → Lambda → S3 + Athena / ClickHouse | Decouple hot path from analytics |
CDN for redirects
- CloudFront with Lambda@Edge (optional): block countries, rewrite expired links, A/B landing pages.
- Caveat: 302 vs 301: 301 can be cached aggressively by browsers; if destination changes often, prefer 302.
API shape (sketch)
POST /v1/urls
{ "url": "https://...", "custom_alias": "optional", "ttl_days": 30 }
Idempotency-Key: <uuid>
GET /{code} -> 302 Location (public)
Tricky parts and solutions
| Problem | Why tricky | Mitigation |
|---|
| Collision | Two requests same code | DB conditional put; retry with new random |
| Hot keys | Viral short code melts Redis/DB | Edge cache + read replicas; separate redirect microservice scale |
| Phishing | Short links hide malicious targets | URL reputation API (Google Web Risk), blocklist pipeline |
| Enumeration | Attackers scan all codes | Rate limit resolves; CAPTCHA on suspicious patterns; longer codes |
| Custom alias squatting | Users grab brand names | Reserved list, premium tier, dispute process |
Caveats
- Strong consistency across regions for “latest URL” is expensive; eventual is usually fine for redirects.
- GDPR: logging IPs for analytics needs retention policy and consent where applicable.
Summary
Optimize read path (CDN + Redis + DynamoDB), keep writes safe (idempotency, validation, abuse controls), and treat security as a first-class concern because short links are a social engineering amplifier.