SuryanandHome

URL Shortener (e.g. bit.ly)

Problem statement

Design a service that takes long URLs and returns short aliases. When users visit the short URL, they are redirected (HTTP 302/301) to the original URL. The system must be highly available for reads (redirects), low latency globally, and collision-free for codes.

Stretch goals: custom aliases, analytics (click counts, geography), expiration, abuse prevention.

How it works (plain English)

  1. User submits https://example.com/very/long/path?....
  2. Service generates a short code (e.g. 7 characters from [A-Za-z0-9]) or validates a custom alias.
  3. Store mapping code → long_url (and metadata).
  4. On GET /{code}, look up the long URL and return 302 (temporary) or 301 (permanent) redirect.

Analogy: A hotel gives you room 204 instead of saying “second floor, east wing, third door past the ice machine.” The number is the short code; the full path is the long URL.

Requirements

TypeExamples
FunctionalCreate short URL, resolve redirect, optional custom slug, optional TTL
Non-functionalRead-heavy, low p99 latency, high availability, horizontal scale

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we chose it here
Browser / AppEnd user or API client.Issues HTTPS requests; no business logic on device for redirect resolution beyond following redirects.
CloudFront / Akamai CDNContent delivery network: caches responses at edge PoPs close to users; supports TLS, signed URLs.Redirects are read-heavy and latency-sensitive globally; CDN can cache 301/302 responses for anonymous hits and shields origin from DDoS-scale traffic.
AWS WAF / Azure Front Door WAFWeb Application Firewall at the edge: IP reputation, rate rules, managed rule groups (OWASP-style).Shorteners are abused for phishing; WAF reduces automated create/resolve abuse before it hits your app tier.
API GatewayManaged HTTP front door: routing, JWT validation, usage plans / throttling, optional request validation.Create-short-URL path is authenticated and needs central rate limits and API keys for partners without reimplementing in every service.
Shortener APIYour stateless service: validate URL, allocate code, write metadata.Holds business rules (blocked domains, custom alias policy) that should not live in the gateway.
Redirect serviceOften separate slim service or CDN-origin function optimized for GET-by-code only.Hot path scales independently; you can use smaller images, higher replica count, and different cache policy than write API.
DynamoDB / CassandraNoSQL wide-column stores: partition key lookup by short code, ms-scale reads, horizontal scale.Access pattern is exactly code → long_url; no joins; DynamoDB global tables help multi-region redirect read latency.
Redis / ElastiCacheIn-memory cache (Redis protocol).Cache-aside for hottest codes avoids repeated Dynamo reads; short TTL lets you invalidate malicious targets quickly.
SQS or KinesisSQS: managed queue of jobs (analytics, audit). Kinesis: ordered stream with replay.Decouple “URL created / resolved” from slow work (aggregate counts, push to warehouse). SQS is simpler for worker pools; Kinesis if you need ordered per-user analytics or Flink stream processing.

Shared definitions: 00-glossary-common-services.md

  • Writes (create URL): authenticated API path, validation, rate limits.
  • Reads (redirect): separate lightweight redirect tier at the edge when possible (cache), else fast KV lookup.

End-to-end flows

Create short URL (write path)

Rendering diagram…

Sequence diagram — components

Step participantWhat it isWhy it appears
ClientApp or CLI creating a short link.Sends JWT (from Cognito/Auth0) so the create API knows who owns the link for quotas and abuse.
API GatewayEdge HTTP layer (see table above).Terminates TLS, enforces throttles, may validate JWT before Lambda/container.
Shortener serviceCore write logic.Performs conditional PutItem for idempotency (Idempotency-Key).
DynamoDBPrimary metadata store (see table).Strong per-item consistency for “does this code exist?” checks via ConditionExpression.
SQSQueue for async work (see glossary).Never lose analytics jobs if the analytics warehouse is slow; DLQ catches poison payloads.

Resolve redirect (read path)

Rendering diagram…
ParticipantWhat it isWhy it appears
BrowserPublic redirect client.Usually unauthenticated; must be rate-limited at edge to prevent enumeration.
CDN / EdgeCloudFront etc.May answer 302 without origin if you cache stable redirects (careful with personalized destinations).
Redirect serviceRead path compute.On origin miss, does Redis then Dynamo lookup pattern.
RedisHot cache for code → url.Cuts p99 and Dynamo RCU cost for viral links.
DynamoDBAuthoritative store if cache miss.Always has final mapping for correctness.

Low-level design (deep dive)

Short code generation

  • Preferred: random base62, length 7+ → ~3.5 trillion space; check uniqueness with conditional write in DynamoDB (attribute_not_exists(pk)).
  • Alternative: Snowflake-like ID encoded in base62 (time-ordered, no DB round-trip for uniqueness if clock+machine ID unique).
  • When to use which: Random for simplicity; Snowflake when you want sortable URLs or avoid hot partitions on write (with sharded counters).

Storage (AWS-flavored example)

ConcernChoiceWhy
Primary mappingDynamoDB pk=code, long_url, ttl, owner_idSingle-digit ms reads at scale, global tables for multi-region
Hot read cacheElastiCache RedisSub-ms; cache-aside; short TTL for abusive URL takedown
Auth for APIAmazon Cognito or Auth0Managed JWT, MFA for org accounts
Abuse / botsAWS WAF + rate limits at API GatewayBlock scrapers creating millions of URLs
AnalyticsSQSLambdaS3 + Athena / ClickHouseDecouple hot path from analytics

CDN for redirects

  • CloudFront with Lambda@Edge (optional): block countries, rewrite expired links, A/B landing pages.
  • Caveat: 302 vs 301: 301 can be cached aggressively by browsers; if destination changes often, prefer 302.

API shape (sketch)

POST /v1/urls
{ "url": "https://...", "custom_alias": "optional", "ttl_days": 30 }
Idempotency-Key: <uuid>

GET /{code}   -> 302 Location (public)

Tricky parts and solutions

ProblemWhy trickyMitigation
CollisionTwo requests same codeDB conditional put; retry with new random
Hot keysViral short code melts Redis/DBEdge cache + read replicas; separate redirect microservice scale
PhishingShort links hide malicious targetsURL reputation API (Google Web Risk), blocklist pipeline
EnumerationAttackers scan all codesRate limit resolves; CAPTCHA on suspicious patterns; longer codes
Custom alias squattingUsers grab brand namesReserved list, premium tier, dispute process

Caveats

  • Strong consistency across regions for “latest URL” is expensive; eventual is usually fine for redirects.
  • GDPR: logging IPs for analytics needs retention policy and consent where applicable.

Summary

Optimize read path (CDN + Redis + DynamoDB), keep writes safe (idempotency, validation, abuse controls), and treat security as a first-class concern because short links are a social engineering amplifier.