Pastebin-like Service
Problem statement
Design a system where users upload large text snippets and receive a shareable link. Viewers fetch read-only content; optional password, expiration, and burn-after-read.
How it works
- Upload: client sends text (or multipart file) → service stores blob + metadata → returns
id. - Read:
GET /raw/{id}or HTML view; enforce ACL, expiry, rate limits.
Analogy: A library drop box: you leave a manuscript (paste), get a receipt number (id); anyone with the number can read the copy until the librarian shreds it (TTL).
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| API Gateway | Managed HTTP entry with auth, throttling, routing. | Single front door for upload/read APIs; offload TLS and abuse throttles from app code. |
| Upload service | Validates size/type, issues presigned URLs, writes metadata. | Keeps heavy multipart logic and virus-scan triggers out of the gateway. |
| Read service | Serves GET for pastes; may stream from object storage. | Can scale read separately from write; apply different cache rules for public pastes. |
| S3 / Azure Blob | Object storage: bytes + metadata, virtually infinite size per object. | Pastes can be large; object store is cheaper and simpler than BLOBs in SQL at scale. |
| PostgreSQL / DynamoDB | SQL vs NoSQL metadata (owner, TTL, visibility, s3_key). | Postgres if you need admin search, joins, reports; Dynamo if access is strictly id → meta and you want global replication with less ops. |
| Pub/Sub or SQS | Async messaging (see glossary). | Decouple upload completion from ClamAV scan, thumbnail, moderation queue so API returns fast. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Blob storage
- Preferred: Amazon S3 (or Azure Blob) for payload; cheap, durable, scales to huge pastes.
- Metadata DB: PostgreSQL if you need rich queries (search by user, reports); DynamoDB if access pattern is strictly
id → meta.
Paste metadata model
| Field | Purpose |
|---|---|
paste_id | UUID or short id |
s3_key | Object key |
created_at, expires_at | Lifecycle |
visibility | public / unlisted / private |
password_hash | Optional bcrypt/argon2 |
content_type | text/plain, etc. |
Security
- Secrets: AWS Secrets Manager for DB creds; KMS envelope encryption for S3 if compliance requires.
- Auth: Cognito for “my pastes”; public reads can be anonymous with rate limit.
- Malware / PII: async Lambda scanning, ClamAV or vendor API; block executables disguised as text.
Lifecycle
- S3 Lifecycle rules to transition old objects to Glacier or delete after TTL.
- Dynamo TTL for metadata if using Dynamo.
E2E: upload with password
Rendering diagram…
Tricky parts
| Issue | Solution |
|---|---|
| Huge pastes | Stream upload to S3; max size; reject decompression bombs |
| Hot viral paste | CDN cache GET /raw/{id} with ETag; separate read replicas |
| Private paste leaked URL | Short-lived signed URLs for S3; no public ACL |
| Burn-after-read | Use Dynamo conditional delete or Redis counter remaining_views |
When to use SQL vs Dynamo
- PostgreSQL when you need admin dashboards, search, joins.
- DynamoDB when QPS on key-value path dominates and access is simple.
Caveats
- Caching private pastes at CDN is dangerous; only cache public pastes with Cache-Control headers.
- Content moderation at scale needs human review queues (Amazon Connect / internal tool) fed by async workers.