SuryanandHome

Pastebin-like Service

Problem statement

Design a system where users upload large text snippets and receive a shareable link. Viewers fetch read-only content; optional password, expiration, and burn-after-read.

How it works

  1. Upload: client sends text (or multipart file) → service stores blob + metadata → returns id.
  2. Read: GET /raw/{id} or HTML view; enforce ACL, expiry, rate limits.

Analogy: A library drop box: you leave a manuscript (paste), get a receipt number (id); anyone with the number can read the copy until the librarian shreds it (TTL).

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
API GatewayManaged HTTP entry with auth, throttling, routing.Single front door for upload/read APIs; offload TLS and abuse throttles from app code.
Upload serviceValidates size/type, issues presigned URLs, writes metadata.Keeps heavy multipart logic and virus-scan triggers out of the gateway.
Read serviceServes GET for pastes; may stream from object storage.Can scale read separately from write; apply different cache rules for public pastes.
S3 / Azure BlobObject storage: bytes + metadata, virtually infinite size per object.Pastes can be large; object store is cheaper and simpler than BLOBs in SQL at scale.
PostgreSQL / DynamoDBSQL vs NoSQL metadata (owner, TTL, visibility, s3_key).Postgres if you need admin search, joins, reports; Dynamo if access is strictly id → meta and you want global replication with less ops.
Pub/Sub or SQSAsync messaging (see glossary).Decouple upload completion from ClamAV scan, thumbnail, moderation queue so API returns fast.

Shared definitions: 00-glossary-common-services.md

Low-level design

Blob storage

  • Preferred: Amazon S3 (or Azure Blob) for payload; cheap, durable, scales to huge pastes.
  • Metadata DB: PostgreSQL if you need rich queries (search by user, reports); DynamoDB if access pattern is strictly id → meta.

Paste metadata model

FieldPurpose
paste_idUUID or short id
s3_keyObject key
created_at, expires_atLifecycle
visibilitypublic / unlisted / private
password_hashOptional bcrypt/argon2
content_typetext/plain, etc.

Security

  • Secrets: AWS Secrets Manager for DB creds; KMS envelope encryption for S3 if compliance requires.
  • Auth: Cognito for “my pastes”; public reads can be anonymous with rate limit.
  • Malware / PII: async Lambda scanning, ClamAV or vendor API; block executables disguised as text.

Lifecycle

  • S3 Lifecycle rules to transition old objects to Glacier or delete after TTL.
  • Dynamo TTL for metadata if using Dynamo.

E2E: upload with password

Rendering diagram…

Tricky parts

IssueSolution
Huge pastesStream upload to S3; max size; reject decompression bombs
Hot viral pasteCDN cache GET /raw/{id} with ETag; separate read replicas
Private paste leaked URLShort-lived signed URLs for S3; no public ACL
Burn-after-readUse Dynamo conditional delete or Redis counter remaining_views

When to use SQL vs Dynamo

  • PostgreSQL when you need admin dashboards, search, joins.
  • DynamoDB when QPS on key-value path dominates and access is simple.

Caveats

  • Caching private pastes at CDN is dangerous; only cache public pastes with Cache-Control headers.
  • Content moderation at scale needs human review queues (Amazon Connect / internal tool) fed by async workers.