SuryanandHome

Q&A Forum (Stack Overflow–class)

Problem statement

Questions, answers, votes, reputation, tags, search, moderation, and SEO-friendly public pages under massive read traffic and write spikes (contests).

How it works

  • Normalized relational core (users, posts, votes) in PostgreSQL with careful indexing.
  • Search delegated to OpenSearch; cache question pages at CDN for anonymous.

Analogy: Town hall bulletin board with thumbs up stickers (votes) and librarian index cards (tags/search) separate from the original handwritten sheets (posts).

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
Web tierRenders pages/API.Stateless behind LB; caches public HTML at CDN.
PostgreSQL primary + replicasSource of truth for posts/votes/users.ACID for votes and reputation updates; replicas for read scaling of question pages.
RedisHot question cache + rate limits.Protects DB from homepage reread storms.
OpenSearchFull-text search over posts.Relational DB full-text is weaker at TB scale vs inverted index.
Kafka change captureStreams DB changes to search index.Keeps search eventually consistent without dual writes from app code.
CloudFrontCache anonymous HTML.SEO pages are read-mostly; short TTL on HTML balances freshness vs load.

Shared definitions: 00-glossary-common-services.md

Low-level design

Voting integrity

  • One vote per user enforced with UNIQUE (user_id, post_id) + transaction.
  • Reputation updates async via Kafka consumer to avoid locking hot users.

Anti-cheat

  • Serial downvoting detectionbatch job in Spark / SQL window functions.

Moderation

  • Flag queue in PostgreSQL; Perspective API toxicity scoring; human mod UI.

Ranking

  • Wilson score or StackExchange-style time decay in cron materialized view refreshed incrementally.

E2E: post answer

Rendering diagram…

Tricky parts

ProblemSolution
Hot question meltdownRedis cache fragment; read replicas
Long threads deep nestingFlat list with reply_to_id + pagination
SEO vs personalizationSSR canonical URLs; noindex low-quality

Caveats

  • Gamification exploitsrate limits, audit, rollback tools.
  • Legal: UGC copyright and DMCA takedown workflow.

Azure

  • Azure Database for PostgreSQL Flexible; Azure AI Content Safety.