Real-time Chat (e.g. Slack / WhatsApp-lite)

Problem statement

Design 1:1 and group messaging with online presence, delivery/read receipts, history sync, and mobile push when offline.

Active users: maintain WebSocket (or MQTT) connections to a gateway tier.
Messages: append to durable log (Kafka) + per-conversation store (sharded DB).
Offline: push notifications (APNs/FCM) + client pulls history on reconnect.

Analogy: Walkie-talkie channel (WebSocket) for live voice, plus voicemail inbox (DB) when the other person is offline.

Rendering diagram…

Component	What it is	Why we use it here
NLB / ALB	Layer 4/7 load balancers distributing TCP/WebSocket connections.	Sticky sessions or least-conn routing to WebSocket gateways; health checks restart bad nodes.
WebSocket gateway	Maintains long-lived connections; frames to/from clients.	Chat needs server push; HTTP polling is wasteful at scale.
Presence Redis	Fast key-value with TTL for “user online/offline”.	Presence is ephemeral and high-churn; Redis TTL matches heartbeat semantics cheaply.
Kafka	Append-only event log for messages and receipts.	Ordering per conversation partition, replay for new consumers (search indexer), decoupling persistence from gateway CPU.
Cassandra / DynamoDB	Wide-column / key-value for message history at scale.	Message tables are append-heavy and partitioned by conversation_id; SQL can become a write bottleneck.
S3	Object storage for attachments/voice notes.	Keeps large blobs out of the DB; lifecycle to cold storage.
SNS + APNs/FCM	Mobile push fan-out (see glossary).	Offline users don’t hold WebSockets; push wakes device for new messages.

Shared definitions: 00-glossary-common-services.md

Sticky sessions at load balancer to same gateway pod (or use Redis pub/sub for cross-pod fan-out).
Heartbeat to detect dead TCP; reconnect with exponential backoff on client.

Per conversation sequence number assigned by single partition Kafka topic key = conversation_id guarantees order in log.
Clients dedupe by message_id.

Hot messages: Redis recent window for speed.
Cold history: S3 + Parquet for cheap archive, or DB partitions by month.

E2E encryption (Signal model) changes design: server stores ciphertext only; key exchange out of band.
If not E2E: TLS in transit, KMS at rest, field-level encryption for sensitive attachments metadata.

Amazon Cognito / Auth0 for JWT; gateway validates JWT per connection upgrade.

Rendering diagram…

Problem	Solution
Ordering across servers	Single Kafka partition per conversation
Split brain presence	TTL keys in Redis; CRDT optional for advanced
Large groups	Fan-in reads; ephemeral messages for mega channels
Delivery receipts	Separate small events topic; idempotent upserts

Backpressure: slow clients → bounded per-connection queues; drop or disconnect abusive clients.
Compliance: FINRA / healthcare may require WORM storage and audit trails.