Team Collaboration (Slack-like)
Problem statement
Workspaces, channels, threads, DMs, search, file uploads, integrations (bots), and enterprise SSO at large org scale.
How it works
- Real-time: WebSocket gateway + channel fan-out via pub/sub (Redis / dedicated service mesh).
- Persistence: messages partitioned by
channel_id+ time; search async index.
Analogy: Office building intercom per floor (channel) with recording (history) and archive room (search).
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| WebSocket gateway fleet | Stateful connection tier. | Push model for chat; scale with connection counts not just RPS. |
| Redis pub/sub or NATS | Cross-gateway fan-out of messages. | If user A’s socket is on gateway2, publish ensures delivery without sticky-only routing. |
| Message service + Cassandra/Dynamo | Durable chat history partitioned by channel. | Write-heavy append logs benefit from wide-partition NoSQL. |
| Kafka → OpenSearch | Async indexing for search. | Decouples search lag from message send latency. |
| S3 attachments | File blobs out of DB. | Cost + size management; virus scan async. |
| Okta SAML → Cognito | Enterprise SSO into JWT for services. | B2B requirement; centralizes group → channel ACL mapping. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Message ordering
- Per-channel sequence from single Kafka partition or DB serial per channel — hot channel issue mitigated by sharding channel to logical partitions.
Presence & typing
- Ephemeral keys in Redis; TTL heartbeat.
Enterprise features
- eDiscovery export to S3 with legal hold flags in PostgreSQL metadata.
- DLP scanning attachments via Macie / Microsoft Purview connectors.
Integrations
- Outgoing webhooks + incoming slash commands behind signed requests (
HMAC).
E2E: post message to channel
Rendering diagram…
Tricky parts
| Problem | Solution |
|---|---|
| @here notification storms | Rate limits + digest mode enterprise policy |
| Large attachments | Direct-to-S3 presigned uploads |
| Search lag | Near-real-time index; highlighting separate query |
Caveats
- Compliance: FINRA retention; EU data residency per workspace home region.
- Bots abuse — OAuth scopes minimal per app.
Azure
- Azure Communication Services; Microsoft Teams Graph API if integrating vs competing.