SuryanandHome

Team Collaboration (Slack-like)

Problem statement

Workspaces, channels, threads, DMs, search, file uploads, integrations (bots), and enterprise SSO at large org scale.

How it works

  • Real-time: WebSocket gateway + channel fan-out via pub/sub (Redis / dedicated service mesh).
  • Persistence: messages partitioned by channel_id + time; search async index.

Analogy: Office building intercom per floor (channel) with recording (history) and archive room (search).

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
WebSocket gateway fleetStateful connection tier.Push model for chat; scale with connection counts not just RPS.
Redis pub/sub or NATSCross-gateway fan-out of messages.If user A’s socket is on gateway2, publish ensures delivery without sticky-only routing.
Message service + Cassandra/DynamoDurable chat history partitioned by channel.Write-heavy append logs benefit from wide-partition NoSQL.
Kafka → OpenSearchAsync indexing for search.Decouples search lag from message send latency.
S3 attachmentsFile blobs out of DB.Cost + size management; virus scan async.
Okta SAML → CognitoEnterprise SSO into JWT for services.B2B requirement; centralizes group → channel ACL mapping.

Shared definitions: 00-glossary-common-services.md

Low-level design

Message ordering

  • Per-channel sequence from single Kafka partition or DB serial per channel — hot channel issue mitigated by sharding channel to logical partitions.

Presence & typing

  • Ephemeral keys in Redis; TTL heartbeat.

Enterprise features

  • eDiscovery export to S3 with legal hold flags in PostgreSQL metadata.
  • DLP scanning attachments via Macie / Microsoft Purview connectors.

Integrations

  • Outgoing webhooks + incoming slash commands behind signed requests (HMAC).

E2E: post message to channel

Rendering diagram…

Tricky parts

ProblemSolution
@here notification stormsRate limits + digest mode enterprise policy
Large attachmentsDirect-to-S3 presigned uploads
Search lagNear-real-time index; highlighting separate query

Caveats

  • Compliance: FINRA retention; EU data residency per workspace home region.
  • Bots abuseOAuth scopes minimal per app.

Azure

  • Azure Communication Services; Microsoft Teams Graph API if integrating vs competing.