SuryanandHome

Video Conferencing (Zoom-class)

Problem statement

Host 1000+ participant webinars, screen share, recording, breakout rooms, with low latency media and region-based media servers.

How it works

  • WebRTC mesh only for tiny calls; SFU (Selective Forwarding Unit) for large calls — server forwards streams without mixing (unless needed).
  • Signaling over WebSocket; TURN servers for NAT traversal.

Analogy: Conference call operator (SFU) passes audio lines through without mixing everyone into one tape unless you ask for MCU mode (expensive mixing).

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
Signaling WS clusterSDP/ICE negotiation channel.WebRTC still needs signaling even though media is peer/SFU routed.
SFU media serversForwards RTP streams without mixing (usually).Scales participant count better than mesh (N² connections).
TURN coturn / TwilioRelay when UDP blocked by NAT/firewall.Connectivity completion; 443/TLS mode for strict networks.
Recorder → S3Captures mixed or per-track recording.Compliance / replay; heavy CPU so isolated autoscaled pool.
Room state RedisEphemeral membership, speaker flags.Fast updates; TTL cleans crashed rooms.

Shared definitions: 00-glossary-common-services.md

Low-level design

Selective forwarding

  • Each client uploads simulcast layers (low/mid/high); SFU picks layer per downstream bandwidth estimate.

Recording

  • Compositor service (headless FFmpeg / GStreamer) subscribes like client; writes HLS chunks to S3; Egress cost heavy.

Scale-out

  • Room affinity sticky to SFU cluster shard; cross-region only if compliance allows.

Security

  • DTLS-SRTP encryption in transit; E2E optional (insertable streams experimental).
  • Waiting room + host kick state in Redis.

E2E: join meeting

Rendering diagram…

Tricky parts

ProblemSolution
UDP blocked networksTURN over TLS 443
Clock sync for recordingNTP + RTCP SR timestamps
CPU hotspotsPer-core SFU pinning; autoscale on packet rate metrics

Caveats

  • Free tier abuseCAPTCHA room creation; rate limits on anonymous meetings.
  • Legal: recording consent banners; data localization for government contracts.

Managed

  • Amazon Chime SDK, Twilio Programmable Video, Azure Communication Services Calling.