Video Conferencing (Zoom-class)
Problem statement
Host 1000+ participant webinars, screen share, recording, breakout rooms, with low latency media and region-based media servers.
How it works
- WebRTC mesh only for tiny calls; SFU (Selective Forwarding Unit) for large calls — server forwards streams without mixing (unless needed).
- Signaling over WebSocket; TURN servers for NAT traversal.
Analogy: Conference call operator (SFU) passes audio lines through without mixing everyone into one tape unless you ask for MCU mode (expensive mixing).
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| Signaling WS cluster | SDP/ICE negotiation channel. | WebRTC still needs signaling even though media is peer/SFU routed. |
| SFU media servers | Forwards RTP streams without mixing (usually). | Scales participant count better than mesh (N² connections). |
| TURN coturn / Twilio | Relay when UDP blocked by NAT/firewall. | Connectivity completion; 443/TLS mode for strict networks. |
| Recorder → S3 | Captures mixed or per-track recording. | Compliance / replay; heavy CPU so isolated autoscaled pool. |
| Room state Redis | Ephemeral membership, speaker flags. | Fast updates; TTL cleans crashed rooms. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Selective forwarding
- Each client uploads simulcast layers (low/mid/high); SFU picks layer per downstream bandwidth estimate.
Recording
- Compositor service (headless FFmpeg / GStreamer) subscribes like client; writes HLS chunks to S3; Egress cost heavy.
Scale-out
- Room affinity sticky to SFU cluster shard; cross-region only if compliance allows.
Security
- DTLS-SRTP encryption in transit; E2E optional (insertable streams experimental).
- Waiting room + host kick state in Redis.
E2E: join meeting
Rendering diagram…
Tricky parts
| Problem | Solution |
|---|---|
| UDP blocked networks | TURN over TLS 443 |
| Clock sync for recording | NTP + RTCP SR timestamps |
| CPU hotspots | Per-core SFU pinning; autoscale on packet rate metrics |
Caveats
- Free tier abuse — CAPTCHA room creation; rate limits on anonymous meetings.
- Legal: recording consent banners; data localization for government contracts.
Managed
- Amazon Chime SDK, Twilio Programmable Video, Azure Communication Services Calling.