Collaborative Document Editor (Google Docs–class)
Problem statement
Multiple users edit rich text concurrently with low latency, eventual consistency, undo, and offline support.
How it works
Core idea: Operational Transformation (OT) or CRDTs (Yjs, Automerge). Each keystroke is an operation; server transforms concurrent ops so everyone converges to the same document.
Analogy: Two people editing the same Wikipedia paragraph — the wiki engine merges edits so neither overwrite is silently lost; CRDTs automate that merge mathematically.
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| Browser editor | Runs CRDT/OT client; captures ops. | Responsiveness requires local merges before server ack. |
| WebSocket gateway | Maintains persistent connection per doc/session. | Low-latency op broadcast; horizontal scale via Redis pub/sub between gateway instances. |
| Document service + CRDT engine | Validates and orders / merges operations. | Authority on conflict resolution; persists to log. |
| Kafka op log | Durable ordered stream per document_id partition. | Enables replay for snapshots, new replicas, and audit. |
| Snapshotter → S3 | Periodic compacted state. | Avoids replaying from t=0 on cold start; bounded recovery time. |
| Presence Redis | Ephemeral cursors/selections. | Not worth durable DB; TTL matches disconnect cleanup. |
Shared definitions: 00-glossary-common-services.md
Low-level design
CRDT vs OT
| Approach | Pros | Cons |
|---|---|---|
| CRDT | Offline-first, simpler server | Larger metadata; garbage collection |
| OT | Smaller messages historically | Central server transform complexity |
Modern default: Yjs CRDT with WebRTC P2P sync optional + WebSocket server for authority.
Persistence
- Operation log in Kafka (partition =
document_id) for total ordering option. - Periodic snapshots to S3 + checkpoint in PostgreSQL for fast cold start.
Presence
- Redis pub/sub per document room for cursor positions (ephemeral).
AuthZ
- Document ACL in PostgreSQL; Cognito groups map to editor / viewer.
E2E: two users type concurrently
Rendering diagram…
Tricky parts
| Problem | Solution |
|---|---|
| Large documents | Sub-CRDT per paragraph; lazy load |
| Malicious payload | Schema validation on ops; size caps |
| Undo across peers | CRDT undo stack per client session |
Caveats
- True WYSIWYG with comments + suggestions explodes complexity — phase features.
- Export to Word/PDF is separate render farm (headless Chromium or Pandoc).
Managed
- Microsoft 365 / Google Workspace APIs if embedding rather than building from scratch.