SuryanandHome

Collaborative Document Editor (Google Docs–class)

Problem statement

Multiple users edit rich text concurrently with low latency, eventual consistency, undo, and offline support.

How it works

Core idea: Operational Transformation (OT) or CRDTs (Yjs, Automerge). Each keystroke is an operation; server transforms concurrent ops so everyone converges to the same document.

Analogy: Two people editing the same Wikipedia paragraph — the wiki engine merges edits so neither overwrite is silently lost; CRDTs automate that merge mathematically.

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
Browser editorRuns CRDT/OT client; captures ops.Responsiveness requires local merges before server ack.
WebSocket gatewayMaintains persistent connection per doc/session.Low-latency op broadcast; horizontal scale via Redis pub/sub between gateway instances.
Document service + CRDT engineValidates and orders / merges operations.Authority on conflict resolution; persists to log.
Kafka op logDurable ordered stream per document_id partition.Enables replay for snapshots, new replicas, and audit.
Snapshotter → S3Periodic compacted state.Avoids replaying from t=0 on cold start; bounded recovery time.
Presence RedisEphemeral cursors/selections.Not worth durable DB; TTL matches disconnect cleanup.

Shared definitions: 00-glossary-common-services.md

Low-level design

CRDT vs OT

ApproachProsCons
CRDTOffline-first, simpler serverLarger metadata; garbage collection
OTSmaller messages historicallyCentral server transform complexity

Modern default: Yjs CRDT with WebRTC P2P sync optional + WebSocket server for authority.

Persistence

  • Operation log in Kafka (partition = document_id) for total ordering option.
  • Periodic snapshots to S3 + checkpoint in PostgreSQL for fast cold start.

Presence

  • Redis pub/sub per document room for cursor positions (ephemeral).

AuthZ

  • Document ACL in PostgreSQL; Cognito groups map to editor / viewer.

E2E: two users type concurrently

Rendering diagram…

Tricky parts

ProblemSolution
Large documentsSub-CRDT per paragraph; lazy load
Malicious payloadSchema validation on ops; size caps
Undo across peersCRDT undo stack per client session

Caveats

  • True WYSIWYG with comments + suggestions explodes complexity — phase features.
  • Export to Word/PDF is separate render farm (headless Chromium or Pandoc).

Managed

  • Microsoft 365 / Google Workspace APIs if embedding rather than building from scratch.