SuryanandHome

Distributed ID Generator (Snowflake / ULID)

Problem statement

Generate 64-bit (or 128-bit) unique, time-sortable IDs at millions/sec across many servers without coordination hot spots or clock regression disasters.

How it works

Snowflake layout (example 64-bit):

| 1 sign | 41 bits timestamp ms | 5 bits datacenter | 5 bits worker | 12 bits sequence |
  • Epoch custom (Twitter 2010) to extend time range.
  • Sequence increments within same millisecond; rolls next ms if overflow.

Analogy: Ticket numbering machine at deli: date stamp + counter + which register printed — globally unique without asking a central database each time.

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
ID service podsStateless servers embedding snowflake algorithm.No DB round-trip per ID generation at core hot path.
Load balancerL4 distribution across pods.Even spread; health checks remove bad instances.
Worker ID allocator (ZK/Dynamo)Assigns unique workerId bits safely.Prevents two machines using same worker bits → collision.
In-process generator loopBatches IDs in memory for callers.Reduces syscall / lock overhead for ultra-high QPS clients.

Shared definitions: 00-glossary-common-services.md

Low-level design

Worker ID assignment

  • Kubernetes StatefulSet ordinal as worker id (fragile if replicas reshuffled) — better: lease table in DynamoDB PutItem if_not_exists worker#i.

Clock sync

  • NTP monitoring; if clock moves backward, wait until caught up or panic to avoid duplicate IDs (better fail than duplicate payments).

Alternatives

FormatProsCons
UUIDv4Easy, globalNot sortable
UUIDv7Sortable, standard128-bit wider indexes
DB sequenceStrong orderingDB bottleneck

Database impact

  • B-tree index friendly IDs reduce random insert fragmentation vs random UUIDv4 — prefer time-leading keys.

E2E: generate batch

Rendering diagram…

Tricky parts

ProblemSolution
Leap seconds / smearGoogle TrueTime only in Spanner; else sequence bits absorb
Multi-region uniquenessDatacenter bits encode region
Exposure of sequenceDo not use if enumeration is security issue — use opaque ULID

Caveats

  • Javascript Number precision — always string IDs in JSON for > 2^53.
  • Clock trustcontainer wall clock can jump after VM migration — monitor.

Managed

  • KSUID, Sonyflake, Snowflake IDs from DB (nextval) acceptable at moderate scale.