Distributed ID Generator (Snowflake / ULID)

Problem statement

Generate 64-bit (or 128-bit) unique, time-sortable IDs at millions/sec across many servers without coordination hot spots or clock regression disasters.

How it works

Snowflake layout (example 64-bit):

| 1 sign | 41 bits timestamp ms | 5 bits datacenter | 5 bits worker | 12 bits sequence |

Epoch custom (Twitter 2010) to extend time range.
Sequence increments within same millisecond; rolls next ms if overflow.

Analogy: Ticket numbering machine at deli: date stamp + counter + which register printed — globally unique without asking a central database each time.

High-level design

Rendering diagram…

Components explained — this design

Component	What it is	Why we use it here
ID service pods	Stateless servers embedding snowflake algorithm.	No DB round-trip per ID generation at core hot path.
Load balancer	L4 distribution across pods.	Even spread; health checks remove bad instances.
Worker ID allocator (ZK/Dynamo)	Assigns unique `workerId` bits safely.	Prevents two machines using same worker bits → collision.
In-process generator loop	Batches IDs in memory for callers.	Reduces syscall / lock overhead for ultra-high QPS clients.

Shared definitions: 00-glossary-common-services.md

Low-level design

Worker ID assignment

Kubernetes StatefulSet ordinal as worker id (fragile if replicas reshuffled) — better: lease table in DynamoDB PutItem if_not_exists worker#i.

Clock sync

NTP monitoring; if clock moves backward, wait until caught up or panic to avoid duplicate IDs (better fail than duplicate payments).

Alternatives

Format	Pros	Cons
UUIDv4	Easy, global	Not sortable
UUIDv7	Sortable, standard	128-bit wider indexes
DB sequence	Strong ordering	DB bottleneck

Database impact

B-tree index friendly IDs reduce random insert fragmentation vs random UUIDv4 — prefer time-leading keys.

E2E: generate batch

Rendering diagram…

Tricky parts

Problem	Solution
Leap seconds / smear	Google TrueTime only in Spanner; else sequence bits absorb
Multi-region uniqueness	Datacenter bits encode region
Exposure of sequence	Do not use if enumeration is security issue — use opaque ULID

Caveats

Javascript Number precision — always string IDs in JSON for > 2^53.
Clock trust — container wall clock can jump after VM migration — monitor.

Managed

KSUID, Sonyflake, Snowflake IDs from DB (nextval) acceptable at moderate scale.