Leader Election & Singleton Workers
Problem statement
Ensure exactly one active processor for a task (cron aggregation, stream partition consumer leader) among N redundant instances without manual failover.
How it works
- Acquire lease in coordination service; renew periodically; standby instances idle until lease lost.
- Kubernetes Lease API preferred in K8s; else etcd, Consul sessions, DynamoDB conditional writes.
Analogy: Radio walkie-talkie rule: only one person holds the push-to-talk token; when they release or go silent too long, another may take over.
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| Pods A/B/C | Redundant workers; only one should run cron. | HA without duplicate side effects (double billing). |
| K8s Lease API | Distributed lock object with renewTime. | Native Kubernetes coordination; avoids embedding Zookeeper for simple cases. |
| Controller loop renew | Periodic patch to extend lease. | If renew stops (crash), another pod acquires lease after leaseDurationSeconds. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Kubernetes native
coordination.k8s.io/v1 Leaseobject withspec.holderIdentity+renewTime.- Client uses client-go
LeaderElectorwith leaseDuration, renewDeadline, retryPeriod.
DynamoDB lock
- Item
pk=leader/myjob, attributesowner,leaseUntil. - Conditional update
leaseUntil < now OR owner=selfto steal after expiry.
Split-brain risk
- Wall clock skew can cause two leaders briefly — fencing tokens for writes to downstream systems.
E2E: failover
Rendering diagram…
Tricky parts
| Problem | Solution |
|---|---|
| Work duplication during failover | Idempotent processing + external fencing |
| Zombie leader after long GC | Short lease + aggressive renew tuned to p99 GC |
| Thundering herd on steal | Randomized backoff before acquire attempt |
Caveats
- Do not use database row as mutex without TTL — stuck transactions deadlock everyone.
- SQS visibility timeout leader pattern is fragile — prefer Kinesis enhanced fan-out with single consumer per shard model.
Azure
- Azure Blob leases (breakable); Service Bus sessions for exclusive message processing.