SuryanandHome

Database Sharding Strategy

Problem statement

A single PostgreSQL (or monolithic DB) hits CPU/IO ceiling; you need horizontal partition of data with minimal cross-shard joins and resharding story.

How it works

  • Choose shard key (e.g. tenant_id, user_id) → deterministic routing function shard = hash(key) % N.
  • Each shard is independent DB instance; application router or Vitess/Citus proxy sends queries.

Analogy: City postal codes: mail for 560001 always goes to Bangalore sorting office — workers don’t search all of India for every letter.

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
Shard router (Vitess/custom)Computes shard from key; proxies SQL.Lets apps mostly write normal SQL while infra handles scatter-gather where unavoidable.
Shard 0..NIndependent DB instances holding subset of data.Horizontal scale when single instance CPU/IO maxed.
Metadata (etcd/ZK)Stores shard map and migrations.Operators need authoritative routing table during resharding.

Shared definitions: 00-glossary-common-services.md

Low-level design

Shard key selection

  • High cardinality + query locality — avoid hot shard (global config table).
  • Tenant isolation SaaS: tenant_id great for noisy neighbor containment.

Resharding

  • Double-write new + old shards during migration; background backfill; flip read traffic; cutover.
  • Consistent hashing with virtual nodes reduces data movement when adding shard.

Cross-shard queries

  • Avoid if possible; else scatter-gather (expensive) or CQRS read models in warehouse.

Sequences / auto-increment

  • Global uniqueness: Snowflake IDs or UUIDv7no cross-shard sequence.

E2E: point read

Rendering diagram…

Tricky parts

ProblemSolution
Join across usersDenormalize or async materialized view
Transactions spanning shardsSaga; 2PC rarely worth it
Rebalancing painUse managed sharding (Citus, Vitess, Dynamo)

Caveats

  • ORM assumptions break — educate team on explicit shard awareness.
  • Backup/restore now N operations — automate with Infrastructure as Code.

Azure / AWS

  • Cosmos DB partition key; Aurora Limitless (preview patterns); Citus on Azure.