System Design Interview Playbook

Problem statement

In 45–60 minutes, communicate a clear design, tradeoffs, and depth on demand without drowning in premature microservices or buzzwords without semantics.

How the interview “works”

Clarify requirements (functional, scale, consistency, latency).
Sketch high-level boxes + data flow in 5–10 minutes.
Deep dive where interviewer probes — usually storage, scaling hot path, failure modes.

Analogy: Architecture studio critique — professor cares about load-bearing walls (bottlenecks) and evacuation routes (failures), not tile grout color (logo on gateway).

High-level process diagram

Rendering diagram…

Low-level checklist (what “good” contains)

Requirements questions

Read vs write ratio? Consistency vs availability priority?
Latency p99 targets? Global or single region?
Compliance (PII, HIPAA, PCI)?

Back-of-envelope

QPS, storage/day, bandwidth, fan-out — round numbers OK; show reasoning.

Core building blocks

Concern	Typical tools (pick & justify)
Object blobs	S3 / Blob
Hot KV / cache	Redis / Memcached
OLTP	PostgreSQL / DynamoDB
Search	OpenSearch
Stream	Kafka / Kinesis / Event Hubs
Async work	SQS / Service Bus
Edge	CloudFront / Front Door
Auth	Cognito / Entra ID

Failure modes (always mention)

Partial outages — degrade (read-only mode), circuit breakers, bulkheads.
Duplicate events — idempotency keys, outbox.
Hot keys / partitions — sharding, caching, replication.

Example pacing (45 minutes)

Mermaid Gantt dateFormat varies by renderer; below is an equivalent flowchart (portable) and a table you can reuse in interviews.

Rendering diagram…

Phase	Minutes (guide)	Goal
Clarify	~0–8	Scope, NFRs, constraints
Envelope	~8–14	Rough capacity sanity
Diagram	~14–24	Boxes + data paths
Depth	~24–40	Storage, scale, failures
Close	~40–45	Recap + open questions

Components explained — this design

Item in diagram	What it is	Why it appears here
Clarify requirements	Interview phase, not a product.	You de-risk wrong design by locking read/write ratio, consistency, latency, compliance before drawing boxes.
Back-of-envelope	Rough QPS, storage, bandwidth estimates.	Interviewers want quantitative thinking; numbers justify Kafka vs SQS, SQL vs NoSQL, etc.
High-level diagram	First architecture sketch.	Shows you can decompose without diving into premature microservices.
Deep dives	Storage, hot paths, failure modes.	Where senior signal lives: tradeoffs, not buzzwords.
Tradeoffs and close	CAP / cost / ops honesty + summary.	Demonstrates you know nothing is free (e.g. global consistency vs latency).

Shared definitions: 00-glossary-common-services.md

Tricky parts (meta)

Trap	Fix
Buzzword soup	Every tech named gets one job sentence
Overfitting CAP	Relate to concrete user-visible symptom
Ignoring cost	Mention $ egress, managed vs DIY ops
No numbers	Even rough numbers beat silence

Caveats

Interview ≠ production — pragmatic MVP first, “if scale 10×” second chapter.
Team skill is a constraint — boring proven beats novel risky unless startup explicitly wants R&D.

Quick tradeoff cheatsheet

SQL vs NoSQL: joins vs partition key access pattern clarity.
Sync vs async: user waits vs eventual UX copy.
Strong vs eventual: money vs social like counts.

Closing template

Summarize: APIs + storage + async path + scaling lever + two failure modes you handle. Invite questions.

You now have 50 companion docs in system-design/ — cross-link topics (e.g. payments + saga + idempotency) when studying for depth.