SuryanandHome

System Design Interview Playbook

Problem statement

In 45–60 minutes, communicate a clear design, tradeoffs, and depth on demand without drowning in premature microservices or buzzwords without semantics.

How the interview “works”

  1. Clarify requirements (functional, scale, consistency, latency).
  2. Sketch high-level boxes + data flow in 5–10 minutes.
  3. Deep dive where interviewer probes — usually storage, scaling hot path, failure modes.

Analogy: Architecture studio critique — professor cares about load-bearing walls (bottlenecks) and evacuation routes (failures), not tile grout color (logo on gateway).

High-level process diagram

Rendering diagram…

Low-level checklist (what “good” contains)

Requirements questions

  • Read vs write ratio? Consistency vs availability priority?
  • Latency p99 targets? Global or single region?
  • Compliance (PII, HIPAA, PCI)?

Back-of-envelope

  • QPS, storage/day, bandwidth, fan-out — round numbers OK; show reasoning.

Core building blocks

ConcernTypical tools (pick & justify)
Object blobsS3 / Blob
Hot KV / cacheRedis / Memcached
OLTPPostgreSQL / DynamoDB
SearchOpenSearch
StreamKafka / Kinesis / Event Hubs
Async workSQS / Service Bus
EdgeCloudFront / Front Door
AuthCognito / Entra ID

Failure modes (always mention)

  • Partial outagesdegrade (read-only mode), circuit breakers, bulkheads.
  • Duplicate eventsidempotency keys, outbox.
  • Hot keys / partitionssharding, caching, replication.

Example pacing (45 minutes)

Mermaid Gantt dateFormat varies by renderer; below is an equivalent flowchart (portable) and a table you can reuse in interviews.

Rendering diagram…
PhaseMinutes (guide)Goal
Clarify~0–8Scope, NFRs, constraints
Envelope~8–14Rough capacity sanity
Diagram~14–24Boxes + data paths
Depth~24–40Storage, scale, failures
Close~40–45Recap + open questions

Components explained — this design

Item in diagramWhat it isWhy it appears here
Clarify requirementsInterview phase, not a product.You de-risk wrong design by locking read/write ratio, consistency, latency, compliance before drawing boxes.
Back-of-envelopeRough QPS, storage, bandwidth estimates.Interviewers want quantitative thinking; numbers justify Kafka vs SQS, SQL vs NoSQL, etc.
High-level diagramFirst architecture sketch.Shows you can decompose without diving into premature microservices.
Deep divesStorage, hot paths, failure modes.Where senior signal lives: tradeoffs, not buzzwords.
Tradeoffs and closeCAP / cost / ops honesty + summary.Demonstrates you know nothing is free (e.g. global consistency vs latency).

Shared definitions: 00-glossary-common-services.md

Tricky parts (meta)

TrapFix
Buzzword soupEvery tech named gets one job sentence
Overfitting CAPRelate to concrete user-visible symptom
Ignoring costMention $ egress, managed vs DIY ops
No numbersEven rough numbers beat silence

Caveats

  • Interview ≠ productionpragmatic MVP first, “if scale 10×” second chapter.
  • Team skill is a constraint — boring proven beats novel risky unless startup explicitly wants R&D.

Quick tradeoff cheatsheet

  • SQL vs NoSQL: joins vs partition key access pattern clarity.
  • Sync vs async: user waits vs eventual UX copy.
  • Strong vs eventual: money vs social like counts.

Closing template

Summarize: APIs + storage + async path + scaling lever + two failure modes you handle. Invite questions.


You now have 50 companion docs in system-design/ — cross-link topics (e.g. payments + saga + idempotency) when studying for depth.