Glossary — common building blocks (what it is & typical why)

Use this page when any design doc mentions a service by name. Each individual design also has a “Components explained — this design” table tying choices to that problem.

Messaging & streaming

Service	What it does	Why teams pick it
Amazon SQS	Managed queue of messages; consumers poll; visibility timeout hides a message while a worker processes it; failed handling can land in a DLQ.	Decouple producers from consumers, smooth spikes, retry work without losing it. FIFO variant exists for strict ordering (lower throughput).
Amazon SNS	Pub/sub fan-out: one publish → many subscribers (email, Lambda, SQS, HTTP).	Notify many systems at once (e.g. “order placed” → email + search indexer + analytics).
Amazon Kinesis / MSK (Kafka)	Ordered, replayable streams partitioned by key; high throughput append-only log.	Event sourcing, cross-service integration, ordering per entity (e.g. `order_id` partition).
Azure Service Bus	Queues + topics (pub/sub) with sessions, dead-letter, duplicate detection.	Enterprise Azure apps needing FIFO-like sessions and rich broker features.
Azure Event Hubs	Kafka-compatible ingest endpoint; tuned for big telemetry firehose.	Device telemetry, clickstream at very high ingress.
Google Pub/Sub	Managed async messaging, push/pull subscribers.	GCP-native decoupling similar to SNS+SQS patterns.

When SQS vs SNS? SQS = one (or competing) consumer pulls a job. SNS = broadcast to many subscribers. Often both: SNS receives event, SQS is a subscriber per downstream pipeline (“fan-out to queues”).

When SQS vs Kafka? SQS: simpler ops, per-message pricing, no long log retention. Kafka: replay, ordering partitions, stream processing (Flink), retention days.

Compute & orchestration

Service	What it does	Why teams pick it
AWS Lambda	Run code on managed containers in response to events; scales to zero; pay per invoke + duration.	Glue between services (webhook handler, S3 upload post-process), bursty low-traffic tasks.
AWS Step Functions	State machine orchestrating Lambda/API calls with retries, parallel, map states.	Sagas, human approval steps, visual workflows; avoids custom orchestration code.
Amazon ECS / EKS	Run Docker on AWS (orchestrated by ECS or Kubernetes).	Long-running services, GPU inference, predictable performance vs Lambda cold starts.
Azure Functions / Durable Functions	Serverless compute; Durable adds orchestration primitives.	Azure equivalent of Lambda + Step Functions patterns.
Google Cloud Run	HTTP services in containers, scale to zero.	Simple container deploy without managing K8s.

Data stores

Service	What it does	Why teams pick it
Amazon DynamoDB	Fully managed NoSQL key-value / document; single-digit ms reads at scale; streams for change capture.	Known access patterns (`pk` + optional `sk`), massive scale, global tables multi-region.
Amazon RDS / Aurora (PostgreSQL)	Managed relational DB with SQL, joins, transactions.	Complex queries, reporting, strong consistency for business invariants (ledger, inventory).
Amazon ElastiCache (Redis)	Managed Redis / Memcached; in-memory sub-ms.	Cache, rate limiting, sessions, leaderboards (sorted sets), pub/sub for small fan-out.
Amazon S3	Object storage: bytes + metadata; 11 nines durability; lifecycle to cheaper tiers.	Files, logs, backups, static assets, data lake landing zone.
Azure Blob Storage / Cosmos DB / Cache	Blob + multi-model global DB + Redis-like cache.	Same roles on Azure.

When DynamoDB vs RDS? DynamoDB when access is key-based and you want ops-free scale. RDS when you need ad hoc SQL, migrations, constraints.

Edge, networking & security

Service	What it does	Why teams pick it
Amazon CloudFront	CDN: caches HTTP responses at edge PoPs near users; signed URLs.	Lower latency, offload origin, DDoS absorption with AWS Shield.
AWS WAF	Web Application Firewall: rules on HTTP(S) (IP match, SQLi patterns, rate-based).	Block abuse at edge before it hits your app servers.
AWS API Gateway	Front door for HTTP APIs: auth, throttling, routing to Lambda/HTTP/VPC.	Centralize cross-cutting API concerns; usage plans for partners.
Amazon Cognito	User pools (signup/login, MFA) + identity pools (temporary AWS creds).	Don’t build auth from scratch; JWT for microservices.
Azure Front Door + APIM + Entra ID	CDN/WAF + API management + enterprise identity.	Same pattern on Microsoft stack.

Search, analytics & observability

Service	What it does	Why teams pick it
OpenSearch / Elasticsearch	Inverted index search + aggregations.	Full-text search, log search.
Amazon Athena / S3 + Parquet	SQL over S3 without loading a warehouse first.	Ad hoc analytics, cheap scans with columnar formats.
Prometheus + Grafana	Metrics time series + dashboards/alerts.	SLOs, infra & app golden signals (latency, traffic, errors, saturation).
OpenTelemetry + Collector	Vendor-neutral traces/metrics/logs export.	Avoid lock-in to one APM vendor; standard context propagation.

Media & ML

Service	What it does	Why teams pick it
AWS Elemental MediaConvert	Video transcoding to ABR ladders (HLS/DASH).	Broadcast-grade pipelines without running FFmpeg fleets yourself.
Amazon SageMaker	Train + host ML models (notebooks, endpoints).	Managed inference with autoscaling when you outgrow DIY TorchServe.

How to read the per-doc tables

Each numbered design includes “Components explained — this design”: every row ties a named box or integration in that architecture to what it does and why that choice fits that problem. If a term appears only once, the row is the source of truth for that doc; this glossary covers cross-cutting definitions.