Glossary — common building blocks (what it is & typical why)
Use this page when any design doc mentions a service by name. Each individual design also has a “Components explained — this design” table tying choices to that problem.
Messaging & streaming
| Service | What it does | Why teams pick it |
|---|
| Amazon SQS | Managed queue of messages; consumers poll; visibility timeout hides a message while a worker processes it; failed handling can land in a DLQ. | Decouple producers from consumers, smooth spikes, retry work without losing it. FIFO variant exists for strict ordering (lower throughput). |
| Amazon SNS | Pub/sub fan-out: one publish → many subscribers (email, Lambda, SQS, HTTP). | Notify many systems at once (e.g. “order placed” → email + search indexer + analytics). |
| Amazon Kinesis / MSK (Kafka) | Ordered, replayable streams partitioned by key; high throughput append-only log. | Event sourcing, cross-service integration, ordering per entity (e.g. order_id partition). |
| Azure Service Bus | Queues + topics (pub/sub) with sessions, dead-letter, duplicate detection. | Enterprise Azure apps needing FIFO-like sessions and rich broker features. |
| Azure Event Hubs | Kafka-compatible ingest endpoint; tuned for big telemetry firehose. | Device telemetry, clickstream at very high ingress. |
| Google Pub/Sub | Managed async messaging, push/pull subscribers. | GCP-native decoupling similar to SNS+SQS patterns. |
When SQS vs SNS? SQS = one (or competing) consumer pulls a job. SNS = broadcast to many subscribers. Often both: SNS receives event, SQS is a subscriber per downstream pipeline (“fan-out to queues”).
When SQS vs Kafka? SQS: simpler ops, per-message pricing, no long log retention. Kafka: replay, ordering partitions, stream processing (Flink), retention days.
Compute & orchestration
| Service | What it does | Why teams pick it |
|---|
| AWS Lambda | Run code on managed containers in response to events; scales to zero; pay per invoke + duration. | Glue between services (webhook handler, S3 upload post-process), bursty low-traffic tasks. |
| AWS Step Functions | State machine orchestrating Lambda/API calls with retries, parallel, map states. | Sagas, human approval steps, visual workflows; avoids custom orchestration code. |
| Amazon ECS / EKS | Run Docker on AWS (orchestrated by ECS or Kubernetes). | Long-running services, GPU inference, predictable performance vs Lambda cold starts. |
| Azure Functions / Durable Functions | Serverless compute; Durable adds orchestration primitives. | Azure equivalent of Lambda + Step Functions patterns. |
| Google Cloud Run | HTTP services in containers, scale to zero. | Simple container deploy without managing K8s. |
Data stores
| Service | What it does | Why teams pick it |
|---|
| Amazon DynamoDB | Fully managed NoSQL key-value / document; single-digit ms reads at scale; streams for change capture. | Known access patterns (pk + optional sk), massive scale, global tables multi-region. |
| Amazon RDS / Aurora (PostgreSQL) | Managed relational DB with SQL, joins, transactions. | Complex queries, reporting, strong consistency for business invariants (ledger, inventory). |
| Amazon ElastiCache (Redis) | Managed Redis / Memcached; in-memory sub-ms. | Cache, rate limiting, sessions, leaderboards (sorted sets), pub/sub for small fan-out. |
| Amazon S3 | Object storage: bytes + metadata; 11 nines durability; lifecycle to cheaper tiers. | Files, logs, backups, static assets, data lake landing zone. |
| Azure Blob Storage / Cosmos DB / Cache | Blob + multi-model global DB + Redis-like cache. | Same roles on Azure. |
When DynamoDB vs RDS? DynamoDB when access is key-based and you want ops-free scale. RDS when you need ad hoc SQL, migrations, constraints.
Edge, networking & security
| Service | What it does | Why teams pick it |
|---|
| Amazon CloudFront | CDN: caches HTTP responses at edge PoPs near users; signed URLs. | Lower latency, offload origin, DDoS absorption with AWS Shield. |
| AWS WAF | Web Application Firewall: rules on HTTP(S) (IP match, SQLi patterns, rate-based). | Block abuse at edge before it hits your app servers. |
| AWS API Gateway | Front door for HTTP APIs: auth, throttling, routing to Lambda/HTTP/VPC. | Centralize cross-cutting API concerns; usage plans for partners. |
| Amazon Cognito | User pools (signup/login, MFA) + identity pools (temporary AWS creds). | Don’t build auth from scratch; JWT for microservices. |
| Azure Front Door + APIM + Entra ID | CDN/WAF + API management + enterprise identity. | Same pattern on Microsoft stack. |
Search, analytics & observability
| Service | What it does | Why teams pick it |
|---|
| OpenSearch / Elasticsearch | Inverted index search + aggregations. | Full-text search, log search. |
| Amazon Athena / S3 + Parquet | SQL over S3 without loading a warehouse first. | Ad hoc analytics, cheap scans with columnar formats. |
| Prometheus + Grafana | Metrics time series + dashboards/alerts. | SLOs, infra & app golden signals (latency, traffic, errors, saturation). |
| OpenTelemetry + Collector | Vendor-neutral traces/metrics/logs export. | Avoid lock-in to one APM vendor; standard context propagation. |
Media & ML
| Service | What it does | Why teams pick it |
|---|
| AWS Elemental MediaConvert | Video transcoding to ABR ladders (HLS/DASH). | Broadcast-grade pipelines without running FFmpeg fleets yourself. |
| Amazon SageMaker | Train + host ML models (notebooks, endpoints). | Managed inference with autoscaling when you outgrow DIY TorchServe. |
How to read the per-doc tables
Each numbered design includes “Components explained — this design”: every row ties a named box or integration in that architecture to what it does and why that choice fits that problem. If a term appears only once, the row is the source of truth for that doc; this glossary covers cross-cutting definitions.