Transactional Email at Scale
Problem statement
Send password resets, receipts, alerts reliably with templates, bounce/complaint handling, deliverability, and unsubscribe compliance.
How it works
- Queue outgoing sends; worker renders template + calls SES / SendGrid.
- Webhooks from provider update suppression list (bounces, complaints).
- Dedicated IPs + warm-up plan when volume high.
Analogy: Postal sorting facility: letters (emails) are barcoded (Message-ID), tracked through scans (webhooks), and blacklisted addresses (complaints) are removed from future drops.
High-level design
Rendering diagram…
Components explained — this design
| Component | What it is | Why we use it here |
|---|---|---|
| SQS send-queue | Buffer of email jobs. | Smooths spikes (marketing blasts); workers scale on queue depth. |
| Mail worker | Renders template + calls SES API. | Centralizes retry, bounce parsing, rate limit per provider. |
| Templates in S3 | Versioned HTML/text bodies. | Marketing/legal can update copy without redeploying worker code (with review gates). |
| Amazon SES | SMTP/API email sending with reputation metrics. | Cheap at scale vs SMTP appliances; integrates with SNS bounce/complaint topics. |
| SNS webhooks → Lambda | Event-driven updates to suppression lists. | Instant opt-out on complaint; keeps you CAN-SPAM compliant. |
| Suppression DynamoDB | Fast lookup “never send to this address”. | O(1) checks before enqueue; TTL for temporary blocks if desired. |
Shared definitions: 00-glossary-common-services.md
Low-level design
Templates
- MJML → HTML build step in CI; localization JSON per locale.
- Inline CSS for email client quirks; plain text alternative part.
Authentication
- SPF, DKIM, DMARC DNS records mandatory for inbox placement.
- BIMI optional brand logo if DMARC strict.
Throttling
- SES sending limits per account — request limit increase; sharding across verified domains.
Secrets
- SMTP credentials in Secrets Manager; rotate.
E2E: bounce handling
Rendering diagram…
Tricky parts
| Problem | Solution |
|---|---|
| Duplicate sends | Idempotency-Key in queue message dedupe window |
| Wrong “from” domain | MAIL FROM domain alignment with DKIM |
| Embargoed countries | Geo IP block in orchestration layer |
Caveats
- Marketing vs transactional streams — separate IPs/subdomains (
mail.vspromo.). - Gmail promotions tab — not a failure; engagement metrics differ.
Azure
- Azure Communication Services Email; SendGrid on Azure Marketplace.