SuryanandHome

IoT Telemetry & Device Management

Problem statement

Ingest millions of devices sending high-frequency telemetry, support commands downlink, OTA firmware, and rules (alert if temp > X).

How it works

  • MQTT broker (AWS IoT Core / Azure IoT Hub) for pub/sub per device topic hierarchy.
  • Hot path: Kinesis / Event HubsFlink aggregations → storage (TSDB / data lake).

Analogy: Smart electric meters phoning home every minute — the utility needs a call center (broker) that never drops calls and a billing warehouse (data lake) for history.

High-level design

Rendering diagram…

Components explained — this design

ComponentWhat it isWhy we use it here
AWS IoT Core / IoT HubMQTT broker + device registry + rules.Protocol translation, per-device auth, rules without custom broker ops.
Rules engine SQLRoutes messages to Kinesis/S3/Lambda based on topic attrs.Low-code routing for ops teams; still needs code review for safety.
Kinesis / Event HubsDurable high-throughput stream.Millions msgs/sec ingest with replay for new analytics jobs.
Flink / Kinesis AnalyticsStateful stream processing (windows, joins).Real-time alerts on sensor thresholds across time windows.
Timestream / ADXTime-series optimized query store.Operational dashboards faster than scanning raw Parquet.
S3 archiveCheap long-term raw retention.Regulatory retention + ML training exports.

Shared definitions: 00-glossary-common-services.md

Low-level design

Topic naming

  • tenant/{tid}/device/{did}/telemetryACL per certificate CN=deviceId.

Security

  • X.509 certs per device; JITR registration; rotate certs OTA.
  • Private keys in TPM/secure element on device when possible.

Backpressure

  • QoS1 MQTT can backlog — device-side queue bounded; drop policy documented.

OTA

  • Signed firmware packages in S3; Jobs API tracks rollout percentage waves.

E2E: telemetry to alert

Rendering diagram…

Tricky parts

ProblemSolution
Clock skew on devicesServer ingest time authoritative; device time metadata
Replay attacksTLS + nonce in signed payloads
Firmware brickA/B partitions + hardware rollback pin

Caveats

  • Cost at scalerules to downsample before lake; pay per message pricing models hurt naive designs.
  • Privacy: home address inference from device clusters — aggregate geospatially.

Azure

  • Azure IoT Hub device twins; Azure Digital Twins for modeling relationships.