Ambitious Internet of Things applications have one thing in common: They produce massive amounts of data. But how to design the next-generation IoT backend that is able to meet the business requirements and doesn’t explode as soon as the traffic increases? This talk will cover how to use MQTT to connect millions of devices with commodity servers and process huge amounts of data. Learn all the common design patterns and see the technologies that actually scale. Explore when to use Cassandra, Kafka, Spark, Docker, and other tools and when to stick with your good ol’ SQL database or Enterprise Message Queue.
5. Introduction
• HiveMQ CTO
• Strong background in distributed
and large scale systems
architecture
• OASIS MQTT TC Member
• Author of “The Technical
Foundations of IoT”
• Conference Speaker
• Program committee member for
German and international IoT
conferences
Dominik
Obermaier
@dobermai
8. What is different
Small data “packets“,but a lot of them
Large amount of concurrently connected devices
Machine communication without human interaction
High grade of automation
Continuous flow of data
10. Dimensions of data
GB/hour
GB/day
GB/month
GB/year
1001 msg / minute
each 1kB in size
0.006
Mbps
0.14
4
53
0.01
number of devices
1,000 10,000 100,000
0.06 0.6 6
1.4
43
526
0.1
14
432
5,256
1.3
144
4,320
52,560
13
1,000,000
60
1,440
43,200
525,600
133
11. Dimensions of data
GB/hour
GB/day
GB/month
GB/year
100 1,000 10,000 100,0001 msg / second
each 1kB in size
0.36 3.6 36 360
Mbps
8.64
259
3,154
0.8
86.4
2,592
31,536
8
864
25,920
315,360
80
8,640
259,200
3,153,600
800
1,000,000
3,600
86,400
2,592,000
31,536,000
8,000
number of devices
12. 1,000,000 msg/s: 2.5 PB/month
whole Internet Traffic in 1996: 1.9 PB/month
Source: https://en.wikipedia.org/wiki/Internet_traffic
Putting this into perspective
100,000 msg/s: 8.6 TB/day
1 server hard drive per day
13. IOT CHALLENGES
➤ Unreliable communication channels (e.g.
mobile)
➤ Constrained Devices
➤ Low Bandwidth and High Latency
environments
➤ Bi-directional communication required
➤ Security
➤ Instantaneous data exchange
15. Load Balancer MQTT Broker Data Store
Stream Processing Batch Processing Queues / Logs
Typical Components
16. Load Balancing Data Ingestion Stream Analytics Data Storage
Batch Processing
High Level IoT Data
Processing Architecture
17. Load Balancer
Enables distribution of
workloads across
multiple computing
resources
First line of defense
Central component for
horizontal scaling,high
availability and
resiliency
18. MQTT Broker
Central Communication
software for Internet
connected devices
Distributes raw messages
to the backend
Allows bi-directional communication
for device-to-cloud and cloud-to-device
communication
19. (Big) Data Store
Multiple data stores may be
needed,persistent and ephemeral:
Data lake, metadata stores,data
marts,caches
Raw data as well as
refined data may be
stored
IoT architectures almost
always deal with Big Data
problems
20. Queues / Logs
Decoupling of data
ingestion and processing
Add fault-tolerance
Persistent and in-memory
options
24. Stream vs Batch
Processing
Running Sushi Candy Store
Stream Batch
Supply
Access
Selection
Running Sushi
virtually endless
sequential, as items pass by
continuous
Candy Store
limited by available containers
random access
one-off selection
Source: https://www.mapr.com/blog/data-processing-vocabulary-101-key-terms-you-need-know
30. Containers
Ideal for cloud environments
with multiple services that
need to be orchestrated
All containers per Runtime
share the same kernel
Very easy to implement
Infrastructure as Code
35. Resiliency
The ability for a software
system as a whole to
recover from failures
Failure of individual components is
expected and the system as a whole
can continue to work (with service
degradation) and is able to recover
All software components need to be
designed for resiliency. Resilient
systems often prefer Availability and
Partition Tolerance if possible.
37. High Availability
Typically refers to the
Uptime of a whole
system or service
Achieved with redundancy for
hardware and software
components
All components in a single
system must be designed for
high availability
38. Load Balancing Data Ingestion Stream Analytics Data Storage
Batch Processing
High Level IoT Data
Processing Architecture
40. Standardized at OASIS
Designed for reliable communication over unreliable
networks (e.g. mobile networks)
Publish / Subscribe architecture
Data agnostic
Internet of Things Messaging Protocol
MQTT
42. MQTT
Use Cases
Push Communication
Reliable Communication
over unreliable networks
Constrained Devices
Low Bandwidth and
High Latency
Enterprise backends to
mobile communication
43. MQTT in IoT
architectures
Massively scalable - hundreds
of thousands messages per
second
Bi-directional,event-driven
push communication
Very low protocol footprint,
suitable for millions of devices
50. Apache Spark
Framework for batch processing of large datasets
Supports streaming via micro-batch processing
Can process terabytes of data
51. Akka
Framework for highly concurrent data processing
Provides an actor model
Suited for CPU intensive processing
52. Hadoop
Ecosystem for very large datasets
Distributed Filesystem HDFS
Scales to hundreds of nodes and terabytes of data
53. HiveMQ
MQTT broker built for enterprise applications
Extensive plugin system
Scales to > 10 million of concurrent connections
OSS Community Edition available
Built for High Availability and used by 120+
of the largest IoT deployments in the world
54. HiveMQ MQTT Client
Java based MQTT library
Developed by HiveMQ and BMW Car-IT
Built for devices and backends
Open Source (Apache 2)
Extremely fast and low overhead
66. “A distributed system is one in
which the failure of a computer
you didn’t even know existed
can render your own computer
unusable.“
All components
must be resilient
Leslie Lamport, Turing Award Winner 2013