Rolls Royce Trent 1000
Analytics data collected in
One fan blade manufacturing
-> 0.5 TB of data
Real-time data transmitted
back to RR when planes are
From autonomous mining
trucks to locomotives, they
have sensors monitoring
fuel, idle time, location for
Predictive maintenance has
saved millions from timely
fuel pump replacement to
adjusting ship hull cleaning
intervals in their marine
● What type of data?
● How fast you need results?
● How much data to keep?
● Historical, real-time, or predictive?
● Cloud or fog / edge analytics?
● Time related data
○ Time series processing
■ Energy consumption with time
■ Failure prediction
■ Specialized DBs - OpenTSDB
● Location data
○ GPS / iBeacons
○ Used in agriculture
■ Detect soil moisture, crop growth
■ Manage irrigation equipment
○ Traffic planning
■ Monitor vehicle speeds, location for better route suggestions
○ Geospatial optimized processing engines - GeoTrellis
Do we need the results instantaneously?, or a few seconds
delay okay?, or else, results after several minutes or more is
● The most often used processing mode in IoT
○ Immediately take action on some event occurring with the source
■ Send out alerts from a temperature sensor hitting a limit
■ Notification in a car dashboard of low tire pressure
● Generating instant alerts and information based on the data sent by
sensors, requires stream processing. Process events one by one in
real-time to match to a predefined set of rules.
○ Apache Storm as a stream processing engine
■ Scalable and fault tolerant
○ For advanced pattern matching, a full fledged CEP engine can be
used, e.g. WSO2 CEP, Esper etc..
● Long term statistics generations, a batch processing system can be
used: Apache Hadoop, Apache Spark
○ Average temperature in a room in the last month
○ Total power usage of the house in the last year
● Interactive analytics with technologies such as Apache Drill and
indexed storage systems such as Couchbase.
● Most often, we may need to mash-up both batch analytics results with
○ Comparing a long term statistics result with incoming real-time
events for alerts etc..
● Batch operations can be brought together with an indexing system for
real-time analytics to lookup data instantly when required
○ Apache Lucene, WSO2 DAS Analytics / Event Tables
● IoT devices generate high volume or different types of data
● We can decide to process right away when we receive it, and discard it,
or else, keep it for more detailed processing
● Big Data stores gives us the option to store huge amounts of data as
● Purge the data, after the raw data is no longer required
● Hindsight can be achieved by processing historical data, and
understanding what has happened.
○ Batch processing systems such as Apache Hadoop and Apache
Spark is used in this area
○ Data visualization with dashboards, showing related data together
● Insight would be understanding what is happening now
○ Achieved with real-time processing systems
○ Scenario: How are my jet engines performing right now
● Foresight is predicting what is going to happen
○ Achieved with machine learning systems such as Apache Mahout,
Apache Spark MLlib, Microsoft Azure Machine Learning, WSO2 ML
○ Scenario: Predictive maintenance -> time to change specific parts
in my car, service scheduling on an aeroplane
● IoT will mean, naturally large amounts of data created, thus large
amount of computation resources are required
● Typical scenario of a centralized analytics server for all devices may not
be feasible all the time
○ Centralized analytics hardware may not be scalable for all the
thousands of devices getting added frequently
○ The network communication will get flooded with analytics chatter
when the device count increases
● Solution: edge analytics, a.k.a, fog analytics
○ Some of the analytics operations are offloaded to the end device
itself or to an immediate gateway, for doing most or some of the
analytics operations required. This creates a scalable infrastructure
for device management in the IoT ecosystem.