Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
COSMOS Data Analytics Architecture
1. From Intelligent Transportation in Madrid to Smart Homes
in Taipei: An IoT Data Analytics architecture applicable to
multiple real world use cases
Thursday, 23 June 2016
Adnan Akbar
Institute for Communication Systems (ICS)
5G Innovation Centre (5GIC)
University of Surrey, UK
Adnan.akbar@surrey.ac.uk
Joint work with:
Paula Ta-Shma, IBM Research
Michael Factor, IBM Research
Guy Hadash, IBM Research
Juan Sancho, ATOS
2. What is Internet of Things ?
• “Internet of Things is based on the vision of connecting everyday objects to internet to form a cyber-
physical system, where every object will be represented by its virtual representation enabling the
control of physical world remotely” (F. Mattern and C. Floerkemeier)
• Connecting Everyday Objects
– Physical things containing chips/ sensors
– capture and communicate all types of data
• Virtual Representation
• Control of Physical World
– interact with other devices, computing systems and the external environment, including people
Thursday, 23 June 2016
3. IoT Data Analytics
• More Data, More opportunities, But More Challenges for analyzing and extracting knowledge from
this data
Thursday, 23 June 2016
Which are the right set of tools ?
Which processing model
should be used to analyze
this data ?
Which analytic methods are
available to get more value
from this data ?
IoT Data
4. Which processing Model to use ?
Thursday, 23 June 2016
Batch Processing vs Event Processing or Real-time vs Historical
IoT Data
Batch
Processing
Event
Processing
Complex Event
Processing
Machine Learning
Statistical Methods
Hybrid
Solutions
5. Right combination of tools for IoT data ?
Thursday, 23 June 2016
Plethora of open source projects for storing and Processing Big data
SwiftSecor
Elasticsearch
6. Generic IoT Architecture – Data Flow
Thursday, 23 June 2016
Ingestion
1. Collect historical time series data
– Collect data from devices
– Aggregate into objects
– Index and/or partition
Secor
IoT
Swift
7. Generic IoT Architecture – Data Flow
Thursday, 23 June 2016
Historical Data Access and Analytics
Secor
Swift
2. Learn patterns in data
– May be time/location dependent
– Generate thresholds, classifiers etc.
8. Generic IoT Architecture – Data Flow
Thursday, 23 June 2016
Real-Time Data Analytics
IoT
Secor
CEP
Swift
3. Apply what was learned on
real time data stream
– Take action
9. Proposed Solution: A Lambda Architecture for IoT
1) Ingestion
2) Historical Data Analytics (Batch Processing)
3) Real-time Data Analytics (Event Processing)
Thursday, 23 June 2016
A generic IoT Analytics architecture
IoT
CEP
Secor
Swift
Green Flows: Real
time
Purple Flows: Batch
10. Use Case 1: Intelligent Transportation System for Madrid Council
• Problem
• Over 3000 traffic sensors deployed through city of Madrid
• EMT needs to staff control rooms where employees manually analyze Madrid traffic sensor output.
This can be slow and costly.
• Objective
• Improve customer satisfaction and reduce costs by responding more efficiently and quickly to real-
time traffic problems
• Approach
• Ingest data from up to 3000 sensors in to our architecture, learn patterns from historical data,
apply it in real-time data using CEP and React by alerting drivers, calling emergency vehicles,
rerouting buses, modifying traffic lights, etc
Thursday, 23 June 2016
Today Tomorrow
11. IoT Architecture – Madrid Traffic – Ingestion Flow
Aim: Collect historical timeseries data for analysis
– Continuously collect data from up to 3000 Madrid council traffic sensors via web service
• Data includes traffic speeds and intensities, updated every 5 mins
– Push the messages to Kafka
– Use Secor to aggregate multiple messages into a single Swift object
• According to policy, e.g., every 60 mins
• Possibly partition the data, e.g. according to date
• Convert to Parquet format
• Annotate with metadata, e.g., min/max speed, start/end time
– Index Swift objects according to their metadata using ElasticSearch
Secor
Swift
IoT
Thursday, 23 June 2016
12. IoT Architecture – Madrid Traffic – Data Access
Aim: Access data efficiently and cost effectively
– Store IoT data in OpenStack Swift object storage
• Open source, low cost deployment, and highly scalable
– Parquet data is accessible via Spark SQL
– Optimized predicate pushdown
• Custom Spark SQL external data source driver
• Uses object metadata indexes
• Searches for Swift objects whose min/max values overlap requested ranges
Get all data for morning traffic:
SELECT codigo, intensidad, velocidad FROM
madridtraffic
WHERE tf >= '08:00:00' AND tf <= '12:00:00'
Brute force method
13245 Swift requests
Optimized predicate pushdown
616 Swift requests
21.5 times improvement
Swift
Thursday, 23 June 2016
13. IoT Architecture – Madrid Traffic – Machine Learning
Aim: Learn to differentiate between ‘good’ and ‘bad’ traffic
– Depends on context
• Time (morning/evening), Day (weekday/weekend)
• Location
– Use Spark MLlib k-means clustering
– Produce threshold values for real-time decision making
– Re-run algorithm when quality of clusters decreases
• Can use silhouette index to measure quality Swift
Thursday, 23 June 2016
14. IoT Architecture – Madrid Traffic – Machine Learning
Event Detection:
• Use Spark MLlib k-means
clustering to separate data
into 2 clusters
• Find the midpoint between
the 2 cluster centres
• Use this midpoint to
generate the thresholds
• Repeat for each context e.g.
time period (morning,
afternoon, evening, night)
Anomaly Detection:
• Use a single cluster and
define an anomaly to be
further than a certain
distance from the cluster
centre
Morning Traffic on Weekdays
Thursday, 23 June 2016
15. IoT Architecture – Madrid Traffic –
Real Time Decision Making
Aim: Respond in real time to traffic conditions
– Use Complex Event Processing (CEP) approach
• Rule based
• Process events record by record
• CEP rules are typically defined manually but in many cases it is difficult
to get them right
– We automate this process and make it smart
CEP
IoT
Prediction
Proactive approach:
• Use Spark streaming
linear regression to
predict traffic behavior
(e.g. speed, intensity) for
near future
• Apply CEP on predicted
data
• Respond pro-actively to
predicted events such as
traffic congestion
– e.g. EMT can
proactively re-
route buses
Thursday, 23 June 2016
16. Use Case 2: Taipei Smart Homes
Thursday, 23 June 2016
Smart plugs
Home Gateway
Real-time monitoring, control, and report of home
appliances energy usage
• Taipei test scenario
comprised of fifty 50
volunteer
households
• Installed with Smart
Energy kit (incl.
home gateway,
smart plugs, and
smart strips)
• Real-time Energy
usage
Goal: Real time Monitoring of Appliances in order to detect anomalies
17. Taipei Smart Homes
• Example of Anomalies
• Short circuit of a device
• Devices being operated at unusual times
• An Anomaly at night might not be an anomaly at daytime
• Same Architecture is used for monitoring Energy data
• Only difference lies in the type of Analytics and Rules
• Historical Data Analytics
• Learn normal patterns from historical data
• Use CEP rules to detect the deviation from normal
• Different Models for different context
• Time of a day (Morning, Afternoon, Evening, Night)
• Weekday or weekend
• Winter or summer
• Rainy or sunny
Thursday, 23 June 2016
18. Real-Time Anomaly detection using COSMOS Data Analytics Architecture
CEP
Secor
Swift
Node-
Red
7
……
PC/monitor
……
istrip
Refrigerator
sensor
Fan / Lighting
Real-time warning
messages
Thursday, 23 June 2016
COSMOS Data Analytics
19. Our Architecture Applies to Many IoT Use cases
• Healthcare
• Healthcare patient monitoring/alert/response
• Logistics
• Monitoring of sensitive goods
• Social Media
• Event detection if high number of posts detected as compared to normal behavior
• Insurance
• Driver behavior and location monitoring
• Transportation
• Connected vehicles, engine diagnostics, automated service scheduling
Thursday, 23 June 2016
20. COSMOS
Funding: EU FP7 at level of 2PY x 3 years
Started: Sept 2013
Coordinator: ATOS
Technical partners: University of Surrey, IBM, NTUA, Siemens, ATOS
Use Case Partners: Hildebrand/Camden, EMT Madrid Bus Transport/Madrid Council, III Taiwan – Smart
Cities use cases
Project Vision: Enable ‘things’ to interact with each other based on shared experience, trust, reputation etc.
Thursday, 23 June 2016
21. Thank you.
Any Questions ?
Thursday, 23 June 2016
For more details, Email: adnan.akbar@surrey.ac.uk
Editor's Notes
I will start with the brief introduction about IoT, I will not go into details as I assume that everyone here is quite familiar with the term.
You will find many defs of IoT but this one is my personal fav,It has three main parts, first one is connecting everyday objects which is any physical entity. It can be your shoe, your fridge or your bus.
Where every object will have its own virtual representation where its properties will be exposed using sensors.And the last but not the least is control of physical world. In order to control physical order, you need to understand the context and meaning from the data measured by these objects.
We have heard in last 2 days that no of connected devices is increasing and so that the data generated by these devices. Data is not only increasing in size but complexity and data is of no value until high level knowledge is extracted from it in order to control the physical world.
So what really is the Internet of Things? It is made up of physical objects (“things”) that have chips, sensors embedded in them that allow the sensing, capturing and communication of all types of data. These devices are then linked through both wired and wireless networks to the Internet. Advanced “things” have actuators embedded into them as well, giving them the capability to interact with other devices, computing systems and the external environment, including people.
IoT takes this one step further – Actuation
Quantity of data and quality of solution (actuation)
Sensors have existed for a long time, think how many sensors you need to send a rocket into space, but today this is not rocket science, what is happening is that sensors are becoming commodities, leading to adoption on a massive scale, enabling new applications to be possible e.g. placing large numbers of sensors in agricultural fields to measure soil humidity and nutrient levels
The advent of IoT has resulted in a trend towards more innovative and automated applications.
Data is not only increasing in size but in complexity as well and data itself is of no value until high level knowledge is extracted from it. And when we talk about extracting high level knowledge , there are three main questions surrounding it.
But in IoT data is generated in the form of real-time events which form complex patterns where each complex pattern represent a unique event. These unique events must be interpreted with minimal time latency in order to apply them for decision making in the context of current situation. The need for processing, analyzing and inferring from these complex patterns in near real-time forms the basis of a research area called Complex Event Processing
(CEP) [4]. The Research area of CEP includes processing, analyzing and correlating event streams from different data sources to infer more complex events in near real-time
Kafka, In our architecture, we have used apache Kafka as the message broker for real-time generated events. It is also an open source tool for real-time publishing and subscribing of messages or data. It provides a scalable architecture for high throughput data feeds with very low latency. What makes kafka
unique on other available systems is its persistent nature to hold the messages for a set amount of time in the form of a log (ordered set of messages).
Secor is an open source tool which takes multiple msgs from kafka topic,,aggregates them together, and stores them into object storage. Up Until now it only supported amazon s3 as its object storage but we added support for open stack swift as well.
Openstack swift:
The OpenStack Object Store project, known as Swift, offers cloud storage software so that data can be stored and retrieved efficiently with a simple API. It's built for scale and optimized for durability, availability, and concurrency across the entire data set. Swift is ideal for storing unstructured Iot data that can grow without bound.
Parquet: Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
and elastic search definitions
As its IoT data, like traffic readings from Madrid, it will have json object, time stamp. Ids etc. semi strucrted data. In order to use spark sql, we want to store it in parquet form.
Apache Spark: Apache Spark™ is a fast and general engine for large-scale data processing.
Spark SQL is Apache Spark's module for working with structured data.
Object storage (openstack swift) as a long term repository for IoT data
Scalable and relatively low cost
By adding metadata to describe what is contained in each object and metadata search we can access it efficiently
Databases are often overkill for what is needed by analytics
Secor works according to defined policy. We can define to create a new object when the size reaches 1 MB, or alternatively time based policy i.e. to create a new object every 60 mins.
That’s how the swift object look like.actually it’s a flat name space but the object name has slashes inside them and this is wht basically the partition data looks like, systems such as hive and is supported by Spark SQL.
We are using a parquet data format which is nice for IoT data, you can do column based compression. Or if you are interested in reading only specific columns, you can do it in parquet format. We extended secor in order to support it for converting data in parquet format. We also extended secor by allowing annotation of meta data with objects. In swift when you create an object, you can also annotate it with meta data.
Can depend on other elements of context like weather etc.
Note: table is for one location only
Same architecture can be used to detect events which in this case will be good and bad traffic. And the same architecture can be used to detect anomalies which might be an accident or a congestion. For detecting anomaly, we use a single cluster and if the new point point is further away from a centre, we classify it as an anomaly.