Presentation I gave at the IBM Big Data Developers meetup group in San Jose, CA.
There is also a video available of this talk at:
https://www.youtube.com/watch?v=TSt49yPBmW0&t=7m59s
The Ultimate Guide to Choosing WordPress Pros and Cons
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @ IBM Big Data Developers Meetup
1. Big Data Developers - Virdata, Internet of Things #virdata
Big Data & IoT: lessons learned
Big Data Developers Meetup, San Jose, CA - June 5, 2014
#virdata | @nathan_gs
2. Big Data Developers - Virdata, Internet of Things #virdata
Who is Technicolor?
Domains
● Media Services
● Entertainment Services
● Connected Home
● Emerging Ventures
● Technology & Innovations
Who We Are
Technicolor, a worldwide technology leader in the media and entertainment sector, is at the
forefront of digital innovation. Our world class research and innovation laboratories and our
creative talent pool enable us to lead the market in delivering advanced services to content
creators and distributors. We also benefit from an extensive intellectual property portfolio
focused on imaging and sound technologies, supporting our thriving licensing business.
3. Big Data Developers - Virdata, Internet of Things #virdata
Virdata – OUR CORE CLOUD SERVICES
Device
Monitoring
Device
Management
Big Data
Analytics
Big Data
Queries
Application
Monitoring
Virdata Cloud APIs
MQTT
MQTT
MQTT
MQTT
M
Q
TT
MQTT
4. Big Data Developers - Virdata, Internet of Things #virdata
Virdata - 2 COMPONENTS: A CLOUD & A LIBRARY
★ Elastic and Scalable cutting edge technologies
★ API’s for different types of information/data consumption
★ Cloud agnostic thru self build monitoring tools
★ Running on both public & private cloud infrastructure
★ Bi-directional messaging
★ High performance brokers architecture
★ Lightweight and portable library
★ Multiple programming languages
★ Supports multiple transport protocols
★ Available for all HW and OS
★ Supports any type of data in any format/syntax
★ Payload is compressed and encrypted
5. Big Data Developers - Virdata, Internet of Things #virdata
Virdata - SERVICE ARCHITECTURE
millions of simultaneous persistent bi-directional connections
millions of messages per second
Real-time Complex Event Processing
Distributed Pub/Sub Messaging
Historical Data Archiving Pre-computed Data
In-Memory
real-time Data
REST API
Launch Queries - Launch Jobs
INTEGRATION
CUSTOMIZATION
NOC, OPERATIONS, MGMT REPORTS, TRENDS
ANALYTICS
6. Big Data Developers - Virdata, Internet of Things #virdata
Virdata - VERTICAL INDUSTRIES
AUTOMOTIVE
● Fleet Management
● Insurance
● Emergency Services
UTILITIES
● Remote Meter Management
● Monitor Energy Consumption
● Optimize Subscription Plan
CONSUMER ELECTRONICS
● Monitoring & Management
● Upsell Services
● Enhanced End User Experience
CUSTOMER CARE
● Monitor Device & Application
● One Button Care
● Call Avoidance
RETAIL
● Geo-location Based Adverts
● Heat Mapping
● Individualized Offering
HEALTH
● Promote Patient Independence
● Time-Series Analysis
● Pro-active Responses
7. Big Data Developers - Virdata, Internet of Things #virdata
Live Demo
Contact us for a live demo at info@virdata.com or virdata.com.
8. Big Data Developers - Virdata, Internet of Things #virdata
Connected “Things”
9. Big Data Developers - Virdata, Internet of Things #virdata
Huge variety in devices and OSs.
10. Big Data Developers - Virdata, Internet of Things #virdata
Virdata Client Libraries
12. Big Data Developers - Virdata, Internet of Things #virdata
Northbound and Southbound API
Northbound API = Cloud API
● Messaging API
○ REST
○ PUB/SUB
○ MQTT
○ JMS
● Data Processing API
○ SQL
○ JobAPI
○ Query/REST
Southbound API provided at the device
level
13. Big Data Developers - Virdata, Internet of Things #virdata
Integration of Virdata into IBM BlueMix
Objectives
• Show the strengths of the Virdata Internet of Things platform
• Scalability to supports millions of connected devices
• Real-time and historical data processing
• Cloud API’s powering new data drives services across vertical markets
• Demonstrate the power of the IBM BlueMix solution
• Rapid development and deployment of new applications
• Platform as a Service marketplace
• Highlight the value of combining both
• Internet of Things platform as a service
Use-case
• Virdata provides real-time car data
• App acts upon car trouble codes
• Invokes manufacturer analytics service
• Initiates recommended actions, e.g. through
Maximo workflow service
• Schedules car dealer appointment
• Informs the car driver
14. Big Data Developers - Virdata, Internet of Things #virdata
Messaging & Broker
15. Big Data Developers - Virdata, Internet of Things #virdata
Messaging Architecture: Device to Platform
Protocol
Adapter
Protocol
Adapter
Protocol
Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data
Processing
API
State
State
State
16. Big Data Developers - Virdata, Internet of Things #virdata
Messaging Architecture: Device to Device(s)
Protocol
Adapter
Protocol
Adapter
Protocol
Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data
Processing
API
State
State
State
17. Big Data Developers - Virdata, Internet of Things #virdata
Messaging Architecture: Large Fan Out
Protocol
Adapter
Protocol
Adapter
Protocol
Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data
Processing
API
State
State
State
18. Big Data Developers - Virdata, Internet of Things #virdata
Horizontally scalable
… and elastic as well.
Messaging
19. Big Data Developers - Virdata, Internet of Things #virdata
Persistent connections
Broker
20. Big Data Developers - Virdata, Internet of Things #virdata
Real-time bidirectional communication
21. Big Data Developers - Virdata, Internet of Things #virdata
MQTT
Pub/Sub
Protocol Adaptor
22. Big Data Developers - Virdata, Internet of Things #virdata
MQTT: QoS levels
QoS 0: best effort
QoS 1: at least once
QoS 2: Exactly once
Protocol Adaptor
25. Big Data Developers - Virdata, Internet of Things #virdata
Message passing
Storm
26. Big Data Developers - Virdata, Internet of Things #virdata
Stream/Message partitioning, as well as grouping.
Storm
27. Big Data Developers - Virdata, Internet of Things #virdata
Storm
Nimbus Zookeeper
Supervisor
Worker Node
Executer
Executer
Executer
Supervisor
Worker Node
Executer
Executer
Executer
Supervisor
Worker Node
Executer
Executer
Executer
28. Big Data Developers - Virdata, Internet of Things #virdata
Storm
Tuple
Stream
Field 1 | Field 2 | Field 3| Field 4 | Field 5
TUPLE
TUPLE TUPLE TUPLE TUPLE
STREAM
29. Big Data Developers - Virdata, Internet of Things #virdata
Storm
Spout
Bolt
SPOUT BOLT
T
T T T
T T T BOLT
T T T
T T T
T T T BOLT API
30. Big Data Developers - Virdata, Internet of Things #virdata
Storm
Grouping
S
B
B
B
B
B
GROUPING GROUPING
32. Big Data Developers - Virdata, Internet of Things #virdata
Events used to manipulate the master data.
Events: Before
33. Big Data Developers - Virdata, Internet of Things #virdata
Today, events are the master data.
Events: After
34. Big Data Developers - Virdata, Internet of Things #virdata
Let’s store everything.
Data System
35. Big Data Developers - Virdata, Internet of Things #virdata
Data is Immutable.
Data System
36. Big Data Developers - Virdata, Internet of Things #virdata
Data is Time Based.
Data System
37. Big Data Developers - Virdata, Internet of Things #virdata
The data you query is often transformed, aggregated, ...
Rarely used in its original form.
Query
38. Big Data Developers - Virdata, Internet of Things #virdata
Query = function ( all data )
Query
39. Big Data Developers - Virdata, Internet of Things #virdata
Functional computation, based on immutable inputs, is
idempotent.
Batch Layer
40. Big Data Developers - Virdata, Internet of Things #virdata
Query: Number of cars living in each city
Car Location Timestamp
BMW 1 Antwerp 2008-10-11
Aston Martin Cologne 2010-01-23
BMW 2 Antwerp 2012-09-12
BMW 1 Cologne 2014-04-29
Location Count
Antwerp 1
Cologne 2
41. Big Data Developers - Virdata, Internet of Things #virdata
Query
All Data QueryPrecomputed
View
42. Big Data Developers - Virdata, Internet of Things #virdata
Layered Architecture
Batch Layer
Speed Layer
Serving
Layer
43. Big Data Developers - Virdata, Internet of Things #virdata
Layered Architecture
Spark C*
Incoming Data
*
Query
45. Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
Incoming Data
Spark C*
46. Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
The batch layer can calculate anything, given enough time...
Unrestrained computation.
47. Big Data Developers - Virdata, Internet of Things #virdata
Keep the data in its original format.
The batch layer stores the data normalized, the generated views are often, if not always denormalized.
Batch Layer
48. Big Data Developers - Virdata, Internet of Things #virdata
Horizontally scalable.
Batch Layer
49. Big Data Developers - Virdata, Internet of Things #virdata
Stores a master copy of the data set
Batch Layer
… append only
50. Big Data Developers - Virdata, Internet of Things #virdata
High Latency.
Let’s for now pretend the update latency doesn’t matter.
Batch Layer
52. Big Data Developers - Virdata, Internet of Things #virdata
In-memory storage
Spark
53. Big Data Developers - Virdata, Internet of Things #virdata
Advanced DAG execution engine
Cyclic data, in memory computing.
Spark
54. Big Data Developers - Virdata, Internet of Things #virdata
Multilanguage support, interactive shells
Scala, Java & Python
Spark
55. Big Data Developers - Virdata, Internet of Things #virdata
Write programs in terms of transformations on
distributed datasets.
RDD, are collections of objects, stored in RAM or on disk.
Are build through parallel transformations,
and are automatically rebuild on failure.
Spark
56. Big Data Developers - Virdata, Internet of Things #virdata
map
Spark: API
reduce
57. Big Data Developers - Virdata, Internet of Things #virdata
map
filter
groupBy
sort
union
join
leftOuterJoin
rightOuterJoin
count
fold
reduceByKey
groupByKey
Spark: API
reduce
cogroup
cross
zip
sample
take
first
partitionBy
mapWith
pipe
save
...
58. Big Data Developers - Virdata, Internet of Things #virdata
Spark Ecosystem
Spark
HDFS
Tachyon
Mesos
Spark
Streaming
Shark /
Spark SQL
GraphX MLlib Mahout
MR
v1
Blink
DB
Velox
YARN
59. Big Data Developers - Virdata, Internet of Things #virdata
Every iteration produces the views from scratch.
Batch Layer
60. Big Data Developers - Virdata, Internet of Things #virdata
Batch View Databases
We need a (read-only) database to store those views.
61. Big Data Developers - Virdata, Internet of Things #virdata
Example: the automotive market
Real Time Tracking
Engine Block Performance
Fleet Management
3rd
Party API integration
Integration with Informix
Big Data Visualization
3rd
Party Application Creation
BlueMix Platform as a Service
Process Integrations
The Open Source Route Enterprise Integration Bringing Analytics to the Data
62. Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
Data absorbed into Batch Views
Time
Now
We are not done yet…
Not yet absorbed.
Just a few hours of data.
64. Big Data Developers - Virdata, Internet of Things #virdata
Speed Layer
Spark C*
Incoming Data
C*
65. Big Data Developers - Virdata, Internet of Things #virdata
Stream processing.
Speed Layer
66. Big Data Developers - Virdata, Internet of Things #virdata
Continuous computation.
Speed Layer
67. Big Data Developers - Virdata, Internet of Things #virdata
Storing a limited window of data.
Compensating for the last few hours of data.
Speed Layer
68. Big Data Developers - Virdata, Internet of Things #virdata
All the complexity is isolated in the Speed Layer.
If anything goes wrong, it’s auto-corrected.
Speed Layer
69. Big Data Developers - Virdata, Internet of Things #virdata
You have a choice between:
● Availability
○ Queries are eventually
consistent
● Consistency
○ Queries are consistent
CAP
Consistency
Partition
Tolerance
Availability
70. Big Data Developers - Virdata, Internet of Things #virdata
Eventual accuracy
Some algorithms are hard to implement in real-time.
For those cases we could estimate the results.
78. Big Data Developers - Virdata, Internet of Things #virdata
Serving Layer
Spark C*
Incoming Data
C*
Query
79. Big Data Developers - Virdata, Internet of Things #virdata
Serving Layer
Random reads.
80. Big Data Developers - Virdata, Internet of Things #virdata
This layer queries the batch & real-time views and
merges it.
Serving Layer
81. Big Data Developers - Virdata, Internet of Things #virdata
Lambda Architecture
82. Big Data Developers - Virdata, Internet of Things #virdata
Lambda Architecture
The Lambda Architecture can discard any view, batch
and real-time, and just recreate everything from the
master data.
83. Big Data Developers - Virdata, Internet of Things #virdata
Mistakes are corrected via recomputation.
Write bad data? Remove the data & recompute.
Bug in view generation? Just recompute the view.
Lambda Architecture
84. Big Data Developers - Virdata, Internet of Things #virdata
Using a new schema?
No problem, keep your data, keep your input F, change your output.
Lambda Architecture
85. Big Data Developers - Virdata, Internet of Things #virdata
Data storage is highly optimized.
Lambda Architecture
87. Big Data Developers - Virdata, Internet of Things #virdata
Cloud Agnostic
Control Plane
88. Big Data Developers - Virdata, Internet of Things #virdata
IBM SoftLayer
Experiences & Observations
1. Smooth migration from SCE 2.2 to SoftLayer in 1 months time including:
■ Development of SoftLayer specific FOG abstraction layer expansion to
accommodate Virdata’s Devops tooling (CHEF)
■ Complete on-boarding of the Virdata Platform
■ Complete launch of simulation and emulation clusters
■ Very exhaustive and complete API
2. Very constructive and professional support throughout the complete on-boarding
process
3. Availability of bare metal seen as a differentiator
89. Big Data Developers - Virdata, Internet of Things #virdata
Cluster Management & Orchestration
Control Plane
RGOSSIP
90. Big Data Developers - Virdata, Internet of Things #virdata
Monitoring and Logging
Control Plane
92. Big Data Developers - Virdata, Internet of Things #virdata
Virdata - SERVICE ARCHITECTURE
millions of simultaneous persistent bi-directional connections
millions of messages per second
Real-time Complex Event Processing
Distributed Pub/Sub Messaging
Historical Data Archiving Pre-computed Data
In-Memory
real-time Data
REST API
Launch Queries - Launch Jobs
INTEGRATION
CUSTOMIZATION
NOC, OPERATIONS, MGMT REPORTS, TRENDS
ANALYTICS
93. Big Data Developers - Virdata, Internet of Things #virdata
Questions?
@virdata_iot | #virdata
@nathan_gs
94. Big Data Developers - Virdata, Internet of Things #virdata
Acknowledgements
I would like to thank Nathan Marz for writing a very insightful book, where the idea of the Lambda Architecture comes from.
Lambda: Big Data - Nathan Marz published at Manning
Lambda, Storm: A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van Landeghem at FOSDEM 2013
Spark: Apache Spark website
Spark: Apache Spark - the light at the end of the tunnel? - Michael Hausenblas, MapR at Data Science Day Berlin 2014
95. Big Data Developers - Virdata, Internet of Things #virdata
Thank you
virdata.com | +1 (937) 569 4220 | info@virdata.com
#virdata | @virdata_iot
@nathan_gs | nathan.bijnens@virdata.com