This document provides an overview of how to apply big data analytics and machine learning to real-time processing. It discusses machine learning and big data analytics to analyze historical data and build models. These models can then be used in real-time processing without needing to be rebuilt, to take automated actions based on incoming data. The agenda includes sections on machine learning, analysis of historical data, real-time processing, and a live demo.
DevEX - reference for building teams, processes, and platforms
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real Time Streaming Analytics
1. HOW TO APPLY BIG DATA ANALYTICS
AND MACHINE LEARNING TO
REAL TIME PROCESSING
Kai Wähner
kwaehner@tibco.com
@KaiWaehner
www.kai-waehner.de
LinkedIn / Xing Please connect!
5. Key Take-Aways
Insights are hidden in Historical Data on Big Data Platforms
Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Event Processing uses these Models (without Rebuilding) to take Action in Real Time
52. 52
Apache Spark – Focus on Analytics
http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/
http://fortune.com/2016/09/09/cloudera-spark-mapreduce/
http://www.ebaytechblog.com/2016/05/28/using-spark-to-ignite-data-analytics/
http://www.forbes.com/sites/paulmiller/2016/06/15/ibm-backs-apache-spark-for-big-data-analytics/
“[IBM’s initiatives] include:
• deepening the integration between Apache Spark and
existing IBM products like the Watson Health Cloud;
• open sourcing IBM’s existing SystemML machine
learning technology;
64. Visual IDE (Dev, Test, Debug)
Simulation (Feed Testing, Test Generation)
Live UI (monitoring, proactive interaction)
Maturity (24/7 support, consulting)
Integration (out-of-the-box: ESB, MDM, etc.)
Library (Java, .NET, Python)
Query Language (often similar to SQL)
Scalability (horizontal and vertical, fail over)
Connectivity (technologies, markets, products)
Operators (Filter, Sort, Aggregate)
What Streaming Alternative do you need?
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts
69. Streaming Analytics
to operationalize insights
and patterns in real time
without rebuilding the models
Stream
Processing
H20
Open
Source
R
TERR
Spark
MLlib
MATLAB
SAS
PMML
Real Time Close Loop: Understand – Anticipate – Act
79. Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Live Monitoring
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS
Data
Sheets
BI
Data
Scientists
Cleansed
Data
History
Data Discovery
Analytics
Enterprise Service Bus
ERP MDM DB WMS
SOA
Data Storage
InternalData
IntegrationBus
API
Event Server
Predictive Maintenance
Spark
Big Data
Machine Data
(Sensors,
Weather Data, …)
Take Action
(Stop Machine, Send Mechanic, …)
Find Insights
(Sensor Behaviour,
Hardware Issues, …)
ERP System
(Transaction History, Production Volume)
2
80. Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Live Monitoring
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS
Data
Sheets
BI
Data
Scientists
Cleansed
Data
History
Data Discovery
Analytics
Enterprise Service Bus
ERP MDM DB WMS
SOA
Data Storage
InternalData
IntegrationBus
API
Event Server
Complete Big Data Architecture
Spark
Big Data
84. Upon event trigger, populate Spotfire RCA template; email responsible engineer
Put model into Action
85. 1. Rules / models pushed from
Spotfire
2. Data streams into StreamBase
3. Data evaluated in real-time
4. Spotfire RCA on trigger
Other notifications available
Live view on streaming data
Streambase – from Big Data to Fast Data
88. Responsible engineer clicks URL to launch Spotfire Root Cause Analysis; diagnose issue
Compare Live Data with Historical Data to make Human Decision
90. Key Take-Aways
Insights are hidden in Historical Data on Big Data Platforms
Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Event Processing uses these Models (without Rebuilding) to take Action in Real Time