SlideShare a Scribd company logo
1 of 35
© 2015 IBM Corporation
IBM Analytics
Spark Analytics with Informix
Pradeep Natarajan, IBM
@pradeepnatara
Agenda
 Context: Informix / Spark high-level value propositions
 IoT use-cases
 Challenges
 Prototype and implementation
 What’s next?
2
Informix to Spark
Context
3
Informix for Internet of Things
• Optimized Database for environments, such as:
• Low or no database administration
• Embedded: gateways, routers
• Very high transaction rates and uptime characteristics
• Widely deployed in the retail sector, where the low administration
overhead makes it essential for in-store deployments.
• Informix supports key Internet-of-Things solutions
• Native support for time-based data: Timeseries
• Small footprint
• Low administration requirements
4
Apache Spark
 Speed
 Ease of use, Unified Engine
 Sophisticated analytics
5
Apache Spark
• Cluster computing framework
• Fast and general engine for large-scale data processing
• In-memory computing
6
Apache Spark Streaming
 Extends Spark for big data stream processing
ROW DATA STREAM Processed Data
Distributed Stream
Processing System
 Scaling, low latency, Recovery
 Integrate Batch and interactive processing
7
Informix to Spark
Use cases
8
Real-Time Operational Database
Streaming Analytics with Spark
 Applications that drive business have positioned relational
databases at the center of operations.
 To continue their success, businesses need to use streaming
analytics to gain real-time insights into their operations and take
actions to optimize outcomes.
 Infrequent batch analytics on “stale” data losing competitive edge.
Increasing demand for real-time analytics to stay in the lead.
9
SENSE -> ANALYZE -> ACT
 As data ages, business value diminishes.
 Sense → Analyze → Act in seconds/ milliseconds, not days
or weeks
Sens
e
Analyze
ActSens
e
Analyz
e
ActDays Days
Seconds
Days
Seconds
Batch
Real-
time
10
Connected Vehicles Energy & Utilities Health Care
Driving behavior matching Power consumption
Continuously streaming data from IBM Informix to analytics platform
Streaming analytics service sample scenarios
…
How does power
consumption
correlated
between House
A,B,C D?
Detect abnormal
patterns in ECG
series
Detect the anomaly
driving behavior
cause higher fuel
consumptions
Increasing demand for real-time analytics
Finance
Detect the anomaly
by price change rate
in time window
Steady price change
Vibration in short
period
Market Manipulation Detection Heart Attack Prevention
Cloud Service Operation
Detect the system
resource peak and
valley, correlates
with workload
information
Server health diagnosis
11
Real-time analytics - Industry
 Information technology – Systems & Network monitoring
 IoT - sensor data analytics and processing
 Financial transactions – authentication, fraud detection,
validation
 Inventory control – consumer trends and demands
 Website analytics – ad targeting
 Many others….
12
Real-time analytics - applications
 Data analyzed as it arrives – data in motion
 Simple: Monitoring, alerts/reports, statistics
 Complex: predictive analytics (regressions,
machine learning, etc…), K-means clusters
(classification, anomaly detection)
 Many store events as well, combine with later
batch processing.
 Immediate actions possible.
13
Informix to Spark
Challenges
14
Exploring data and discovering
actionable business insights
 The problem - Often users will not know what exact
analytics they want to do
 Difficult to justify cost/risk of a complex solution without
specific business value
 Need to reduce the cost/risk of adding real-time data
analytics pipeline to application architecture
 Let data scientist explore data to find useful data analytics
without interfering with existing business.
15
We're running an Informix database. How
to incorporate real-time analytics into our
application architecture?
Application
Server
Database
16
Out-dated approach - requires additional complexity
Increased risk and cost.
Application Server
Additional
Component
Additional
Component
17
Informix to Spark
Prototype
Implementation
18
Real-Time Operational Database
Streaming Analytics with Spark
 Newly prototyped feature for the Informix database.
 Enables Informix customers to stream data added to their
database in real-time via MQTT, which can then be
consumed by an analytics platform such as Apache Spark.
19
Informix MQTT Streamer – Enable real-
time analytics pipeline which drastically
reduces complexity, cost and risk
20
How is it implemented?
 Uses Informix Virtual-Index Interface (VII)
 VII allows us to write UDRs that will be triggered
whenever certain SQL statements are executed
 This is typically used to create indexes for custom
data types. Instead, we use it to write data to a
socket during INSERT/UPDATE statements
VII UDR:
Publish to
MQTT broker
MQTT broker
21
Installation and basic usage
 Open Sourced!
 Available on github –
https://github.com/IBM-IoT/InformixSparkStreaming
 Run install script
 Add the streaming index to the column whose values
you want to stream
create index stream on table(col1, col2) USING
streaming_index;
22
The Nitty gritty
• Installed into Informix is a set of custom UDRs that convert
data into MQTT messages and sends them to a specified
address
• Virtual Table Indexes detect data insert/update/deletes as
they happen and trigger the messages to be sent
• Once in an MQTT broker, almost anything can consume it
– MQTT clients available for most programming languages (include
Java for Apache spark)
• Spark can analyze the data, compare it to historical data,
use streaming k-means algorithms to determine changes
in the data
The Nitty gritty continued
 Once installed, the custom “streaming_index” index type
will be available for use.
 Running the “create index” command and specifying to use
the “streaming_index” index type will run the code in the
custom UDRs that will push the data via MQTT.
 Then, whenever you run the INSERT statement on the
column that you created the streaming index on, the data
that you inserted will automatically be published to an
MQTT broker.
 See the “IBM Informix Virtual-Index Interface Programmer's
Guide” for more details.
24
In-depth
 Does the prototype work for Temp. tables?
 No specific index-related restrictions to temp. tables
 Do we lock the tables?
 The VII will lengthen the amount of time a lock is held
 Future item - multiple concurrent writers to a per-table
queue, flushed asynchronously by a separate thread
 Would this work for multi-nodes (sharding)?
 The current prototype is really delegating this to Spark,
where multiple input streams could be merged into one
25
In-depth
 Installs in seconds
 No need to upgrade database
 No need to restart database server
 Can be installed and activated on a live production
database!
 Minimal interference with existing business
application
26
Informix to Spark
Demo
27
Heart To Spark
• Demonstration for real time streaming of data from
the Informix engine into a message broker for
digestion by one or more services
• Simulates IOT data from a heart rate monitor
• Watches for trends in heart rates
– Poor health/stress can cause a rise in baseline heartrate
which is measurable
• Uses Spark Analytics to determine baseline heartrates
and plots the trend (heartrate rising, steady, or falling)
• Graphing tools in browser show us a view of the data
Demo - Installation
30
IOT devices send data into
the Informix server
Data Streams from Informix
into an MQTT broker
From MQTT Data is
streamed into Spark for
real-time Analysis
Results from both Informix
and Spark available to the
end user
Overview
Not limited to Apache Spark
 Can be used by any application/platform that can
consume TCP socket data.
 IBM Infosphere Streams
 Apache Storm
 Custom applications (most programming
languages have MQTT libraries)
 Many, many others.
32
Informix to Spark
What’s next?
33
Endless possibilities
 Check out Apache Spark for more information
about analytics and machine learning
 http://spark.apache.org/
 Learn more about Machine Learning and its
potential
 https://www.coursera.org/learn/machine-learning
 Contact IBM Informix
34
Questions?
Pradeep Natarajan
@pradeepnatara
3535

More Related Content

What's hot

Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices MeshRed Hat Developers
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureMats Johansson
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...DataStax Academy
 
#PCMVision: VMware NSX - Transforming Security
#PCMVision: VMware NSX - Transforming Security#PCMVision: VMware NSX - Transforming Security
#PCMVision: VMware NSX - Transforming SecurityPCM
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
Choosing the right platform for your Internet -of-Things solution
Choosing the right platform for your Internet -of-Things solutionChoosing the right platform for your Internet -of-Things solution
Choosing the right platform for your Internet -of-Things solutionIBM_Info_Management
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...DataWorks Summit
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareData Con LA
 
Io t world_2016_iot_smart_gateways_moe
Io t world_2016_iot_smart_gateways_moeIo t world_2016_iot_smart_gateways_moe
Io t world_2016_iot_smart_gateways_moeShawn Moe
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringDatabricks
 
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache KafkaTransform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache KafkaPrecisely
 
Data Centric Transformation in Telecom
Data Centric Transformation in TelecomData Centric Transformation in Telecom
Data Centric Transformation in TelecomDataWorks Summit
 
Blockchain and Apache NiFi
Blockchain and Apache NiFiBlockchain and Apache NiFi
Blockchain and Apache NiFiTimothy Spann
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 

What's hot (20)

Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices Mesh
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
 
#PCMVision: VMware NSX - Transforming Security
#PCMVision: VMware NSX - Transforming Security#PCMVision: VMware NSX - Transforming Security
#PCMVision: VMware NSX - Transforming Security
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
 
Choosing the right platform for your Internet -of-Things solution
Choosing the right platform for your Internet -of-Things solutionChoosing the right platform for your Internet -of-Things solution
Choosing the right platform for your Internet -of-Things solution
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Io t world_2016_iot_smart_gateways_moe
Io t world_2016_iot_smart_gateways_moeIo t world_2016_iot_smart_gateways_moe
Io t world_2016_iot_smart_gateways_moe
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data EngineeringSuccessful AI/ML Projects with End-to-End Cloud Data Engineering
Successful AI/ML Projects with End-to-End Cloud Data Engineering
 
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache KafkaTransform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
 
Data Centric Transformation in Telecom
Data Centric Transformation in TelecomData Centric Transformation in Telecom
Data Centric Transformation in Telecom
 
Blockchain and Apache NiFi
Blockchain and Apache NiFiBlockchain and Apache NiFi
Blockchain and Apache NiFi
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 

Viewers also liked

IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
 
IBM Informix dynamic server and websphere MQ integration
IBM Informix dynamic server and websphere MQ  integrationIBM Informix dynamic server and websphere MQ  integration
IBM Informix dynamic server and websphere MQ integrationKeshav Murthy
 
Realtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingSantosh Sahoo
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web ServiceSpark Summit
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 

Viewers also liked (6)

IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
IBM Informix dynamic server and websphere MQ integration
IBM Informix dynamic server and websphere MQ  integrationIBM Informix dynamic server and websphere MQ  integration
IBM Informix dynamic server and websphere MQ integration
 
Realtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark Streaming
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 

Similar to Spark Analytics with Informix

New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 
Informix IWA: Architectural options
Informix IWA: Architectural optionsInformix IWA: Architectural options
Informix IWA: Architectural optionsKeshav Murthy
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Hitachi streaming data platform v8
Hitachi streaming data platform v8Hitachi streaming data platform v8
Hitachi streaming data platform v8Navaid Khan
 
Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Navaid Khan
 
Hitachi Streaming Data Platform
Hitachi Streaming Data PlatformHitachi Streaming Data Platform
Hitachi Streaming Data PlatformNavaid Khan
 
3 reasons to pick a time series platform for monitoring dev ops driven contai...
3 reasons to pick a time series platform for monitoring dev ops driven contai...3 reasons to pick a time series platform for monitoring dev ops driven contai...
3 reasons to pick a time series platform for monitoring dev ops driven contai...DevOps.com
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - ThompsonProlifics
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsAerospike, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsRobert Greiner
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Motadata brochure
Motadata brochureMotadata brochure
Motadata brochureRajDodiya4
 
IoT and the Oil & Gas industry at M2M Oil & Gas 2014 in London
IoT and the Oil & Gas industry at M2M Oil & Gas 2014 in LondonIoT and the Oil & Gas industry at M2M Oil & Gas 2014 in London
IoT and the Oil & Gas industry at M2M Oil & Gas 2014 in LondonEurotech
 

Similar to Spark Analytics with Informix (20)

New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Informix IWA: Architectural options
Informix IWA: Architectural optionsInformix IWA: Architectural options
Informix IWA: Architectural options
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Hitachi streaming data platform v8
Hitachi streaming data platform v8Hitachi streaming data platform v8
Hitachi streaming data platform v8
 
Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8Hitachi Streaming Data Platform_v8
Hitachi Streaming Data Platform_v8
 
Hitachi Streaming Data Platform
Hitachi Streaming Data PlatformHitachi Streaming Data Platform
Hitachi Streaming Data Platform
 
3 reasons to pick a time series platform for monitoring dev ops driven contai...
3 reasons to pick a time series platform for monitoring dev ops driven contai...3 reasons to pick a time series platform for monitoring dev ops driven contai...
3 reasons to pick a time series platform for monitoring dev ops driven contai...
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - Thompson
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIs
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Motadata brochure
Motadata brochureMotadata brochure
Motadata brochure
 
Salesforce - classification of cloud computing
Salesforce - classification of cloud computingSalesforce - classification of cloud computing
Salesforce - classification of cloud computing
 
IoT and the Oil & Gas industry at M2M Oil & Gas 2014 in London
IoT and the Oil & Gas industry at M2M Oil & Gas 2014 in LondonIoT and the Oil & Gas industry at M2M Oil & Gas 2014 in London
IoT and the Oil & Gas industry at M2M Oil & Gas 2014 in London
 

Recently uploaded

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Recently uploaded (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Spark Analytics with Informix

  • 1. © 2015 IBM Corporation IBM Analytics Spark Analytics with Informix Pradeep Natarajan, IBM @pradeepnatara
  • 2. Agenda  Context: Informix / Spark high-level value propositions  IoT use-cases  Challenges  Prototype and implementation  What’s next? 2
  • 4. Informix for Internet of Things • Optimized Database for environments, such as: • Low or no database administration • Embedded: gateways, routers • Very high transaction rates and uptime characteristics • Widely deployed in the retail sector, where the low administration overhead makes it essential for in-store deployments. • Informix supports key Internet-of-Things solutions • Native support for time-based data: Timeseries • Small footprint • Low administration requirements 4
  • 5. Apache Spark  Speed  Ease of use, Unified Engine  Sophisticated analytics 5
  • 6. Apache Spark • Cluster computing framework • Fast and general engine for large-scale data processing • In-memory computing 6
  • 7. Apache Spark Streaming  Extends Spark for big data stream processing ROW DATA STREAM Processed Data Distributed Stream Processing System  Scaling, low latency, Recovery  Integrate Batch and interactive processing 7
  • 9. Real-Time Operational Database Streaming Analytics with Spark  Applications that drive business have positioned relational databases at the center of operations.  To continue their success, businesses need to use streaming analytics to gain real-time insights into their operations and take actions to optimize outcomes.  Infrequent batch analytics on “stale” data losing competitive edge. Increasing demand for real-time analytics to stay in the lead. 9
  • 10. SENSE -> ANALYZE -> ACT  As data ages, business value diminishes.  Sense → Analyze → Act in seconds/ milliseconds, not days or weeks Sens e Analyze ActSens e Analyz e ActDays Days Seconds Days Seconds Batch Real- time 10
  • 11. Connected Vehicles Energy & Utilities Health Care Driving behavior matching Power consumption Continuously streaming data from IBM Informix to analytics platform Streaming analytics service sample scenarios … How does power consumption correlated between House A,B,C D? Detect abnormal patterns in ECG series Detect the anomaly driving behavior cause higher fuel consumptions Increasing demand for real-time analytics Finance Detect the anomaly by price change rate in time window Steady price change Vibration in short period Market Manipulation Detection Heart Attack Prevention Cloud Service Operation Detect the system resource peak and valley, correlates with workload information Server health diagnosis 11
  • 12. Real-time analytics - Industry  Information technology – Systems & Network monitoring  IoT - sensor data analytics and processing  Financial transactions – authentication, fraud detection, validation  Inventory control – consumer trends and demands  Website analytics – ad targeting  Many others…. 12
  • 13. Real-time analytics - applications  Data analyzed as it arrives – data in motion  Simple: Monitoring, alerts/reports, statistics  Complex: predictive analytics (regressions, machine learning, etc…), K-means clusters (classification, anomaly detection)  Many store events as well, combine with later batch processing.  Immediate actions possible. 13
  • 15. Exploring data and discovering actionable business insights  The problem - Often users will not know what exact analytics they want to do  Difficult to justify cost/risk of a complex solution without specific business value  Need to reduce the cost/risk of adding real-time data analytics pipeline to application architecture  Let data scientist explore data to find useful data analytics without interfering with existing business. 15
  • 16. We're running an Informix database. How to incorporate real-time analytics into our application architecture? Application Server Database 16
  • 17. Out-dated approach - requires additional complexity Increased risk and cost. Application Server Additional Component Additional Component 17
  • 19. Real-Time Operational Database Streaming Analytics with Spark  Newly prototyped feature for the Informix database.  Enables Informix customers to stream data added to their database in real-time via MQTT, which can then be consumed by an analytics platform such as Apache Spark. 19
  • 20. Informix MQTT Streamer – Enable real- time analytics pipeline which drastically reduces complexity, cost and risk 20
  • 21. How is it implemented?  Uses Informix Virtual-Index Interface (VII)  VII allows us to write UDRs that will be triggered whenever certain SQL statements are executed  This is typically used to create indexes for custom data types. Instead, we use it to write data to a socket during INSERT/UPDATE statements VII UDR: Publish to MQTT broker MQTT broker 21
  • 22. Installation and basic usage  Open Sourced!  Available on github – https://github.com/IBM-IoT/InformixSparkStreaming  Run install script  Add the streaming index to the column whose values you want to stream create index stream on table(col1, col2) USING streaming_index; 22
  • 23. The Nitty gritty • Installed into Informix is a set of custom UDRs that convert data into MQTT messages and sends them to a specified address • Virtual Table Indexes detect data insert/update/deletes as they happen and trigger the messages to be sent • Once in an MQTT broker, almost anything can consume it – MQTT clients available for most programming languages (include Java for Apache spark) • Spark can analyze the data, compare it to historical data, use streaming k-means algorithms to determine changes in the data
  • 24. The Nitty gritty continued  Once installed, the custom “streaming_index” index type will be available for use.  Running the “create index” command and specifying to use the “streaming_index” index type will run the code in the custom UDRs that will push the data via MQTT.  Then, whenever you run the INSERT statement on the column that you created the streaming index on, the data that you inserted will automatically be published to an MQTT broker.  See the “IBM Informix Virtual-Index Interface Programmer's Guide” for more details. 24
  • 25. In-depth  Does the prototype work for Temp. tables?  No specific index-related restrictions to temp. tables  Do we lock the tables?  The VII will lengthen the amount of time a lock is held  Future item - multiple concurrent writers to a per-table queue, flushed asynchronously by a separate thread  Would this work for multi-nodes (sharding)?  The current prototype is really delegating this to Spark, where multiple input streams could be merged into one 25
  • 26. In-depth  Installs in seconds  No need to upgrade database  No need to restart database server  Can be installed and activated on a live production database!  Minimal interference with existing business application 26
  • 28.
  • 29. Heart To Spark • Demonstration for real time streaming of data from the Informix engine into a message broker for digestion by one or more services • Simulates IOT data from a heart rate monitor • Watches for trends in heart rates – Poor health/stress can cause a rise in baseline heartrate which is measurable • Uses Spark Analytics to determine baseline heartrates and plots the trend (heartrate rising, steady, or falling) • Graphing tools in browser show us a view of the data
  • 31. IOT devices send data into the Informix server Data Streams from Informix into an MQTT broker From MQTT Data is streamed into Spark for real-time Analysis Results from both Informix and Spark available to the end user Overview
  • 32. Not limited to Apache Spark  Can be used by any application/platform that can consume TCP socket data.  IBM Infosphere Streams  Apache Storm  Custom applications (most programming languages have MQTT libraries)  Many, many others. 32
  • 34. Endless possibilities  Check out Apache Spark for more information about analytics and machine learning  http://spark.apache.org/  Learn more about Machine Learning and its potential  https://www.coursera.org/learn/machine-learning  Contact IBM Informix 34