Informix Spark Streaming is an extension of Informix that allows data to be streamed out of the database as soon as it is inserted, updated, or deleted.
The protocol currently used to stream the changes is MQTT v3.1.1 (older versions not supported!). This extension is able to stream data to any MQTT broker where it can be processed or passed on to subscribing clients for processing.
4. Informix for Internet of Things
• Optimized Database for environments, such as:
• Low or no database administration
• Embedded: gateways, routers
• Very high transaction rates and uptime characteristics
• Widely deployed in the retail sector, where the low administration
overhead makes it essential for in-store deployments.
• Informix supports key Internet-of-Things solutions
• Native support for time-based data: Timeseries
• Small footprint
• Low administration requirements
4
6. Apache Spark
• Cluster computing framework
• Fast and general engine for large-scale data processing
• In-memory computing
6
7. Apache Spark Streaming
Extends Spark for big data stream processing
ROW DATA STREAM Processed Data
Distributed Stream
Processing System
Scaling, low latency, Recovery
Integrate Batch and interactive processing
7
9. Real-Time Operational Database
Streaming Analytics with Spark
Applications that drive business have positioned relational
databases at the center of operations.
To continue their success, businesses need to use streaming
analytics to gain real-time insights into their operations and take
actions to optimize outcomes.
Infrequent batch analytics on “stale” data losing competitive edge.
Increasing demand for real-time analytics to stay in the lead.
9
10. SENSE -> ANALYZE -> ACT
As data ages, business value diminishes.
Sense → Analyze → Act in seconds/ milliseconds, not days
or weeks
Sens
e
Analyze
ActSens
e
Analyz
e
ActDays Days
Seconds
Days
Seconds
Batch
Real-
time
10
11. Connected Vehicles Energy & Utilities Health Care
Driving behavior matching Power consumption
Continuously streaming data from IBM Informix to analytics platform
Streaming analytics service sample scenarios
…
How does power
consumption
correlated
between House
A,B,C D?
Detect abnormal
patterns in ECG
series
Detect the anomaly
driving behavior
cause higher fuel
consumptions
Increasing demand for real-time analytics
Finance
Detect the anomaly
by price change rate
in time window
Steady price change
Vibration in short
period
Market Manipulation Detection Heart Attack Prevention
Cloud Service Operation
Detect the system
resource peak and
valley, correlates
with workload
information
Server health diagnosis
11
12. Real-time analytics - Industry
Information technology – Systems & Network monitoring
IoT - sensor data analytics and processing
Financial transactions – authentication, fraud detection,
validation
Inventory control – consumer trends and demands
Website analytics – ad targeting
Many others….
12
13. Real-time analytics - applications
Data analyzed as it arrives – data in motion
Simple: Monitoring, alerts/reports, statistics
Complex: predictive analytics (regressions,
machine learning, etc…), K-means clusters
(classification, anomaly detection)
Many store events as well, combine with later
batch processing.
Immediate actions possible.
13
15. Exploring data and discovering
actionable business insights
The problem - Often users will not know what exact
analytics they want to do
Difficult to justify cost/risk of a complex solution without
specific business value
Need to reduce the cost/risk of adding real-time data
analytics pipeline to application architecture
Let data scientist explore data to find useful data analytics
without interfering with existing business.
15
16. We're running an Informix database. How
to incorporate real-time analytics into our
application architecture?
Application
Server
Database
16
17. Out-dated approach - requires additional complexity
Increased risk and cost.
Application Server
Additional
Component
Additional
Component
17
19. Real-Time Operational Database
Streaming Analytics with Spark
Newly prototyped feature for the Informix database.
Enables Informix customers to stream data added to their
database in real-time via MQTT, which can then be
consumed by an analytics platform such as Apache Spark.
19
20. Informix MQTT Streamer – Enable real-
time analytics pipeline which drastically
reduces complexity, cost and risk
20
21. How is it implemented?
Uses Informix Virtual-Index Interface (VII)
VII allows us to write UDRs that will be triggered
whenever certain SQL statements are executed
This is typically used to create indexes for custom
data types. Instead, we use it to write data to a
socket during INSERT/UPDATE statements
VII UDR:
Publish to
MQTT broker
MQTT broker
21
22. Installation and basic usage
Open Sourced!
Available on github –
https://github.com/IBM-IoT/InformixSparkStreaming
Run install script
Add the streaming index to the column whose values
you want to stream
create index stream on table(col1, col2) USING
streaming_index;
22
23. The Nitty gritty
• Installed into Informix is a set of custom UDRs that convert
data into MQTT messages and sends them to a specified
address
• Virtual Table Indexes detect data insert/update/deletes as
they happen and trigger the messages to be sent
• Once in an MQTT broker, almost anything can consume it
– MQTT clients available for most programming languages (include
Java for Apache spark)
• Spark can analyze the data, compare it to historical data,
use streaming k-means algorithms to determine changes
in the data
24. The Nitty gritty continued
Once installed, the custom “streaming_index” index type
will be available for use.
Running the “create index” command and specifying to use
the “streaming_index” index type will run the code in the
custom UDRs that will push the data via MQTT.
Then, whenever you run the INSERT statement on the
column that you created the streaming index on, the data
that you inserted will automatically be published to an
MQTT broker.
See the “IBM Informix Virtual-Index Interface Programmer's
Guide” for more details.
24
25. In-depth
Does the prototype work for Temp. tables?
No specific index-related restrictions to temp. tables
Do we lock the tables?
The VII will lengthen the amount of time a lock is held
Future item - multiple concurrent writers to a per-table
queue, flushed asynchronously by a separate thread
Would this work for multi-nodes (sharding)?
The current prototype is really delegating this to Spark,
where multiple input streams could be merged into one
25
26. In-depth
Installs in seconds
No need to upgrade database
No need to restart database server
Can be installed and activated on a live production
database!
Minimal interference with existing business
application
26
29. Heart To Spark
• Demonstration for real time streaming of data from
the Informix engine into a message broker for
digestion by one or more services
• Simulates IOT data from a heart rate monitor
• Watches for trends in heart rates
– Poor health/stress can cause a rise in baseline heartrate
which is measurable
• Uses Spark Analytics to determine baseline heartrates
and plots the trend (heartrate rising, steady, or falling)
• Graphing tools in browser show us a view of the data
31. IOT devices send data into
the Informix server
Data Streams from Informix
into an MQTT broker
From MQTT Data is
streamed into Spark for
real-time Analysis
Results from both Informix
and Spark available to the
end user
Overview
32. Not limited to Apache Spark
Can be used by any application/platform that can
consume TCP socket data.
IBM Infosphere Streams
Apache Storm
Custom applications (most programming
languages have MQTT libraries)
Many, many others.
32
34. Endless possibilities
Check out Apache Spark for more information
about analytics and machine learning
http://spark.apache.org/
Learn more about Machine Learning and its
potential
https://www.coursera.org/learn/machine-learning
Contact IBM Informix
34