Finance market prediction has always been one of the hottest topics in Data Science and Machine Learning. However, the prediction algorithm is just a small piece of the puzzle. Building a data stream pipeline that is constantly combining the latest price info with high volume historical data is extremely challenging using traditional platforms, requiring a lot of code and thinking about how to scale or move into the cloud.
This session is going to walk-through the architecture and implementation details of an application built on top of open-source tools that demonstrate how to easily build a stock prediction solution with no source code - except a few lines of R and the UI interface that will consume data through a RESTful endpoint, in real-time. The solution leverages in-memory data grid technology for high-speed ingestion, combining streaming of real-time data and distributed processing for stock indicator algorithms.
Similar to IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD
Horses for Courses: Database RoundtableEric Kavanagh
Similar to IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD (20)
7. Hard to add new data sources
Why?
Hard to scale
Why so hard?
Hard to make it real-time
8. Traditional models are reactive and static
HDFS
Data Lake
Store Analytics
Hard to change
Labor intensive
Inefficient
No real-time information
ETL based
Data-source specific
9. Stream-based, real-time closed-loop analytics are needed
HDFSData Lake
Expert System /
Machine Learning
In-Memory Real-
Time Data
Continuous Learning
Continuous Improvement
Continuous Adapting
Data Stream Pipeline
Multiple Data Sources
Real-Time Processing
Store Everything
10. Info
Analysis
Look at past trends
(for similar input)
Evaluate current input
Score / Predict
Neural Network
How can it be addressed?
18. Ingest Transform Sink
SpringXD
Store / Analyze
Fast Data
Distributed Computing
Predict / Machine Learning
Other Sources and
Destinations
JMS
Streaming real-time analytics architecture
20. SpringXD
INGEST / SINK PROCESS ANALYZE
• Little or no coding required
• Dozens of built-in connectors
• Seamless integration with Kafka,
Sqoop
• Create new connectors easily
using Spring
• Call Spark, Reactor or RxJava
• Built-in configurable filtering,
splitting and transformation
• Out-of-box configurable jobs for
batch processing
• Import and invoke PMML jobs
easily
• Call Python, R, Madlib and other
tools
• Built-in configurable counters and
gauges
Data Stream Pipelining
24. Partitioned Regions
GemFire Server1 GemFire Server2
Primary'to'redundant'replica1on'
Primary
0 2 4 6
Redundant
1 3 5 7
Region A Region B Region A Region B
Primary
1 3 5 7
Redundant
0 2 4 6