IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

© Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved.
A Stock Prediction System using
open-source software
Fred Melo
fmelo@pivotal.io
@fredmelo_br
1
William Markito
wmarkito@pivotal.io
@william_markito

© Copyright 2014 Pivotal. All rights reserved.
It's all about DATA
Data Sources
Look for patterns
Prediction

© Copyright 2014 Pivotal. All rights reserved. 5
Machine Learning is the answer
Neural Networks
Clustering Genetic Algorithms

Train with historical dataset
Apply model to the new input
Applying Machine Learning

Hard to add new data sources
Why?
Hard to scale
Why so hard?
Hard to make it real-time

Traditional models are reactive and static
HDFS
Data Lake
Store Analytics
Hard to change
Labor intensive
Inefficient
No real-time information
ETL based
Data-source specific

Stream-based, real-time closed-loop analytics are needed
HDFSData Lake
Expert System /
Machine Learning
In-Memory Real-
Time Data
Continuous Learning
Continuous Improvement
Continuous Adapting
Data Stream Pipeline
Multiple Data Sources
Real-Time Processing
Store Everything

Info
Analysis
Look at past trends
(for similar input)
Evaluate current input
Score / Predict
Neural Network
How can it be addressed?

Info
Analysis
Filter
[ json ]
Neural Network

Info
Analysis
Filter Enrich
Neural Network

Info
Analysis
Neural Network
Filter Enrich Transform

Info
Analysis
Neural Network

Info
Analysis
Transform
Neural Network

Neural Network
In-Memory Data Grid
Real-time
scoring
Train

Neural Network
In-Memory Data Grid
Front-end
Update Push

Ingest Transform Sink
SpringXD
Store / Analyze
Fast Data
Distributed Computing
Predict / Machine Learning
Other Sources and
Destinations
JMS
Streaming real-time analytics architecture

Transform Sink
SpringXD
Extensible
Open-Source
Fault-Tolerant
Horizontally Scalable
Cloud-Native
HTTP
Machine Learning
Fast Data
Filter
Predict Sink
HTTP
Split
Dashboard
Push
Demo Architecture

SpringXD
INGEST / SINK PROCESS ANALYZE
• Little or no coding required
• Dozens of built-in connectors
• Seamless integration with Kafka,
Sqoop
• Create new connectors easily
using Spring
• Call Spark, Reactor or RxJava
• Built-in configurable filtering,
splitting and transformation
• Out-of-box configurable jobs for
batch processing
• Import and invoke PMML jobs
easily
• Call Python, R, Madlib and other
tools
• Built-in configurable counters and
gauges
Data Stream Pipelining

SpringXD
XD NodesXD NodesXD NodesXD Nodes
Ingest
SpringXD
Split Filter Transform Sink
XD admin
XD Nodes
Ingest Split Filter Transform Sink
Stream
Deployment
Messaging
Scale-Out and HA Architecture

Geode client-server architecture
GemFire'Server'
Par,,oned'
Region'
GemFire'Server'
Par,,oned'
Region'
GemFire'
Locator'
!
GemFire'Client'
Local'
Cache'
Connec,on'pool'
Send!address!and!load!
informa.on!to!locator!
Send,!receive!
cache!data.!
Receive!server!
events!
Request!server!informa.on!from!
locator.!
Locator!responds!with!least!
loaded!server!address.!

Partitioned Regions
GemFire Server1 GemFire Server2
Primary'to'redundant'replica1on'
Primary
0 2 4 6
Redundant
1 3 5 7
Region A Region B Region A Region B
Primary
1 3 5 7
Redundant
0 2 4 6

Event handling
GemFire'Server'
Region'A'
'
''''''''''
subscrip4on
Region'A'
pool6name=ServerPool'
'
'''''''''
X
GemFire'Client'1'
pool'"ServerPool"'
(with'or'without''
subscrip4ons'enabled)
Region'A'
pool6name=ServerPool'
'
'''''''''
X
GemFire'Client'2'
pool'"ServerPool"'
(with'subscrip4ons'enabled,'
'interest'register'in'X,'receiveValues=true)
X
Distributed'System
Update'/'Create
1
2
3 3
4
X
The pool propagates the
event to the cache server,
where the region is updated.
The server distributes the event to
its peers and also places it into
the subscription queue for Client 2.

medium
avg (x+1)
relative
strength (x)
medium avg (x)
price(x)
Neural Network

SpringXD
shell - R
Transformer
geode-json
client
geode-json
client
http-client
http-server
obj-to-json
splitter
splitter
Simulator
tap

SpringXD
http://projectgeode.org
http://projects.spring.io/spring-xd
https://registry.hub.docker.com/
http://www.r-project.org

IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD

Similar to IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD (20)

More from In-Memory Computing Summit

More from In-Memory Computing Summit (20)

Recently uploaded

Recently uploaded (20)

IMCSummit 2015 - Day 2 Developer Track - Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode, R and Spring XD