SlideShare a Scribd company logo
1 of 33
AMIDST Toolbox – Flink Forward 1
A Java Toolbox for Scalable Probabilistic Machine Learning
12-14 SEP 2016, Berlin
Ana M. Martínez
Aalborg University
ana@cs.aau.dk
AMIDST Toolbox – Flink Forward 2
Who
are we?
THE AMIDST CONSORTIUM
3AMIDST Toolbox – Flink Forward
4
Running
Use Case
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
5
Predicting Defaulting Clients
Predicts probability a customer will default within 2
years
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
§  Daily data for millions of clients
§  Tons of missing data.
§  Odd distributions.	
6AMIDST Toolbox – Flink Forward
7
Toolbox
presentation
AMIDST Toolbox – Flink Forward
GENERAL DESCRIPTION
8AMIDST Toolbox – Flink Forward
AMIDST APPROACH
9
Data Knowledge
Openbox Models
Blackbox Inference Engine
(Powered by Flink)
	
AMIDST Toolbox – Flink Forward
10
Main
Features
AMIDST Toolbox – Flink Forward
PGMS
11
Probabilistic graphical models (PGMs)
Specify your model using probabilistic graphical models with
latent variables and temporal dependencies
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
12AMIDST Toolbox – Flink Forward
PGMS
13
Custom Gaussian Mixture Model
Hij defines a local mixture.
Hi defines a global mixture.
AMIDST Toolbox – Flink Forward
PGMS
14
RUNNING CODE EXAMPLE
//Set-up Flink session.

final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = "hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename,
false);



//Build the model

Model model = new CustomGaussianMixture(data.getAttributes());




AMIDST Toolbox – Flink Forward
SCALABLE INFERENCE
15
Scalable Learning
Perform Bayesian inference on your probabilistic models with
powerful approximate and scalable algorithms.
AMIDST Toolbox – Flink Forward
SCALABLE INFERENCE
16
d-VMP Algorithm - Coded as iterative map-reduce task
A state-of-the-art distributed variational message passing
algorithm.
AMIDST Toolbox – Flink Forward
SCALABLE INFERENCE
17
RUNNING CODE EXAMPLE
AMIDST Toolbox – Flink Forward
//Set-up Flink session.

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = “hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename, false);



//Build the model

Model model = new CustomGaussianMixture(data.getAttributes());



//Learn the model

model.updateModel(data); 



DATA STREAMS
18
Data Streams
Update your models when new data is available. This makes our
toolbox appropriate for learning from data streams.
AMIDST Toolbox – Flink Forward
DATA STREAMS
19
//Set-up Flink session.

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = “hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename, false);



//Build the model

Model model = new CustomGaussianMixture(data.getAttributes());



//Learn the model

model.updateModel(data); 



//Update your model

for(int i=1; i<12; i++) {

filename = “dataFlink_month"+i+".arff";

data = DataFlinkLoader.loadDataFromFolder(env, filename,false);

model.updateModel(data);



}
RUNNING CODE EXAMPLE
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
20
Predicting Defaulting Clients
§  Old BCC’s models based on logistic regression got an AUC of 0.816.
§  AMIDST’s models gets an AUC of 0.952.
AMIDST Toolbox – Flink Forward
SCALABILITY ANALYSIS
21
Scalability analysis
Use your defined models to process massive data sets in a
distributed computer cluster using Flink.
AMIDST Toolbox – Flink Forward
SCALABILITY ANALYSIS
22
One billion node probabilistic model
Experiment on a Flink cluster with 16 nodes on AWS.
AMIDST Toolbox – Flink Forward
SCALABILITY ANALYSIS
§  Speedup	(with	respect	to	2	nodes)	
AMIDST Toolbox – Flink Forward 23/32
0	
1	
2	
3	
4	
5	
6	
7	
8	
4	nodes	 8	nodes	 16	nodes
MODULAR DESIGN
24
Modular Design
The AMIDST Toolbox has been designed following a modular structure.
This makes easier:
§  The maintenance and enhancement of the software
§  The integration with external software: HUGIN, MOA, Weka, R.
AMIDST Toolbox – Flink Forward
MODULAR DESIGN
25AMIDST Toolbox – Flink Forward
26
Running
Use Case II
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
27
Tracking Concept Drift
Detects changes in customer profiles during Spanish financial crisis
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
28
Hidden Variables are used to capture changes in customer profile
MODEL
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
29
RUNNING CODE
AMIDST Toolbox – Flink Forward
//Set-up Flink session.

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = “hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename, false);



//Build the model

Model model = new ConceptDriftDetector(data.getAttributes());



//Learn the model

model.updateModel(data); 



//Update your model

for(int i=1; i<12; i++) {

filename = “dataFlink_month"+i+".arff";

data = DataFlinkLoader.loadDataFromFolder(env, filename,false);

model.updateModel(data);
System.out.println(model.getPosteriorDistribution(“hiddenVar”).
toString());

}
CONCEPT DRIFT DETECTION
30
Hidden Variable Captures Concept Drift
Drift Pattern: Seasonal + Global trend
RESULTS
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
31
Unemployment Rate main driver of Concept Drift
Hidden Variable correlates with unemployment rate (rho = 0.961)
RESULTS
AMIDST Toolbox – Flink Forward
COLLABORATE
32
www.amidsttoolbox.com github.com/amidst/toolbox
Appache
License 2.0
AMIDST Toolbox – Flink Forward
AMIDST Toolbox – Flink Forward 33
Thanks for your
attention
			@	 contact@amidsttoolbox.com
@AmidstToolbox
			
www	 www.amidsttoolbox.com

More Related Content

Viewers also liked

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsFlink Forward
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkFlink Forward
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Flink Forward
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Flink Forward
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Flink Forward
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateFlink Forward
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: AmadeusFlink Forward
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...Flink Forward
 
Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)AMIDST Toolbox
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkFlink Forward
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Flink Forward
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingFlink Forward
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriFlink Forward
 

Viewers also liked (20)

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security Enhancements
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs

 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: Amadeus
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
 
Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 

Similar to Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with Flink

Elastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application CloudElastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application CloudLucas Bremgartner
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormMike Lohmann
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringMohamed MEJDOUBI
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3edKhánh Ghẻ
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
 
OpenDaylight app development tutorial
OpenDaylight app development tutorialOpenDaylight app development tutorial
OpenDaylight app development tutorialSDN Hub
 
Projects on Cloud Computing
Projects on Cloud ComputingProjects on Cloud Computing
Projects on Cloud ComputingPhdtopiccom
 
What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1Precisely
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayJosef Adersberger
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherDavid La Motta
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingJaime Martin Losa
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionLudovico Caldara
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01AislanSoares
 
How to use source control with apex?
How to use source control with apex?How to use source control with apex?
How to use source control with apex?Oliver Lemm
 
Open Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit ParisOpen Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit Parisit-novum
 

Similar to Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with Flink (20)

Elastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application CloudElastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application Cloud
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
 
Smartblitzmerker
SmartblitzmerkerSmartblitzmerker
Smartblitzmerker
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3ed
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
OpenDaylight app development tutorial
OpenDaylight app development tutorialOpenDaylight app development tutorial
OpenDaylight app development tutorial
 
Projects on Cloud Computing
Projects on Cloud ComputingProjects on Cloud Computing
Projects on Cloud Computing
 
What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack Together
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW version
 
E-GEN iCAN
E-GEN iCANE-GEN iCAN
E-GEN iCAN
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01
 
How to use source control with apex?
How to use source control with apex?How to use source control with apex?
How to use source control with apex?
 
Open Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit ParisOpen Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit Paris
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一z xss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxviniciusperissetr
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Recently uploaded (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with Flink

  • 1. AMIDST Toolbox – Flink Forward 1 A Java Toolbox for Scalable Probabilistic Machine Learning 12-14 SEP 2016, Berlin Ana M. Martínez Aalborg University ana@cs.aau.dk
  • 2. AMIDST Toolbox – Flink Forward 2 Who are we?
  • 3. THE AMIDST CONSORTIUM 3AMIDST Toolbox – Flink Forward
  • 5. RUNNING USE CASE 5 Predicting Defaulting Clients Predicts probability a customer will default within 2 years AMIDST Toolbox – Flink Forward
  • 6. RUNNING USE CASE §  Daily data for millions of clients §  Tons of missing data. §  Odd distributions. 6AMIDST Toolbox – Flink Forward
  • 9. AMIDST APPROACH 9 Data Knowledge Openbox Models Blackbox Inference Engine (Powered by Flink) AMIDST Toolbox – Flink Forward
  • 11. PGMS 11 Probabilistic graphical models (PGMs) Specify your model using probabilistic graphical models with latent variables and temporal dependencies AMIDST Toolbox – Flink Forward
  • 12. RUNNING USE CASE 12AMIDST Toolbox – Flink Forward
  • 13. PGMS 13 Custom Gaussian Mixture Model Hij defines a local mixture. Hi defines a global mixture. AMIDST Toolbox – Flink Forward
  • 14. PGMS 14 RUNNING CODE EXAMPLE //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = "hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new CustomGaussianMixture(data.getAttributes()); 
 
 AMIDST Toolbox – Flink Forward
  • 15. SCALABLE INFERENCE 15 Scalable Learning Perform Bayesian inference on your probabilistic models with powerful approximate and scalable algorithms. AMIDST Toolbox – Flink Forward
  • 16. SCALABLE INFERENCE 16 d-VMP Algorithm - Coded as iterative map-reduce task A state-of-the-art distributed variational message passing algorithm. AMIDST Toolbox – Flink Forward
  • 17. SCALABLE INFERENCE 17 RUNNING CODE EXAMPLE AMIDST Toolbox – Flink Forward //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = “hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new CustomGaussianMixture(data.getAttributes());
 
 //Learn the model
 model.updateModel(data); 
 

  • 18. DATA STREAMS 18 Data Streams Update your models when new data is available. This makes our toolbox appropriate for learning from data streams. AMIDST Toolbox – Flink Forward
  • 19. DATA STREAMS 19 //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = “hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new CustomGaussianMixture(data.getAttributes());
 
 //Learn the model
 model.updateModel(data); 
 
 //Update your model
 for(int i=1; i<12; i++) {
 filename = “dataFlink_month"+i+".arff";
 data = DataFlinkLoader.loadDataFromFolder(env, filename,false);
 model.updateModel(data);
 
 } RUNNING CODE EXAMPLE AMIDST Toolbox – Flink Forward
  • 20. RUNNING USE CASE 20 Predicting Defaulting Clients §  Old BCC’s models based on logistic regression got an AUC of 0.816. §  AMIDST’s models gets an AUC of 0.952. AMIDST Toolbox – Flink Forward
  • 21. SCALABILITY ANALYSIS 21 Scalability analysis Use your defined models to process massive data sets in a distributed computer cluster using Flink. AMIDST Toolbox – Flink Forward
  • 22. SCALABILITY ANALYSIS 22 One billion node probabilistic model Experiment on a Flink cluster with 16 nodes on AWS. AMIDST Toolbox – Flink Forward
  • 23. SCALABILITY ANALYSIS §  Speedup (with respect to 2 nodes) AMIDST Toolbox – Flink Forward 23/32 0 1 2 3 4 5 6 7 8 4 nodes 8 nodes 16 nodes
  • 24. MODULAR DESIGN 24 Modular Design The AMIDST Toolbox has been designed following a modular structure. This makes easier: §  The maintenance and enhancement of the software §  The integration with external software: HUGIN, MOA, Weka, R. AMIDST Toolbox – Flink Forward
  • 25. MODULAR DESIGN 25AMIDST Toolbox – Flink Forward
  • 26. 26 Running Use Case II AMIDST Toolbox – Flink Forward
  • 27. CONCEPT DRIFT DETECTION 27 Tracking Concept Drift Detects changes in customer profiles during Spanish financial crisis AMIDST Toolbox – Flink Forward
  • 28. CONCEPT DRIFT DETECTION 28 Hidden Variables are used to capture changes in customer profile MODEL AMIDST Toolbox – Flink Forward
  • 29. CONCEPT DRIFT DETECTION 29 RUNNING CODE AMIDST Toolbox – Flink Forward //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = “hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new ConceptDriftDetector(data.getAttributes());
 
 //Learn the model
 model.updateModel(data); 
 
 //Update your model
 for(int i=1; i<12; i++) {
 filename = “dataFlink_month"+i+".arff";
 data = DataFlinkLoader.loadDataFromFolder(env, filename,false);
 model.updateModel(data); System.out.println(model.getPosteriorDistribution(“hiddenVar”). toString());
 }
  • 30. CONCEPT DRIFT DETECTION 30 Hidden Variable Captures Concept Drift Drift Pattern: Seasonal + Global trend RESULTS AMIDST Toolbox – Flink Forward
  • 31. CONCEPT DRIFT DETECTION 31 Unemployment Rate main driver of Concept Drift Hidden Variable correlates with unemployment rate (rho = 0.961) RESULTS AMIDST Toolbox – Flink Forward
  • 33. AMIDST Toolbox – Flink Forward 33 Thanks for your attention @ contact@amidsttoolbox.com @AmidstToolbox www www.amidsttoolbox.com