SlideShare a Scribd company logo
1 of 65
Download to read offline
Real-Time Big Data
Stream Analytics
Albert Bifet @abifet
Business Applications of Social Network Analysis (BASNA)
Shenzhen, 14 December 2014
Real time analytics
Real time analytics
Companies need to know what is happening
right now to react and anticipate to detect new
business opportunities through Social Networks
and Real-time Analytics.
Outline
Big Data Stream Analytics
MOA: Massive Online Analysis
SAMOA: Scalable Advanced Massive Online Analysis
Outline
Big Data Stream Analytics
MOA: Massive Online Analysis
SAMOA: Scalable Advanced Massive Online Analysis
Data Streams
Data Streams
• Sequence is potentially infinite
• High amount of data: sublinear space
• High speed of arrival: sublinear time per example
• Once an element from a data stream has been processed
it is discarded or archived
Big Data & Real Time
Data Streams
Approximation algorithms
• Small error rate with high probability
• An algorithm (ε,δ)−approximates F if it outputs ˜F for
which Pr[| ˜F −F| > εF] < δ.
Big Data & Real Time
8 Bits Counter
1 0 1 0 1 0 1 0
What is the largest number we can
store in 8 bits?
8 Bits Counter
What is the largest number we can
store in 8 bits?
8 Bits Counter
0 20 40 60 80 100
0
20
40
60
80
100
x
f (x) = log(1+x)/log(2)
f (0) = 0,f (1) = 1
8 Bits Counter
0 2 4 6 8 10
0
2
4
6
8
10
x
f (x) = log(1+x)/log(2)
f (0) = 0,f (1) = 1
8 Bits Counter
0 2 4 6 8 10
0
2
4
6
8
10
x
f (x) = log(1+x/30)/log(1+1/30)
f (0) = 0,f (1) = 1
8 Bits Counter
0 20 40 60 80 100
0
20
40
60
80
100
x
f (x) = log(1+x/30)/log(1+1/30)
f (0) = 0,f (1) = 1
8 Bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 0
2 for every event in the stream
3 do rand = random number between 0 and 1
4 if rand < p
5 then c ← c +1
What is the largest number we can
store in 8 bits?
8 Bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 0
2 for every event in the stream
3 do rand = random number between 0 and 1
4 if rand < p
5 then c ← c +1
With p = 1/2 we can store 2×256
with standard deviation σ = n/2
8 Bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 0
2 for every event in the stream
3 do rand = random number between 0 and 1
4 if rand < p
5 then c ← c +1
With p = 2−c then E[2c] = n +2 with
variance σ2 = n(n +1)/2
8 Bits Counter
MORRIS APPROXIMATE COUNTING ALGORITHM
1 Init counter c ← 0
2 for every event in the stream
3 do rand = random number between 0 and 1
4 if rand < p
5 then c ← c +1
If p = b−c then E[bc] = n(b −1)+b,
σ2 = (b −1)n(n +1)/2
Data Streams
Example
Puzzle: Finding Missing Numbers
• Let π be a permutation of {1,...,n}.
• Let π−1 be π with one element
missing.
• π−1[i] arrives in increasing order
Task: Determine the missing number
Data Streams
Example
Puzzle: Finding Missing Numbers
• Let π be a permutation of {1,...,n}.
• Let π−1 be π with one element
missing.
• π−1[i] arrives in increasing order
Task: Determine the missing number
Use a n-bit
vector to
memorize all the
numbers (O(n)
space)
Data Streams
Example
Puzzle: Finding Missing Numbers
• Let π be a permutation of {1,...,n}.
• Let π−1 be π with one element
missing.
• π−1[i] arrives in increasing order
Task: Determine the missing number
Data Streams:
O(log(n)) space.
Data Streams
Example
Puzzle: Finding Missing Numbers
• Let π be a permutation of {1,...,n}.
• Let π−1 be π with one element
missing.
• π−1[i] arrives in increasing order
Task: Determine the missing number
Data Streams:
O(log(n)) space.
Store
n(n +1)
2
−∑
j≤i
π−1[j].
Min-wise independent hashing
Similarity: Jaccard Coefficient of A and B
=|A ∩B|/|A ∪B|
Min-wise independent hashing
(Broder et al., 2000)
Let h : Items → [0,1] be a random hash function.
Find the 4 items bought by either Alice or Bob with the
smallest hash values:
1 Find the 4 smallest hash values for Alice, and the 4 for
Bob.
2 Aggregate the results.
3 Estimate the similarity.
Example on estimating the Jaccard similarity in
data streams.
Min-wise independent hashing
1 Find the 4 smallest hash values for Alice, and
the 4 for Bob.
Min-wise independent hashing
2 Aggregate the results
3 Estimate the similarity.
• Among the 4 items with the smallest hash values, only
milk is bought by Alice and Bob, thus the estimated
similarity is 1/4
Example on estimating the Jaccard similarity in
data streams.
Stream Learning of Influence probabilities
Learning influence strength
along links in social networks
• Real-time interactions
• Small amount of time and memory
• Only one pass
Learning influence strength
along links in social networks
Social Graph and Log of Propagation
Learning influence probabilities
• Approximate solutions
• Superlinear space
• Linear space
• Sublinear space
• Landmark and sliding
window
STRIP: Stream Learning of Influence
probabilities
Learning influence probabilities
• The problem of learning the probabilities was first
considered in Goyal et al. WSDM’10.
• In particular the following definition was shown to give a
good prediction of propagation:
Let Au2v(t) be the number of actions that propagated from
user u to user v within time t and Au|v the total number of
actions performed by either u or v. Then
pu,v = Au2v(t)/Au|v
Similar to Jaccard coefficient for estimating the similarity
between sets A and B = |A ∩B|/|A ∪B|.
Outline
Big Data Stream Analytics
MOA: Massive Online Analysis
SAMOA: Scalable Advanced Massive Online Analysis
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online
learning from data streams.
• It is related to WEKA
• It includes a collection of offline and online as well as
tools for evaluation: classification, regression, clustering,
outlier detection, concept drift detection and
recommender systems
• Easy to extend
• Easy to design and run experiments
History - timeline
WEKA
• 1993 - WEKA : project starts (Ian Witten)
• Mid 1999 - WEKA 3 (100% Java) released
MOA
• Nov 2007 - First public release of MOA: Richard Kirkby,
Geoff Holmes and Bernhard Pfahringer
• 2009 - MOA Concept Drift Classification
• 2010 - MOA Clustering
• 2011 - MOA Graph Mining, Multi-label classification,
Twitter Reader, Active Learning
• 2014 - MOA Outliers, and Recommender System
• 2014 - MOA Concept Drift
WEKA
• Waikato Environment for Knowledge Analysis
• Collection of state-of-the-art machine learning
algorithms and data processing tools implemented in
Java
• Released under the GPL
• Support for the whole process of experimental data
mining
• Preparation of input data
• Statistical evaluation of learning schemes
• Visualization of input data and the result of learning
• Used for education, research and applications
• Complements “Data Mining” by Witten & Frank & Hall
WEKA: the bird
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
Classification Experimental
Setting
Classification Experimental
Setting
Evaluation procedures for Data
Streams
• Holdout
• Interleaved Test-Then-Train
or Prequential
Classification Experimental
Setting
Data Sources
• Random Tree Generator
• Random RBF Generator
• LED Generator
• Waveform Generator
• Hyperplane
• SEA Generator
• STAGGER Generator
Classification Experimental
Setting
Classifiers
• Naive Bayes
• Decision stumps
• Hoeffding Tree
• Hoeffding Option Tree
• Bagging and Boosting
• Random Forests
• ADWIN Bagging and
Leveraging Bagging
• Adaptive Rules
• Logistic Regression, SGD
RAM-Hours
RAM-Hour
Every GB of RAM deployed for 1 hour
Cloud Computing Rental Cost Options
Clustering Experimental Setting
Clustering Experimental Setting
Internal measures External measures
Gamma Rand statistic
C Index Jaccard coefficient
Point-Biserial Folkes and Mallow Index
Log Likelihood Hubert Γ statistics
Dunn’s Index Minkowski score
Tau Purity
Tau A van Dongen criterion
Tau C V-measure
Somer’s Gamma Completeness
Ratio of Repetition Homogeneity
Modified Ratio of Repetition Variation of information
Adjusted Ratio of Clustering Mutual information
Fagan’s Index Class-based entropy
Deviation Index Cluster-based entropy
Z-Score Index Precision
D Index Recall
Silhouette coefficient F-measure
Clustering Experimental Setting
Clusterers
• StreamKM++
• CluStream
• ClusTree
• Den-Stream
• D-Stream
• CobWeb
Web
http://www.moa.cms.waikato.ac.nz
Easy Design of a MOA classifier
• void resetLearningImpl ()
• void trainOnInstanceImpl (Instance inst)
• double[] getVotesForInstance (Instance i)
Easy Design of a MOA clusterer
• void resetLearningImpl ()
• void trainOnInstanceImpl (Instance inst)
• Clustering getClusteringResult()
Extensions of MOA
• Multi-label Classification
• Active Learning
• Regression
• Closed Frequent Graph Mining
• Twitter Sentiment Analysis
Outline
Big Data Stream Analytics
MOA: Massive Online Analysis
SAMOA: Scalable Advanced Massive Online Analysis
SAMOA
SAMOA is distributed streaming machine
learning (ML) framework that contains a
programing abstraction for distributed
streaming ML algorithms.
Hadoop
Hadoop architecture
Apache S4
Apache Storm
Stream, Spout, Bolt, Topology
SAMOA
SAMOA
samoa-SPE
SAMOA
Algorithm and API
SPE-adapter
S4 Storm other SPEs
ML-adapter
MOA
Other ML
frameworks
samoa-S4 samoa-storm samoa-other-SPEs
SAMOA
SAMOA	
  SA
SAMOA ML Developer API
Processing Item
Processor
Stream
SAMOA ML Developer API
Web
http://samoa-project.net/
The Team
Summary
1 To improve benefits, companies need to satisfy
costumers. To do this they need to know
• what potential costumers needs
• social networks
• real-time analytics
2 MOA deals with evolving data streams
3 SAMOA is distributed streaming machine learning
Thanks!
@abifet
http://moa.cms.waikato.ac.nz/
@moadatamining
http://samoa-project.net/
@samoa project

More Related Content

What's hot

MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming AlgorithmsJoe Kelley
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Ansgar Scherp
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker DiarizationHONGJOO LEE
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataAnsgar Scherp
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudAnsgar Scherp
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkFlink Forward
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19GoDataDriven
 

What's hot (20)

MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19
 

Viewers also liked

Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Toward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and ApplicationsToward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and ApplicationsRaja Chiky
 
Semantic Discovery and Integration of Urban Data Streams
Semantic Discovery and Integration of Urban Data StreamsSemantic Discovery and Integration of Urban Data Streams
Semantic Discovery and Integration of Urban Data StreamsAli Intizar
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataLuiz Henrique Zambom Santana
 
Research Seminar Presentation - A framework for partitioning and execution of...
Research Seminar Presentation - A framework for partitioning and execution of...Research Seminar Presentation - A framework for partitioning and execution of...
Research Seminar Presentation - A framework for partitioning and execution of...malinga2009
 
Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks   Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks Bohdan Pavlyshenko
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...EuroIoTa
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopJames Kebinger
 
Bigdata analytics-twitter
Bigdata analytics-twitterBigdata analytics-twitter
Bigdata analytics-twitterdfilppi
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange Labs
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange LabsData analytics for monitoring IoT infrastructures by G.Madhusudan, Orange Labs
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange LabsEuroIoTa
 
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...EuroIoTa
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyUri Cohen
 

Viewers also liked (19)

Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Toward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and ApplicationsToward Semantic Data Stream - Technologies and Applications
Toward Semantic Data Stream - Technologies and Applications
 
Semantic Discovery and Integration of Urban Data Streams
Semantic Discovery and Integration of Urban Data StreamsSemantic Discovery and Integration of Urban Data Streams
Semantic Discovery and Integration of Urban Data Streams
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big Data
 
Research Seminar Presentation - A framework for partitioning and execution of...
Research Seminar Presentation - A framework for partitioning and execution of...Research Seminar Presentation - A framework for partitioning and execution of...
Research Seminar Presentation - A framework for partitioning and execution of...
 
Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks   Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Cerrera DINWC2015
Cerrera DINWC2015Cerrera DINWC2015
Cerrera DINWC2015
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...
 
The Big 5- Twitter
The Big 5- TwitterThe Big 5- Twitter
The Big 5- Twitter
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with Hadoop
 
Bigdata analytics-twitter
Bigdata analytics-twitterBigdata analytics-twitter
Bigdata analytics-twitter
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange Labs
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange LabsData analytics for monitoring IoT infrastructures by G.Madhusudan, Orange Labs
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange Labs
 
EU FP7 CityPulse Project
EU FP7 CityPulse ProjectEU FP7 CityPulse Project
EU FP7 CityPulse Project
 
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case study
 

Similar to Real-Time Big Data Stream Analytics

Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer visionEran Shlomo
 
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...Dataconomy Media
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 
Mining Large-Scale Temporal Dynamics with Hadoop
Mining Large-Scale Temporal Dynamics with HadoopMining Large-Scale Temporal Dynamics with Hadoop
Mining Large-Scale Temporal Dynamics with HadoopDataWorks Summit
 

Similar to Real-Time Big Data Stream Analytics (20)

Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Big data
Big dataBig data
Big data
 
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
Mining Large-Scale Temporal Dynamics with Hadoop
Mining Large-Scale Temporal Dynamics with HadoopMining Large-Scale Temporal Dynamics with Hadoop
Mining Large-Scale Temporal Dynamics with Hadoop
 
Big data
Big dataBig data
Big data
 

More from Albert Bifet

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online AnalysisAlbert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streamsAlbert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAlbert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesAlbert Bifet
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsAlbert Bifet
 

More from Albert Bifet (10)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed Trees
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data Streams
 

Recently uploaded

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyAnusha Are
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 

Recently uploaded (20)

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 

Real-Time Big Data Stream Analytics

  • 1. Real-Time Big Data Stream Analytics Albert Bifet @abifet Business Applications of Social Network Analysis (BASNA) Shenzhen, 14 December 2014
  • 4. Companies need to know what is happening right now to react and anticipate to detect new business opportunities through Social Networks and Real-time Analytics.
  • 5. Outline Big Data Stream Analytics MOA: Massive Online Analysis SAMOA: Scalable Advanced Massive Online Analysis
  • 6. Outline Big Data Stream Analytics MOA: Massive Online Analysis SAMOA: Scalable Advanced Massive Online Analysis
  • 7. Data Streams Data Streams • Sequence is potentially infinite • High amount of data: sublinear space • High speed of arrival: sublinear time per example • Once an element from a data stream has been processed it is discarded or archived Big Data & Real Time
  • 8. Data Streams Approximation algorithms • Small error rate with high probability • An algorithm (ε,δ)−approximates F if it outputs ˜F for which Pr[| ˜F −F| > εF] < δ. Big Data & Real Time
  • 9. 8 Bits Counter 1 0 1 0 1 0 1 0 What is the largest number we can store in 8 bits?
  • 10. 8 Bits Counter What is the largest number we can store in 8 bits?
  • 11. 8 Bits Counter 0 20 40 60 80 100 0 20 40 60 80 100 x f (x) = log(1+x)/log(2) f (0) = 0,f (1) = 1
  • 12. 8 Bits Counter 0 2 4 6 8 10 0 2 4 6 8 10 x f (x) = log(1+x)/log(2) f (0) = 0,f (1) = 1
  • 13. 8 Bits Counter 0 2 4 6 8 10 0 2 4 6 8 10 x f (x) = log(1+x/30)/log(1+1/30) f (0) = 0,f (1) = 1
  • 14. 8 Bits Counter 0 20 40 60 80 100 0 20 40 60 80 100 x f (x) = log(1+x/30)/log(1+1/30) f (0) = 0,f (1) = 1
  • 15. 8 Bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c +1 What is the largest number we can store in 8 bits?
  • 16. 8 Bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c +1 With p = 1/2 we can store 2×256 with standard deviation σ = n/2
  • 17. 8 Bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c +1 With p = 2−c then E[2c] = n +2 with variance σ2 = n(n +1)/2
  • 18. 8 Bits Counter MORRIS APPROXIMATE COUNTING ALGORITHM 1 Init counter c ← 0 2 for every event in the stream 3 do rand = random number between 0 and 1 4 if rand < p 5 then c ← c +1 If p = b−c then E[bc] = n(b −1)+b, σ2 = (b −1)n(n +1)/2
  • 19. Data Streams Example Puzzle: Finding Missing Numbers • Let π be a permutation of {1,...,n}. • Let π−1 be π with one element missing. • π−1[i] arrives in increasing order Task: Determine the missing number
  • 20. Data Streams Example Puzzle: Finding Missing Numbers • Let π be a permutation of {1,...,n}. • Let π−1 be π with one element missing. • π−1[i] arrives in increasing order Task: Determine the missing number Use a n-bit vector to memorize all the numbers (O(n) space)
  • 21. Data Streams Example Puzzle: Finding Missing Numbers • Let π be a permutation of {1,...,n}. • Let π−1 be π with one element missing. • π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space.
  • 22. Data Streams Example Puzzle: Finding Missing Numbers • Let π be a permutation of {1,...,n}. • Let π−1 be π with one element missing. • π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space. Store n(n +1) 2 −∑ j≤i π−1[j].
  • 23. Min-wise independent hashing Similarity: Jaccard Coefficient of A and B =|A ∩B|/|A ∪B|
  • 24. Min-wise independent hashing (Broder et al., 2000) Let h : Items → [0,1] be a random hash function. Find the 4 items bought by either Alice or Bob with the smallest hash values: 1 Find the 4 smallest hash values for Alice, and the 4 for Bob. 2 Aggregate the results. 3 Estimate the similarity. Example on estimating the Jaccard similarity in data streams.
  • 25. Min-wise independent hashing 1 Find the 4 smallest hash values for Alice, and the 4 for Bob.
  • 26. Min-wise independent hashing 2 Aggregate the results 3 Estimate the similarity. • Among the 4 items with the smallest hash values, only milk is bought by Alice and Bob, thus the estimated similarity is 1/4 Example on estimating the Jaccard similarity in data streams.
  • 27. Stream Learning of Influence probabilities
  • 28. Learning influence strength along links in social networks • Real-time interactions • Small amount of time and memory • Only one pass
  • 29. Learning influence strength along links in social networks Social Graph and Log of Propagation
  • 30. Learning influence probabilities • Approximate solutions • Superlinear space • Linear space • Sublinear space • Landmark and sliding window STRIP: Stream Learning of Influence probabilities
  • 31. Learning influence probabilities • The problem of learning the probabilities was first considered in Goyal et al. WSDM’10. • In particular the following definition was shown to give a good prediction of propagation: Let Au2v(t) be the number of actions that propagated from user u to user v within time t and Au|v the total number of actions performed by either u or v. Then pu,v = Au2v(t)/Au|v Similar to Jaccard coefficient for estimating the similarity between sets A and B = |A ∩B|/|A ∪B|.
  • 32. Outline Big Data Stream Analytics MOA: Massive Online Analysis SAMOA: Scalable Advanced Massive Online Analysis
  • 33. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. • It is related to WEKA • It includes a collection of offline and online as well as tools for evaluation: classification, regression, clustering, outlier detection, concept drift detection and recommender systems • Easy to extend • Easy to design and run experiments
  • 34. History - timeline WEKA • 1993 - WEKA : project starts (Ian Witten) • Mid 1999 - WEKA 3 (100% Java) released MOA • Nov 2007 - First public release of MOA: Richard Kirkby, Geoff Holmes and Bernhard Pfahringer • 2009 - MOA Concept Drift Classification • 2010 - MOA Clustering • 2011 - MOA Graph Mining, Multi-label classification, Twitter Reader, Active Learning • 2014 - MOA Outliers, and Recommender System • 2014 - MOA Concept Drift
  • 35. WEKA • Waikato Environment for Knowledge Analysis • Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java • Released under the GPL • Support for the whole process of experimental data mining • Preparation of input data • Statistical evaluation of learning schemes • Visualization of input data and the result of learning • Used for education, research and applications • Complements “Data Mining” by Witten & Frank & Hall
  • 37. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct.
  • 38. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct.
  • 39. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct.
  • 41. Classification Experimental Setting Evaluation procedures for Data Streams • Holdout • Interleaved Test-Then-Train or Prequential
  • 42. Classification Experimental Setting Data Sources • Random Tree Generator • Random RBF Generator • LED Generator • Waveform Generator • Hyperplane • SEA Generator • STAGGER Generator
  • 43. Classification Experimental Setting Classifiers • Naive Bayes • Decision stumps • Hoeffding Tree • Hoeffding Option Tree • Bagging and Boosting • Random Forests • ADWIN Bagging and Leveraging Bagging • Adaptive Rules • Logistic Regression, SGD
  • 44. RAM-Hours RAM-Hour Every GB of RAM deployed for 1 hour Cloud Computing Rental Cost Options
  • 46. Clustering Experimental Setting Internal measures External measures Gamma Rand statistic C Index Jaccard coefficient Point-Biserial Folkes and Mallow Index Log Likelihood Hubert Γ statistics Dunn’s Index Minkowski score Tau Purity Tau A van Dongen criterion Tau C V-measure Somer’s Gamma Completeness Ratio of Repetition Homogeneity Modified Ratio of Repetition Variation of information Adjusted Ratio of Clustering Mutual information Fagan’s Index Class-based entropy Deviation Index Cluster-based entropy Z-Score Index Precision D Index Recall Silhouette coefficient F-measure
  • 47. Clustering Experimental Setting Clusterers • StreamKM++ • CluStream • ClusTree • Den-Stream • D-Stream • CobWeb
  • 49. Easy Design of a MOA classifier • void resetLearningImpl () • void trainOnInstanceImpl (Instance inst) • double[] getVotesForInstance (Instance i)
  • 50. Easy Design of a MOA clusterer • void resetLearningImpl () • void trainOnInstanceImpl (Instance inst) • Clustering getClusteringResult()
  • 51. Extensions of MOA • Multi-label Classification • Active Learning • Regression • Closed Frequent Graph Mining • Twitter Sentiment Analysis
  • 52. Outline Big Data Stream Analytics MOA: Massive Online Analysis SAMOA: Scalable Advanced Massive Online Analysis
  • 53. SAMOA SAMOA is distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.
  • 56. Apache Storm Stream, Spout, Bolt, Topology
  • 57. SAMOA
  • 58. SAMOA samoa-SPE SAMOA Algorithm and API SPE-adapter S4 Storm other SPEs ML-adapter MOA Other ML frameworks samoa-S4 samoa-storm samoa-other-SPEs
  • 60. SAMOA ML Developer API Processing Item Processor Stream
  • 64. Summary 1 To improve benefits, companies need to satisfy costumers. To do this they need to know • what potential costumers needs • social networks • real-time analytics 2 MOA deals with evolving data streams 3 SAMOA is distributed streaming machine learning