SlideShare a Scribd company logo
1 of 66
Building a Large-Scale, Adaptive
Recommendation Engine with Apache
Flink and Spark
Zoltán Zvara
zoltan.zvara@ilab.sztaki.hu
Gábor Hermann
ghermann@ilab.sztaki.hu
This project has received funding from the European Union’s Horizon 2020
research and innovation program under grant agreement No 688191.
About us
• Institute for Computer Science and Control, Hungarian Academy of
Sciences (MTA SZTAKI)
• Informatics Laboratory
• „Big Data – Momemtum” research group
• „Data Mining and Search” research group
• Research group with strong industry ties
• Ericsson, Rovio, Portugal Telekom, etc.
Agenda
1. Recommendation systems and matrix factorization
2. Batch vs. online
3. Matrix factorization
1. Online
2. Batch + online
4. Solution in Spark & Flink
5. Conclusions
Recommendation systems
Recommendation systems
𝑅
Recommendation with matrix factorization
5
1
3
5
2
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
Zoltán rated Rogue One
with 5 stars
𝑅
Recommendation with matrix factorization
𝑈
𝑈 ∙ 𝐼 ≈ 𝑅
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
Level of action
Level of drama
X factor
0
0
0
0
Latent
factors
Zoltán
Gábor
Rogue One Interstellar
Zoltán rated Rogue One
with 5 stars
𝑅
Recommendation with matrix factorization
𝑈
𝑈 ∙ 𝐼 ≈ 𝑅
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
Level of action
Level of drama
X factor
0
0
0
0
Latent
factors
Zoltán
Gábor
Rogue One Interstellar
min
𝑢∗,𝑖∗
(𝑝,𝑞)∈𝜅 𝑅
𝑟𝑝𝑞 − 𝜇 − 𝑏 𝑝 − 𝑏 𝑞 − 𝑢 𝑝 𝑖 𝑞
2
+
+𝜆
𝑝∈𝜅 𝑈
( 𝑢 𝑝
2
+ 𝑏 𝑝
2
) + 𝜆
𝑞∈𝜅 𝐼
( 𝑖 𝑞
2
+ 𝑏 𝑞
2
)
Zoltán rated Rogue One
with 5 stars
𝑅
Recommendation with matrix factorization
𝑈
𝑈 ∙ 𝐼 ≈ 𝑅
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
Level of action
Level of drama
X factor
?
0
0
0
0
Latent
factors
Zoltán
Gábor
Rogue One Interstellar
Zoltán rated Rogue One
with 5 stars
Would Gábor like Interstellar?
𝑅
Recommendation with matrix factorization
𝑈
𝑈 ∙ 𝐼 ≈ 𝑅
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
Level of action
Level of drama
X factor
?
0
0
0
0
Latent
factors
Zoltán
Gábor
Rogue One Interstellar
Zoltán rated Rogue One
with 5 stars
Would Gábor like Interstellar?
𝑅
Recommendation with matrix factorization
𝑈
𝑈 ∙ 𝐼 ≈ 𝑅
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
Level of action
Level of drama
X factor
?
0
0
0
0
Latent
factors
Zoltán
Gábor
Rogue One Interstellar
Zoltán rated Rogue One
with 5 stars
Would Gábor like Interstellar?
5 4 -4
3
2
5
𝑅
Recommendation with matrix factorization
𝑈
𝑈 ∙ 𝐼 ≈ 𝑅
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
Level of action
Level of drama
X factor
?
0
0
0
0
Latent
factors
Zoltán
Gábor
Rogue One Interstellar
Zoltán rated Rogue One
with 5 stars
Would Gábor like Interstellar?
5 4 -4
3
2
5
3
𝑅
Recommendation with matrix factorization
𝑈
𝑈 ∙ 𝐼 ≈ 𝑅
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
Level of action
Level of drama
X factor
3
0
0
0
0
Latent
factors
Zoltán
Gábor
Rogue One Interstellar
Zoltán rated Rogue One
with 5 stars
Would Gábor like Interstellar?
5 4 -4
3
2
5
3
[user; item; time; rating]
𝑅
Batch training
𝑈
item vector
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
PERSISTENT STORAGE
[user; item; time; rating]
𝑅
Batch training
𝑈
item vector
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
PERSISTENT STORAGE
[user; item; time; rating]
𝑅
Batch training
𝑈
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
PERSISTENT STORAGE
𝑅
Online training
𝑈
item vector
3
2
5
5
3
2
5 -6 -1
5 4 -4
5
1
3
user
vector
5 3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
2 5 4 2 4
𝑅
Online training
𝑈
item vector
3
2
6
5
3
2
5 -6 -2
5 4 -4
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
5 4 2 4
𝑅
Online training
𝑈
item vector
1
3
5
5
3
2
4 -5 -1
5 4 -4
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
5 4 2 4
Batch + online combination
But how to scale?
• Spotify streamed 20 billion hours of music in 2015
• YouTube over a billion users, billions of video views every day
• Use distributed data-analytics frameworks
• How can we combine batch + online?
Apache Spark vs. Apache Flink
𝑅
Distributed online matrix factorization
𝑈
item vector
3
2
6
5
3
2
5 -6 -2
5 4 -4
1
3
user
vector
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
2 5 4 2 4
𝑅
Distributed online matrix factorization
𝑈
item vector
3
2
6
5
3
2
5 -6 -2
5 4 -4
1
3
user
vector
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
5 4 2 4
𝑅
Distributed online matrix factorization
𝑈
item vector
3
2
6
5
3
2
5 -6 -2
5 4 -4
1
3
user
vector
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
5 4 2 4
3
2
6
25 -6 -2
need to co-locate
𝑅
Distributed online matrix factorization
𝑈
item vector
3
2
6
5
3
2
5 -6 -2
5 4 -4
1
3
user
vector
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
5 4 2 4
1
3
5
24 -3 -1
need to co-locate
then update
𝑅
Distributed online matrix factorization
𝑈
item vector
1
3
5
5
3
2
4 -5 -1
5 4 -4
1
3
user
vector
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
[user; item; time; rating]
5 4 2 4
1
3
5
24 -3 -1
need to co-locate
then update
send updates
𝑅
Distributed online matrix factorization
𝑈
item vector
1
3
5
5
3
2
4 -5 -1
5 4 -4
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
5 4 2 4
process two ratings in parallel
𝑅
Distributed online matrix factorization
𝑈
item vector
1
3
5
5
3
2
4 -5 -1
5 4 -4
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
5 4 2 4
process two ratings in parallel
𝑅
Distributed online matrix factorization
𝑈
item vector
1
3
5
5
3
2
4 -5 -1
5 4 -4
5
1
3
user
vector
5
2
3
0
0
0
0
Zoltán
Gábor
Rogue One Interstellar
5 4 2 4
process two ratings in parallel
• Concurrent modification
• Similar problem with batch SGD
• Distributed SGD
(Gemulla et al. 2011)
Online MF in Spark
val ratings: DStream[Rating] = ...
we have our input
Online MF in Spark
val ratings: DStream[Rating] = ...
val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =
we have our input
would like to have output like this
Online MF in Spark
val ratings: DStream[Rating] = ...
val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =
we have our input
would like to have output like this
updateStateByKey?
Online MF in Spark
val ratings: DStream[Rating] = ...
val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =
we have our input
would like to have output like this
updateStateByKey?
Use batch DSGD for online updates!
(discussion issue SPARK-6407)
Online MF in Spark
val ratings: DStream[Rating] = ...
var users: RDD[(UserId, Vector)] = ...
var items: RDD[(ItemId, Vector)] = ...
val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =
we have our input
would like to have output like this
need to represent factor matrices
Online MF in Spark
val ratings: DStream[Rating] = ...
var users: RDD[(UserId, Vector)] = ...
var items: RDD[(ItemId, Vector)] = ...
val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =
ratings.transform { (rs: RDD[Rating]) =>
we have our input
would like to have output like this
use transform to allow RDD operations
need to represent factor matrices
Online MF in Spark
val ratings: DStream[Rating] = ...
var users: RDD[(UserId, Vector)] = ...
var items: RDD[(ItemId, Vector)] = ...
val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =
ratings.transform { (rs: RDD[Rating]) =>
val updates = batchDSGD(rs, users, items)
we have our input
would like to have output like this
use transform to allow RDD operations
need to represent factor matrices
compute updates
Online MF in Spark
val ratings: DStream[Rating] = ...
var users: RDD[(UserId, Vector)] = ...
var items: RDD[(ItemId, Vector)] = ...
val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] =
ratings.transform { (rs: RDD[Rating]) =>
val updates = batchDSGD(rs, users, items)
users = applyUserUpdates(users, updates)
items = applyItemUpdates(items, updates)
updates
}
we have our input
would like to have output like this
use transform to allow RDD operations
need to represent factor matrices
compute updates
apply updates to get updated matrices
Online MF in Spark
• Performance decreases by time
Online MF in Spark
• Performance decreases by time
• Problem: tracking lineage graph
• Solution: use checkpointing
Online MF in Spark
• Performance decreases by time
• Problem: tracking lineage graph
• Solution: use checkpointing
Online MF in Flink
user
vectors
item
vectors
long-running operators with state
Online MF in Flink
user
vectors
item
vectors
long-running operators with state
backward edge in dataflow
(stream loop)
Online MF in Flink
1. rating event
2
user
vectors
item
vectors
Online MF in Flink
1. rating event 2. rating event & user vector
25 -6 -22
user
vectors
item
vectors
Online MF in Flink
1. rating event 2. rating event & user vector 25 -6 -2
3
2
6
25 -6 -22
user
vectors
item
vectors
Online MF in Flink
1. rating event 2. rating event & user vector
3. apply update
2
25 -6 -22
user
vectors
item
vectors
4 -3 -1
1
3
5
Online MF in Flink
1. rating event 2. rating event & user vector
4. user vector update
3. apply update
2
25 -6 -22
user
vectors
item
vectors
4 -3 -1
1
3
5
4 -3 -1
Online MF in Flink
WARNING!
Loops API (iterative streams) not mature enough yet,
but there is ongoing effort
1. rating event 2. rating event & user vector
4. user vector update
3. apply update
2
25 -6 -22
user
vectors
item
vectors
4 -3 -1
1
3
5
4 -3 -1
Online MF: Spark vs. Flink
Combining batch + online in Spark
• Easy: can run batch training periodically on whole dataset
Combining batch + online in Flink
• Combining Flink Batch API with Streaming API
• Could only do it with an external system
Combining batch + online in Flink
• Combining Flink Batch API with Streaming API
• Could only do it with an external system
• Batch with Streaming API
• Feasible!
• Asynchronous training
(Schelter et al. 2014)
Combining batch + online in Flink
• Combining Flink Batch API with Streaming API
• Could only do it with an external system
• Batch with Streaming API
• Feasible!
• Asynchronous training
(Schelter et al. 2014)
• Batch + online
• Both with Streaming API
• Share matrices in common state
• Parameter Server approach
Lessons learned
Lessons learned
Flink Spark
Implementation More complex solution,
harder to implement
Easier to use:
could use batch for streaming
Lessons learned
Flink Spark
Implementation More complex solution,
harder to implement
Easier to use:
could use batch for streaming
Generality Can express finer grained updates Updates limited by mini-batch
Lessons learned
Flink Spark
Implementation More complex solution,
harder to implement
Easier to use:
could use batch for streaming
Generality Can express finer grained updates Updates limited by mini-batch
Code stability Some parts are not mature enough
(e.g. Loops API)
More mature
Lessons learned
Flink Spark
Implementation More complex solution,
harder to implement
Easier to use:
could use batch for streaming
Generality Can express finer grained updates Updates limited by mini-batch
Code stability Some parts are not mature enough
(e.g. Loops API)
More mature
Performance Optimal for online learning,
can perform well on batch
Not always optimal for online
learning (e.g. online MF)
Lessons learned
Flink Spark
Implementation More complex solution,
harder to implement
Easier to use:
could use batch for streaming
Generality Can express finer grained updates Updates limited by mini-batch
Code stability Some parts are not mature enough
(e.g. Loops API)
More mature
Performance Optimal for online learning,
can perform well on batch
Not always optimal for online
learning (e.g. online MF)
Handling
data skew
Currently hard to relocate
long-running operators
Periodic scheduling enables easier
modification of partitioning
Lessons learned
Flink Spark
Implementation More complex solution,
harder to implement
Easier to use:
could use batch for streaming
Generality Can express finer grained updates Updates limited by mini-batch
Code stability Some parts are not mature enough
(e.g. Loops API)
More mature
Performance Optimal for online learning,
can perform well on batch
Not always optimal for online
learning (e.g. online MF)
Handling
data skew
Currently hard to relocate
long-running operators
Periodic scheduling enables easier
modification of partitioning
Machine learning Non-complete ML library
and other efforts for ML in Flink
Spark MLlib is mature
and used in production
Thank you for your attention
Zoltán Zvara
zoltan.zvara@ilab.sztaki.hu
Gábor Hermann
ghermann@ilab.sztaki.hu
Source code:
https://github.com/gaborhermann/large-scale-recommendation
Measurements
Batch + online combination
• 30M music listening Last.fm dataset
• Weekly batch training
• Evaluation weekly average
• on every incoming listening
• Around 45.000 users
Online MF: Spark vs. Flink
• 30M music listening Last.fm dataset read from 12 Kafka partitions
• Spark batch duration: 5 sec
• Time of processing X ratings
• DSGD algorithm
• Using 6 nodes, 4 cores each
• Spark 2.1.0, Flink 1.2.0
Batch on Flink Streaming
• Movielens 1M movie rating dataset
• Using 6 nodes, 4 cores each

More Related Content

What's hot

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Zalando Technology
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaAltinity Ltd
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka confluent
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinMarko Rodriguez
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 

What's hot (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Apache flink
Apache flinkApache flink
Apache flink
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka Matching the Scale at Tinder with Kafka
Matching the Scale at Tinder with Kafka
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
User behavior analytics
User behavior analyticsUser behavior analytics
User behavior analytics
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 

Viewers also liked

Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Food Recommendation System Using Clustering Analysis for Diabetic patients
Food Recommendation System Using Clustering Analysis for Diabetic patientsFood Recommendation System Using Clustering Analysis for Diabetic patients
Food Recommendation System Using Clustering Analysis for Diabetic patientsMaiyaporn Phanich
 
iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...
iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...
iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...Spark Summit
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance ObservationsAdam Roberts
 
Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014Laura Po
 
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Edureka!
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsSmitha Mysore Lokesh
 
Developing a Movie recommendation Engine with Spark
Developing a Movie recommendation Engine with SparkDeveloping a Movie recommendation Engine with Spark
Developing a Movie recommendation Engine with SparkEdureka!
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 

Viewers also liked (12)

Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
Food Recommendation System Using Clustering Analysis for Diabetic patients
Food Recommendation System Using Clustering Analysis for Diabetic patientsFood Recommendation System Using Clustering Analysis for Diabetic patients
Food Recommendation System Using Clustering Analysis for Diabetic patients
 
iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...
iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...
iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyu...
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014Comparing topic models for a movie recommendation system webist2014
Comparing topic models for a movie recommendation system webist2014
 
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIs
 
Developing a Movie recommendation Engine with Spark
Developing a Movie recommendation Engine with SparkDeveloping a Movie recommendation Engine with Spark
Developing a Movie recommendation Engine with Spark
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Similar to Building Large-Scale Recommendation Engines with Apache Flink & Spark

PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...
Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...
Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...Flink Forward
 
ML Zoomcamp 1.8 - Linear Algebra Refresher
ML Zoomcamp 1.8 - Linear Algebra RefresherML Zoomcamp 1.8 - Linear Algebra Refresher
ML Zoomcamp 1.8 - Linear Algebra RefresherAlexey Grigorev
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...Big Data Spain
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
An Inter-Wiki Page Data Processor for a M2M System @Matsue, 1sep., Eskm2013
An Inter-Wiki Page Data Processor for a M2M System  @Matsue, 1sep., Eskm2013An Inter-Wiki Page Data Processor for a M2M System  @Matsue, 1sep., Eskm2013
An Inter-Wiki Page Data Processor for a M2M System @Matsue, 1sep., Eskm2013Takashi Yamanoue
 
Cypher to SQL online mapper
Cypher to SQL online mapperCypher to SQL online mapper
Cypher to SQL online mapperAl Zindiq
 
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation Probo.ci Drupal 4 Gov Devops 1/2 day Presentation
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation Zivtech, LLC
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxJadna Almeida
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxJadna Almeida
 
Stateful patterns in Azure Functions
Stateful patterns in Azure FunctionsStateful patterns in Azure Functions
Stateful patterns in Azure FunctionsMassimo Bonanni
 
Graph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesGraph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesRon Barabash
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
 
Path Analyzer X-Files: How We Built the Ultimate xDB Forensic Tool
Path Analyzer X-Files: How We Built the Ultimate xDB Forensic ToolPath Analyzer X-Files: How We Built the Ultimate xDB Forensic Tool
Path Analyzer X-Files: How We Built the Ultimate xDB Forensic ToolSitecore
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applicationsPaweł Żurowski
 

Similar to Building Large-Scale Recommendation Engines with Apache Flink & Spark (20)

PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...
Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...
Flink Forward Berlin 2017: Daniel Berecz, Gabor Hermann - Parameter Server on...
 
ML Zoomcamp 1.8 - Linear Algebra Refresher
ML Zoomcamp 1.8 - Linear Algebra RefresherML Zoomcamp 1.8 - Linear Algebra Refresher
ML Zoomcamp 1.8 - Linear Algebra Refresher
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Angular 5
Angular 5Angular 5
Angular 5
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
An Inter-Wiki Page Data Processor for a M2M System @Matsue, 1sep., Eskm2013
An Inter-Wiki Page Data Processor for a M2M System  @Matsue, 1sep., Eskm2013An Inter-Wiki Page Data Processor for a M2M System  @Matsue, 1sep., Eskm2013
An Inter-Wiki Page Data Processor for a M2M System @Matsue, 1sep., Eskm2013
 
Cypher to SQL online mapper
Cypher to SQL online mapperCypher to SQL online mapper
Cypher to SQL online mapper
 
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation Probo.ci Drupal 4 Gov Devops 1/2 day Presentation
Probo.ci Drupal 4 Gov Devops 1/2 day Presentation
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Stateful patterns in Azure Functions
Stateful patterns in Azure FunctionsStateful patterns in Azure Functions
Stateful patterns in Azure Functions
 
Software-defined Networks as Databases
Software-defined Networks as DatabasesSoftware-defined Networks as Databases
Software-defined Networks as Databases
 
Graph processing at scale using spark & graph frames
Graph processing at scale using spark & graph framesGraph processing at scale using spark & graph frames
Graph processing at scale using spark & graph frames
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
 
Path Analyzer X-Files: How We Built the Ultimate xDB Forensic Tool
Path Analyzer X-Files: How We Built the Ultimate xDB Forensic ToolPath Analyzer X-Files: How We Built the Ultimate xDB Forensic Tool
Path Analyzer X-Files: How We Built the Ultimate xDB Forensic Tool
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applications
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Building Large-Scale Recommendation Engines with Apache Flink & Spark

  • 1. Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark Zoltán Zvara zoltan.zvara@ilab.sztaki.hu Gábor Hermann ghermann@ilab.sztaki.hu This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 688191.
  • 2. About us • Institute for Computer Science and Control, Hungarian Academy of Sciences (MTA SZTAKI) • Informatics Laboratory • „Big Data – Momemtum” research group • „Data Mining and Search” research group • Research group with strong industry ties • Ericsson, Rovio, Portugal Telekom, etc.
  • 3. Agenda 1. Recommendation systems and matrix factorization 2. Batch vs. online 3. Matrix factorization 1. Online 2. Batch + online 4. Solution in Spark & Flink 5. Conclusions
  • 6. 𝑅 Recommendation with matrix factorization 5 1 3 5 2 0 0 0 0 Zoltán Gábor Rogue One Interstellar Zoltán rated Rogue One with 5 stars
  • 7. 𝑅 Recommendation with matrix factorization 𝑈 𝑈 ∙ 𝐼 ≈ 𝑅 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 Level of action Level of drama X factor 0 0 0 0 Latent factors Zoltán Gábor Rogue One Interstellar Zoltán rated Rogue One with 5 stars
  • 8. 𝑅 Recommendation with matrix factorization 𝑈 𝑈 ∙ 𝐼 ≈ 𝑅 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 Level of action Level of drama X factor 0 0 0 0 Latent factors Zoltán Gábor Rogue One Interstellar min 𝑢∗,𝑖∗ (𝑝,𝑞)∈𝜅 𝑅 𝑟𝑝𝑞 − 𝜇 − 𝑏 𝑝 − 𝑏 𝑞 − 𝑢 𝑝 𝑖 𝑞 2 + +𝜆 𝑝∈𝜅 𝑈 ( 𝑢 𝑝 2 + 𝑏 𝑝 2 ) + 𝜆 𝑞∈𝜅 𝐼 ( 𝑖 𝑞 2 + 𝑏 𝑞 2 ) Zoltán rated Rogue One with 5 stars
  • 9. 𝑅 Recommendation with matrix factorization 𝑈 𝑈 ∙ 𝐼 ≈ 𝑅 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 Level of action Level of drama X factor ? 0 0 0 0 Latent factors Zoltán Gábor Rogue One Interstellar Zoltán rated Rogue One with 5 stars Would Gábor like Interstellar?
  • 10. 𝑅 Recommendation with matrix factorization 𝑈 𝑈 ∙ 𝐼 ≈ 𝑅 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 Level of action Level of drama X factor ? 0 0 0 0 Latent factors Zoltán Gábor Rogue One Interstellar Zoltán rated Rogue One with 5 stars Would Gábor like Interstellar?
  • 11. 𝑅 Recommendation with matrix factorization 𝑈 𝑈 ∙ 𝐼 ≈ 𝑅 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 Level of action Level of drama X factor ? 0 0 0 0 Latent factors Zoltán Gábor Rogue One Interstellar Zoltán rated Rogue One with 5 stars Would Gábor like Interstellar? 5 4 -4 3 2 5
  • 12. 𝑅 Recommendation with matrix factorization 𝑈 𝑈 ∙ 𝐼 ≈ 𝑅 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 Level of action Level of drama X factor ? 0 0 0 0 Latent factors Zoltán Gábor Rogue One Interstellar Zoltán rated Rogue One with 5 stars Would Gábor like Interstellar? 5 4 -4 3 2 5 3
  • 13. 𝑅 Recommendation with matrix factorization 𝑈 𝑈 ∙ 𝐼 ≈ 𝑅 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 Level of action Level of drama X factor 3 0 0 0 0 Latent factors Zoltán Gábor Rogue One Interstellar Zoltán rated Rogue One with 5 stars Would Gábor like Interstellar? 5 4 -4 3 2 5 3
  • 14. [user; item; time; rating] 𝑅 Batch training 𝑈 item vector 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar PERSISTENT STORAGE
  • 15. [user; item; time; rating] 𝑅 Batch training 𝑈 item vector 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar PERSISTENT STORAGE
  • 16. [user; item; time; rating] 𝑅 Batch training 𝑈 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar PERSISTENT STORAGE
  • 17. 𝑅 Online training 𝑈 item vector 3 2 5 5 3 2 5 -6 -1 5 4 -4 5 1 3 user vector 5 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 2 5 4 2 4
  • 18. 𝑅 Online training 𝑈 item vector 3 2 6 5 3 2 5 -6 -2 5 4 -4 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 5 4 2 4
  • 19. 𝑅 Online training 𝑈 item vector 1 3 5 5 3 2 4 -5 -1 5 4 -4 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 5 4 2 4
  • 20. Batch + online combination
  • 21. But how to scale? • Spotify streamed 20 billion hours of music in 2015 • YouTube over a billion users, billions of video views every day • Use distributed data-analytics frameworks • How can we combine batch + online?
  • 22. Apache Spark vs. Apache Flink
  • 23. 𝑅 Distributed online matrix factorization 𝑈 item vector 3 2 6 5 3 2 5 -6 -2 5 4 -4 1 3 user vector 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 2 5 4 2 4
  • 24. 𝑅 Distributed online matrix factorization 𝑈 item vector 3 2 6 5 3 2 5 -6 -2 5 4 -4 1 3 user vector 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 5 4 2 4
  • 25. 𝑅 Distributed online matrix factorization 𝑈 item vector 3 2 6 5 3 2 5 -6 -2 5 4 -4 1 3 user vector 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 5 4 2 4 3 2 6 25 -6 -2 need to co-locate
  • 26. 𝑅 Distributed online matrix factorization 𝑈 item vector 3 2 6 5 3 2 5 -6 -2 5 4 -4 1 3 user vector 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 5 4 2 4 1 3 5 24 -3 -1 need to co-locate then update
  • 27. 𝑅 Distributed online matrix factorization 𝑈 item vector 1 3 5 5 3 2 4 -5 -1 5 4 -4 1 3 user vector 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar [user; item; time; rating] 5 4 2 4 1 3 5 24 -3 -1 need to co-locate then update send updates
  • 28. 𝑅 Distributed online matrix factorization 𝑈 item vector 1 3 5 5 3 2 4 -5 -1 5 4 -4 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar 5 4 2 4 process two ratings in parallel
  • 29. 𝑅 Distributed online matrix factorization 𝑈 item vector 1 3 5 5 3 2 4 -5 -1 5 4 -4 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar 5 4 2 4 process two ratings in parallel
  • 30. 𝑅 Distributed online matrix factorization 𝑈 item vector 1 3 5 5 3 2 4 -5 -1 5 4 -4 5 1 3 user vector 5 2 3 0 0 0 0 Zoltán Gábor Rogue One Interstellar 5 4 2 4 process two ratings in parallel • Concurrent modification • Similar problem with batch SGD • Distributed SGD (Gemulla et al. 2011)
  • 31. Online MF in Spark val ratings: DStream[Rating] = ... we have our input
  • 32. Online MF in Spark val ratings: DStream[Rating] = ... val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = we have our input would like to have output like this
  • 33. Online MF in Spark val ratings: DStream[Rating] = ... val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = we have our input would like to have output like this updateStateByKey?
  • 34. Online MF in Spark val ratings: DStream[Rating] = ... val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = we have our input would like to have output like this updateStateByKey? Use batch DSGD for online updates! (discussion issue SPARK-6407)
  • 35. Online MF in Spark val ratings: DStream[Rating] = ... var users: RDD[(UserId, Vector)] = ... var items: RDD[(ItemId, Vector)] = ... val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = we have our input would like to have output like this need to represent factor matrices
  • 36. Online MF in Spark val ratings: DStream[Rating] = ... var users: RDD[(UserId, Vector)] = ... var items: RDD[(ItemId, Vector)] = ... val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = ratings.transform { (rs: RDD[Rating]) => we have our input would like to have output like this use transform to allow RDD operations need to represent factor matrices
  • 37. Online MF in Spark val ratings: DStream[Rating] = ... var users: RDD[(UserId, Vector)] = ... var items: RDD[(ItemId, Vector)] = ... val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = ratings.transform { (rs: RDD[Rating]) => val updates = batchDSGD(rs, users, items) we have our input would like to have output like this use transform to allow RDD operations need to represent factor matrices compute updates
  • 38. Online MF in Spark val ratings: DStream[Rating] = ... var users: RDD[(UserId, Vector)] = ... var items: RDD[(ItemId, Vector)] = ... val updateStream: DStream[Either[(UserId, Vector), (ItemId, Vector)]] = ratings.transform { (rs: RDD[Rating]) => val updates = batchDSGD(rs, users, items) users = applyUserUpdates(users, updates) items = applyItemUpdates(items, updates) updates } we have our input would like to have output like this use transform to allow RDD operations need to represent factor matrices compute updates apply updates to get updated matrices
  • 39. Online MF in Spark • Performance decreases by time
  • 40. Online MF in Spark • Performance decreases by time • Problem: tracking lineage graph • Solution: use checkpointing
  • 41. Online MF in Spark • Performance decreases by time • Problem: tracking lineage graph • Solution: use checkpointing
  • 42. Online MF in Flink user vectors item vectors long-running operators with state
  • 43. Online MF in Flink user vectors item vectors long-running operators with state backward edge in dataflow (stream loop)
  • 44. Online MF in Flink 1. rating event 2 user vectors item vectors
  • 45. Online MF in Flink 1. rating event 2. rating event & user vector 25 -6 -22 user vectors item vectors
  • 46. Online MF in Flink 1. rating event 2. rating event & user vector 25 -6 -2 3 2 6 25 -6 -22 user vectors item vectors
  • 47. Online MF in Flink 1. rating event 2. rating event & user vector 3. apply update 2 25 -6 -22 user vectors item vectors 4 -3 -1 1 3 5
  • 48. Online MF in Flink 1. rating event 2. rating event & user vector 4. user vector update 3. apply update 2 25 -6 -22 user vectors item vectors 4 -3 -1 1 3 5 4 -3 -1
  • 49. Online MF in Flink WARNING! Loops API (iterative streams) not mature enough yet, but there is ongoing effort 1. rating event 2. rating event & user vector 4. user vector update 3. apply update 2 25 -6 -22 user vectors item vectors 4 -3 -1 1 3 5 4 -3 -1
  • 50. Online MF: Spark vs. Flink
  • 51. Combining batch + online in Spark • Easy: can run batch training periodically on whole dataset
  • 52. Combining batch + online in Flink • Combining Flink Batch API with Streaming API • Could only do it with an external system
  • 53. Combining batch + online in Flink • Combining Flink Batch API with Streaming API • Could only do it with an external system • Batch with Streaming API • Feasible! • Asynchronous training (Schelter et al. 2014)
  • 54. Combining batch + online in Flink • Combining Flink Batch API with Streaming API • Could only do it with an external system • Batch with Streaming API • Feasible! • Asynchronous training (Schelter et al. 2014) • Batch + online • Both with Streaming API • Share matrices in common state • Parameter Server approach
  • 56. Lessons learned Flink Spark Implementation More complex solution, harder to implement Easier to use: could use batch for streaming
  • 57. Lessons learned Flink Spark Implementation More complex solution, harder to implement Easier to use: could use batch for streaming Generality Can express finer grained updates Updates limited by mini-batch
  • 58. Lessons learned Flink Spark Implementation More complex solution, harder to implement Easier to use: could use batch for streaming Generality Can express finer grained updates Updates limited by mini-batch Code stability Some parts are not mature enough (e.g. Loops API) More mature
  • 59. Lessons learned Flink Spark Implementation More complex solution, harder to implement Easier to use: could use batch for streaming Generality Can express finer grained updates Updates limited by mini-batch Code stability Some parts are not mature enough (e.g. Loops API) More mature Performance Optimal for online learning, can perform well on batch Not always optimal for online learning (e.g. online MF)
  • 60. Lessons learned Flink Spark Implementation More complex solution, harder to implement Easier to use: could use batch for streaming Generality Can express finer grained updates Updates limited by mini-batch Code stability Some parts are not mature enough (e.g. Loops API) More mature Performance Optimal for online learning, can perform well on batch Not always optimal for online learning (e.g. online MF) Handling data skew Currently hard to relocate long-running operators Periodic scheduling enables easier modification of partitioning
  • 61. Lessons learned Flink Spark Implementation More complex solution, harder to implement Easier to use: could use batch for streaming Generality Can express finer grained updates Updates limited by mini-batch Code stability Some parts are not mature enough (e.g. Loops API) More mature Performance Optimal for online learning, can perform well on batch Not always optimal for online learning (e.g. online MF) Handling data skew Currently hard to relocate long-running operators Periodic scheduling enables easier modification of partitioning Machine learning Non-complete ML library and other efforts for ML in Flink Spark MLlib is mature and used in production
  • 62. Thank you for your attention Zoltán Zvara zoltan.zvara@ilab.sztaki.hu Gábor Hermann ghermann@ilab.sztaki.hu Source code: https://github.com/gaborhermann/large-scale-recommendation
  • 64. Batch + online combination • 30M music listening Last.fm dataset • Weekly batch training • Evaluation weekly average • on every incoming listening • Around 45.000 users
  • 65. Online MF: Spark vs. Flink • 30M music listening Last.fm dataset read from 12 Kafka partitions • Spark batch duration: 5 sec • Time of processing X ratings • DSGD algorithm • Using 6 nodes, 4 cores each • Spark 2.1.0, Flink 1.2.0
  • 66. Batch on Flink Streaming • Movielens 1M movie rating dataset • Using 6 nodes, 4 cores each

Editor's Notes

  1. Say that we focus on comparing the two systems for this use-case.
  2. Say that we focus on comparing the two systems for this use-case.
  3. Say that we focus on comparing the two systems for this use-case.
  4. Ratings in a sparse matrix
  5. Story: turned out it is worth to combine these two? Message: batch + online is better than batch alone, or online alone. DCG: Discounted Cumulative Gain, measures ranking quality, higher-better https://en.wikipedia.org/wiki/Discounted_cumulative_gain
  6. Sources: Spotify 2015 data https://techcrunch.com/2015/12/01/spotify-claims-streaming-music-throne-worldwide-but-pandora-is-still-top-service-in-u-s/?ncid=rss#.uuccs9:VA8w YT https://www.youtube.com/yt/press/en-GB/statistics.html
  7. Vs. mini-batch. Send records without global synchronization.
  8. Vs. mini-batch. Send records without global synchronization.
  9. TODO: 4 dia „animalas”
  10. TODO: 4 dia „animalas”
  11. TODO: 4 dia „animalas”
  12. TODO: 4 dia „animalas”
  13. TODO: 4 dia „animalas”
  14. TODO: 4 dia „animalas”