SlideShare a Scribd company logo
1 of 118
Download to read offline
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
advancedspark.com
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Who Am I?
2
Streaming Data Engineer
Netflix OSS Committer
Data Solutions Engineer
Apache Contributor
Principal Data Solutions Engineer
IBM Technology Center
Meetup Organizer
Advanced Apache Meetup
Book Author
Advanced .
Due 2016
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Advanced Apache Spark Meetup
http://advancedspark.com
Meetup Metrics
Top 10 Most-active Spark Meetup!
3200+ Members in just 9 mos!!
3700+ Docker downloads (demos)
Meetup Mission
Code deep-dive into Spark and related open source projects
Surface key patterns and idioms
Focus on distributed systems, scale, and performance
3
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Live, Interactive Demo!
Audience Participation Required!!
Cell Phone Compatible!!!
demo.advancedspark.com4
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
http://demo.advancedspark.com
End User ->
ElasticSearch ->
Spark ML ->
Data Scientist ->
5
<- Kafka
<- Spark
Streaming
<- Cassandra,
Redis
<- Zeppelin,
iPython
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Presentation Outline
① Scaling
② Similarities
③ Recommendations
④ Approximations
⑤ Netflix Recommendations
6
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Scaling with Parallelism
7
Peter
O(log n)
O(log n)
Worker
Nodes
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Parallelism with Composability
Worker 1 Worker 2
Max (a max b max c max d) == (a max b) max (c max d)
Set Union (a U b U c U d) == (a U b) U (c U d)
Addition (a + b + c + d) == (a + b) + (c + d)
Multiply (a * b * c * d) == (a * b) * (c * d)
8
What about Division and Average?
Collect at Driver
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
What about Division?
Division (a / b / c / d) != (a / b) / (c / d)
(3 / 4 / 7 / 8) != (3 / 4) / (7 / 8)
(((3 / 4) / 7) / 8) != ((3 * 8) / (4 * 7))
0.134 != 0.857
9
What were the Egyptians thinking?!
Not Composable
“Divide like
an Egyptian”
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
What about Average?
Overall AVG
(3, 1) (3 + 5 + 5 + 7) 20
+ (5, 1) == -------------------- == --- == 5
+ (5, 1) (1 + 1 + 1 + 1) 4
+ (7, 1)
10
values
counts
Pairwise AVG
(3 + 5) (5 + 7) 8 12 20
------- + ------- == --- + --- == --- == 10 != 5
2 2 2 2 2
Divide, Add, Divide?
Not Composable
Single-Node Divide at the End?
Doesn’t need to be Composable!
AVG (3, 5, 5, 7) == 5
Add, Add, Add?
Composable!
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Presentation Outline
① Scaling
② Similarities
③ Recommendations
④ Approximations
⑤ Netflix Recommendations
11
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Similarities
12
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Euclidean Similarity
Exists in Euclidean, flat space
Based on Euclidean distance
Linear measure
Bias towards magnitude
13
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Cosine Similarity
Angular measure
Adjusts for Euclidean magnitude bias
Normalize to unit vectors in all dimensions
Used with real-valued vectors (versus binary)
14
org.jblas.
DoubleMatrix
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Jaccard Similarity
Set similarity measurement
Set intersection / set union
Bias towards popularity
Works with binary vectors
15
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Log Likelihood Similarity
Adjusts for popularity bias
Netflix “Shawshank” problem
16
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Word Similarity
Edit Distance
Misspellings and autocorrect
Word2Vec
Similar words are defined by similar contexts in vector space
17
English Spanish
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Find Synonyms with Word2Vec
18
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Find Synonyms using Word2Vec
19
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Document Similarity
TF/IDF
Term Freq / Inverse Document Freq
Used by most search engines
Doc2Vec
Similar documents are determined by similar contexts
20
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bonus! Text Rank Document Summary
Text Rank (aka Sentence Rank)
Surface summary sentences
TF/IDF + Similarity Graph + PageRank
Most similar sentence to all other sentences
TF/IDF + Similarity Graph
Most influential sentences
PageRank
21
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Similarity Pathways (Recommendations)
Best recommendations for 2 (or more) people
“You like Max Max. I like Message in a Bottle.
We might like a movie similar to both.”
Item-to-Item Similarity Graph + Dijkstra Heaviest Path
22
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Similarity Pathway for Movie Recommendations
23
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Load Movies with Tags into DataFrame
24
My
Choice
Their
Choice
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Item-to-Item Tag Jaccard Similarity
Based on Tags
25
Calculate Jaccard Similarity
(Tag Set Similarity)
Must be Above the Given
Jaccard Similarity Threshold
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Item-to-Item Tag Similarity Graph
26
Edge Weights
==
Jaccard Similarity
(Based on Tag Sets)
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Use Dijkstra to Find Heaviest Pathway
27
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Calculating Exact Similarity
Brute-Force Similarity
Cartesian Product
O(n^2) shuffle and compute
aka. All-pairs, Pair-wise,
Similarity Join
28
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Calculating Approximate Similarity
Goal: Reduce Shuffle
Approximate Similarity
Sampling
Bucketing or Clustering
Ignore low-similarity probability
Locality Sensitive Hashing
Twitter Algebird MinHash
29
Bucket
By Genre
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Presentation Outline
① Scaling
② Similarities
③ Recommendations
④ Approximations
① Netflix Recommendations
30
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Recommendations
31
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Basic Terminology
User: User seeking recommendations
Item: Item being recommended
Explicit User Feedback: user knows they are rating or liking, can choose to dislike
Implicit User Feedback: user not explicitly aware, cannot dislike (click, hover, etc)
Instances: Rows of user feedback/input data
Overfitting: Training a model too closely to the training data & hyperparameters
Hold Out Split: Holding out some of the instances to avoid overfitting
Features: Columns of instance rows (of feedback/input data)
Cold Start Problem: Not enough data to personalize (new)
Hyperparameter: Model-specific config knobs for tuning (tree depth, iterations)
Model Evaluation: Compare predictions to actual values of hold out split
Feature Engineering: Modify, reduce, combine features
Loss Function: Function we’re trying to minimize such as least-squared error for Linear Regression
Cross Entropy: Loss function used for classification algorithms such as Logistic Regression
Optimizer: Technique to optimize loss function such as Stochastic Gradient Descent (SGD)
32
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Stochastic Gradient Descent (SGD)
Optimizes Loss Function
Least Squared Error b/w predicted and actual value
Cross Entropy Log Likelihood b/w predicted and actual probability
33
2-Dimensional 3-Dimensional
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Features
Binary: True or False
Numeric Discrete: Integers
Numeric: Real Values
Binning: Convert Continuous into Discrete (Time of Day->Morning, Afternoon)
Categorical Ordinal: Size (Small->Medium->Large), Ratings (1->5)
Categorical Nominal: Independent, Favorite Sports Teams, Dating Spots
Temporal: Time-based, Time of Day, Binge Viewing
Text: Movie Titles, Genres, Tags, Reviews (Tokenize, Stop Words, Stemming)
Media: Images, Audio, Video
Geographic: (Longitude, Latitude), Geohash
Latent: Hidden Features within Data (Collaborative Filtering)
Derived: Age of Movie, Duration of User Subscription
34
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Feature Engineering
Dimension Reduction
Reduce number of features in feature space
Principle Component Analysis (PCA)
Find principle features that best describe data variance
Peel dimensional layers back
One-Hot Encoding
Convert nominal categorical feature values into 0’s and 1’s
Remove any numerical relationship between categories
Bears -> 1 Bears -> [1.0, 0.0, 0.0]
49’ers -> 2 --> 49’ers -> [0.0, 1.0, 0.0]
Steelers-> 3 Steelers-> [0.0, 0.0, 1.0]
35
Convert Each Item
to Binary Vector
with Single 1.0 Column
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Feature Normalization & Standardization
Goal
Scale features to standard size
Prevent boundless features
Helps avoid overfitting
Required by many ML algos
Normalize Features
Calculate L1 (or L2, etc) norm, then divide into each element
Standardize Features
Apply standard normal transformation (mean->0, stddev->1)
org.apache.spark.ml.feature.[Normalizer, StandardScaler]
36
http://www.mathsisfun.com/data/standard-normal-distribution.html
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Non-Personalized Recommendations
37
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Cold Start Problem
“Cold Start” problem
New user, don’t know their preferences, must show something!
Movies with highest-rated actors
Top K aggregations
Facebook social graph
Friend-based recommendations
Most desirable singles
PageRank of likes and dislikes
38
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
GraphFrame PageRank
39
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Example: Dating Site “Like” Graph
40
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
PageRank of Top Influencers
41
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Personalized Recommendations
42
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Personalized PageRank
43
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Personalized PageRank: Outbound Links
44
0.15 = (1 - 0.85 “Damping Factor”)
85% Probability: Choose Among Outbound Network
15% Probability: Choose Self or Random
85% Among
Outbound
Network
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Personalized PageRank: No Outbound
45
0.15 = (1 - 0.85 “Damping Factor”)
85% Probability: Choose Among Outbound Network
15% Probability: Choose Self or Random
85% Among
No
Outbound
Network!!
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
User-to-User Clustering
User Similarity
Time-based
Pattern of viewing (binge or casual)
Time of viewing (am or pm)
Ratings-based
Content ratings or number of views
Average rating relative to others (critical or lenient)
Search-based
Search terms
46
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Item-to-Item Clustering
Item Similarity
Profile text (TF/IDF, Word2Vec, NLP)
Categories, tags, interests (Jaccard Similarity, LSH)
Images, facial structures (Neural Nets, Eigenfaces)
Dating Site Example…
47
Cluster Similar Eigen-facesCluster Similar Profiles Cluster Similar Categories
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bonus: NLP Conversation Starter Bot
48
“If your responses to my generic opening
lines are positive, I may read your profile.”
Spark ML, Stanford CoreNLP,
TF/IDF, DecisionTrees, Sentiment
http://crockpotveggies.com/2015/02/09/automating-tinder-with-eigenfaces.html
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bonus: Demo!
Spark + Stanford CoreNLP Sentiment Analysis
49
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bonus: Top 100 Country Song Sentiment
50
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bonus: Surprising Results…?!
51
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Item-to-Item Based Recommendations
Based on Metadata: Genre, Description, Cast, City
52
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Item-to-Item-based Recommendations
One-Hot Encoding + K-Means Clustering
53
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
One-Hot Encode Tag Feature Vectors
54
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Cluster Movie Tag Feature Vectors
55
Hyperparameter
Tuning
(K Clusters?)
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Analyze Movie Tag Clusters
56
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
User-to-Item Collaborative Filtering
Matrix Factorization
① Factor the large matrix (left) into 2 smaller matrices (right)
② Lower-rank matrices approximate original when multiplied
③ Fill in the missing values of the large matrix
④ Surface k (rank) latent features from user-item interactions
57
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Item-to-Item Collaborative Filtering
Famous Amazon Paper circa 2003
Problem
As users grew, user-to-item collaborative filtering didn’t scale
Solution
Item-to-item similarity, nearest neighbors
Offline (Batch)
Generate itemId->List[userId] vectors
Online (Real-time)
From cart, recommend nearest-neighbors in vector space
58
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Collaborative Filtering-based Recommendations
59
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Fitting the Matrix Factorization Model
60
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Show ItemFactors Matrix from ALS
61
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Show UserFactors Matrix from ALS
62
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Generating Individual Recommendations
63
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Generating Batch Recommendations
64
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Clustering + Collaborative Filtering Recs
Cluster matrix output from Matrix Factorization
Latent features derived from user-item interaction
Item-to-Item Similarity
Cluster item-factor matrix->
User-to-User Similarity
<-Cluster user-factor matrix
65
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Clustering + Collaborative Filtering-based Recommendations
66
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Show ItemFactors Matrix from ALS
67
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Convert to Item Factors -> mllib.Vector
Required by K-Means Clustering Algorithm
68
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Fit and Evaluate K-Means Cluster Model
69
Measures Closeness
Of Points Within Clusters
K = 5 Clusters
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Genres and Clusters
Typical Genres
Documentary, Romance, Comedy, Horror, Action, Adventure
Latent (Hidden) Clusters
Emotionally-Independent Dramas for Hopeless Romantics
Witty Dysfunctional-Family TV Animated Comedies
Romantic Crime Movies based on Classic Literature
Latin American Forbidden-Love Movies
Critically-acclaimed Emotional Drug Movie
Cerebral Military Movie based on Real Life
Sentimental Movies about Horses for Ages 11-12
Gory Canadian Revenge Movies
Raunchy Mad Scientist Comedy
70
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Presentation Outline
① Scaling
② Similarities
③ Recommendations
④ Approximations
⑤ Netflix Recommendations
71
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
When to Approximate?
Memory or time constrained queries
Relative vs. exact counts are OK (approx # errors after a release)
Using machine learning or graph algos
Inherently probabilistic and approximate
Streaming aggregations
Inherently sloppy collection (exactly once?)
72
Approximate as much as you can get away with!
Ask for forgiveness later !!
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
When NOT to Approximate?
If you’ve ever heard the term…
“Sarbanes-Oxley”
…at the office.
73
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
A Few Good Algorithms
74
You can’t handle
the approximate!
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Common to These Algos & Data Structs
Low, fixed size in memory
Store large amount of data
Known error bounds
Tunable tradeoff between size and error
Less memory than Java/Scala collections
Rely on multiple hash functions or operations
Size of hash range defines error
75
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bloom Filter
Set.contains(key): Boolean
“Hash Multiple Times and Flip the Bits Wherever You Land”
76
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bloom Filter
Approximate Set.contains(key)
No means No, Yes means Maybe
Elements can only be added
Never updated or removed
77
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bloom Filter in Action
78
set(key) contains(key): Boolean
Images by @avibryant
Set.contains(key): TRUE -> maybe contains (other key hashes may overlap)
Set.contains(key): FALSE -> definitely does not contain (no key flipped all bits)
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
CountMin Sketch
Frequency Count and TopK
“Hash Multiple Times and Add 1 Wherever You Land”
79
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
CountMin Sketch (CMS)
Approximate frequency count and TopK for key
ie. “Heavy Hitters” on Twitter
80
Matei Zaharia Martin Odersky Donald Trump
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
CountMin Sketch In Action (TopK Count)
81
Images derived from @avibryant
Find minimum of all rows
…
…
Can overestimate,
but never underestimate
Multiple hash functions
(1 hash function per row)
Binary hash output
(1 element per column)
x 2 occurrences of
“Top Gun” for slightly
additional complexity
Top Gun
Top Gun
Top Gun
(x 2)
A Few
Good Men
Taps
Top Gun
(x 2)
add(Top Gun, 2)
getCount(Top Gun): Long
Use Case: TopK movies using total views
add(A Few Good Men, 1)
add(Taps, 1)
A Few
Good Men
Taps
…
…
Overlap Top Gun
Overlap A Few Good Men
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
HyperLogLog
Count Distinct
“Hash Multiple Times and Uniformly Distribute Where You Land”
82
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
HyperLogLog (HLL)
Approximate count distinct
Slight twist
Special hash function creates uniform distribution
Hash subsets of data with single, special hash func
Error estimate
14 bits for size of range
m = 2^14 = 16,384 hash slots
error = 1.04/(sqrt(16,384)) = .81%
83
Not many of these
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
HyperLogLog In Action (Count Distinct)
Use Case: Number of distinct users who view a movie
84
0 32
Top Gun: Hour 2
user
2001
user
4009
user
3002
user
7002
user
1005
user
6001
User
8001
User
8002
user
1001
user
2009
user
3005
user
3003
Top Gun: Hour 1
user
3001
user
7009
0 16
UniformDistribution:
Estimate distinct # of users by
inspecting just the beginning
0 32
Top Gun: Hour 1 + 2
user
2001
user
4009
user
3002
user
7002
user
1005
user
6001
User
8001
User
8002
Combine across
different scales
user
7009
user
1001
user
2009
user
3005
user
3003
user
3001
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Locality Sensitive Hashing
Set Similarity
“Pre-process Items into Buckets, Compare Within Buckets”
85
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Locality Sensitive Hashing (LSH)
Approximate set similarity
Pre-process m rows into b buckets
b << m; b = buckets, m = rows
Hash items multiple times
** Similar items hash to overlapping buckets
** Hash designed to cluster similar items
Compare just contents of buckets
Much smaller cartesian compare
** Compare in parallel !!
Avoids huge cartesian all-pairs compare
86
Chapter 3: LSH
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
DIMSUM
Set Similarity
“Pre-process and ignore data that is unlikely to be similar.”
87
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
DIMSUM
“Dimension Independent Matrix Square Using MR”
Remove vectors with low probability of similarity
RowMatrix.columnSimiliarites(threshold)
Twitter DIMSUM Case Study
40% efficiency gain over bruce-force Cosine Sim
88
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Common Tools to Approximate
Twitter Algebird
Redis
Apache Spark
89
Composable Library
Distributed Cache
Big Data Processing
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Twitter Algebird
Algebraic Fundamentals
Parallel
Associative
Composable
Examples
Min, Max, Avg
BloomFilter (Set.contains(key))
HyperLogLog (Count Distinct)
CountMin Sketch (TopK Count)
90
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Redis
Implementation of HyperLogLog (Count Distinct)
12KB per item count
2^64 max # of items
0.81% error
Add user views for given movie
PFADD TopGun_Hour1_HLL user1001 user2009 user3005
PFADD TopGun_Hour1_HLL user3003 user1001
Get distinct count (cardinality) of set
PFCOUNT TopGun_Hour1_HLL
Returns: 4 (distinct users viewed this movie)
Union 2 HyperLogLog Data Structures
PFMERGE TopGun_Hour1_HLL TopGun_Hour2_HLL
91
ignore duplicates
Tunable
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Approximations in Spark Libraries
Spark Core
countByKeyApprox(timeout: Long, confidence: Double)
PartialResult
Spark SQL
approxCountDistinct(column: Column, targetResidual: Float)
approxQuantile(column: Column, quantiles: Seq[Float], targetResidual: Float)
Spark ML
Stratified sampling
sampleByKey(fractions: Map[K, Double])
DIMSUM sampling
Probabilistic sampling reduces amount of shuffle
RowMatrix.columnSimilarities(threshold: Double)
92
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Exact Count vs. Approximate HLL and CMS Count
93
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
HashSet vs. HyperLogLog (Memory)
94
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
HashSet vs. CountMin Sketch (Memory)
95
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Demo!
Exact Similarity vs. Approximate LSH Similarity
96
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Brute Force Cartesian All Pair Similarity
97
47 seconds
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Locality Sensitive Hash All Pair Similarity
98
6 seconds
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Many More Demos!
or
Download Docker Clone on Github
99
http://advancedspark.com
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Presentation Outline
① Scaling
② Similarities
③ Recommendations
④ Approximations
⑤ Netflix Recommendations
100
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Recommendations
From Ratings to Real-time
101
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Has a Lot of Data
Netflix has a lot of data about a lot of users and a lot of movies.
Netflix can use this data to buy new movies.
Netflix is global.
Netflix can use this data to choose original programming.
Netflix knows that a lot of people like politics and Kevin Spacey.
102
The UK doesn’t have White Castle.
Renamed my favourite movie to:
“Harold and Kumar
Get the Munchies”
My favorite movie:
“Harold and Kumar
Go to White Castle”
Summary: Buy NFLX Stock!
This broke my unit tests!
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Data Pipeline - Then
103
v1.0
v2.0
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Data Pipeline – Now (Keystone)
104
v3.0
9 million events per second
22 GB per second!!
EC2 D2XL
Disk: 6 TB, 475 MB/s
RAM: 30 G
Network: 700 Mbps
Auto-scaling,
Fault tolerance
A/B Tests,
Trending Now
SAMZA
Splits high and
normal priority
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Recommendation Data Pipeline
105
Throw away
batch user
factors (U)
Keep
batch video
factors (V)
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Trending Now (Time-based Recs)
Uses Spark Streaming
Personalized to user (viewing history, past ratings)
Learns and adapts to events (Valentine’s Day)
106
“VHS”
Number of
Plays
Number of
Impressions
Calculate
Take Rate
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Bonus: Pandora Time-based Recs
Work Days
Play familiar music
User is less likely accept new music
Evenings and Weekends
Play new music
More like to accept new music
107
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
$1 Million Netflix Prize (2006-2009)
Goal
Improve movie predictions by 10% (Root Mean Sq Error)
Test data withheld to calculate RMSE upon submission
5-star Ratings Dataset
(userId, movieId, rating, timestamp)
Winning algorithm(s)
10.06% improvement (RMSE)
Ensemble of 500+ ML combined with GBDT’s
Computationally impractical
108
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Secrets to the Winning Algorithms
Adjust for the following human bias…
① Alice effect: user rates lower than avg
② Inception effect: movie rated higher than avg
③ Overall mean rating of a movie
④ Number of people who have rated a movie
⑤ Number of days since user’s first rating
⑥ Number of days since movie’s first rating
⑦ Mood, time of day, day of week, season, weather
109
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Common ML Algorithms
Logistic Regression
Linear Regression
Gradient Boosted Decision Trees
Random Forest
Matrix Factorization
SVD
Restricted Boltzmann Machines
Deep Neural Nets
Markov Models
LDA
Clustering
110
Ensembles!
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Genres and Clusters
Typical Genres
Documentaries, Romance Comedies, Horror, Action, Adventure
Latent (Hidden) Clusters
Emotionally-Independent Dramas for Hopeless Romantics
Witty Dysfunctional-Family TV Animated Comedies
Romantic Crime Movies based on Classic Literature
Latin American Forbidden-Love Movies
Critically-acclaimed Emotional Drug Movie
Cerebral Military Movie based on Real Life
Sentimental Movies about Horses for Ages 11-12
Gory Canadian Revenge Movies
Raunchy Mad Scientist Comedy
111
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Social Integration
Post to Facebook after movie start (5 mins)
Recommend to new users based on friends
Helps with Cold Start problem
112
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Search
No results? No problem… Show similar results!
Utilize extensive DVD Catalog
Metadata search (ElasticSearch)
Named entity recognition (NLP)
Empty searches are opportunity!
Explicit feedback for future recommendations
Content to buy and produce!
113
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix A/B Tests
Users tend to click on images featuring…
Faces with strong emotional expressions
Villains over heroes
Small number of cast members
114
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Netflix Recommendation Serving Layer
Use Case: Recommendation service depends on EVCache
Problem: EVCache cluster does down or becomes latent!?
Answer: github.com/Netflix/Hystrix Circuit Breaker!
Circuit States
Closed: Service OK
Open: Service DOWN
Fallback to Static
115
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Why Higher Average Ratings 2004+?
2004, Netflix noticed higher ratings on average
Some possible reasons why…
116
① Significant UI improvements deployed
② New recommendation engine deployed
③
Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI
Thank You, Everyone!!
Chris Fregly @cfregly
Research Scientist @ Flux Capacitor AI
San Francisco, California, USA
http://fluxcapacitor.com
Sign up for the Meetup and Book
Contribute to Github Repo
Run all Demos using Docker
Find me LinkedIn, Twitter, Github, Email, Fax
117
Image derived from http://www.duchess-france.org/
Flux Capacitor AI Bringing AI Back to the Future!
Bringing AI Back to the Future!

More Related Content

What's hot

Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Chris Fregly
 
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Chris Fregly
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Chris Fregly
 
Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Chris Fregly
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Chris Fregly
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Chris Fregly
 
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5Chris Fregly
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Chris Fregly
 
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...Chris Fregly
 
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Chris Fregly
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016 Chris Fregly
 
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChris Fregly
 
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015Chris Fregly
 
Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015Chris Fregly
 
Brussels Spark Meetup Oct 30, 2015: Spark After Dark 1.5:  Real-time, Advanc...
Brussels Spark Meetup Oct 30, 2015:  Spark After Dark 1.5:  Real-time, Advanc...Brussels Spark Meetup Oct 30, 2015:  Spark After Dark 1.5:  Real-time, Advanc...
Brussels Spark Meetup Oct 30, 2015: Spark After Dark 1.5:  Real-time, Advanc...Chris Fregly
 
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Chris Fregly
 
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Chris Fregly
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...Athens Big Data
 
Scotland Data Science Meetup Oct 13, 2015: Spark SQL, DataFrames, Catalyst, ...
Scotland Data Science Meetup Oct 13, 2015:  Spark SQL, DataFrames, Catalyst, ...Scotland Data Science Meetup Oct 13, 2015:  Spark SQL, DataFrames, Catalyst, ...
Scotland Data Science Meetup Oct 13, 2015: Spark SQL, DataFrames, Catalyst, ...Chris Fregly
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Chris Fregly
 

What's hot (20)

Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015
 
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
 
Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016
 
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
 
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016
 
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
 
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
 
Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015
 
Brussels Spark Meetup Oct 30, 2015: Spark After Dark 1.5:  Real-time, Advanc...
Brussels Spark Meetup Oct 30, 2015:  Spark After Dark 1.5:  Real-time, Advanc...Brussels Spark Meetup Oct 30, 2015:  Spark After Dark 1.5:  Real-time, Advanc...
Brussels Spark Meetup Oct 30, 2015: Spark After Dark 1.5:  Real-time, Advanc...
 
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
 
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
 
Scotland Data Science Meetup Oct 13, 2015: Spark SQL, DataFrames, Catalyst, ...
Scotland Data Science Meetup Oct 13, 2015:  Spark SQL, DataFrames, Catalyst, ...Scotland Data Science Meetup Oct 13, 2015:  Spark SQL, DataFrames, Catalyst, ...
Scotland Data Science Meetup Oct 13, 2015: Spark SQL, DataFrames, Catalyst, ...
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
 

Viewers also liked

Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Chris Fregly
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰台灣資料科學年會
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly ProblemMark Chang
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Data Science Thailand
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
 
Machine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math RefresherMachine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math Refresherbutest
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Chris Fregly
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
 
高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan台灣資料科學年會
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsMark Chang
 

Viewers also liked (20)

Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
 
Machine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math RefresherMachine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math Refresher
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine Learning
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
02 math essentials
02 math essentials02 math essentials
02 math essentials
 
高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 

Similar to Bringing AI Back to the Future with Flux Capacitor

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Data Con LA
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Chris Fregly
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?DataWorks Summit
 
Drilling the Async Library
Drilling the Async LibraryDrilling the Async Library
Drilling the Async LibraryKnoldus Inc.
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVMJohn Lee
 
Denker - Pharo: Present and Future - 2009-07-14
Denker - Pharo: Present and Future - 2009-07-14Denker - Pharo: Present and Future - 2009-07-14
Denker - Pharo: Present and Future - 2009-07-14CHOOSE
 
Talk: The Present and Future of Pharo
Talk: The Present and Future of PharoTalk: The Present and Future of Pharo
Talk: The Present and Future of PharoMarcus Denker
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIAnimesh Singh
 
AST for JavaScript developers
AST for JavaScript developersAST for JavaScript developers
AST for JavaScript developersBohdan Liashenko
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Chris Fregly
 
Continuous Integration and Deployment Best Practices on AWS
Continuous Integration and Deployment Best Practices on AWSContinuous Integration and Deployment Best Practices on AWS
Continuous Integration and Deployment Best Practices on AWSDanilo Poccia
 
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...Amazon Web Services
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Makoto Yui
 
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)Konrad Malawski
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...Flink Forward
 
A Physical Units Library for the Next C++
A Physical Units Library for the Next C++A Physical Units Library for the Next C++
A Physical Units Library for the Next C++Mateusz Pusz
 

Similar to Bringing AI Back to the Future with Flux Capacitor (20)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
Drilling the Async Library
Drilling the Async LibraryDrilling the Async Library
Drilling the Async Library
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVM
 
Denker - Pharo: Present and Future - 2009-07-14
Denker - Pharo: Present and Future - 2009-07-14Denker - Pharo: Present and Future - 2009-07-14
Denker - Pharo: Present and Future - 2009-07-14
 
Talk: The Present and Future of Pharo
Talk: The Present and Future of PharoTalk: The Present and Future of Pharo
Talk: The Present and Future of Pharo
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
AST for JavaScript developers
AST for JavaScript developersAST for JavaScript developers
AST for JavaScript developers
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
 
Continuous Integration and Deployment Best Practices on AWS
Continuous Integration and Deployment Best Practices on AWSContinuous Integration and Deployment Best Practices on AWS
Continuous Integration and Deployment Best Practices on AWS
 
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
3 D Visual Avis Project
3 D Visual Avis Project3 D Visual Avis Project
3 D Visual Avis Project
 
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
 
A Physical Units Library for the Next C++
A Physical Units Library for the Next C++A Physical Units Library for the Next C++
A Physical Units Library for the Next C++
 

More from Chris Fregly

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataChris Fregly
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfChris Fregly
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupChris Fregly
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedChris Fregly
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine LearningChris Fregly
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon BraketChris Fregly
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-PersonChris Fregly
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapChris Fregly
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Chris Fregly
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Chris Fregly
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...Chris Fregly
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...Chris Fregly
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...Chris Fregly
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...Chris Fregly
 

More from Chris Fregly (20)

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
 

Recently uploaded

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 

Recently uploaded (20)

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 

Bringing AI Back to the Future with Flux Capacitor

  • 1. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI advancedspark.com
  • 2. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Who Am I? 2 Streaming Data Engineer Netflix OSS Committer Data Solutions Engineer Apache Contributor Principal Data Solutions Engineer IBM Technology Center Meetup Organizer Advanced Apache Meetup Book Author Advanced . Due 2016
  • 3. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Advanced Apache Spark Meetup http://advancedspark.com Meetup Metrics Top 10 Most-active Spark Meetup! 3200+ Members in just 9 mos!! 3700+ Docker downloads (demos) Meetup Mission Code deep-dive into Spark and related open source projects Surface key patterns and idioms Focus on distributed systems, scale, and performance 3
  • 4. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Live, Interactive Demo! Audience Participation Required!! Cell Phone Compatible!!! demo.advancedspark.com4
  • 5. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI http://demo.advancedspark.com End User -> ElasticSearch -> Spark ML -> Data Scientist -> 5 <- Kafka <- Spark Streaming <- Cassandra, Redis <- Zeppelin, iPython
  • 6. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Presentation Outline ① Scaling ② Similarities ③ Recommendations ④ Approximations ⑤ Netflix Recommendations 6
  • 7. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Scaling with Parallelism 7 Peter O(log n) O(log n) Worker Nodes
  • 8. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Parallelism with Composability Worker 1 Worker 2 Max (a max b max c max d) == (a max b) max (c max d) Set Union (a U b U c U d) == (a U b) U (c U d) Addition (a + b + c + d) == (a + b) + (c + d) Multiply (a * b * c * d) == (a * b) * (c * d) 8 What about Division and Average? Collect at Driver
  • 9. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI What about Division? Division (a / b / c / d) != (a / b) / (c / d) (3 / 4 / 7 / 8) != (3 / 4) / (7 / 8) (((3 / 4) / 7) / 8) != ((3 * 8) / (4 * 7)) 0.134 != 0.857 9 What were the Egyptians thinking?! Not Composable “Divide like an Egyptian”
  • 10. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI What about Average? Overall AVG (3, 1) (3 + 5 + 5 + 7) 20 + (5, 1) == -------------------- == --- == 5 + (5, 1) (1 + 1 + 1 + 1) 4 + (7, 1) 10 values counts Pairwise AVG (3 + 5) (5 + 7) 8 12 20 ------- + ------- == --- + --- == --- == 10 != 5 2 2 2 2 2 Divide, Add, Divide? Not Composable Single-Node Divide at the End? Doesn’t need to be Composable! AVG (3, 5, 5, 7) == 5 Add, Add, Add? Composable!
  • 11. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Presentation Outline ① Scaling ② Similarities ③ Recommendations ④ Approximations ⑤ Netflix Recommendations 11
  • 12. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Similarities 12
  • 13. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Euclidean Similarity Exists in Euclidean, flat space Based on Euclidean distance Linear measure Bias towards magnitude 13
  • 14. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Cosine Similarity Angular measure Adjusts for Euclidean magnitude bias Normalize to unit vectors in all dimensions Used with real-valued vectors (versus binary) 14 org.jblas. DoubleMatrix
  • 15. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Jaccard Similarity Set similarity measurement Set intersection / set union Bias towards popularity Works with binary vectors 15
  • 16. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Log Likelihood Similarity Adjusts for popularity bias Netflix “Shawshank” problem 16
  • 17. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Word Similarity Edit Distance Misspellings and autocorrect Word2Vec Similar words are defined by similar contexts in vector space 17 English Spanish
  • 18. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Find Synonyms with Word2Vec 18
  • 19. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Find Synonyms using Word2Vec 19
  • 20. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Document Similarity TF/IDF Term Freq / Inverse Document Freq Used by most search engines Doc2Vec Similar documents are determined by similar contexts 20
  • 21. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bonus! Text Rank Document Summary Text Rank (aka Sentence Rank) Surface summary sentences TF/IDF + Similarity Graph + PageRank Most similar sentence to all other sentences TF/IDF + Similarity Graph Most influential sentences PageRank 21
  • 22. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Similarity Pathways (Recommendations) Best recommendations for 2 (or more) people “You like Max Max. I like Message in a Bottle. We might like a movie similar to both.” Item-to-Item Similarity Graph + Dijkstra Heaviest Path 22
  • 23. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Similarity Pathway for Movie Recommendations 23
  • 24. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Load Movies with Tags into DataFrame 24 My Choice Their Choice
  • 25. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Item-to-Item Tag Jaccard Similarity Based on Tags 25 Calculate Jaccard Similarity (Tag Set Similarity) Must be Above the Given Jaccard Similarity Threshold
  • 26. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Item-to-Item Tag Similarity Graph 26 Edge Weights == Jaccard Similarity (Based on Tag Sets)
  • 27. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Use Dijkstra to Find Heaviest Pathway 27
  • 28. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Calculating Exact Similarity Brute-Force Similarity Cartesian Product O(n^2) shuffle and compute aka. All-pairs, Pair-wise, Similarity Join 28
  • 29. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Calculating Approximate Similarity Goal: Reduce Shuffle Approximate Similarity Sampling Bucketing or Clustering Ignore low-similarity probability Locality Sensitive Hashing Twitter Algebird MinHash 29 Bucket By Genre
  • 30. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Presentation Outline ① Scaling ② Similarities ③ Recommendations ④ Approximations ① Netflix Recommendations 30
  • 31. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Recommendations 31
  • 32. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Basic Terminology User: User seeking recommendations Item: Item being recommended Explicit User Feedback: user knows they are rating or liking, can choose to dislike Implicit User Feedback: user not explicitly aware, cannot dislike (click, hover, etc) Instances: Rows of user feedback/input data Overfitting: Training a model too closely to the training data & hyperparameters Hold Out Split: Holding out some of the instances to avoid overfitting Features: Columns of instance rows (of feedback/input data) Cold Start Problem: Not enough data to personalize (new) Hyperparameter: Model-specific config knobs for tuning (tree depth, iterations) Model Evaluation: Compare predictions to actual values of hold out split Feature Engineering: Modify, reduce, combine features Loss Function: Function we’re trying to minimize such as least-squared error for Linear Regression Cross Entropy: Loss function used for classification algorithms such as Logistic Regression Optimizer: Technique to optimize loss function such as Stochastic Gradient Descent (SGD) 32
  • 33. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Stochastic Gradient Descent (SGD) Optimizes Loss Function Least Squared Error b/w predicted and actual value Cross Entropy Log Likelihood b/w predicted and actual probability 33 2-Dimensional 3-Dimensional
  • 34. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Features Binary: True or False Numeric Discrete: Integers Numeric: Real Values Binning: Convert Continuous into Discrete (Time of Day->Morning, Afternoon) Categorical Ordinal: Size (Small->Medium->Large), Ratings (1->5) Categorical Nominal: Independent, Favorite Sports Teams, Dating Spots Temporal: Time-based, Time of Day, Binge Viewing Text: Movie Titles, Genres, Tags, Reviews (Tokenize, Stop Words, Stemming) Media: Images, Audio, Video Geographic: (Longitude, Latitude), Geohash Latent: Hidden Features within Data (Collaborative Filtering) Derived: Age of Movie, Duration of User Subscription 34
  • 35. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Feature Engineering Dimension Reduction Reduce number of features in feature space Principle Component Analysis (PCA) Find principle features that best describe data variance Peel dimensional layers back One-Hot Encoding Convert nominal categorical feature values into 0’s and 1’s Remove any numerical relationship between categories Bears -> 1 Bears -> [1.0, 0.0, 0.0] 49’ers -> 2 --> 49’ers -> [0.0, 1.0, 0.0] Steelers-> 3 Steelers-> [0.0, 0.0, 1.0] 35 Convert Each Item to Binary Vector with Single 1.0 Column
  • 36. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Feature Normalization & Standardization Goal Scale features to standard size Prevent boundless features Helps avoid overfitting Required by many ML algos Normalize Features Calculate L1 (or L2, etc) norm, then divide into each element Standardize Features Apply standard normal transformation (mean->0, stddev->1) org.apache.spark.ml.feature.[Normalizer, StandardScaler] 36 http://www.mathsisfun.com/data/standard-normal-distribution.html
  • 37. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Non-Personalized Recommendations 37
  • 38. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Cold Start Problem “Cold Start” problem New user, don’t know their preferences, must show something! Movies with highest-rated actors Top K aggregations Facebook social graph Friend-based recommendations Most desirable singles PageRank of likes and dislikes 38
  • 39. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! GraphFrame PageRank 39
  • 40. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Example: Dating Site “Like” Graph 40
  • 41. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI PageRank of Top Influencers 41
  • 42. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Personalized Recommendations 42
  • 43. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Personalized PageRank 43
  • 44. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Personalized PageRank: Outbound Links 44 0.15 = (1 - 0.85 “Damping Factor”) 85% Probability: Choose Among Outbound Network 15% Probability: Choose Self or Random 85% Among Outbound Network
  • 45. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Personalized PageRank: No Outbound 45 0.15 = (1 - 0.85 “Damping Factor”) 85% Probability: Choose Among Outbound Network 15% Probability: Choose Self or Random 85% Among No Outbound Network!!
  • 46. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI User-to-User Clustering User Similarity Time-based Pattern of viewing (binge or casual) Time of viewing (am or pm) Ratings-based Content ratings or number of views Average rating relative to others (critical or lenient) Search-based Search terms 46
  • 47. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Item-to-Item Clustering Item Similarity Profile text (TF/IDF, Word2Vec, NLP) Categories, tags, interests (Jaccard Similarity, LSH) Images, facial structures (Neural Nets, Eigenfaces) Dating Site Example… 47 Cluster Similar Eigen-facesCluster Similar Profiles Cluster Similar Categories
  • 48. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bonus: NLP Conversation Starter Bot 48 “If your responses to my generic opening lines are positive, I may read your profile.” Spark ML, Stanford CoreNLP, TF/IDF, DecisionTrees, Sentiment http://crockpotveggies.com/2015/02/09/automating-tinder-with-eigenfaces.html
  • 49. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bonus: Demo! Spark + Stanford CoreNLP Sentiment Analysis 49
  • 50. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bonus: Top 100 Country Song Sentiment 50
  • 51. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bonus: Surprising Results…?! 51
  • 52. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Item-to-Item Based Recommendations Based on Metadata: Genre, Description, Cast, City 52
  • 53. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Item-to-Item-based Recommendations One-Hot Encoding + K-Means Clustering 53
  • 54. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI One-Hot Encode Tag Feature Vectors 54
  • 55. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Cluster Movie Tag Feature Vectors 55 Hyperparameter Tuning (K Clusters?)
  • 56. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Analyze Movie Tag Clusters 56
  • 57. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI User-to-Item Collaborative Filtering Matrix Factorization ① Factor the large matrix (left) into 2 smaller matrices (right) ② Lower-rank matrices approximate original when multiplied ③ Fill in the missing values of the large matrix ④ Surface k (rank) latent features from user-item interactions 57
  • 58. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Item-to-Item Collaborative Filtering Famous Amazon Paper circa 2003 Problem As users grew, user-to-item collaborative filtering didn’t scale Solution Item-to-item similarity, nearest neighbors Offline (Batch) Generate itemId->List[userId] vectors Online (Real-time) From cart, recommend nearest-neighbors in vector space 58
  • 59. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Collaborative Filtering-based Recommendations 59
  • 60. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Fitting the Matrix Factorization Model 60
  • 61. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Show ItemFactors Matrix from ALS 61
  • 62. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Show UserFactors Matrix from ALS 62
  • 63. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Generating Individual Recommendations 63
  • 64. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Generating Batch Recommendations 64
  • 65. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Clustering + Collaborative Filtering Recs Cluster matrix output from Matrix Factorization Latent features derived from user-item interaction Item-to-Item Similarity Cluster item-factor matrix-> User-to-User Similarity <-Cluster user-factor matrix 65
  • 66. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Clustering + Collaborative Filtering-based Recommendations 66
  • 67. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Show ItemFactors Matrix from ALS 67
  • 68. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Convert to Item Factors -> mllib.Vector Required by K-Means Clustering Algorithm 68
  • 69. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Fit and Evaluate K-Means Cluster Model 69 Measures Closeness Of Points Within Clusters K = 5 Clusters
  • 70. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Genres and Clusters Typical Genres Documentary, Romance, Comedy, Horror, Action, Adventure Latent (Hidden) Clusters Emotionally-Independent Dramas for Hopeless Romantics Witty Dysfunctional-Family TV Animated Comedies Romantic Crime Movies based on Classic Literature Latin American Forbidden-Love Movies Critically-acclaimed Emotional Drug Movie Cerebral Military Movie based on Real Life Sentimental Movies about Horses for Ages 11-12 Gory Canadian Revenge Movies Raunchy Mad Scientist Comedy 70
  • 71. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Presentation Outline ① Scaling ② Similarities ③ Recommendations ④ Approximations ⑤ Netflix Recommendations 71
  • 72. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI When to Approximate? Memory or time constrained queries Relative vs. exact counts are OK (approx # errors after a release) Using machine learning or graph algos Inherently probabilistic and approximate Streaming aggregations Inherently sloppy collection (exactly once?) 72 Approximate as much as you can get away with! Ask for forgiveness later !!
  • 73. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI When NOT to Approximate? If you’ve ever heard the term… “Sarbanes-Oxley” …at the office. 73
  • 74. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI A Few Good Algorithms 74 You can’t handle the approximate!
  • 75. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Common to These Algos & Data Structs Low, fixed size in memory Store large amount of data Known error bounds Tunable tradeoff between size and error Less memory than Java/Scala collections Rely on multiple hash functions or operations Size of hash range defines error 75
  • 76. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bloom Filter Set.contains(key): Boolean “Hash Multiple Times and Flip the Bits Wherever You Land” 76
  • 77. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bloom Filter Approximate Set.contains(key) No means No, Yes means Maybe Elements can only be added Never updated or removed 77
  • 78. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bloom Filter in Action 78 set(key) contains(key): Boolean Images by @avibryant Set.contains(key): TRUE -> maybe contains (other key hashes may overlap) Set.contains(key): FALSE -> definitely does not contain (no key flipped all bits)
  • 79. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI CountMin Sketch Frequency Count and TopK “Hash Multiple Times and Add 1 Wherever You Land” 79
  • 80. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI CountMin Sketch (CMS) Approximate frequency count and TopK for key ie. “Heavy Hitters” on Twitter 80 Matei Zaharia Martin Odersky Donald Trump
  • 81. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI CountMin Sketch In Action (TopK Count) 81 Images derived from @avibryant Find minimum of all rows … … Can overestimate, but never underestimate Multiple hash functions (1 hash function per row) Binary hash output (1 element per column) x 2 occurrences of “Top Gun” for slightly additional complexity Top Gun Top Gun Top Gun (x 2) A Few Good Men Taps Top Gun (x 2) add(Top Gun, 2) getCount(Top Gun): Long Use Case: TopK movies using total views add(A Few Good Men, 1) add(Taps, 1) A Few Good Men Taps … … Overlap Top Gun Overlap A Few Good Men
  • 82. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI HyperLogLog Count Distinct “Hash Multiple Times and Uniformly Distribute Where You Land” 82
  • 83. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI HyperLogLog (HLL) Approximate count distinct Slight twist Special hash function creates uniform distribution Hash subsets of data with single, special hash func Error estimate 14 bits for size of range m = 2^14 = 16,384 hash slots error = 1.04/(sqrt(16,384)) = .81% 83 Not many of these
  • 84. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI HyperLogLog In Action (Count Distinct) Use Case: Number of distinct users who view a movie 84 0 32 Top Gun: Hour 2 user 2001 user 4009 user 3002 user 7002 user 1005 user 6001 User 8001 User 8002 user 1001 user 2009 user 3005 user 3003 Top Gun: Hour 1 user 3001 user 7009 0 16 UniformDistribution: Estimate distinct # of users by inspecting just the beginning 0 32 Top Gun: Hour 1 + 2 user 2001 user 4009 user 3002 user 7002 user 1005 user 6001 User 8001 User 8002 Combine across different scales user 7009 user 1001 user 2009 user 3005 user 3003 user 3001
  • 85. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Locality Sensitive Hashing Set Similarity “Pre-process Items into Buckets, Compare Within Buckets” 85
  • 86. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Locality Sensitive Hashing (LSH) Approximate set similarity Pre-process m rows into b buckets b << m; b = buckets, m = rows Hash items multiple times ** Similar items hash to overlapping buckets ** Hash designed to cluster similar items Compare just contents of buckets Much smaller cartesian compare ** Compare in parallel !! Avoids huge cartesian all-pairs compare 86 Chapter 3: LSH
  • 87. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI DIMSUM Set Similarity “Pre-process and ignore data that is unlikely to be similar.” 87
  • 88. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI DIMSUM “Dimension Independent Matrix Square Using MR” Remove vectors with low probability of similarity RowMatrix.columnSimiliarites(threshold) Twitter DIMSUM Case Study 40% efficiency gain over bruce-force Cosine Sim 88
  • 89. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Common Tools to Approximate Twitter Algebird Redis Apache Spark 89 Composable Library Distributed Cache Big Data Processing
  • 90. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Twitter Algebird Algebraic Fundamentals Parallel Associative Composable Examples Min, Max, Avg BloomFilter (Set.contains(key)) HyperLogLog (Count Distinct) CountMin Sketch (TopK Count) 90
  • 91. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Redis Implementation of HyperLogLog (Count Distinct) 12KB per item count 2^64 max # of items 0.81% error Add user views for given movie PFADD TopGun_Hour1_HLL user1001 user2009 user3005 PFADD TopGun_Hour1_HLL user3003 user1001 Get distinct count (cardinality) of set PFCOUNT TopGun_Hour1_HLL Returns: 4 (distinct users viewed this movie) Union 2 HyperLogLog Data Structures PFMERGE TopGun_Hour1_HLL TopGun_Hour2_HLL 91 ignore duplicates Tunable
  • 92. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Approximations in Spark Libraries Spark Core countByKeyApprox(timeout: Long, confidence: Double) PartialResult Spark SQL approxCountDistinct(column: Column, targetResidual: Float) approxQuantile(column: Column, quantiles: Seq[Float], targetResidual: Float) Spark ML Stratified sampling sampleByKey(fractions: Map[K, Double]) DIMSUM sampling Probabilistic sampling reduces amount of shuffle RowMatrix.columnSimilarities(threshold: Double) 92
  • 93. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Exact Count vs. Approximate HLL and CMS Count 93
  • 94. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI HashSet vs. HyperLogLog (Memory) 94
  • 95. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI HashSet vs. CountMin Sketch (Memory) 95
  • 96. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Demo! Exact Similarity vs. Approximate LSH Similarity 96
  • 97. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Brute Force Cartesian All Pair Similarity 97 47 seconds
  • 98. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Locality Sensitive Hash All Pair Similarity 98 6 seconds
  • 99. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Many More Demos! or Download Docker Clone on Github 99 http://advancedspark.com
  • 100. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Presentation Outline ① Scaling ② Similarities ③ Recommendations ④ Approximations ⑤ Netflix Recommendations 100
  • 101. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Recommendations From Ratings to Real-time 101
  • 102. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Has a Lot of Data Netflix has a lot of data about a lot of users and a lot of movies. Netflix can use this data to buy new movies. Netflix is global. Netflix can use this data to choose original programming. Netflix knows that a lot of people like politics and Kevin Spacey. 102 The UK doesn’t have White Castle. Renamed my favourite movie to: “Harold and Kumar Get the Munchies” My favorite movie: “Harold and Kumar Go to White Castle” Summary: Buy NFLX Stock! This broke my unit tests!
  • 103. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Data Pipeline - Then 103 v1.0 v2.0
  • 104. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Data Pipeline – Now (Keystone) 104 v3.0 9 million events per second 22 GB per second!! EC2 D2XL Disk: 6 TB, 475 MB/s RAM: 30 G Network: 700 Mbps Auto-scaling, Fault tolerance A/B Tests, Trending Now SAMZA Splits high and normal priority
  • 105. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Recommendation Data Pipeline 105 Throw away batch user factors (U) Keep batch video factors (V)
  • 106. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Trending Now (Time-based Recs) Uses Spark Streaming Personalized to user (viewing history, past ratings) Learns and adapts to events (Valentine’s Day) 106 “VHS” Number of Plays Number of Impressions Calculate Take Rate
  • 107. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Bonus: Pandora Time-based Recs Work Days Play familiar music User is less likely accept new music Evenings and Weekends Play new music More like to accept new music 107
  • 108. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI $1 Million Netflix Prize (2006-2009) Goal Improve movie predictions by 10% (Root Mean Sq Error) Test data withheld to calculate RMSE upon submission 5-star Ratings Dataset (userId, movieId, rating, timestamp) Winning algorithm(s) 10.06% improvement (RMSE) Ensemble of 500+ ML combined with GBDT’s Computationally impractical 108
  • 109. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Secrets to the Winning Algorithms Adjust for the following human bias… ① Alice effect: user rates lower than avg ② Inception effect: movie rated higher than avg ③ Overall mean rating of a movie ④ Number of people who have rated a movie ⑤ Number of days since user’s first rating ⑥ Number of days since movie’s first rating ⑦ Mood, time of day, day of week, season, weather 109
  • 110. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Common ML Algorithms Logistic Regression Linear Regression Gradient Boosted Decision Trees Random Forest Matrix Factorization SVD Restricted Boltzmann Machines Deep Neural Nets Markov Models LDA Clustering 110 Ensembles!
  • 111. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Genres and Clusters Typical Genres Documentaries, Romance Comedies, Horror, Action, Adventure Latent (Hidden) Clusters Emotionally-Independent Dramas for Hopeless Romantics Witty Dysfunctional-Family TV Animated Comedies Romantic Crime Movies based on Classic Literature Latin American Forbidden-Love Movies Critically-acclaimed Emotional Drug Movie Cerebral Military Movie based on Real Life Sentimental Movies about Horses for Ages 11-12 Gory Canadian Revenge Movies Raunchy Mad Scientist Comedy 111
  • 112. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Social Integration Post to Facebook after movie start (5 mins) Recommend to new users based on friends Helps with Cold Start problem 112
  • 113. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Search No results? No problem… Show similar results! Utilize extensive DVD Catalog Metadata search (ElasticSearch) Named entity recognition (NLP) Empty searches are opportunity! Explicit feedback for future recommendations Content to buy and produce! 113
  • 114. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix A/B Tests Users tend to click on images featuring… Faces with strong emotional expressions Villains over heroes Small number of cast members 114
  • 115. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Netflix Recommendation Serving Layer Use Case: Recommendation service depends on EVCache Problem: EVCache cluster does down or becomes latent!? Answer: github.com/Netflix/Hystrix Circuit Breaker! Circuit States Closed: Service OK Open: Service DOWN Fallback to Static 115
  • 116. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Why Higher Average Ratings 2004+? 2004, Netflix noticed higher ratings on average Some possible reasons why… 116 ① Significant UI improvements deployed ② New recommendation engine deployed ③
  • 117. Flux Capacitor AI Bringing AI Back to the Future!Bringing AI Back to the Future!Flux Capacitor AI Thank You, Everyone!! Chris Fregly @cfregly Research Scientist @ Flux Capacitor AI San Francisco, California, USA http://fluxcapacitor.com Sign up for the Meetup and Book Contribute to Github Repo Run all Demos using Docker Find me LinkedIn, Twitter, Github, Email, Fax 117 Image derived from http://www.duchess-france.org/
  • 118. Flux Capacitor AI Bringing AI Back to the Future! Bringing AI Back to the Future!