1. 1
Learning to Personalize
Justin Basilico
Page Algorithms Engineering September 19, 2014
@JustinBasilico
ATL 2014
2. 2
Interested in high-quality
recommendations
Proxy question:
Accuracy in predicted rating
Measured by root mean
squared error (RMSE)
Improve by 10% = $1 million!
Data size:
100M ratings (back then
“almost massive”)
7. 7
Everything is a Recommendation
Rows
Ranking
Over 75% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning
8. 8
Top Picks
Personalization awareness
Diversity
9. 9
Personalized genres
Genres focused on user interest
Derived from tag combinations
Provide context and evidence
How are they generated?
Implicit: Based on recent plays,
ratings & other interactions
Explicit: Taste preferences
10. 10
Similarity
Recommend videos similar
to one you’ve liked
“Because you watched”
rows
Pivots
Video information page
In response to user actions
(search, list add, …)
11. 11
Support for Recommendations
Behavioral Support Social Support
16. 16
Rating Prediction
Based on first year progress prize
Top 2 algorithms
Matrix Factorization (SVD++)
Restricted Boltzmann Machines
(RBM)
Ensemble: Linear blend
Videos
R
≈
Users
U
V
(99% Sparse) d
Videos
Users
× d
17. 17
Ranking by ratings
4.7 4.6 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5
Niche titles
High average ratings… by those who would watch it
23. 23
Page-level algorithmic challenge
10,000s of
possible
rows …
10-40
rows
Variable number of
possible videos per
row (up to thousands)
1 personalized page
per device
24. 24
Balancing a Personalized Page
Accurate vs. Diverse
Discovery vs. Continuation
Depth vs. Coverage
Freshness vs. Stability
Recommendations vs. Tasks
26. 26
Building a page algorithmically
Approaches
Template: Non-personalized layout
Row-independent: Greedy rank rows by f(r | u, c)
Stage-wise: Pick next rows by f(r | u, c, p1:n)
Page-wise: Total page fitness f(p | u, c)
Obey constraints
Certain rows may be required (Continue Watching
and My List)
Filter, de-duplicate
Format for device
27. 27
Row Features
Quality of items
Features of items
Quality of evidence
User-row interactions
Item/row metadata
Recency
Item-row affinity
Row length
Position on page
Title
Diversity
Similarity
Freshness
…
28. 28
Page-level Metrics
How do you measure the quality of
the homepage?
Ease of discovery, Diversity,
Novelty, …
Challenges:
Position effects
Row-video generalization
2D versions of ranking quality
metrics
Example: Recall @ row-by-column
0 10 20 30
Recall
Row
30. 30
Three levels of Learning Distribution/Parallelization
1. For each subset of the population (e.g.
region)
Want independently trained and tuned models
2. For each combination of hyperparameters
Simple: Grid search
Better: Bayesian optimization using Gaussian
Processes
3. For each subset of the training data
Distribute over machines (e.g. ADMM)
Multi-core parallelism (e.g. HogWild)
Or… use GPUs
31. 31
Example: Training Neural Networks
Level 1: Machines in different
AWS regions
Level 2: Machines in same AWS
region
Spearmint or MOE for parameter
optimization
Condor, StarCluster, Mesos, etc. for
coordination
Level 3: Highly optimized, parallel
CUDA code on GPUs