Recent Trends in Personalization: A Netflix Perspective

Recent Trends in
Personalization:
A Netflix Perspective
Justin Basilico
ICML 2019 Adaptive & Multi-Task Learning Workshop
2019-06-15
@JustinBasilico

Help members find content
to watch and enjoy to maximize
member satisfaction and retention

Ordering of videos is personalized
From what we recommend
Ranking

Selection and placement of rows is personalized
... to how we construct a pageRows

Personalized images
... to what images to select

... to reaching out to our members

Everything is a recommendation!
Over 80% of what
people watch
comes from our
recommendations
Overview in [Gomez-Uribe & Hunt, 2016]

○ Every person is unique with a variety of interests
○ Help people find what they want when they’re not sure what they want
○ Large datasets but small data per user
… and potentially biased by the output of your system
○ Cold-start problems on all sides
○ Non-stationary, context-dependent, mood-dependent
○ More than just accuracy: Diversity, novelty, freshness, fairness, ...
○ ...
No, personalization is hard!

Some recent trends in approaching these challenges:
1. Deep Learning
2. Causality
3. Bandits & Reinforcement Learning
4. Fairness
5. Experience Personalization
Trending Now

Trend 1: Deep Learning in
Recommendations

What~2012 ~2017
Deep Learning becomes popular in
Machine Learning
Deep Learning becomes popular in
Recommender Systems
What took so long?

Traditional Recommendations
Collaborative Filtering:
Recommend items that
similar users have chosen
0 1 0 1 0
0 0 1 1 0
1 0 0 1 1
0 1 0 0 0
0 0 0 0 1
Users
Items

U≈R
V
A Matrix Factorization view
2

U
A Feed-Forward Network view
V
2

U
A (deeper) feed-forward view
V
Mean
squared loss?

… isn’t always the best
U
V
Mean squared
loss
?

V
… but opens up many possibilities
Softmax
Avg / Stack/
Sequence
DNN / RNN / CNN
Input
interactions
(X)
(X)
p(Y)
2018-12-2319:32:10
2018-12-2412:05:53
2019-01-0215:40:22

Sequence prediction
● Treat recommendations as a
sequence classification problem
○ Input: sequence of user actions
○ Output: next action
● E.g. Gru4Rec [Hidasi et. al., 2016]
○ Input: sequence of items in a sessions
○ Output: next item in the session
● Also co-evolution: [Wu et al.,
2017], [Dai et al., 2017]

Leveraging other data
● Example: YouTube Recommender
[Covington et. al., 2016]
● Two stage ranker: candidate
generation (shrinking set of items
to rank) and ranking (classifying
actual impressions)
● Two feed-forward, fully
connected, networks with
hundreds of features

Contextual sequence data
2017-12-10 15:40:22
2017-12-23 19:32:10
2017-12-24 12:05:53
2017-12-27 22:40:22
2017-12-29 19:39:36
2017-12-30 20:42:13
Context ItemSequence
per user
?
Time

Time-sensitive sequence prediction
● Proper modeling of time and system dynamics is critical
○ Recommendations are actions at a moment in time
● Experiment on a Netflix internal dataset
○ Input: Sequence of past plays and time context
■ Discrete time: Day-of-week (Mon, Tue, …) & Hour-of-day
■ Continuous time (aka timestamp)
○ Label: Predict next play (temporal split data)

From Correlation to Causation
● Most recommendation
algorithms are correlational
○ Some early recommendation
algorithms literally computed
correlations between users
and items
● Did you watch a movie
because you liked it? Or
because we showed it to
you? Or both? p(Y|X) → p(Y|X, do(R))
(from http://www.tylervigen.com/spurious-correlations)

Feedback loops
Impression bias
inflates plays
Leads to inflated
item popularity
More plays
More
impressions
Oscillations in
distribution of genre
recommendations
Feedback loops can cause biases to be
reinforced by the recommendation system!
[Chaney et al., 2018]: simulations showing that this can reduce the
usefulness of the system

Closed Loop
Training
Data
Watches Model
Recs

Closed Loop
Training
Data
Watches Model
Recs
Danger Zone

Closed Loop
Training
Data
Watches Model
Recs
Danger Zone
Search
Training
Data
Watches Model
Recs
Open Loop

Debiasing Recommendations
● IPS Estimator for MF [Schnabel et al., 2016]
○ Train a debiasing model and reweight the data
● Causal Embeddings [Bonner & Vasile, 2018]
○ Jointly learn debiasing model and task model
○ Regularize the two towards each other
● Doubly-Robust MF [Wang et al., 2019]

Trend 3: Bandits &
Reinforcement Learning in
Recommendations

● Uncertainty around user interests and new items
● Sparse and indirect feedback
● Changing trends
● Break feedback loops
● Want to explore to learn
Why contextual bandits for recommendations?
▶Early news example: [Li et al., 2010]

Bart [McInerney et al., 2018]
● Bandit selecting both items and explanations for
Spotify homepage
● Factorization Machine with epsilon-greedy explore
over personalized candidate set
● Counterfactual risk minimization to train the bandit

Artwork Personalization as
Contextual Bandit
● Environment: Netflix homepage
● Context: Member, device, page, etc.
● Learner: Artwork selector for a show
● Action: Display specific image for show
● Reward: Member has positive engagement
Artwork Selector
▶

Offline Replay Results
● Bandit finds good images
● Personalization is better
● Artwork variety matters
● Personalization wiggles
around best images
Lift in Replay in the various algorithms as
compared to the Random baseline
More info in our blog post

Going Long-Term
● Want to maximize long-term user satisfaction and retention
● Involves many user visits, recommendation actions and delayed reward
● … sounds like Reinforcement Learning

● High-dimensional action space: Recommending a single item is O(|C|);
typically want to do ranking or page construction, which is combinatorial
● High-dimensional state space: Users are represented in the state, along
with the relevant history
● Off-policy training: Need to learn from existing system actions
● Concurrency: Don’t observe full trajectories, need to learn simultaneously
from many interactions
● Changing action space: New actions (items) become available and need to
be cold-started.
● No good simulator: Requires knowing feedback for user on recommended
items
Challenges of Reinforcement Learning for
Recommendations

List-wise [Zhao et al., 2017] or Page-wise recommendation [Zhao et al. 2018]
based on [Dulac-Arnold et al., 2016]
Embeddings for actions

● Generator to choose user action from recommendation
● Reward trained like a discriminator
● LSTM or Position-Weight architecture
● Learning over sets via cascading Deep Q Networks
○ Different Q function per position
GAN-inspired as a user simulator
[Chen et al., 2019]

● Train candidate generator using
REINFORCE
● Exploration done using softmax with
temperature
● Off-policy correction with adaptation for
top-k recommendations
● Trust region policy optimization to keep
close to logging policy
Policy Gradient for YouTube
Recommendations [Chen et al., 2019]

Personalization has a big impact in people’s lives
How do we make sure that it is fair?

Calibrated Recommendations [Steck, 2018]
● Fairness as matching distribution of user interests
● Accuracy as an objective can lead to unbalanced predictions
● Simple example:
● Many recommendation algorithms exhibit this behavior of exaggerating the
dominant interests and crowd out less frequent ones
30 action70 romance
30% action70% romance
User:
Expectation:
100% romanceReality: Maximizes accuracy

- Genre-distribution of each item is given:
- Genre-distribution of user’s play history:
… add prior for other genres:
- Genre-distribution of recommended list:
(for diversity)
(or other categorization)
Calibration Metric

Calibration Results (MovieLens 20M)
Baseline model (wMF):
Many users receive
uncalibrated rec’s
After reranking:
Rec’s are much more
calibrated (smaller )
Userdensity
More calibrated (KL divergence)
Submodular
Reranker:

Fairness through Pairwise Comparisons
[Beutel et al., 2019]
● Recommendations are fair if likelihood of clicked item being ranked above
an unclicked item is the same across two groups
○ Intra-group pairwise accuracy - Restrict to pairs within group
○ Inter-group pairwise accuracy - Restrict to pairs between groups
● Training: Add pairwise regularizer based on randomized data to collect
fairness feedback

Trend 5:
Experience Personalization

Personalizing how we recommend
(not just what we recommend…)
● Algorithm level: Ideal balance of diversity, popularity,
novelty, freshness, etc. may depend on the person
● Display level: How you present items or explain
recommendations can also be personalized
● Interaction level: Balancing the needs of lean-back
users and power users

Page/Slate Optimization
● Select multiple actions that go together and
receive feedback on group
● Personalizing based on within-session
browsing behavior [Wu et al., 2015]
● Off-policy evaluation for slates
[Swaminathan, et al., 2016]
● Slate optimization as VAE [Jiang et al., 2019]
● Marginal posterior sampling for slate bandits
[Dimakopoulou et al., 2019]

More dimensions to personalize
Rows
Trailer
Evidence
Synopsis
Image
Row Title
Metadata
Ranking

Rating Ranking Pages
4.7
Experience
Evolution of our Personalization Approach

Potential Connections with
Multi-Task / Meta Learning?

Applications as tasks
● Many related personalization tasks in a
recommender system
● Examples:
○ [Zhao et al., 2015] - Outputs for different tasks
○ [Bansal et al., 2016] - Jointly learn to recommend
and predict metadata for items
○ [Ma et al., 2018] - Jointly learn watch and enjoy
○ [Lu et al., 2018] - Jointly learn for rating prediction
and explanation
○ [Hadash et al., 2018] - Jointly learn ranking and
rating prediction
User
history
Ranking
Page
Rating
Explanation
Search
Image
Context ...

Other views
● Users-as-tasks: Treat each user as a task and learn from others users
○ Example: [Ning & Karapis, 2010] finds similar users and does
support vector regression
● Items-as-tasks: Treat each item as a separate model to learn
● Contexts-as-tasks: Treat different contexts (time, device, region, …)
as separate tasks
● Domains-as-tasks: Leverage representations of users in one domain
to help in another (e.g. different kinds of items, different genres)
○ Example: [Li et al., 2009] on movies <-> books

1. Deep Learning
2. Causality
3. Bandits & Reinforcement Learning
4. Fairness
5. Experience Personalization
6. Multi-task & Meta Learning?
Lots of opportunity for Machine Learning
in Personalization

Thank you
Questions?
@JustinBasilico Yes, we’re hiring...
Justin Basilico

Recent Trends in Personalization: A Netflix Perspective

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Recent Trends in Personalization: A Netflix Perspective

Similar to Recent Trends in Personalization: A Netflix Perspective (20)

More from Justin Basilico

More from Justin Basilico (7)

Recently uploaded

Recently uploaded (20)

Recent Trends in Personalization: A Netflix Perspective