SlideShare a Scribd company logo
1 of 83
Artwork
Personalization
at Netflix
Justin Basilico
QCon SF 2018
2018-11-05
@JustinBasilico
Which artwork to show?
A good image is...
1. Representative
2. Informative
3. Engaging
4. Differential
A good image is...
1. Representative
2. Informative
3. Engaging
4. Differential
Personal
Intuition: Preferences in cast members
Intuition: Preferences in genre
Choose artwork so that members
understand if they will likely enjoy a title
to maximize satisfaction and retention
Challenges in Artwork
Personalization
Everything is a Recommendation
Over 80% of what
people watch
comes from our
recommendations
Rankings
Rows
Attribution
Pick
only one
Was it the recommendation or artwork?
Or both?
▶
Change Effects
Which one caused the play?
Is change confusing?
Day 1 Day 2
▶
● Creatives select the images that are available
● But algorithms must be still robust
Adding meaning and avoiding clickbait
Scale
Over 20M RPS
for images at
peak
Traditional Recommendations
Collaborative Filtering:
Recommend items that
similar users have chosen
0 1 0 1 0
0 0 1 1 0
1 0 0 1 1
0 1 0 0 0
0 0 0 0 1
Users
Items
Members can only play
images we choose
Need
something
more
Bandit
Not that kind
of Bandit
Image from Wikimedia commons
Multi-Armed Bandits (MAB)
● Multiple slot machines with
unknown reward distribution
● A gambler can play one arm at a time
● Which machine to play to maximize
reward?
Bandit Algorithms Setting
Each round:
● Learner chooses an action
● Environment provides a real-valued reward for action
● Learner updates to maximize the cumulative reward
Learner
(Policy)
Environment
Action
Reward
Artwork Optimization as Bandit
● Environment: Netflix homepage
● Learner: Artwork selector for a show
● Action: Display specific image for show
● Reward: Member has positive engagement
Artwork Selector
▶
● What images should creatives provide?
○ Variety of image designs
○ Thematic and visual differences
● How many images?
○ Creating each image has a cost
○ Diminishing returns
Images as Actions
● What is a good outcome?
✓ Watching and enjoying the content
● What is a bad outcome?
✖ No engagement
✖ Abandoning or not enjoying the
content
Designing Rewards
Metric: Take Fraction
▶
Example: Altered Carbon
Take Fraction: 1/3
Minimizing Regret
● What is the best that a bandit can do?
○ Always choose optimal action
● Regret: Difference between optimal
action and chosen action
● To maximize reward, minimize the
cumulative regret
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
Choose
image
Bandit Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Historical rewardsActions
2/4
0/2
1/3
Observed
Take
Fraction
Overall: 3/9
Strategy
Show current best image
Try another image to learn
if it is actually better
ExplorationMaximization
vs.
Principles of Exploration
● Gather information to make the best overall decision
in the long-run
● Best long-term strategy may involve short-term
sacrifices
Common strategies
1. Naive Exploration
2. Optimism in the Face of Uncertainty
3. Probability Matching
Naive Exploration: 𝝐-greedy
● Idea: Add a noise to the greedy policy
● Algorithm:
○ With probability 𝝐
■ Choose one action uniformly at random
○ Otherwise
■ Choose the action with the best reward so far
● Pros: Simple
● Cons: Regret is unbounded
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
2/4
(greedy)
0/2
1/3
Observed
Reward
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
𝝐 / 3
𝝐 / 3
1 - 2𝝐 / 3
Epsilon-Greedy Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
Epsilon-Greedy Example
1 0 1 0
0 0 0
0 1 0
2/4
(greedy)
0/3
1/3
Observed
Reward
Optimism: Upper Confidence Bound (UCB)
● Idea: Prefer actions with uncertain values
● Approach:
○ Compute confidence interval of observed rewards
for each action
○ Choose action a with the highest 𝛂-percentile
○ Observe reward and update confidence interval
for a
● Pros: Theoretical regret minimization properties
● Cons: Needs to update quickly from observed rewards
Beta-Bernoulli Distribution
Image from Wikipedia
Beta Bernoulli
Pr(1) = p
Pr(0) = 1 - p
Prior
Bandit Example with Beta-Bernoulli
2/4
0/2
1/3
Observed Take
Fraction
Prior:
𝛽(1, 1)
𝛽(3, 3)
𝛽(2, 3)
𝛽(1, 3)+ =
A
B
C
Bayesian UCB Example
1 0 1 1 ?
0 0 ?
0 1 0 ?
[0.15, 0.85]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 ?
0 0 ?
0 1 0 ?
[0.15, 0.85]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 0
0 0
0 1 0
[0.12, 0.78]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Bayesian UCB Example
1 0 1 1 0
0 0
0 1 0
[0.12, 0.78]
[0.07, 0.81]
Reward 95%
Confidence
[0.01, 0.71]
Probabilistic: Thompson Sampling
● Idea: Select the actions by the probability they are the best
● Approach:
○ Keep a distribution over model parameters for each action
○ Sample estimated reward value for each action
○ Choose action a with maximum sampled value
○ Observe reward for action a and update its parameter distribution
● Pros: Randomness continues to explore without update
● Cons: Hard to compute probabilities of actions
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
𝛽(3, 3) =
𝛽(2, 3) =
Distribution
𝛽(1, 3) =
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
0.38
0.59
Sampled
values
0.18
Thompson Sampling Example
1 0 1 0 ?
0 0 ?
0 1 0 ?
0.38
0.59
Sampled
values
0.18
Thompson Sampling Example
1 0 1 0
0 0
0 1 0 1
Distribution
𝛽(3, 3) =
𝛽(3, 3) =
𝛽(1, 3) =
Many Variants of Bandits
● Standard setting: Stochastic and stationary
● Drifting: Reward values change over time
● Adversarial: No assumptions on how rewards are generated
● Continuous action space
● Infinite set of actions
● Varying set of actions over time
● ...
What about personalization?
Contextual Bandits
● Let’s make this harder!
● Slot machines where payout depends on
context
● E.g. time of day, blinking light on slot
machine, ...
Contextual Bandit
Learner
(Policy)
Environment
Action
Reward
Context
Each round:
● Environment provides context (feature) vector
● Learner chooses an action for context
● Environment provides a real-valued reward for action in context
● Learner updates to maximize the cumulative reward
Supervised Learning Contextual Bandits
Input: Features (x∊ℝd
)
Output: Predicted label
Feedback: Actual label (y)
Input: Context (x∊ℝd
)
Output: Action (a = 𝜋(x))
Feedback: Reward (r∊ℝ)
Supervised Learning Contextual Bandits
Example Chihuahua images from ImageNet
Cat Dog Cat 0
Dog ✓ Seal 0
???
Reward
Dog
Label
Dog Dog Fox 0✓
Artwork Personalization
as Contextual Bandit
● Context: Member, device, page, etc.
Artwork Selector
▶
Choose
Epsilon Greedy Example
𝝐 1-𝝐
Personalized
Image
Image
At Random
● Learn a supervised regression model per image to predict reward
● Pick image with highest predicted reward
Member
(context)
Features
Image Pool
Model 1
Winner
Model 2
Model 3
Model 4
arg
max
Greedy Policy Example
● Linear model to calculate uncertainty in reward estimate
● Choose image with highest 𝛂-percentile predicted reward value
Member
(context)
Features
Image Pool
Model 1
Winner
Model 2
Model 3
Model 4
arg
max
LinUCB Example
Lin et al., 2010
● Learn distribution over model parameters (e.g. Bayesian Regression)
● Sample a model, evaluate features, take arg max
Thompson Sampling Example
Member
(context)
Features
Image Pool
Sample 1
Winner
Sample 2
Sample 3
Sample 4
arg
max
Chappelle & Li, 2011
Model 1
Model 2
Model 3
Model 4
Offline Metric: Replay
Offline Take Fraction: 2/3
Logged
Actions
Model
Assignments
▶ ▶
Li et al., 2011
Replay
● Pros
○ Unbiased metric when using logged probabilities
○ Easy to compute
○ Rewards observed are real
● Cons
○ Requires a lot of data
○ High variance due if few matches
■ Techniques like Doubly-Robust estimation (Dudik, Langford
& Li, 2011) can help
Offline Replay Results
● Bandit finds good images
● Personalization is better
● Artwork variety matters
● Personalization wiggles
around best images
Lift in Replay in the various algorithms as
compared to the Random baseline
Bandits in the Real
World
● Getting started
○ Need data to learn
○ Warm-starting via batch learning from existing data
● Closing the feedback loop
○ Only exposing bandit to its own output
● Algorithm performance depends data volume
○ Need to be able to test bandits at large scale, head-to-head
A/B testing Bandit Algorithms
Starting the Loop
Explore User
Action
Context
Join
Logging
Reward
Data
Store
Model User
Update
Action
Context
Join
Logging
Reward
Data
Store
Incremental
Publish
Train Batch
Publish
Completing the Loop
● Need to serve an image for any title in the catalog
○ Calls from homepage, search, galleries, etc.
○ > 20M RPS at peak
● Existing UI code written assuming image lookup is fast
○ In memory map of video ID to URL
○ Want to insert Machine Learned model
○ Don’t want a big rewrite across all UI code
Scale Challenges
Live Compute Online Precompute
Synchronous computation
to choose image for title in
response to a member
request
Asynchronous computation
to choose image for title
before request and stored in
cache
Live Compute Online Precompute
Pros:
● Access to most fresh data
● Knowledge of full context
● Compute only what is necessary
Cons:
● Strict Service Level Agreements
○ Must respond quickly in all
cases
○ Requires high availability
● Restricted to simple algorithms
Pros:
● Can handle large data
● Can run moderate complexity
algorithms
● Can average computational
cost across users
● Change from actions
Cons:
● Has some delay
● Done in event context
● Extra compute for users and
items not served
See techblog for more details
System Architecture
Edge
Personalized
Image
Precompute
EV Cache
Precompute
logs
ETL
(aggregate
data)
Model
training
Bandit
model
UI
image
request
Play and
Impression
logs
Precompute & Image Lookup
● Precompute
○ Run bandit for each title on each profile to
choose personalized image
○ Store the title to image mapping in EVCache
● Image Lookup
○ Pull profile’s image mapping from EVCache
once per request
Logging & Reward
● Precompute Logging
○ Selected image
○ Exploration Probability
○ Candidate pool
○ Snapshot facts for feature generation
● Reward Logging
○ Image rendered in UI & if played
○ Precompute ID
Image via YouTube
Feature Generation & Training
● Join rewards with snapshotted facts
● Generate features using DeLorean
○ Feature encoders are shared online and offline
● Train the model using Spark
● Publish model to production
DeLorean image by JMortonPhoto.com & OtoGodfrey.com
Track the quality of the model
- Compare prediction to actual behavior
- Online equivalents of offline metrics
Reserve a fraction of data for a simple
policy (e.g. 𝝐-greedy) to sanity check
bandits
Monitoring and Resiliency
● Missing images greatly degrade the member experience
● Try to serve the best image possible
Graceful Degradation
Personalized
Selection
Unpersonalized
Fallback
Default Image
(when all else fails)
Does it work?
Online results
● A/B test: It works!
● Rolled out to our >130M member base
● Most beneficial for lesser known titles
● Competition between titles for
attention leads to compression of
offline metrics More details in our blog post
Future Work
More dimensions to personalize
Rows
Trailer
Evidence
Synopsis
Image
Row Title
Metadata
Ranking
Automatic image selection
● Generating new artwork is costly and time consuming
● Can we predict performance from raw image?
Artwork selection orchestration
● Neighboring image selection influences result
Row A
(microphones)
Example: Stand-up comedy
Row B
(more variety)
Long-term Reward: Road to
Reinforcement Learning
● RL involves multiple actions and delayed reward
● Useful to maximize user long-term joy?
Thank you
@JustinBasilico

More Related Content

What's hot

Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Zachary Schendel
 

What's hot (20)

Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
 

Similar to Artwork Personalization at Netflix

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
MLconf
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
Vasco Duarte
 

Similar to Artwork Personalization at Netflix (20)

Recommender systems
Recommender systems Recommender systems
Recommender systems
 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorch
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspective
 
Sprezzatura - Roelof van Zwol - May 2018
Sprezzatura  - Roelof van Zwol - May 2018Sprezzatura  - Roelof van Zwol - May 2018
Sprezzatura - Roelof van Zwol - May 2018
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Artwork Personalization at Netflix

  • 1. Artwork Personalization at Netflix Justin Basilico QCon SF 2018 2018-11-05 @JustinBasilico
  • 2.
  • 4. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential
  • 5. A good image is... 1. Representative 2. Informative 3. Engaging 4. Differential Personal
  • 8. Choose artwork so that members understand if they will likely enjoy a title to maximize satisfaction and retention
  • 10. Everything is a Recommendation Over 80% of what people watch comes from our recommendations Rankings Rows
  • 11. Attribution Pick only one Was it the recommendation or artwork? Or both? ▶
  • 12. Change Effects Which one caused the play? Is change confusing? Day 1 Day 2 ▶
  • 13. ● Creatives select the images that are available ● But algorithms must be still robust Adding meaning and avoiding clickbait
  • 14. Scale Over 20M RPS for images at peak
  • 15. Traditional Recommendations Collaborative Filtering: Recommend items that similar users have chosen 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 Users Items Members can only play images we choose
  • 20. Multi-Armed Bandits (MAB) ● Multiple slot machines with unknown reward distribution ● A gambler can play one arm at a time ● Which machine to play to maximize reward?
  • 21. Bandit Algorithms Setting Each round: ● Learner chooses an action ● Environment provides a real-valued reward for action ● Learner updates to maximize the cumulative reward Learner (Policy) Environment Action Reward
  • 22. Artwork Optimization as Bandit ● Environment: Netflix homepage ● Learner: Artwork selector for a show ● Action: Display specific image for show ● Reward: Member has positive engagement Artwork Selector ▶
  • 23. ● What images should creatives provide? ○ Variety of image designs ○ Thematic and visual differences ● How many images? ○ Creating each image has a cost ○ Diminishing returns Images as Actions
  • 24. ● What is a good outcome? ✓ Watching and enjoying the content ● What is a bad outcome? ✖ No engagement ✖ Abandoning or not enjoying the content Designing Rewards
  • 25. Metric: Take Fraction ▶ Example: Altered Carbon Take Fraction: 1/3
  • 26. Minimizing Regret ● What is the best that a bandit can do? ○ Always choose optimal action ● Regret: Difference between optimal action and chosen action ● To maximize reward, minimize the cumulative regret
  • 27. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions
  • 28. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions Choose image
  • 29. Bandit Example 1 0 1 0 ? 0 0 ? 0 1 0 ? Historical rewardsActions 2/4 0/2 1/3 Observed Take Fraction Overall: 3/9
  • 30. Strategy Show current best image Try another image to learn if it is actually better ExplorationMaximization vs.
  • 31. Principles of Exploration ● Gather information to make the best overall decision in the long-run ● Best long-term strategy may involve short-term sacrifices
  • 32. Common strategies 1. Naive Exploration 2. Optimism in the Face of Uncertainty 3. Probability Matching
  • 33. Naive Exploration: 𝝐-greedy ● Idea: Add a noise to the greedy policy ● Algorithm: ○ With probability 𝝐 ■ Choose one action uniformly at random ○ Otherwise ■ Choose the action with the best reward so far ● Pros: Simple ● Cons: Regret is unbounded
  • 34. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 2/4 (greedy) 0/2 1/3 Observed Reward
  • 35. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝝐 / 3 𝝐 / 3 1 - 2𝝐 / 3
  • 36. Epsilon-Greedy Example 1 0 1 0 ? 0 0 ? 0 1 0 ?
  • 37. Epsilon-Greedy Example 1 0 1 0 0 0 0 0 1 0 2/4 (greedy) 0/3 1/3 Observed Reward
  • 38. Optimism: Upper Confidence Bound (UCB) ● Idea: Prefer actions with uncertain values ● Approach: ○ Compute confidence interval of observed rewards for each action ○ Choose action a with the highest 𝛂-percentile ○ Observe reward and update confidence interval for a ● Pros: Theoretical regret minimization properties ● Cons: Needs to update quickly from observed rewards
  • 39. Beta-Bernoulli Distribution Image from Wikipedia Beta Bernoulli Pr(1) = p Pr(0) = 1 - p Prior
  • 40. Bandit Example with Beta-Bernoulli 2/4 0/2 1/3 Observed Take Fraction Prior: 𝛽(1, 1) 𝛽(3, 3) 𝛽(2, 3) 𝛽(1, 3)+ = A B C
  • 41. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 42. Bayesian UCB Example 1 0 1 1 ? 0 0 ? 0 1 0 ? [0.15, 0.85] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 43. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 44. Bayesian UCB Example 1 0 1 1 0 0 0 0 1 0 [0.12, 0.78] [0.07, 0.81] Reward 95% Confidence [0.01, 0.71]
  • 45. Probabilistic: Thompson Sampling ● Idea: Select the actions by the probability they are the best ● Approach: ○ Keep a distribution over model parameters for each action ○ Sample estimated reward value for each action ○ Choose action a with maximum sampled value ○ Observe reward for action a and update its parameter distribution ● Pros: Randomness continues to explore without update ● Cons: Hard to compute probabilities of actions
  • 46. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 𝛽(3, 3) = 𝛽(2, 3) = Distribution 𝛽(1, 3) =
  • 47. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  • 48. Thompson Sampling Example 1 0 1 0 ? 0 0 ? 0 1 0 ? 0.38 0.59 Sampled values 0.18
  • 49. Thompson Sampling Example 1 0 1 0 0 0 0 1 0 1 Distribution 𝛽(3, 3) = 𝛽(3, 3) = 𝛽(1, 3) =
  • 50. Many Variants of Bandits ● Standard setting: Stochastic and stationary ● Drifting: Reward values change over time ● Adversarial: No assumptions on how rewards are generated ● Continuous action space ● Infinite set of actions ● Varying set of actions over time ● ...
  • 52. Contextual Bandits ● Let’s make this harder! ● Slot machines where payout depends on context ● E.g. time of day, blinking light on slot machine, ...
  • 53. Contextual Bandit Learner (Policy) Environment Action Reward Context Each round: ● Environment provides context (feature) vector ● Learner chooses an action for context ● Environment provides a real-valued reward for action in context ● Learner updates to maximize the cumulative reward
  • 54. Supervised Learning Contextual Bandits Input: Features (x∊ℝd ) Output: Predicted label Feedback: Actual label (y) Input: Context (x∊ℝd ) Output: Action (a = 𝜋(x)) Feedback: Reward (r∊ℝ)
  • 55. Supervised Learning Contextual Bandits Example Chihuahua images from ImageNet Cat Dog Cat 0 Dog ✓ Seal 0 ??? Reward Dog Label Dog Dog Fox 0✓
  • 56. Artwork Personalization as Contextual Bandit ● Context: Member, device, page, etc. Artwork Selector ▶
  • 57. Choose Epsilon Greedy Example 𝝐 1-𝝐 Personalized Image Image At Random
  • 58. ● Learn a supervised regression model per image to predict reward ● Pick image with highest predicted reward Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max Greedy Policy Example
  • 59. ● Linear model to calculate uncertainty in reward estimate ● Choose image with highest 𝛂-percentile predicted reward value Member (context) Features Image Pool Model 1 Winner Model 2 Model 3 Model 4 arg max LinUCB Example Lin et al., 2010
  • 60. ● Learn distribution over model parameters (e.g. Bayesian Regression) ● Sample a model, evaluate features, take arg max Thompson Sampling Example Member (context) Features Image Pool Sample 1 Winner Sample 2 Sample 3 Sample 4 arg max Chappelle & Li, 2011 Model 1 Model 2 Model 3 Model 4
  • 61. Offline Metric: Replay Offline Take Fraction: 2/3 Logged Actions Model Assignments ▶ ▶ Li et al., 2011
  • 62. Replay ● Pros ○ Unbiased metric when using logged probabilities ○ Easy to compute ○ Rewards observed are real ● Cons ○ Requires a lot of data ○ High variance due if few matches ■ Techniques like Doubly-Robust estimation (Dudik, Langford & Li, 2011) can help
  • 63. Offline Replay Results ● Bandit finds good images ● Personalization is better ● Artwork variety matters ● Personalization wiggles around best images Lift in Replay in the various algorithms as compared to the Random baseline
  • 64. Bandits in the Real World
  • 65. ● Getting started ○ Need data to learn ○ Warm-starting via batch learning from existing data ● Closing the feedback loop ○ Only exposing bandit to its own output ● Algorithm performance depends data volume ○ Need to be able to test bandits at large scale, head-to-head A/B testing Bandit Algorithms
  • 66. Starting the Loop Explore User Action Context Join Logging Reward Data Store Model User Update Action Context Join Logging Reward Data Store Incremental Publish Train Batch Publish Completing the Loop
  • 67. ● Need to serve an image for any title in the catalog ○ Calls from homepage, search, galleries, etc. ○ > 20M RPS at peak ● Existing UI code written assuming image lookup is fast ○ In memory map of video ID to URL ○ Want to insert Machine Learned model ○ Don’t want a big rewrite across all UI code Scale Challenges
  • 68. Live Compute Online Precompute Synchronous computation to choose image for title in response to a member request Asynchronous computation to choose image for title before request and stored in cache
  • 69. Live Compute Online Precompute Pros: ● Access to most fresh data ● Knowledge of full context ● Compute only what is necessary Cons: ● Strict Service Level Agreements ○ Must respond quickly in all cases ○ Requires high availability ● Restricted to simple algorithms Pros: ● Can handle large data ● Can run moderate complexity algorithms ● Can average computational cost across users ● Change from actions Cons: ● Has some delay ● Done in event context ● Extra compute for users and items not served See techblog for more details
  • 71. Precompute & Image Lookup ● Precompute ○ Run bandit for each title on each profile to choose personalized image ○ Store the title to image mapping in EVCache ● Image Lookup ○ Pull profile’s image mapping from EVCache once per request
  • 72. Logging & Reward ● Precompute Logging ○ Selected image ○ Exploration Probability ○ Candidate pool ○ Snapshot facts for feature generation ● Reward Logging ○ Image rendered in UI & if played ○ Precompute ID Image via YouTube
  • 73. Feature Generation & Training ● Join rewards with snapshotted facts ● Generate features using DeLorean ○ Feature encoders are shared online and offline ● Train the model using Spark ● Publish model to production DeLorean image by JMortonPhoto.com & OtoGodfrey.com
  • 74. Track the quality of the model - Compare prediction to actual behavior - Online equivalents of offline metrics Reserve a fraction of data for a simple policy (e.g. 𝝐-greedy) to sanity check bandits Monitoring and Resiliency
  • 75. ● Missing images greatly degrade the member experience ● Try to serve the best image possible Graceful Degradation Personalized Selection Unpersonalized Fallback Default Image (when all else fails)
  • 77. Online results ● A/B test: It works! ● Rolled out to our >130M member base ● Most beneficial for lesser known titles ● Competition between titles for attention leads to compression of offline metrics More details in our blog post
  • 79. More dimensions to personalize Rows Trailer Evidence Synopsis Image Row Title Metadata Ranking
  • 80. Automatic image selection ● Generating new artwork is costly and time consuming ● Can we predict performance from raw image?
  • 81. Artwork selection orchestration ● Neighboring image selection influences result Row A (microphones) Example: Stand-up comedy Row B (more variety)
  • 82. Long-term Reward: Road to Reinforcement Learning ● RL involves multiple actions and delayed reward ● Useful to maximize user long-term joy?