Déjà Vu: The Importance of Time and Causality in Recommender Systems

Déjà Vu
The Importance of Time and
Causality in Recommender
Systems
Justin Basilico & Yves Raimond
August 29, 2017
@JustinBasilico @moustaki

But first…
Goodbye
& Cinematch

But first…
Goodbye
& Cinematch
% Match
Hello
&

But first…
Goodbye
Why?
+200% ratings volume
Clear link to personalization
& Cinematch
% Match
Hello
&

Image from Domiriel (cc by-nc)

● This moment can be controlled
by the user
○ Visit time
○ Session length
● … or influenced by the system
○ Notifications, emails
● And must choose an action
○ … that has consequences
Recommendations are actions
at a moment in time
● Time and causality are critical
aspects in any recommender
system
○ Data collection
○ Experiment design (offline & online)
○ Algorithm & objective design
○ System design

Data collection
Observed
labels
Training
time
Serving
time
Serving
input data
collected

Violation of the space-time
continuum!
Observed
labels
Training
input data
collected
Training
time
Serving
time
Serving
input data
collected

Data collection
Observed
labels
Training
input data
collected
Training
time
Serving
time
Serving
input data
collected

Time machines
Observed
labels
Training
input data
collected
Training
time
Serving
time
Serving
input data
collected
Distributed Time Travel for Feature Generation

● Be careful when splitting dataset
○ Don’t overfit the past
○ Predict the future
● Rule of thumb: Split across what you need to generalize
○ Time!
○ Users or Items?
● May need to train/test at multiple distinct time points to see
generalization across time (e.g. [Lathia et. al., 2009])
● Simulate system behaviors (e.g. training and publishing
delays) in evaluation pipeline
○ Helps capture trade-off between accuracy and responsiveness
Experiment design
Train
Time
Test

Time-aware
recommendation
algorithms

?
?
Users changing over time
Nonstationarity

Items changing over time
popularity
time
Learned item
bias
Actual item popularity
Item launch
Item becomes
available

● Aggregation
○ Decay functions (e.g. [Ding, Li, 2005])
○ Buckets (e.g. [Zimdars, Chickering, Meek, 2001])
● Extrapolation (e.g. [Koren, 2009])
● Sequences
○ Markov (e.g. [Rendle, et al., 2010])
○ Last N (e.g. [Shani, Heckerman, Brafman, 2005])
○ RNNs (e.g. [Hidasi et al., 2015])
● Features
○ Discretized (e.g. [Baltrunas, Amatriain, 2009])
○ Continuous (e.g. example age in [Covington et. al., 2016])
Some modeling approaches time

● Generalizing to future behaviors through
temporal extrapolation
● Time exhibits many periodicities
○ Daily
○ Weekly
○ Seasonally
○ … and even longer: Olympics, elections, etc.
● Additional periodic time context features
can be added or extracted
Time as context Experiment on a Netflix
internal dataset

● Recommendation systems are a means to an end
○ Reward = enjoyment - interaction cost
○ Enjoyment integrated over time (e.g. goodness * length of view)
○ Interaction cost integrated over time
○ Don’t waste your users time
○ Magnitudes of enjoyment and cost may be user-specific
● Maximize enjoyment of the selected item while minimizing time it
takes to find the item
Minimizing interaction time

Hangul alphabet, 3 syllables but
requires 7 (2 + 3 + 2) interactionsClick

With a model optimized to minimize
interaction time: one interaction
Click

Time-aware
recommender systems

Algorithms changing
Idea
Offline
experimentation
Online
experimentation
(A/B)
Rollout

Algorithm C
Algorithm B
Algorithm A
Algorithms changing
Idea
Offline
experimentation
Online
experimentation
(A/B)
Rollout

Algorithm C
Algorithm B
Algorithm A
Algorithms changing
Idea
Offline
experimentation
Online
experimentation
(A/B)
Rollout
Assumes stationarity! A change in other parts of the system
might invalidate previous (offline or online) results.
Holdback A/B tests as part of rollout can help.

UX changing over time
% Match&

Feedback loops
Impression bias
inflates plays
Leads to inflated
item popularity
More plays
More
impressions
Oscillations in
distribution of genre
recommendations
Feedback loops can cause biases to be
reinforced by the recommendation system!

Closed Loop
Training
Data
Watches Model
Recs

Closed Loop
Training
Data
Watches Model
Recs
Danger Zone

Closed Loop
Training
Data
Watches Model
Recs
Danger Zone
Search
Training
Data
Watches Model
Recs
Open Loop

Open vs. Closed Loops
[Based on Steck, 2013 with system as selector]
Watch when
rec
Probability
of rec
Watch when
not rec
Probability
of not rec

Watch when
rec
Probability
of rec
Watch when
not rec
Probability
of not rec
Closed loop: 0
Open loop: > 0

Watch when
rec
Probability
of rec
Watch when
not rec
Probability
of not rec
Closed loop: 0
Open loop: > 0
We have control
over this

● Maintain some controlled exploration to break
feedback loop and handle non-stationarities
● Explore with -greedy, Thompson Sampling, etc.
● Control to avoid significantly degrading user
experience
● Log as much as possible
○ Include counterfactuals: What maximal action
system wanted to do (e.g. [Bottou et al., 2013])
Controlled stochasticity
Explore
Explore

Replay Metrics
Observed
reward
Existing
recommendation
algorithm (with
stochasticity)
Observed
reward
New recommendation
algorithm
[Li et al., 2011; Dudik, Langford, Li, 2014]
Simulate online metrics, offline!

● Stochasticity opens the door to using causal inference
● Inverse Propensity Weighting
○ Reduce production bias by reweighting train and test data
○ Know probability of user receiving an impression
○ Doesn’t handle simultaneity and other endogeneity
● Covariate shift
○ Use explore data to estimate bias in other data
○ Use all data to train
● Instrumental variables for more general settings
Causality
[Schnabel et al., 2016; Liang, Charlin, Blei, 2016; Smola, 2011, Sugiyama, Kawanabe, 2012]

● Most recommendations (and ML) models are correlational
○ These items are correlated with these types of users
● But we seek causal actions
○ Showing this item is rewarding for this user
● Our recommendation action should have an incremental
effect in reward: E[r(a)] - E[r(∅)]
○ Application-dependent choice of ∅
○ Sometimes it may be better not to provide a recommendation that
simply maximizes p(vi
|u)
○ May provide less obvious recommendations
Incrementality
p(vi
|∅) p(vi
|a)
Incremental
effect

● Gold standard of causality
○ Random assignment
○ Measured across time
○ Incremental benefit of treatment
● Causality safety net?
○ Hard to test with full feedback loop effects
○ An algorithm may behave differently when
training off its own data
○ Holdback tests
A/B Testing
Time
A
(Control)
B
(Treatment)
Significant?
Metrics

● After users and items, time is usually the next most important
factor in recommendation systems
○ Model it as such
○ Evaluate it as such
○ Make it central to your system and infrastructure
● Recommender systems act in a causal loop
○ Influenced by themselves and others
○ Be thoughtful about feedback effects
Takeaways

Thank you.
@JustinBasilico @moustaki
Justin Basilico & Yves Raimond
Yes, we’re hiring...

Déjà Vu: The Importance of Time and Causality in Recommender Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Déjà Vu: The Importance of Time and Causality in Recommender Systems

Similar to Déjà Vu: The Importance of Time and Causality in Recommender Systems (20)

Recently uploaded

Recently uploaded (20)

Déjà Vu: The Importance of Time and Causality in Recommender Systems