Introduction to Uplift Modelling

Introduction to Uplift Modelling
An online gaming application

A few words about me
•  Senior Data Scientist at Dataiku
(worked on churn prediction, fraud detection, bot detection, recommender systems, graph
analytics, smart cities, … )
•  Occasional Kaggle competitor
•  Mostly code with python and SQL
•  Twitter @prrgutierrez

Plan
•  Introduction / Clients situation
•  Uplift use case examples
•  Uplift modelling
•  Uplift evaluation & results

Client situation
•  French Online Gaming Company (RPG)
•  A lot of users are leaving
•  let’s do a churn prediction model !
•  Target : no come back in 14 or 28 days.
(14 missing days -> 80 % of chance not to come back
28 missing days -> 90 % of chance not to come back)
•  Features :
•  Connection features :
•  Time played in 1,7,15,30,… days
•  Time since last connection
•  Connection frequency
•  Days of week / hours of days played
•  Equivalent for payments and subscriptions
•  Age, sex, country
•  Number of account, is a bot …
•  No in game features (no data)

Client situation
•  Model Results :
•  AUC 0.88
•  Very stable model
•  Marketing actions :
•  7 diﬀerent actions based on customer segmentation
(oﬀers, promotion, … )
•  A/B test
-> -5 % churn for persons contacted by email
•  Going further :
•  Feature engineering : guilds, close network, in game actions, …
•  Study long term churn …

Client situation
•  But wait !
•  Strong hypothesis : target the person that are the most likely to churn

Client situation
•  But wait !
•  What is the gain / person for an action ?
•  cost of action
•  value of the customer
•  independent variables
•  “treated” population and “control” population
• 
•  Value with action :
•  Value without action :
•  Gain (if independent of treatment ) :
c
vi i
X
T C
Y =
⇢
1 if customer churn
0 otherwise
ET
(Vi) = vi(1 PT
(Y = 1|X)) c
EC
(Vi) = vi(1 PC
(Y = 1|X))
vi
E(Gi) = vi(PC
(Y = 1|X) PT
(Y = 1|X)) c

Client situation
•  But wait !
•  What is the gain / person for an action ?
•  Objective : maximize this gain
•  Targeting highly probable churner -> minimize
But not the diﬀerence !
•  Intuitive examples :
•  : action is expected to make the situation worst. Spam ?
•  : user does not care, is already lost
Upli&
=
Model

E(Gi) = vi(PC
(Y = 1|X) PT
(Y = 1|X)) c
PT
(Y = 1|X)
PC
(Y = 1) ⇡ PT
(Y = 1)
P
PC
(Y = 1) < PT
(Y = 1)

Uplift
•  Model eﬀect of the action
•  4 groups of customers / patients
•  1 Responded because of the action
(the people we want)
•  2 Responded, but would have responded anyway
(unnecessary costs)
•  3 Did not respond and the action had no impact
(unnecessary costs)
•  4 Did not respond because the action had a negative impact
(negative impact)
•  Incomplete knowledge

Uplift Examples
•  Healthcare :
•  A typical medical trial:
•  treatment group: gets the treatment
•  control group: gets placebo (or another treatment)
•  do a statistical test to show that the treatment is better than placebo
•  With uplift modeling we can find out for whom the treatment works best
•  Personalized medicine
•  Ex : What is the gain in survival probability ?
-> classification/uplift problem

Uplift Examples
•  Churn :
•  E-gaming
•  Other Ex : Coyote
•  Retail :
•  Compare coupons campaigns

Uplift Examples
•  Mailing : Hillstrom challenge
•  2 campaigns :
•  one men email
•  one woman email
•  Question : who are the people to target / that have the best response rate

Uplift Examples
•  Common pattern
•  Experiment or A/B testing -> Test and control
•  Warning : Control can be biased easily :
•  Targeted most probable churners and control is the rest
•  Call only the people that come to a shop
•  Limited experiment trial -> no bandit algorithm :
(once a medicine experiment is done, you don’t continue the “exploration”)
-> relatively large and discrete in time feedbacks.

Uplift modelling
•  Three main methods :
•  Two models approach
•  Class variable modification
•  Modification of existing machine learning models

Uplift modelling : Two model approach
•  Build a model on treatment to get
•  Build a model on control to get
•  Set :
PT
(Y |X)
PC
(Y |X)
P = PT
(Y |X) PC
(Y |X)

Uplift modelling : Two model approach
•  Advantages :
•  Standard ML models can be used
•  In theory, two good estimators -> a good uplift model
•  Works well in practice
•  Generalize to regression and multi-treatment easily
•  Drawbacks
•  Difference of estimators is probably not the best estimator of the difference
•  The two classifier can ignore the weaker uplift signal (since it’s not their target)
•  Algorithm focusing on estimating the difference should perform better

Uplift modelling : Class variable modification
•  Introduced in Jaskowski, Jaroszewicz 2012
•  Allows any classifier to be updated to uplift modeling
•  Let denote the group membership (Treatment or Control)
•  Let’s define the new target variable :
•  This corresponds to flipping the target in the control dataset.
G 2 {T, C}
Z =
8
<
:
1 if G = T and Y = 1
1 if G = C and Y = 0
0 otherwise

•  Summary :
•  Flip class for control dataset
•  Concatenate test and control dataset
•  Build a classifier
•  Target users with highest probability
•  Advantages :
•  Any classifier can be used
•  Directly predict uplift (and not each class separately)
•  Single model on a larger dataset (instead of two small ones)
•  Drawbacks :
•  Complex decision surface -> model can perform poorly
•  Interpretation : what is AUC in this case ?

Uplift modeling : Other methods
•  Based on decision trees :
•  Rzepakowski Jaroszewicz 2012
new decision tree split criterion based on information theory
•  Soltys Rzepakowski Jaroszewicz 2013
Ensemble methods for uplift modeling
(out of today scope)

Evaluation
•  We used :
•  2 model approach. -> AUC ? Not very informative.
•  1 model approach -> does AUC means something ?
•  How can we evaluate / compare them ?
•  Cross Validation :
•  4 datasets : treatment/control x train/test
•  Problem :
•  We don’t have a clear 0/1 target.
•  We would need to know for each customer
•  Response to treatment
•  Response to control
-> not possible

Evaluation
•  Gain for group of customers :
•  Gain for the 10% highest scoring customers =
% of successes for top 10% treated customers − % of successes for top 10% control
customers
•  Uplift curve ? :
•  Diﬀerence between two lift curve
•  Interpretation : net gain in success rate if a given percentage of the population is treated
•  Pb : no theoretic maximum
•  Pb 2 : weird behaviour for 2 wizard models.

Evaluation : Qini
•  Qini Measure :
•  Similar to Gini (Area under lift curve). Lift Curve <-> Qini Curve
•  Parametric curve defined by :
•  When taking the first observations
•  is the total number of 1 seen in target observations
•  is the total number of 1 seen in control observations
•  is the total number of target observations
•  is the total number of control observations
•  Balanced setting :
t
f(t) = YT (t) YC(t) ⇤ NC(t)/NT (t)
YT
YC
NC
NT
f(t) = YT (t) YC(t)

Evaluation : Qini
•  Personal intuition :
•  We can’t know everything :
•  treated that convert, not treated that don’t convert. What would have happen ?
•  But we don’t want to see :
•  Treated not converting
•  Not treated converting (in our top list)
•  In we want to minimize :
•  Very similar to lift taking into account only negative examples.
t
NT (t) YT (t) + YC(t)

Evaluation : Qini
f(t) = YT (t) YC(t)

Evaluation : Qini
•  Best model :
•  Take first all positive in target and last all positive in control.
•  No theoretic best model :
•  depends on possibility of negative effect
•  Displayed for no negative effect
•  Random model :
•  Corresponds to global effect of treatment
•  Hillstrom Dataset :
•  For women models are comparable and useful
•  For men, there is no clear individuals to target

Evaluation : Qini
•  Back to our study :
•  Class modification performs best
•  Two models approach performs poorly
•  A/B test failure :
•  Control dataset is way to small !
•  Class modification model very close to lift
•  Two model slightly better than random
-> need to redo the A/B test.

Conclusion
•  Uplift :
•  Surprisingly little literature / examples
•  The theory is rather easy to test
•  Two models
•  Class modification
•  The intuition and evaluation are not easy to grasp
•  On the client side :
•  I don’t loose hope we’ll do the A/B test again
•  A good lead to select the best oﬀer for a customer

A few references
•  Data :
•  Churn in gaming :
WOWAH dataset (blog post to come)
•  Uplift for healthcare :
Colon Dataset
•  Uplift in mailing :
Hillstrom data challenge
•  Uplift in General :
Simulated data :
(blog post to come)

A few references
•  Application
•  Uplift modeling for clinical trial data (Jaskowski, Jaroszewicz)
•  Uplift Modeling in Direct Marketing (Rzepakowski, Jaroszewicz)

A few references
•  Modeling techniques :
•  Rzepakowski Jaroszewicz 2011 (decision trees)
•  Soltys Rzepakowski Jaroszewicz 2013 (ensemble for uplift)
•  Jaskowski Jaroszewicz 2012 (Class modification model)

A few references
•  Evaluation
•  Using Control Groups to Target on Predicted Lift (Radcliﬀe)
•  Testing a New Metric for Uplift Models (Mesalles Naranjo)

Thank you for your attention !

Introduction to Uplift Modelling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Uplift Modelling

Similar to Introduction to Uplift Modelling (20)

Recently uploaded

Recently uploaded (20)

Introduction to Uplift Modelling