How to correctly estimate the effect of online advertisement(About Double Machine Learning)

•

1 like•1,656 views

Yusuke Kaneko

TokyoR 第71回でのLTスライドです．発表中も少し触れましたが、今回使用したCriteoのデータセットは高次元データとは言い難いのでややタイトル詐欺感はあります

Data & Analytics

How to correctly estimate
the effect of online
advertisement
Introduction of Double Machine Lerning

About me
● Yusuke Kaneko(@coldstart_p)
● CyberAgent, Inc.
● Studying: Econometrics, Causal Inference

Main Theme
How to estimate correctly treatment effects in
high-dimensional data(such as advertising data)

Original
1. Chernozhukov, Victor, et al. "Double/debiased/neyman machine learning of
treatment effects." American Economic Review 107.5 (2017): 261-65.
2. Belloni, A., Chernozhukov, V., Fernández‐Val, I., & Hansen, C. (2017).
Program evaluation and causal inference with high‐dimensional data.
Econometrica, 85(1), 233-298.
3. Chernozhukov, Victor, et al. "Double/debiased machine learning for treatment
and structural parameters." The Econometrics Journal 21.1 (2018): C1-C68.

Problem
● Advertisement data has a lot of variables. (= High-Dimensional Data)
example: Type of Advertisement , Time , History-data
→ Under such circumstances, it is known that low-dimensional parameters such
as treatment effects can not be estimated well by Machine Learning Methods
(The model and figure are on the slide later)
● Motivation: How to estimate treatment effects correctly ?
→ Use Double Machine Learning(Double ML) !
=( FLEXIBLE and SIMPLE Method)
● Double ML Code are on github(https://github.com/VC2015/DMLonGitHub/)
…(Double ML package does not exist yet. )

Data
● Criteo Uplift Modeling Dataset(EC Site AD Large Dataset for causal
inference)
Recently
Published
Dataset!

Data
● Features are as follows
● Averege Visit Rate and Treatment Ratio
is on the right

Model
● Partially Linear Model
Y: Outcome Variable(Whether user visited = visit)
D: Treatment Variable (Whether or not an ad has been delivered = treatment)
Z: Vector of covariates(such as history data = f0 f1 ...)
θ0: “lift” parameter(Treatment Effect parameter)
● D is expressed by the following equation
D is conditionally exogenous

Bad ML Estimation
● Predict Y using D and Z, then obtain
(Example)
1. Run Random Forest and get
2. and get
3. Repeat 1. and 2. until convergence

Bad ML Estimation
● Prediction performance is Good, but the distribution of looks like
below.
(from paper)

Double ML
1. Predict Y and D using Z by
obtained using ML methods(Lasso , Random Forest etc... )
2. Residualize
3. Regress and get

Double ML
● Prediction performance is Good, and the distribution of looks like
below.
(from paper)

Sample Splitting
● In Double ML, Sample Splitting is one key Ingredient.
→ Convergence rate improves with sample splitting(Please refer to the papers
for the theoretical content)
(from paper)

Double ML in R
● All the code for the simulation is on the github

Simulation
● Since Criteo Dataset is too large , I made overall sampling first (about 2%)
● Introduce Sampling Bias( Since we are using data coming from a randomized
experiment)

Naive ATE Method
● Unbiased ATE(Average Treatment Effect) estimation is possible with this
technique if there is no sampling bias

Simulation
● Average treatment effect without sampling bias on the left
● Average treatment effect with sampling bias on the right
Highly Biased

Simulation
● Use approach: Double ML with Lasso(You can use RandomForest instead of
Lasso)
→ Counter approach: Naive ATE, Single equation Lasso , Usual Lasso , Lasso
with Propensity Score Weighting

Simulation
● Result(with Lasso with Propensity Score Weighting) Lasso with PSW‘s
performance is
very bad , so
remove this

Simulation
● Result(without Lasso with Propensity Score Weighting)
Double ML’s
performance is
very good!

Conclusion
● Double ML is Flexible and Simple Method for causal inference
● Double ML’s performance seems very good with real data

What's hot

[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee

A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula

An introduction to deep reinforcement learningBig Data Colombia

RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemEhsan38

How Lazada ranks products to improve customer experience and conversionEugene Yan Ziyou

Intro to Deep Reinforcement LearningKhaled Saleh

多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology

[DL輪読会]GANとエネルギーベースモデルDeep Learning JP

Neural Processes FamilyKota Matsui

Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks

IIBMP2016 深層生成モデルによる表現学習Preferred Networks

Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee

機械学習で嘘をつく話Satoshi Hara

2018年01月27日 TensorBoardによる学習の可視化aitc_jp

Sequential Decision Making in RecommendationsJaya Kawale

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks

Come-Closer-Diffuse-Faster Accelerating Conditional Diffusion Models for Inve...Chung Hyung Jin

[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured PredictionDeep Learning JP

Icml2018読み会_overview&GANsKentaro Tachibana

Uplift Modeling Workshopodsc

What's hot (20)

[1312.5602] Playing Atari with Deep Reinforcement Learning

A brief overview of Reinforcement Learning applied to games

An introduction to deep reinforcement learning

RecSysOps: Best Practices for Operating a Large-Scale Recommender System

How Lazada ranks products to improve customer experience and conversion

Intro to Deep Reinforcement Learning

多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)

[DL輪読会]GANとエネルギーベースモデル

Neural Processes Family

Unified Approach to Interpret Machine Learning Model: SHAP + LIME

IIBMP2016 深層生成モデルによる表現学習

Reinforcement Learning 8: Planning and Learning with Tabular Methods

機械学習で嘘をつく話

2018年01月27日 TensorBoardによる学習の可視化

Sequential Decision Making in Recommendations

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Come-Closer-Diffuse-Faster Accelerating Conditional Diffusion Models for Inve...

[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction

Icml2018読み会_overview&GANs

Uplift Modeling Workshop

Similar to How to correctly estimate the effect of online advertisement(About Double Machine Learning)

Faster and cheaper, smart ab experiments - public ver.Marsan Ma

IntroductioneditedMefratechnologies

Nbe rcausalpredictionv111 lecture2NBER

PREDICT 422 - Module 1.pptxVikramKumar790542

DissertationMefratechnologies

IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar

Adapting neural networks for the estimation of treatment effectsViswanath Gangavaram

AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfMargiShah29

Housing price predictionAbhimanyu Dwivedi

Experimental designs and data analysis in the field of Agronomy science by ma...Manoj Sharma

Setting up an A/B-testing frameworkAgnes van Belle

Internship project report,Predictive ModellingAmit Kumar

housepriceprediction-180915174356.pdfVinayShekarReddy

Machine Learning.pptxNitinSharma134320

Data Science - Part V - Decision Trees & Random Forests Derek Kane

Probability distribution Function & Decision Trees in machine learningSadia Zafar

Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...INFOGAIN PUBLICATION

Classification of Headache Disorder Using Random Forest Algorithm.pptxAlemu Gudeta

debatrim_report (1)Debatri Mitra

How ml can improve purchase conversionsSudeep Shukla

Similar to How to correctly estimate the effect of online advertisement(About Double Machine Learning) (20)

Faster and cheaper, smart ab experiments - public ver.

Introductionedited

Nbe rcausalpredictionv111 lecture2

PREDICT 422 - Module 1.pptx

Dissertation

IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES

Adapting neural networks for the estimation of treatment effects

AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf

Housing price prediction

Experimental designs and data analysis in the field of Agronomy science by ma...

Setting up an A/B-testing framework

Internship project report,Predictive Modelling

housepriceprediction-180915174356.pdf

Machine Learning.pptx

Data Science - Part V - Decision Trees & Random Forests

Probability distribution Function & Decision Trees in machine learning

Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...

Classification of Headache Disorder Using Random Forest Algorithm.pptx

debatrim_report (1)

How ml can improve purchase conversions

Recently uploaded

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Week-01-2.ppt BBB human Computer interactionfulawalesam

Invezz.com - Grow your wealth with trading signalsInvezz1

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Introduction-to-Machine-Learning (1).pptxfirstjob4

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Recently uploaded (20)

Capstone Project on IBM Data Analytics Program

Generative AI on Enterprise Cloud with NiFi and Milvus

Week-01-2.ppt BBB human Computer interaction

Invezz.com - Grow your wealth with trading signals

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Smarteg dropshipping via API with DroFx.pptx

VidaXL dropshipping via API with DroFx.pptx

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl

Log Analysis using OSSEC sasoasasasas.pptx

Mature dropshipping via API with DroFx.pptx

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

BabyOno dropshipping via API with DroFx.pptx

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Introduction-to-Machine-Learning (1).pptx

Edukaciniai dropshipping via API with DroFx

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Determinants of health, dimensions of health, positive health and spectrum of...

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

CebaBaby dropshipping via API with DroFX.pptx

How to correctly estimate the effect of online advertisement(About Double Machine Learning)

1. How to correctly estimate the effect of online advertisement Introduction of Double Machine Lerning

2. About me ● Yusuke Kaneko(@coldstart_p) ● CyberAgent, Inc. ● Studying: Econometrics, Causal Inference

3. Main Theme How to estimate correctly treatment effects in high-dimensional data(such as advertising data)

4. Original 1. Chernozhukov, Victor, et al. "Double/debiased/neyman machine learning of treatment effects." American Economic Review 107.5 (2017): 261-65. 2. Belloni, A., Chernozhukov, V., Fernández‐Val, I., & Hansen, C. (2017). Program evaluation and causal inference with high‐dimensional data. Econometrica, 85(1), 233-298. 3. Chernozhukov, Victor, et al. "Double/debiased machine learning for treatment and structural parameters." The Econometrics Journal 21.1 (2018): C1-C68.

5. Problem ● Advertisement data has a lot of variables. (= High-Dimensional Data) example: Type of Advertisement , Time , History-data → Under such circumstances, it is known that low-dimensional parameters such as treatment effects can not be estimated well by Machine Learning Methods (The model and figure are on the slide later) ● Motivation: How to estimate treatment effects correctly ? → Use Double Machine Learning(Double ML) ! =( FLEXIBLE and SIMPLE Method) ● Double ML Code are on github(https://github.com/VC2015/DMLonGitHub/) …(Double ML package does not exist yet. )

6. Data ● Criteo Uplift Modeling Dataset(EC Site AD Large Dataset for causal inference) Recently Published Dataset!

7. Data ● Features are as follows ● Averege Visit Rate and Treatment Ratio is on the right

8. Model ● Partially Linear Model Y: Outcome Variable(Whether user visited = visit) D: Treatment Variable (Whether or not an ad has been delivered = treatment) Z: Vector of covariates(such as history data = f0 f1 ...) θ0: “lift” parameter(Treatment Effect parameter) ● D is expressed by the following equation D is conditionally exogenous

9. Bad ML Estimation ● Predict Y using D and Z, then obtain (Example) 1. Run Random Forest and get 2. and get 3. Repeat 1. and 2. until convergence

10. Bad ML Estimation ● Prediction performance is Good, but the distribution of looks like below. (from paper)

11. Double ML 1. Predict Y and D using Z by obtained using ML methods(Lasso , Random Forest etc... ) 2. Residualize 3. Regress and get

12. Double ML ● Prediction performance is Good, and the distribution of looks like below. (from paper)

13. Sample Splitting ● In Double ML, Sample Splitting is one key Ingredient. → Convergence rate improves with sample splitting(Please refer to the papers for the theoretical content) (from paper)

14. Double ML in R ● All the code for the simulation is on the github

15. Simulation ● Since Criteo Dataset is too large , I made overall sampling first (about 2%) ● Introduce Sampling Bias( Since we are using data coming from a randomized experiment)

16. Naive ATE Method ● Unbiased ATE(Average Treatment Effect) estimation is possible with this technique if there is no sampling bias

17. Simulation ● Average treatment effect without sampling bias on the left ● Average treatment effect with sampling bias on the right Highly Biased

18. Simulation ● Use approach: Double ML with Lasso(You can use RandomForest instead of Lasso) → Counter approach: Naive ATE, Single equation Lasso , Usual Lasso , Lasso with Propensity Score Weighting

19. Simulation ● Result(with Lasso with Propensity Score Weighting) Lasso with PSW‘s performance is very bad , so remove this

20. Simulation ● Result(without Lasso with Propensity Score Weighting) Double ML’s performance is very good!

21. Conclusion ● Double ML is Flexible and Simple Method for causal inference ● Double ML’s performance seems very good with real data

How to correctly estimate the effect of online advertisement(About Double Machine Learning)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to correctly estimate the effect of online advertisement(About Double Machine Learning)

Similar to How to correctly estimate the effect of online advertisement(About Double Machine Learning) (20)

More from Yusuke Kaneko

More from Yusuke Kaneko (7)

Recently uploaded

Recently uploaded (20)

How to correctly estimate the effect of online advertisement(About Double Machine Learning)