SlideShare a Scribd company logo
1 of 127
Download to read offline
I listen to ~ 100 Bln ad opportunities daily
I listen to ~ 100 Bln ad opportunities daily
I respond with optimal bids within milliseconds
I listen to ~ 100 Bln ad opportunities daily
I respond with optimal bids within milliseconds
I petabytes of data (ad impressions, visits, clicks, conversions)
Predicting user response to ads is a Machine-Learning problem.
Predicting user response to ads is a Machine-Learning problem.
but quantifying impact of ad-exposure is a Measurement probem.
Spark: existing vs simulated data
Most Spark applications process existing big data-sets.
Spark: existing vs simulated data
Most Spark applications process existing big data-sets.
Today we’re talking about analyzing simulated big data
Key Conceptual Take-aways
I Issues in Ad lift measurement
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definition
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definition
I Confidence bounds
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definition
I Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definition
I Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definition
I Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definition
I Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-bounds
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definition
I Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-bounds
I Monte Carlo simulations
Application context: ad impact measurement
I Advertisers want to know the impact of showing ads to users.
Measuring Ad Impact: Two Approaches
I Observational studies:
Measuring Ad Impact: Two Approaches
I Observational studies:
I Compare uses who happen to be exposed vs not exposed
Measuring Ad Impact: Two Approaches
I Observational studies:
I Compare uses who happen to be exposed vs not exposed
I Bias a big issue
Measuring Ad Impact: Two Approaches
I Observational studies:
I Compare uses who happen to be exposed vs not exposed
I Bias a big issue
I Randomized tests:
Measuring Ad Impact: Two Approaches
I Observational studies:
I Compare uses who happen to be exposed vs not exposed
I Bias a big issue
I Randomized tests:
I Randomly expose to test, compare with control (un-exposed)
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test: Ad lift
Ideal Randomized Test: Ad lift
Ad Lift: Response Rates
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Ad Lift: Response Rates
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . .
Ad Lift: Response Rates
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . .
But how confident are we?
Response Rate 90% Confidence Bounds
Response Rate 90% Confidence Bounds
P(R > ˆR | r = q5) = 5%
Response Rate 90% Confidence Bounds
P(R > ˆR | r = q5) = 5%
P(R < ˆR | r = q95) = 5%
Response-Rate Confidence Bounds
Response-Rate Confidence Bounds
Response-Rate Confidence Bounds
Response-Rate Confidence Bounds
How to find (q5, q95) ?
Response-Rate: Bayesian Confidence Bounds
Randomly generate response rates that are consistent with the data.
Response-Rate: Bayesian Confidence Bounds
Randomly generate response rates that are consistent with the data.
(Sample rates from posterior distribution given data.)
Response-Rate: Bayesian Confidence Bounds
Randomly generate response rates that are consistent with the data.
(Sample rates from posterior distribution given data.)
Find the (0.05, 0.95) quantiles of these rates.
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r, with a prior distrib. p(r)
I assume p(r) = Beta(1, 1) = Unif (0, 1)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r, with a prior distrib. p(r)
I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r, with a prior distrib. p(r)
I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)
à rk
(1 ≠ r)N≠k
· Beta(1, 1)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r, with a prior distrib. p(r)
I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)
à rk
(1 ≠ r)N≠k
· Beta(1, 1)
à rk+1
(1 ≠ r)N≠k+1
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r, with a prior distrib. p(r)
I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)
à rk
(1 ≠ r)N≠k
· Beta(1, 1)
à rk+1
(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r, with a prior distrib. p(r)
I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)
à rk
(1 ≠ r)N≠k
· Beta(1, 1)
à rk+1
(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
I Compute (0.05, 0.95) quantiles from the generated rates.
Response-Rate: Bayesian Confidence Bounds
A simple form of Gibbs Sampling (more later):
I sample M values of r from posterior
P(r | k) ≥ Beta(k + 1, N ≠ k + 1).
I compute (0.05, 0.95) quantiles
Response-Rate: Bayesian Confidence Bounds
A simple form of Gibbs Sampling (more later):
I sample M values of r from posterior
P(r | k) ≥ Beta(k + 1, N ≠ k + 1).
I compute (0.05, 0.95) quantiles
from numpy.random import beta
from scipy.stats.mstats import mquantiles
def conf(N, k, samples = 500):
rates = beta(k+1, N-k+1, samples)
return mquantiles(rates, prob = [0.05, 0.95])
Response-Rate: Bayesian Confidence Bounds
Response-Rate: Bayesian Confidence Bounds
Response-Rate: Bayesian Confidence Bounds
Response-Rate: Bayesian Confidence Bounds
Response-Rate: Bayesian Confidence Bounds
Response Rates: Example
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . .
Response Rates: Example
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . .
=∆ 90% confidence region (1.8%, 2.2%)
We’ve talked about Response Rates. . .
now let’s consider Ad Lift
Ad Lift: Simple Example
I control: 10,000 users, 200 conversions
I test: 100,000 users, 2200 conversions
Observed response-rates:
I control: ˆRc = 200/10, 000 = 2%
I test: ˆRt = 2200/100, 000 = 2.2%
Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
Ad Lift: Simple Example
I control: 10,000 users, 200 conversions
I test: 100,000 users, 2200 conversions
Observed response-rates:
I control: ˆRc = 200/10, 000 = 2%
I test: ˆRt = 2200/100, 000 = 2.2%
Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
This is a great lift !
Ad Lift: Simple Example
I control: 10,000 users, 200 conversions
I test: 100,000 users, 2200 conversions
Observed response-rates:
I control: ˆRc = 200/10, 000 = 2%
I test: ˆRt = 2200/100, 000 = 2.2%
Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
This is a great lift !
Not so fast! Is this a reliable estimate?
Ad Lift: Simple Example
I control: 10,000 users, 200 conversions
I test: 100,000 users, 2200 conversions
Observed response-rates:
I control: ˆRc = 200/10, 000 = 2%
I test: ˆRt = 2200/100, 000 = 2.2%
Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
This is a great lift !
Not so fast! Is this a reliable estimate?
Could true lift ¸ be 0%, or even negative ?
Ad Lift: Bayesian Confidence Bounds
Sampling approach:
Observed data: control: (kc, Nc), test: (kt, Nt)
1. Repeat M times:
Ad Lift: Bayesian Confidence Bounds
Sampling approach:
Observed data: control: (kc, Nc), test: (kt, Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
Ad Lift: Bayesian Confidence Bounds
Sampling approach:
Observed data: control: (kc, Nc), test: (kt, Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
Ad Lift: Bayesian Confidence Bounds
Sampling approach:
Observed data: control: (kc, Nc), test: (kt, Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
I compute lift L = rt/rc ≠ 1
Ad Lift: Bayesian Confidence Bounds
Sampling approach:
Observed data: control: (kc, Nc), test: (kt, Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
I compute lift L = rt/rc ≠ 1
2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.
Ad Lift: Bayesian Confidence Intervals
I control: nc = 10, 000 users, kc = 200 conversions
I test: nt = 100, 000 users, kt = 2, 200 conversions
Observed response-rates:
I control: ˆRc = 200/10, 000 = 2%
I test: ˆRt = 2200/100, 000 = 2.2%
Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
Ad Lift: Bayesian Confidence Intervals
I control: nc = 10, 000 users, kc = 200 conversions
I test: nt = 100, 000 users, kt = 2, 200 conversions
Observed response-rates:
I control: ˆRc = 200/10, 000 = 2%
I test: ˆRt = 2200/100, 000 = 2.2%
Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
90% confidence interval: (≠2.7%, 23.6%)
Complication 1:
Auction win-bias
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test
Bids on control users are wasted!
A Less Wasteful Randomized Test
A Less Wasteful Randomized Test: Win-bias
Cannot simply compare Test Winners (tw) and Control (c):
I test-winners selection bias: “win bias”
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R0
tw =
Rc ≠ (1 ≠ w)RtL
w
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R0
tw =
Rc ≠ (1 ≠ w)RtL
w
I compute lift L = R1
tw /R0
tw ≠ 1
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R0
tw =
Rc ≠ (1 ≠ w)RtL
w
I compute lift L = R1
tw /R0
tw ≠ 1
I similar to Treatment E ect Under Non-compliance in clinicial
trials.
Ad Lift Estimation
How to compute the 90% confidence interval for L?
Ad Lift: Confidence Intervals with Gibbs sampler
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviors
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviors
I their probabilities
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviors
I their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviors
I their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviors
I their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviors
I their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
Ad Lift: Confidence Intervals
Ad Lift: Confidence Intervals
Gibbs sampler convergence may depend on prior distribution:
I start with multiple (say 100) priors
I run them all in parallel using Spark.
Uses of Monte Carlo Simulations
I confidence intervals
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su cient” population sizes for reliably estimating
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su cient” population sizes for reliably estimating
I response rates
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su cient” population sizes for reliably estimating
I response rates
I lift
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su cient” population sizes for reliably estimating
I response rates
I lift
I understand e ect of complex phenomena
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su cient” population sizes for reliably estimating
I response rates
I lift
I understand e ect of complex phenomena
I validate/verify analytical formulas
Complication 2:
Control contamination due to users with multiple cookies
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Cookie-Contamination Questions
I How does cookie contamination a ect measured lift?
Cookie-Contamination Questions
I How does cookie contamination a ect measured lift?
I Does the cookie-distribution matter?
Cookie-Contamination Questions
I How does cookie contamination a ect measured lift?
I Does the cookie-distribution matter?
I everyone has k cookies vs an average of k cookies
Cookie-Contamination Questions
I How does cookie contamination a ect measured lift?
I Does the cookie-distribution matter?
I everyone has k cookies vs an average of k cookies
I What is the influence of the control percentage?
Cookie-Contamination Questions
I How does cookie contamination a ect measured lift?
I Does the cookie-distribution matter?
I everyone has k cookies vs an average of k cookies
I What is the influence of the control percentage?
I Simulations best way to understand this
Simulations for cookie-contamination
I A scenario is a combination of parameters:
I M = # trials for this scenario, usually 10K-1M
I n = # users, typically 10K - 10M
I p = # control percentage (usually 10-50%)
I k = cookie-distribution, expressed as 1 : 100, or 1 : 70, 3 : 30
I r = (un-contaminated) control user response rate
I a = true lift, i.e. exposed user response rate = r ú (1 + a).
I A scenario file specifies a scenario in each row.
I could be thousands of scenarios
Scenario Simulations in Spark
Scenario Simulations in Spark
Scenario Simulations in Spark
Scenario Simulations in Spark
Scenario Simulations in Spark
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

More Related Content

Similar to Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)Rahul Pal
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowPaper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowMin-Yih Hsu
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)
 
Quantitative techniques
Quantitative techniquesQuantitative techniques
Quantitative techniquesAsif Bodla
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.pptRufesh
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Sease
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Marketing Engineering Notes
Marketing Engineering NotesMarketing Engineering Notes
Marketing Engineering NotesFelipe Affonso
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
modelperfcheatsheet.pdf
modelperfcheatsheet.pdfmodelperfcheatsheet.pdf
modelperfcheatsheet.pdfERNESTOVEIGA
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionBennoG1
 
mcp-bandits.pptx
mcp-bandits.pptxmcp-bandits.pptx
mcp-bandits.pptxBlackrider9
 

Similar to Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha (20)

Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Data mining model
Data mining modelData mining model
Data mining model
 
Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)Supervised learning - Linear and Logistic Regression( AI, ML)
Supervised learning - Linear and Logistic Regression( AI, ML)
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data FlowPaper Study - Demand-Driven Computation of Interprocedural Data Flow
Paper Study - Demand-Driven Computation of Interprocedural Data Flow
 
working with python
working with pythonworking with python
working with python
 
Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...Scalable trust-region method for deep reinforcement learning using Kronecker-...
Scalable trust-region method for deep reinforcement learning using Kronecker-...
 
Quantitative techniques
Quantitative techniquesQuantitative techniques
Quantitative techniques
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.ppt
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
Rouault sfn2014
Rouault sfn2014 Rouault sfn2014
Rouault sfn2014
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Marketing Engineering Notes
Marketing Engineering NotesMarketing Engineering Notes
Marketing Engineering Notes
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
modelperfcheatsheet.pdf
modelperfcheatsheet.pdfmodelperfcheatsheet.pdf
modelperfcheatsheet.pdf
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear Regression
 
mcp-bandits.pptx
mcp-bandits.pptxmcp-bandits.pptx
mcp-bandits.pptx
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Recently uploaded (20)

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha

  • 1.
  • 2.
  • 3. I listen to ~ 100 Bln ad opportunities daily
  • 4. I listen to ~ 100 Bln ad opportunities daily I respond with optimal bids within milliseconds
  • 5. I listen to ~ 100 Bln ad opportunities daily I respond with optimal bids within milliseconds I petabytes of data (ad impressions, visits, clicks, conversions)
  • 6. Predicting user response to ads is a Machine-Learning problem.
  • 7. Predicting user response to ads is a Machine-Learning problem. but quantifying impact of ad-exposure is a Measurement probem.
  • 8. Spark: existing vs simulated data Most Spark applications process existing big data-sets.
  • 9. Spark: existing vs simulated data Most Spark applications process existing big data-sets. Today we’re talking about analyzing simulated big data
  • 10. Key Conceptual Take-aways I Issues in Ad lift measurement
  • 11. Key Conceptual Take-aways I Issues in Ad lift measurement I Proper definition
  • 12. Key Conceptual Take-aways I Issues in Ad lift measurement I Proper definition I Confidence bounds
  • 13. Key Conceptual Take-aways I Issues in Ad lift measurement I Proper definition I Confidence bounds I Bayesian Methods for Ad Lift Confidence Bounds
  • 14. Key Conceptual Take-aways I Issues in Ad lift measurement I Proper definition I Confidence bounds I Bayesian Methods for Ad Lift Confidence Bounds I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
  • 15. Key Conceptual Take-aways I Issues in Ad lift measurement I Proper definition I Confidence bounds I Bayesian Methods for Ad Lift Confidence Bounds I Gibbs Sampling (MCMC – Markov Chain Monte Carlo) I Using Spark for:
  • 16. Key Conceptual Take-aways I Issues in Ad lift measurement I Proper definition I Confidence bounds I Bayesian Methods for Ad Lift Confidence Bounds I Gibbs Sampling (MCMC – Markov Chain Monte Carlo) I Using Spark for: I Monte Carlo sampling for confidence-bounds
  • 17. Key Conceptual Take-aways I Issues in Ad lift measurement I Proper definition I Confidence bounds I Bayesian Methods for Ad Lift Confidence Bounds I Gibbs Sampling (MCMC – Markov Chain Monte Carlo) I Using Spark for: I Monte Carlo sampling for confidence-bounds I Monte Carlo simulations
  • 18. Application context: ad impact measurement I Advertisers want to know the impact of showing ads to users.
  • 19. Measuring Ad Impact: Two Approaches I Observational studies:
  • 20. Measuring Ad Impact: Two Approaches I Observational studies: I Compare uses who happen to be exposed vs not exposed
  • 21. Measuring Ad Impact: Two Approaches I Observational studies: I Compare uses who happen to be exposed vs not exposed I Bias a big issue
  • 22. Measuring Ad Impact: Two Approaches I Observational studies: I Compare uses who happen to be exposed vs not exposed I Bias a big issue I Randomized tests:
  • 23. Measuring Ad Impact: Two Approaches I Observational studies: I Compare uses who happen to be exposed vs not exposed I Bias a big issue I Randomized tests: I Randomly expose to test, compare with control (un-exposed)
  • 29. Ad Lift: Response Rates If we see k = 200 conversions out of N = 10, 000 users, what is a good estimate for the response-rate?
  • 30. Ad Lift: Response Rates If we see k = 200 conversions out of N = 10, 000 users, what is a good estimate for the response-rate? Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . .
  • 31. Ad Lift: Response Rates If we see k = 200 conversions out of N = 10, 000 users, what is a good estimate for the response-rate? Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . . But how confident are we?
  • 32. Response Rate 90% Confidence Bounds
  • 33. Response Rate 90% Confidence Bounds P(R > ˆR | r = q5) = 5%
  • 34. Response Rate 90% Confidence Bounds P(R > ˆR | r = q5) = 5% P(R < ˆR | r = q95) = 5%
  • 38. Response-Rate Confidence Bounds How to find (q5, q95) ?
  • 39.
  • 40. Response-Rate: Bayesian Confidence Bounds Randomly generate response rates that are consistent with the data.
  • 41. Response-Rate: Bayesian Confidence Bounds Randomly generate response rates that are consistent with the data. (Sample rates from posterior distribution given data.)
  • 42. Response-Rate: Bayesian Confidence Bounds Randomly generate response rates that are consistent with the data. (Sample rates from posterior distribution given data.) Find the (0.05, 0.95) quantiles of these rates.
  • 43. Response-Rate: Bayesian Confidence Bounds I Assume an unknown true rate r, with a prior distrib. p(r) I assume p(r) = Beta(1, 1) = Unif (0, 1)
  • 44. Response-Rate: Bayesian Confidence Bounds I Assume an unknown true rate r, with a prior distrib. p(r) I assume p(r) = Beta(1, 1) = Unif (0, 1) I Sample from the posterior distribution of the rate r I conditional on the observed data (k conversions out of N) P(r | k) Ã P(k | r) · p(r)
  • 45. Response-Rate: Bayesian Confidence Bounds I Assume an unknown true rate r, with a prior distrib. p(r) I assume p(r) = Beta(1, 1) = Unif (0, 1) I Sample from the posterior distribution of the rate r I conditional on the observed data (k conversions out of N) P(r | k) Ã P(k | r) · p(r) Ã rk (1 ≠ r)N≠k · Beta(1, 1)
  • 46. Response-Rate: Bayesian Confidence Bounds I Assume an unknown true rate r, with a prior distrib. p(r) I assume p(r) = Beta(1, 1) = Unif (0, 1) I Sample from the posterior distribution of the rate r I conditional on the observed data (k conversions out of N) P(r | k) Ã P(k | r) · p(r) Ã rk (1 ≠ r)N≠k · Beta(1, 1) Ã rk+1 (1 ≠ r)N≠k+1
  • 47. Response-Rate: Bayesian Confidence Bounds I Assume an unknown true rate r, with a prior distrib. p(r) I assume p(r) = Beta(1, 1) = Unif (0, 1) I Sample from the posterior distribution of the rate r I conditional on the observed data (k conversions out of N) P(r | k) Ã P(k | r) · p(r) Ã rk (1 ≠ r)N≠k · Beta(1, 1) Ã rk+1 (1 ≠ r)N≠k+1 Ã Beta(k + 1, N ≠ k + 1)
  • 48. Response-Rate: Bayesian Confidence Bounds I Assume an unknown true rate r, with a prior distrib. p(r) I assume p(r) = Beta(1, 1) = Unif (0, 1) I Sample from the posterior distribution of the rate r I conditional on the observed data (k conversions out of N) P(r | k) Ã P(k | r) · p(r) Ã rk (1 ≠ r)N≠k · Beta(1, 1) Ã rk+1 (1 ≠ r)N≠k+1 Ã Beta(k + 1, N ≠ k + 1) I Compute (0.05, 0.95) quantiles from the generated rates.
  • 49. Response-Rate: Bayesian Confidence Bounds A simple form of Gibbs Sampling (more later): I sample M values of r from posterior P(r | k) ≥ Beta(k + 1, N ≠ k + 1). I compute (0.05, 0.95) quantiles
  • 50. Response-Rate: Bayesian Confidence Bounds A simple form of Gibbs Sampling (more later): I sample M values of r from posterior P(r | k) ≥ Beta(k + 1, N ≠ k + 1). I compute (0.05, 0.95) quantiles from numpy.random import beta from scipy.stats.mstats import mquantiles def conf(N, k, samples = 500): rates = beta(k+1, N-k+1, samples) return mquantiles(rates, prob = [0.05, 0.95])
  • 56. Response Rates: Example If we see k = 200 conversions out of N = 10, 000 users, what is a good estimate for the response-rate? Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . .
  • 57. Response Rates: Example If we see k = 200 conversions out of N = 10, 000 users, what is a good estimate for the response-rate? Estimated response-rate ˆR = k/N = 200/10, 000 = 2%. . . =∆ 90% confidence region (1.8%, 2.2%)
  • 58. We’ve talked about Response Rates. . . now let’s consider Ad Lift
  • 59. Ad Lift: Simple Example I control: 10,000 users, 200 conversions I test: 100,000 users, 2200 conversions Observed response-rates: I control: ˆRc = 200/10, 000 = 2% I test: ˆRt = 2200/100, 000 = 2.2% Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
  • 60. Ad Lift: Simple Example I control: 10,000 users, 200 conversions I test: 100,000 users, 2200 conversions Observed response-rates: I control: ˆRc = 200/10, 000 = 2% I test: ˆRt = 2200/100, 000 = 2.2% Estimated Lift ˆL = 2.2/2 ≠ 1 = 10% This is a great lift !
  • 61. Ad Lift: Simple Example I control: 10,000 users, 200 conversions I test: 100,000 users, 2200 conversions Observed response-rates: I control: ˆRc = 200/10, 000 = 2% I test: ˆRt = 2200/100, 000 = 2.2% Estimated Lift ˆL = 2.2/2 ≠ 1 = 10% This is a great lift ! Not so fast! Is this a reliable estimate?
  • 62. Ad Lift: Simple Example I control: 10,000 users, 200 conversions I test: 100,000 users, 2200 conversions Observed response-rates: I control: ˆRc = 200/10, 000 = 2% I test: ˆRt = 2200/100, 000 = 2.2% Estimated Lift ˆL = 2.2/2 ≠ 1 = 10% This is a great lift ! Not so fast! Is this a reliable estimate? Could true lift ¸ be 0%, or even negative ?
  • 63. Ad Lift: Bayesian Confidence Bounds Sampling approach: Observed data: control: (kc, Nc), test: (kt, Nt) 1. Repeat M times:
  • 64. Ad Lift: Bayesian Confidence Bounds Sampling approach: Observed data: control: (kc, Nc), test: (kt, Nt) 1. Repeat M times: I draw control response rate rc from posterior P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
  • 65. Ad Lift: Bayesian Confidence Bounds Sampling approach: Observed data: control: (kc, Nc), test: (kt, Nt) 1. Repeat M times: I draw control response rate rc from posterior P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1). I draw test response rate rt from posterior P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
  • 66. Ad Lift: Bayesian Confidence Bounds Sampling approach: Observed data: control: (kc, Nc), test: (kt, Nt) 1. Repeat M times: I draw control response rate rc from posterior P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1). I draw test response rate rt from posterior P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1). I compute lift L = rt/rc ≠ 1
  • 67. Ad Lift: Bayesian Confidence Bounds Sampling approach: Observed data: control: (kc, Nc), test: (kt, Nt) 1. Repeat M times: I draw control response rate rc from posterior P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1). I draw test response rate rt from posterior P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1). I compute lift L = rt/rc ≠ 1 2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.
  • 68. Ad Lift: Bayesian Confidence Intervals I control: nc = 10, 000 users, kc = 200 conversions I test: nt = 100, 000 users, kt = 2, 200 conversions Observed response-rates: I control: ˆRc = 200/10, 000 = 2% I test: ˆRt = 2200/100, 000 = 2.2% Estimated Lift ˆL = 2.2/2 ≠ 1 = 10%
  • 69. Ad Lift: Bayesian Confidence Intervals I control: nc = 10, 000 users, kc = 200 conversions I test: nt = 100, 000 users, kt = 2, 200 conversions Observed response-rates: I control: ˆRc = 200/10, 000 = 2% I test: ˆRt = 2200/100, 000 = 2.2% Estimated Lift ˆL = 2.2/2 ≠ 1 = 10% 90% confidence interval: (≠2.7%, 23.6%)
  • 70.
  • 75. Ideal Randomized Test Bids on control users are wasted!
  • 76.
  • 77. A Less Wasteful Randomized Test
  • 78. A Less Wasteful Randomized Test: Win-bias Cannot simply compare Test Winners (tw) and Control (c): I test-winners selection bias: “win bias”
  • 79. Ad Lift: Proper Definition
  • 80. Ad Lift: Proper Definition
  • 81. Ad Lift: Proper Definition
  • 82. Ad Lift: Proper Definition
  • 83. Ad Lift: Proper Definition
  • 84. Ad Lift: Proper Definition
  • 85. Ad Lift Estimation Main ideas: I observe test-losers response rate RtL
  • 86. Ad Lift Estimation Main ideas: I observe test-losers response rate RtL I observe test win-rate w
  • 87. Ad Lift Estimation Main ideas: I observe test-losers response rate RtL I observe test win-rate w I we show one can estimate R0 tw = Rc ≠ (1 ≠ w)RtL w
  • 88. Ad Lift Estimation Main ideas: I observe test-losers response rate RtL I observe test win-rate w I we show one can estimate R0 tw = Rc ≠ (1 ≠ w)RtL w I compute lift L = R1 tw /R0 tw ≠ 1
  • 89. Ad Lift Estimation Main ideas: I observe test-losers response rate RtL I observe test win-rate w I we show one can estimate R0 tw = Rc ≠ (1 ≠ w)RtL w I compute lift L = R1 tw /R0 tw ≠ 1 I similar to Treatment E ect Under Non-compliance in clinicial trials.
  • 90. Ad Lift Estimation How to compute the 90% confidence interval for L?
  • 91.
  • 92. Ad Lift: Confidence Intervals with Gibbs sampler
  • 93. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach (details omitted, see Chickering/Pearl 1997): I Assume a random parameter vector ◊ consisting of:
  • 94. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach (details omitted, see Chickering/Pearl 1997): I Assume a random parameter vector ◊ consisting of: I user latent (potential) behaviors
  • 95. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach (details omitted, see Chickering/Pearl 1997): I Assume a random parameter vector ◊ consisting of: I user latent (potential) behaviors I their probabilities
  • 96. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach (details omitted, see Chickering/Pearl 1997): I Assume a random parameter vector ◊ consisting of: I user latent (potential) behaviors I their probabilities I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
  • 97. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach (details omitted, see Chickering/Pearl 1997): I Assume a random parameter vector ◊ consisting of: I user latent (potential) behaviors I their probabilities I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet) I Sample M values of unknown ◊ from posterior: Gibbs Sampler P(◊ |Data) Ã P(Data | ◊) · p(◊)
  • 98. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach (details omitted, see Chickering/Pearl 1997): I Assume a random parameter vector ◊ consisting of: I user latent (potential) behaviors I their probabilities I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet) I Sample M values of unknown ◊ from posterior: Gibbs Sampler P(◊ |Data) Ã P(Data | ◊) · p(◊) I For each sampled ◊ compute lift L using above
  • 99. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach (details omitted, see Chickering/Pearl 1997): I Assume a random parameter vector ◊ consisting of: I user latent (potential) behaviors I their probabilities I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet) I Sample M values of unknown ◊ from posterior: Gibbs Sampler P(◊ |Data) Ã P(Data | ◊) · p(◊) I For each sampled ◊ compute lift L using above I Compute (0.05, 0.95) quantiles of sampled L values
  • 100. Ad Lift: Confidence Intervals
  • 101. Ad Lift: Confidence Intervals Gibbs sampler convergence may depend on prior distribution: I start with multiple (say 100) priors I run them all in parallel using Spark.
  • 102. Uses of Monte Carlo Simulations I confidence intervals
  • 103. Uses of Monte Carlo Simulations I confidence intervals I determine “su cient” population sizes for reliably estimating
  • 104. Uses of Monte Carlo Simulations I confidence intervals I determine “su cient” population sizes for reliably estimating I response rates
  • 105. Uses of Monte Carlo Simulations I confidence intervals I determine “su cient” population sizes for reliably estimating I response rates I lift
  • 106. Uses of Monte Carlo Simulations I confidence intervals I determine “su cient” population sizes for reliably estimating I response rates I lift I understand e ect of complex phenomena
  • 107. Uses of Monte Carlo Simulations I confidence intervals I determine “su cient” population sizes for reliably estimating I response rates I lift I understand e ect of complex phenomena I validate/verify analytical formulas
  • 108. Complication 2: Control contamination due to users with multiple cookies
  • 109.
  • 110. Control Contamination due to Multiple Cookies
  • 111. Control Contamination due to Multiple Cookies
  • 112. Control Contamination due to Multiple Cookies
  • 113. Control Contamination due to Multiple Cookies
  • 114. Control Contamination due to Multiple Cookies
  • 115. Cookie-Contamination Questions I How does cookie contamination a ect measured lift?
  • 116. Cookie-Contamination Questions I How does cookie contamination a ect measured lift? I Does the cookie-distribution matter?
  • 117. Cookie-Contamination Questions I How does cookie contamination a ect measured lift? I Does the cookie-distribution matter? I everyone has k cookies vs an average of k cookies
  • 118. Cookie-Contamination Questions I How does cookie contamination a ect measured lift? I Does the cookie-distribution matter? I everyone has k cookies vs an average of k cookies I What is the influence of the control percentage?
  • 119. Cookie-Contamination Questions I How does cookie contamination a ect measured lift? I Does the cookie-distribution matter? I everyone has k cookies vs an average of k cookies I What is the influence of the control percentage? I Simulations best way to understand this
  • 120. Simulations for cookie-contamination I A scenario is a combination of parameters: I M = # trials for this scenario, usually 10K-1M I n = # users, typically 10K - 10M I p = # control percentage (usually 10-50%) I k = cookie-distribution, expressed as 1 : 100, or 1 : 70, 3 : 30 I r = (un-contaminated) control user response rate I a = true lift, i.e. exposed user response rate = r ú (1 + a). I A scenario file specifies a scenario in each row. I could be thousands of scenarios