SlideShare a Scribd company logo
1 of 32
Download to read offline
IMPLEMENTING AND ANALYZING
ONLINE EXPERIMENTS
SEAN J. TAYLOR
28 JUL 2015
MULTITHREADED DATA
WHO AM I?
• Core Data Science Team at Facebook
• PhD from NYU in Information Systems
• Four academic papers employing online field
experiments
• Teach and consult on experimental design at
Facebook
http://seanjtaylor.com

http://github.com/seanjtaylor

http://facebook.com/seanjtaylor

@seanjtaylor
I ASSUME YOU KNOW
• Why causality matters
• A little bit of Python and R
• Basic statistics + linear regression
SIMPLEST POSSIBLE
EXPERIMENT
user_id version spent
123 B $10
596 A $0
456 A $4
991 B $9
def	
  get_version(user_id):	
  
	
  	
  	
  	
  if	
  user_id	
  %	
  2:	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  'A'	
  
	
  	
  	
  	
  else:	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  'B'	
  
>	
  t.test(c(0,	
  4),	
  c(10,	
  9))	
  
	
   Welch	
  Two	
  Sample	
  t-­‐test	
  
data:	
  	
  c(0,	
  4)	
  and	
  c(10,	
  9)	
  
t	
  =	
  -­‐3.638,	
  df	
  =	
  1.1245,	
  p-­‐value	
  =	
  0.1487	
  
alternative	
  hypothesis:	
  true	
  difference	
  in	
  means	
  is	
  not	
  equal	
  to	
  0	
  
95	
  percent	
  confidence	
  interval:	
  
	
  -­‐27.74338	
  	
  12.74338	
  
sample	
  estimates:	
  
mean	
  of	
  x	
  mean	
  of	
  y	
  	
  
	
  	
  	
  	
  	
  	
  2.0	
  	
  	
  	
  	
  	
  	
  9.5
FIN
COMMON PROBLEMS
• Type I errors from measuring too many effects
• Type II and M errors from lack of power
• Repeated use of the same population (“pollution”)
• Type I errors from violation of the i.i.d. assumption
• Composing many changes into one experiment
POWER
OR
THE ONLY WAY TO
TRULY FAIL AT AN
EXPERIMENT
OR
THE SIZE OF YOUR
CONFIDENCE
INTERVALS
ERRORS
• Type I: Thinking your
metric changed when it
didn’t. We usually bound
this at 1 or 5%.
• Type II: Thinking your
metric didn’t change
when it did. You can
control this through
better planning.
HOW TO MAKE TYPE I ERRORS
$ Spent
Time
Spent
Survey
Satisfaction
oMale,
<25
Female,
<25
Male,
>=25
Female,
>=25
Measure a ton of metrics
Findasubgroupitworkson
AVOID TYPE II ERRORS WITH
POWER
1. Use enough subjects in your experiment.
2. Test a reasonably strong treatment.

Remember: you care about the difference.
POWER ANALYSIS
First step in designing an
experiment is to determine how
much data you’ll need to learn the
answer to your question.
Process:
• set the smallest effect size you’d
like to detect.
• simulate your experiment 200
times at various sample sizes
• count the number of simulated
experiments where you correctly
reject the null of effect=0.
TYPE M ERRORS
• Magnitude error:
reporting an effect size
which is too large
• happens when your
experiment is
underpowered AND
you only report the
significant results
IMPLEMENTATION
PLANOUT: KEY IDEAS
• an experiment is just a pseudo-random mapping from
(user, context) → parameters, and is serializable.
• persistent randomizations implemented through hash
functions, salts make experiments orthogonal
• always log exposures (parameters assignment) to
improve precision, provide randomization check
• namespaces create ability to do sequential
experiments on new blocks of users
https://facebook.github.io/planout/
A/B TESTING IN PLANOUT
from	
  planout.ops.random	
  import	
  *	
  
from	
  planout.experiment	
  import	
  SimpleExperiment	
  
class	
  ButtonCopyExperiment(SimpleExperiment):	
  
	
  	
  	
  	
  def	
  assign(self,	
  params,	
  user_id):	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  `params`	
  is	
  always	
  the	
  first	
  argument.	
  
	
  	
  	
  	
  	
  	
  	
  	
  params.button_text	
  =	
  UniformChoice(	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  choices=["Buy	
  now!",	
  "Buy	
  later!"],	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  unit=user_id	
  
	
  	
  	
  	
  	
  	
  	
  	
  )	
  
#	
  Later	
  in	
  your	
  production	
  code:	
  
from	
  myexperiments	
  import	
  ButtonCopyExperiment	
  
e	
  =	
  ButtonCopyExperiment(user_id=212)	
  
print(e.get('button_text'))	
  
#	
  Event	
  later:	
  
e	
  =	
  ButtonCopyExperiment(user_id=212)	
  
e.log_event('purchase',	
  {'amount':	
  9.43})	
  
PLANOUT LOGS → DATA
{"inputs":	
  {"user_id":	
  212},	
  
"name":	
  "ButtonCopyExperiment",	
  
"checksum":	
  "646e69a5",	
  "params":	
  
{"button_text":	
  "Buy	
  later!"},	
  
"time":	
  1437952369,	
  "salt":	
  
"ButtonCopyExperiment",	
  "event":	
  
“exposure"}	
  
{"inputs":	
  {"user_id":	
  212},	
  
"name":	
  "ButtonCopyExperiment",	
  
"checksum":	
  "646e69a5",	
  "params":	
  
{"button_text":	
  "Buy	
  later!"},	
  
"time":	
  1437952369,	
  "extra_data":	
  
{"amount":	
  9.43},	
  "salt":	
  
"ButtonCopyExperiment",	
  "event":	
  
"purchase"}	
  
user_id button_text
123 Buy later!
596 Buy later!
456 Buy now!
991 Buy later!
user_id amount
123 $12
596 $9
Exposures
Purchases
ADVANCED DESIGN 1:
FACTORIAL DESIGN
• Can use conditional logic as well as other random
assignment operators: 

RandomInteger, RandomFloat, WeightedChoice, Sample.
class	
  FactorialExperiment(SimpleExperiment):	
  
	
  	
  	
  	
  def	
  assign(self,	
  params,	
  user_id):	
  
	
  	
  	
  	
  	
  	
  	
  	
  params.button_text	
  =	
  UniformChoice(	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  choices=["Buy	
  now!",	
  "Buy	
  later!"],	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  unit=user_id	
  
	
  	
  	
  	
  	
  	
  	
  	
  )	
  
	
  	
  	
  	
  	
  	
  	
  	
  params.button_color	
  =	
  UniformChoice(	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  choices=["blue",	
  "orange"],	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  unit=user_id	
  
	
  	
  	
  	
  	
  	
  	
  	
  )	
  
ADVANCED DESIGN 2:
INCREMENTAL CHANGES
##	
  We're	
  going	
  to	
  try	
  two	
  different	
  button	
  redesigns.	
  
class	
  FirstExperiment(SimpleExperiment):	
  
	
  	
  	
  	
  def	
  assign(self,	
  params,	
  user_id):	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  ...	
  set	
  some	
  params	
  
class	
  SecondExperiment(SimpleExperiment):	
  
	
  	
  	
  	
  def	
  assign(self,	
  params,	
  user_id):	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  ...	
  set	
  some	
  params	
  differently	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  
class	
  ButtonNamespace(SimpleNamespace):	
  
	
  	
  	
  	
  def	
  setup(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.name	
  =	
  'button_experiment_sequence'	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.primary_unit	
  =	
  'user_id'	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.num_segments	
  =	
  1000	
  
	
  	
  	
  	
  def	
  setup_experiments():	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  allocate	
  and	
  deallocate	
  experiments	
  here	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  First	
  gets	
  100	
  out	
  of	
  1000	
  segments.	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.add_experiment('first',	
  FirstExperiment,	
  100)	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  self.add_experiment('second',	
  SecondExperiment,	
  100)	
  
ADVANCED DESIGN 3:
WITHIN-SUBJECTS
Previous experiments persistently assigned same
treatment to user, but unit of analysis can be more
complex:
class	
  DiscountExperiment(SimpleExperiment):	
  
	
  	
  	
  	
  def	
  assign(self,	
  params,	
  user_id,	
  item_id):	
  
	
  	
  	
  	
  	
  	
  	
  	
  params.discount	
  =	
  BernoulliTrial(p=0.1,	
  unit=[user_id,	
  item_id])	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  params.discount:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  params.discount_amount	
  =	
  RandomInteger(	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  min=5,	
  max=15,	
  unit=user_id	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  )	
  
	
  	
  	
  	
  	
  	
  	
  	
  else:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  params.discount_amount	
  =	
  0	
  
e	
  =	
  DiscountExperiment(user_id=212,	
  item_id=2)	
  
print(e.get('discount_amount'))	
  
ANALYSIS
THE IDEAL DATA SET
Subject
/ User
Gender Age
Button
Size
Button
Text
Spent Bounce
Erin F 22 Large
Buy
Now!
$20 0
Ashley F 29 Large
Buy
Later!
$4 0
Gary M 34 Small
Buy
Now!
$0 1
Leo M 18 Large
Buy
Now!
$0 1
Ed M 46 Small
Buy
Later!
$9 0
Sam M 25 Small
Buy
Now!
$5 0
Independent
Observations
Randomly
Assigned
Metrics
Pre-experiment
Covariates
{
{
{
{
SIMPLEST CASE: OLS
>	
  summary(lm(spent	
  ~	
  button.size,	
  data	
  =	
  df))	
  
Call:	
  
lm(formula	
  =	
  spent	
  ~	
  button.size,	
  data	
  =	
  df)	
  
Residuals:	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  2	
  	
  	
  	
  	
  3	
  	
  	
  	
  	
  4	
  	
  	
  	
  	
  5	
  	
  	
  	
  	
  6	
  	
  
	
  10.0	
  	
  -­‐0.5	
  	
  -­‐4.5	
  -­‐10.0	
  	
  	
  4.5	
  	
  	
  0.5	
  	
  
Coefficients:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Estimate	
  Std.	
  Error	
  t	
  value	
  Pr(>|t|)	
  
(Intercept)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  10.000	
  	
  	
  	
  	
  	
  5.489	
  	
  	
  1.822	
  	
  	
  	
  0.143	
  
factor(button.size)s	
  	
  	
  -­‐5.500	
  	
  	
  	
  	
  	
  6.722	
  	
  -­‐0.818	
  	
  	
  	
  0.459	
  
Residual	
  standard	
  error:	
  7.762	
  on	
  4	
  degrees	
  of	
  freedom	
  
Multiple	
  R-­‐squared:	
  	
  0.1434,	
   Adjusted	
  R-­‐squared:	
  	
  -­‐0.07079	
  	
  
F-­‐statistic:	
  0.6694	
  on	
  1	
  and	
  4	
  DF,	
  	
  p-­‐value:	
  0.4592	
  
DATA REDUCTION
Subject Xi Di Yi
Evan M 0 1
Ashley F 0 1
Greg M 1 0
Leena F 1 0
Ema F 0 0
Seamus M 1 1
X D Y Cases
M 0 1 1
M 1 1 1
F 0 1 1
F 1 1 0
M 0 0 0
M 1 0 1
F 0 0 1
F 1 0 1
N # treatments X # groups X #outcomes
source('css_stats.R')	
  
reduced	
  <-­‐	
  df	
  %>%	
  
	
  	
  mutate(rounded.spent	
  =	
  round(spent,	
  0))	
  %>%	
  
	
  	
  group_by(button.size,	
  rounded.spent)	
  %>%	
  
	
  	
  summarise(n	
  =	
  n())	
  
>	
  lm(rounded.spent	
  ~	
  button.size,	
  data	
  =	
  reduced,	
  weights	
  =	
  n)	
  %>%	
  
+	
  	
  	
  coeftest(vcov	
  =	
  sandwich.lm)	
  
t	
  test	
  of	
  coefficients:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Estimate	
  Std.	
  Error	
  t	
  value	
  	
  Pr(>|t|)	
  	
  	
  	
  	
  
(Intercept)	
  	
  	
  7.43137	
  	
  	
  	
  0.45162	
  16.4548	
  7.522e-­‐14	
  ***	
  
button.sizes	
  -­‐2.45178	
  	
  	
  	
  0.59032	
  -­‐4.1533	
  0.0004149	
  ***	
  
-­‐-­‐-­‐	
  
Signif.	
  codes:	
  	
  0	
  '***'	
  0.001	
  '**'	
  0.01	
  '*'	
  0.05	
  '.'	
  0.1	
  '	
  '	
  1	
  
DATA REDUCTION +
WEIGHTED OLS
FACTORIAL DESIGNS
• Identify two types of effects: marginal and
interactions. Need to fix one group as the baseline.
>	
  coeftest(lm(spent	
  ~	
  button.size	
  *	
  button.text,	
  data	
  =	
  df))	
  
t	
  test	
  of	
  coefficients:	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Estimate	
  Std.	
  Error	
  t	
  value	
  	
  Pr(>|t|)	
  	
  	
  	
  	
  
(Intercept)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  6.79643	
  	
  	
  	
  0.62998	
  10.7884	
  <	
  2.2e-­‐16	
  ***	
  
button.sizes	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐2.43253	
  	
  	
  	
  0.86673	
  -­‐2.8066	
  	
  0.006064	
  **	
  	
  
button.textn	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2.11611	
  	
  	
  	
  0.86673	
  	
  2.4415	
  	
  0.016458	
  *	
  	
  	
  
button.sizes:button.textn	
  -­‐2.57660	
  	
  	
  	
  1.27584	
  -­‐2.0195	
  	
  0.046219	
  *	
  	
  	
  
-­‐-­‐-­‐	
  
Signif.	
  codes:	
  	
  0	
  '***'	
  0.001	
  '**'	
  0.01	
  '*'	
  0.05	
  '.'	
  0.1	
  '	
  '	
  1	
  
USING COVARIATES TO GAIN
PRECISION
• With simple random assignment, using covariates
is not necessary.
• However, you can improve precision of ATE
estimates if covariates explain a lot of variation in
the potential outcomes.
• Can be added to a linear model and SEs should
get smaller if they are helpful.
NON-IID DATA
• Repeated observations of
the same user are not
independent.
• Ditto if you ‘re
experimenting on certain
items only.
• If you ignore dependent
data, the true confidence
intervals are larger than
you think.
Subject /
User
Item
Button
Size
Spent
Erin Shirt Large $20
Erin Socks Large $4
Erin Pants Large $0
Leo Shirt Large $0
Ed Shirt Small $9
Ed Socks Small $5
THE BOOTSTRAP
R1
All Your
Data
R2
…
R500
Generate random
sub-samples
s1
s2
s500
Compute statistics
or estimate model
parameters
…
} 0.0
2.5
5.0
7.5
-2 -1 0 1 2
Statistic
Count
Get a distribution
over statistic of interest
(e.g. the treatment effect)
- take mean
- CIs == 95% quantiles
- SEs == standard deviation
USER AND USER-ITEM
BOOTSTRAPS
source('css_stats.R')	
  
library(broom)	
  ##	
  for	
  extracting	
  model	
  coefficients	
  
fitter	
  <-­‐	
  function(.data)	
  {	
  
	
  	
  	
  	
  lm(summary	
  ~	
  opposed,	
  data	
  =	
  .data,	
  weights	
  =	
  .weights)	
  %>%	
  
	
  	
  	
  	
  tidy	
  
}	
  
iid.replicates	
  	
  	
  	
  <-­‐	
  iid.bootstrap(df,	
  fitter,	
  .combine	
  =	
  bind_rows)	
  
oneway.replicates	
  <-­‐	
  clustered.bootstrap(df,	
  c('user_id'),	
  fitter,	
  .combine	
  =	
  
bind_rows)	
  
twoway.replicates	
  <-­‐	
  clustered.bootstrap(df,	
  c('user_id',	
  'item_id'),	
  fitter,	
  .combine	
  
=	
  bind_rows)	
  
>	
  head(iid.replicates)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  term	
  	
  	
  estimate	
  	
  std.error	
  statistic	
  	
  	
  	
  	
  	
  p.value	
  
1	
  	
  (Intercept)	
  	
  0.4700000	
  0.04795154	
  	
  9.801561	
  6.296919e-­‐18	
  
2	
  button.sizes	
  -­‐0.2200000	
  0.08003333	
  -­‐2.748855	
  6.695621e-­‐03	
  
3	
  	
  (Intercept)	
  	
  0.4250000	
  0.05307832	
  	
  8.007036	
  5.768641e-­‐13	
  
4	
  button.sizes	
  -­‐0.1750000	
  0.08456729	
  -­‐2.069358	
  4.049329e-­‐02	
  
5	
  	
  (Intercept)	
  	
  0.4137931	
  0.05141050	
  	
  8.048805	
  4.118301e-­‐13	
  
6	
  button.sizes	
  -­‐0.1429598	
  0.08621804	
  -­‐1.658119	
  9.965016e-­‐02	
  
DEPENDENCE CHANGES
CONFIDENCE INTERVALS
DATA REDUCTION WITH
DEPENDENT DATA
Subject Di Yij
Evan 1 1
Evan 1 0
Ashley 0 1
Ashley 0 1
Ashley 0 1
Greg 1 0
Leena 1 0
Leena 1 1
Ema 0 0
Seamus 1 1
Create bootstrap replicates
R1
R2
R3
reduce the replicates
as if they’re i.i.d.
r1
r2
r3
s1
s2
s3
compute statistics
on reduced data
THANKS!
HERE ARE SOME RESOURCES:
• Me: http://seanjtaylor.com
• These slides:

http://www.slideshare.net/seanjtaylor/implementing-
and-analyzing-online-experiments
• Full Featured Tutorial: 

http://eytan.github.io/www-15-tutorial/
• “Field Experiments” 

by Gerber and Green

More Related Content

What's hot

Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningPruet Boonma
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
Azure ML - November 2014
Azure ML - November 2014 Azure ML - November 2014
Azure ML - November 2014 David Green
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellSri Ambati
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at ScaleSri Ambati
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learningjoshwills
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningKnoldus Inc.
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciencesfsmart01
 
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
A Scalable, High-performance Algorithm for Hybrid Job RecommendationsA Scalable, High-performance Algorithm for Hybrid Job Recommendations
A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013MLconf
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With RDavid Chiu
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RShirin Elsinghorst
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldKen Tabor
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...tboubez
 

What's hot (20)

Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Azure ML - November 2014
Azure ML - November 2014 Azure ML - November 2014
Azure ML - November 2014
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciences
 
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
A Scalable, High-performance Algorithm for Hybrid Job RecommendationsA Scalable, High-performance Algorithm for Hybrid Job Recommendations
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
 
Machine learning yearning
Machine learning yearningMachine learning yearning
Machine learning yearning
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With R
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our World
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
DCSM report2
DCSM report2DCSM report2
DCSM report2
 

Viewers also liked

Learning About At-Risk Veterans Using 
Online Network Surveys
Learning About At-Risk Veterans Using 
Online Network SurveysLearning About At-Risk Veterans Using 
Online Network Surveys
Learning About At-Risk Veterans Using 
Online Network SurveysSean Taylor
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data ScienceSean Taylor
 
Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)Data Science Thailand
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Politica Actual Colombiana
Politica Actual ColombianaPolitica Actual Colombiana
Politica Actual ColombianaDaniel G.
 

Viewers also liked (7)

Learning About At-Risk Veterans Using 
Online Network Surveys
Learning About At-Risk Veterans Using 
Online Network SurveysLearning About At-Risk Veterans Using 
Online Network Surveys
Learning About At-Risk Veterans Using 
Online Network Surveys
 
Jsm big-data
Jsm big-dataJsm big-data
Jsm big-data
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)Big Data in Banking (Data Science Thailand Meetup #2)
Big Data in Banking (Data Science Thailand Meetup #2)
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Politica Actual Colombiana
Politica Actual ColombianaPolitica Actual Colombiana
Politica Actual Colombiana
 

Similar to Implementing and analyzing online experiments

Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#J On The Beach
 
MT_01_unittest_python.pdf
MT_01_unittest_python.pdfMT_01_unittest_python.pdf
MT_01_unittest_python.pdfHans Jones
 
Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleYvonne K. Matos
 
Agile Experiments in Machine Learning
Agile Experiments in Machine LearningAgile Experiments in Machine Learning
Agile Experiments in Machine Learningmathias-brandewinder
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance OptimizationAlbert Chu
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlpankit_ppt
 
Usability testing
Usability testingUsability testing
Usability testinggamelanYK
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...Salehkhanovic
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou
 
Machine learning in php
Machine learning in phpMachine learning in php
Machine learning in phpDamien Seguy
 
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...Databricks
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming AlgorithmsJoe Kelley
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...PyData
 
Artificial Neural Networks Workshop
Artificial Neural Networks WorkshopArtificial Neural Networks Workshop
Artificial Neural Networks WorkshopYakup Görür
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 

Similar to Implementing and analyzing online experiments (20)

wk5ppt2_Iris
wk5ppt2_Iriswk5ppt2_Iris
wk5ppt2_Iris
 
Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#
 
MT_01_unittest_python.pdf
MT_01_unittest_python.pdfMT_01_unittest_python.pdf
MT_01_unittest_python.pdf
 
Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and Kaggle
 
Agile Experiments in Machine Learning
Agile Experiments in Machine LearningAgile Experiments in Machine Learning
Agile Experiments in Machine Learning
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Usability testing
Usability testingUsability testing
Usability testing
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
 
Machine learning in php
Machine learning in phpMachine learning in php
Machine learning in php
 
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
 
Artificial Neural Networks Workshop
Artificial Neural Networks WorkshopArtificial Neural Networks Workshop
Artificial Neural Networks Workshop
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Tugas quiz SPSS
Tugas quiz SPSSTugas quiz SPSS
Tugas quiz SPSS
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 

Recently uploaded

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Recently uploaded (20)

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

Implementing and analyzing online experiments

  • 1. IMPLEMENTING AND ANALYZING ONLINE EXPERIMENTS SEAN J. TAYLOR 28 JUL 2015 MULTITHREADED DATA
  • 2. WHO AM I? • Core Data Science Team at Facebook • PhD from NYU in Information Systems • Four academic papers employing online field experiments • Teach and consult on experimental design at Facebook http://seanjtaylor.com
 http://github.com/seanjtaylor
 http://facebook.com/seanjtaylor
 @seanjtaylor
  • 3. I ASSUME YOU KNOW • Why causality matters • A little bit of Python and R • Basic statistics + linear regression
  • 4. SIMPLEST POSSIBLE EXPERIMENT user_id version spent 123 B $10 596 A $0 456 A $4 991 B $9 def  get_version(user_id):          if  user_id  %  2:                  return  'A'          else:                  return  'B'   >  t.test(c(0,  4),  c(10,  9))     Welch  Two  Sample  t-­‐test   data:    c(0,  4)  and  c(10,  9)   t  =  -­‐3.638,  df  =  1.1245,  p-­‐value  =  0.1487   alternative  hypothesis:  true  difference  in  means  is  not  equal  to  0   95  percent  confidence  interval:    -­‐27.74338    12.74338   sample  estimates:   mean  of  x  mean  of  y                2.0              9.5
  • 5. FIN
  • 6. COMMON PROBLEMS • Type I errors from measuring too many effects • Type II and M errors from lack of power • Repeated use of the same population (“pollution”) • Type I errors from violation of the i.i.d. assumption • Composing many changes into one experiment
  • 7. POWER OR THE ONLY WAY TO TRULY FAIL AT AN EXPERIMENT OR THE SIZE OF YOUR CONFIDENCE INTERVALS
  • 8. ERRORS • Type I: Thinking your metric changed when it didn’t. We usually bound this at 1 or 5%. • Type II: Thinking your metric didn’t change when it did. You can control this through better planning.
  • 9. HOW TO MAKE TYPE I ERRORS $ Spent Time Spent Survey Satisfaction oMale, <25 Female, <25 Male, >=25 Female, >=25 Measure a ton of metrics Findasubgroupitworkson
  • 10. AVOID TYPE II ERRORS WITH POWER 1. Use enough subjects in your experiment. 2. Test a reasonably strong treatment.
 Remember: you care about the difference.
  • 11. POWER ANALYSIS First step in designing an experiment is to determine how much data you’ll need to learn the answer to your question. Process: • set the smallest effect size you’d like to detect. • simulate your experiment 200 times at various sample sizes • count the number of simulated experiments where you correctly reject the null of effect=0.
  • 12. TYPE M ERRORS • Magnitude error: reporting an effect size which is too large • happens when your experiment is underpowered AND you only report the significant results
  • 14. PLANOUT: KEY IDEAS • an experiment is just a pseudo-random mapping from (user, context) → parameters, and is serializable. • persistent randomizations implemented through hash functions, salts make experiments orthogonal • always log exposures (parameters assignment) to improve precision, provide randomization check • namespaces create ability to do sequential experiments on new blocks of users https://facebook.github.io/planout/
  • 15. A/B TESTING IN PLANOUT from  planout.ops.random  import  *   from  planout.experiment  import  SimpleExperiment   class  ButtonCopyExperiment(SimpleExperiment):          def  assign(self,  params,  user_id):                  #  `params`  is  always  the  first  argument.                  params.button_text  =  UniformChoice(                          choices=["Buy  now!",  "Buy  later!"],                          unit=user_id                  )   #  Later  in  your  production  code:   from  myexperiments  import  ButtonCopyExperiment   e  =  ButtonCopyExperiment(user_id=212)   print(e.get('button_text'))   #  Event  later:   e  =  ButtonCopyExperiment(user_id=212)   e.log_event('purchase',  {'amount':  9.43})  
  • 16. PLANOUT LOGS → DATA {"inputs":  {"user_id":  212},   "name":  "ButtonCopyExperiment",   "checksum":  "646e69a5",  "params":   {"button_text":  "Buy  later!"},   "time":  1437952369,  "salt":   "ButtonCopyExperiment",  "event":   “exposure"}   {"inputs":  {"user_id":  212},   "name":  "ButtonCopyExperiment",   "checksum":  "646e69a5",  "params":   {"button_text":  "Buy  later!"},   "time":  1437952369,  "extra_data":   {"amount":  9.43},  "salt":   "ButtonCopyExperiment",  "event":   "purchase"}   user_id button_text 123 Buy later! 596 Buy later! 456 Buy now! 991 Buy later! user_id amount 123 $12 596 $9 Exposures Purchases
  • 17. ADVANCED DESIGN 1: FACTORIAL DESIGN • Can use conditional logic as well as other random assignment operators: 
 RandomInteger, RandomFloat, WeightedChoice, Sample. class  FactorialExperiment(SimpleExperiment):          def  assign(self,  params,  user_id):                  params.button_text  =  UniformChoice(                          choices=["Buy  now!",  "Buy  later!"],                          unit=user_id                  )                  params.button_color  =  UniformChoice(                          choices=["blue",  "orange"],                          unit=user_id                  )  
  • 18. ADVANCED DESIGN 2: INCREMENTAL CHANGES ##  We're  going  to  try  two  different  button  redesigns.   class  FirstExperiment(SimpleExperiment):          def  assign(self,  params,  user_id):                  #  ...  set  some  params   class  SecondExperiment(SimpleExperiment):          def  assign(self,  params,  user_id):                  #  ...  set  some  params  differently                     class  ButtonNamespace(SimpleNamespace):          def  setup(self):                  self.name  =  'button_experiment_sequence'                  self.primary_unit  =  'user_id'                  self.num_segments  =  1000          def  setup_experiments():                  #  allocate  and  deallocate  experiments  here                  #  First  gets  100  out  of  1000  segments.                  self.add_experiment('first',  FirstExperiment,  100)                      self.add_experiment('second',  SecondExperiment,  100)  
  • 19. ADVANCED DESIGN 3: WITHIN-SUBJECTS Previous experiments persistently assigned same treatment to user, but unit of analysis can be more complex: class  DiscountExperiment(SimpleExperiment):          def  assign(self,  params,  user_id,  item_id):                  params.discount  =  BernoulliTrial(p=0.1,  unit=[user_id,  item_id])                  if  params.discount:                          params.discount_amount  =  RandomInteger(                                  min=5,  max=15,  unit=user_id                          )                  else:                          params.discount_amount  =  0   e  =  DiscountExperiment(user_id=212,  item_id=2)   print(e.get('discount_amount'))  
  • 21. THE IDEAL DATA SET Subject / User Gender Age Button Size Button Text Spent Bounce Erin F 22 Large Buy Now! $20 0 Ashley F 29 Large Buy Later! $4 0 Gary M 34 Small Buy Now! $0 1 Leo M 18 Large Buy Now! $0 1 Ed M 46 Small Buy Later! $9 0 Sam M 25 Small Buy Now! $5 0 Independent Observations Randomly Assigned Metrics Pre-experiment Covariates { { { {
  • 22. SIMPLEST CASE: OLS >  summary(lm(spent  ~  button.size,  data  =  df))   Call:   lm(formula  =  spent  ~  button.size,  data  =  df)   Residuals:          1          2          3          4          5          6      10.0    -­‐0.5    -­‐4.5  -­‐10.0      4.5      0.5     Coefficients:                                            Estimate  Std.  Error  t  value  Pr(>|t|)   (Intercept)                        10.000            5.489      1.822        0.143   factor(button.size)s      -­‐5.500            6.722    -­‐0.818        0.459   Residual  standard  error:  7.762  on  4  degrees  of  freedom   Multiple  R-­‐squared:    0.1434,   Adjusted  R-­‐squared:    -­‐0.07079     F-­‐statistic:  0.6694  on  1  and  4  DF,    p-­‐value:  0.4592  
  • 23. DATA REDUCTION Subject Xi Di Yi Evan M 0 1 Ashley F 0 1 Greg M 1 0 Leena F 1 0 Ema F 0 0 Seamus M 1 1 X D Y Cases M 0 1 1 M 1 1 1 F 0 1 1 F 1 1 0 M 0 0 0 M 1 0 1 F 0 0 1 F 1 0 1 N # treatments X # groups X #outcomes
  • 24. source('css_stats.R')   reduced  <-­‐  df  %>%      mutate(rounded.spent  =  round(spent,  0))  %>%      group_by(button.size,  rounded.spent)  %>%      summarise(n  =  n())   >  lm(rounded.spent  ~  button.size,  data  =  reduced,  weights  =  n)  %>%   +      coeftest(vcov  =  sandwich.lm)   t  test  of  coefficients:                            Estimate  Std.  Error  t  value    Pr(>|t|)           (Intercept)      7.43137        0.45162  16.4548  7.522e-­‐14  ***   button.sizes  -­‐2.45178        0.59032  -­‐4.1533  0.0004149  ***   -­‐-­‐-­‐   Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1   DATA REDUCTION + WEIGHTED OLS
  • 25. FACTORIAL DESIGNS • Identify two types of effects: marginal and interactions. Need to fix one group as the baseline. >  coeftest(lm(spent  ~  button.size  *  button.text,  data  =  df))   t  test  of  coefficients:                                                      Estimate  Std.  Error  t  value    Pr(>|t|)           (Intercept)                                6.79643        0.62998  10.7884  <  2.2e-­‐16  ***   button.sizes                            -­‐2.43253        0.86673  -­‐2.8066    0.006064  **     button.textn                              2.11611        0.86673    2.4415    0.016458  *       button.sizes:button.textn  -­‐2.57660        1.27584  -­‐2.0195    0.046219  *       -­‐-­‐-­‐   Signif.  codes:    0  '***'  0.001  '**'  0.01  '*'  0.05  '.'  0.1  '  '  1  
  • 26. USING COVARIATES TO GAIN PRECISION • With simple random assignment, using covariates is not necessary. • However, you can improve precision of ATE estimates if covariates explain a lot of variation in the potential outcomes. • Can be added to a linear model and SEs should get smaller if they are helpful.
  • 27. NON-IID DATA • Repeated observations of the same user are not independent. • Ditto if you ‘re experimenting on certain items only. • If you ignore dependent data, the true confidence intervals are larger than you think. Subject / User Item Button Size Spent Erin Shirt Large $20 Erin Socks Large $4 Erin Pants Large $0 Leo Shirt Large $0 Ed Shirt Small $9 Ed Socks Small $5
  • 28. THE BOOTSTRAP R1 All Your Data R2 … R500 Generate random sub-samples s1 s2 s500 Compute statistics or estimate model parameters … } 0.0 2.5 5.0 7.5 -2 -1 0 1 2 Statistic Count Get a distribution over statistic of interest (e.g. the treatment effect) - take mean - CIs == 95% quantiles - SEs == standard deviation
  • 29. USER AND USER-ITEM BOOTSTRAPS source('css_stats.R')   library(broom)  ##  for  extracting  model  coefficients   fitter  <-­‐  function(.data)  {          lm(summary  ~  opposed,  data  =  .data,  weights  =  .weights)  %>%          tidy   }   iid.replicates        <-­‐  iid.bootstrap(df,  fitter,  .combine  =  bind_rows)   oneway.replicates  <-­‐  clustered.bootstrap(df,  c('user_id'),  fitter,  .combine  =   bind_rows)   twoway.replicates  <-­‐  clustered.bootstrap(df,  c('user_id',  'item_id'),  fitter,  .combine   =  bind_rows)   >  head(iid.replicates)                    term      estimate    std.error  statistic            p.value   1    (Intercept)    0.4700000  0.04795154    9.801561  6.296919e-­‐18   2  button.sizes  -­‐0.2200000  0.08003333  -­‐2.748855  6.695621e-­‐03   3    (Intercept)    0.4250000  0.05307832    8.007036  5.768641e-­‐13   4  button.sizes  -­‐0.1750000  0.08456729  -­‐2.069358  4.049329e-­‐02   5    (Intercept)    0.4137931  0.05141050    8.048805  4.118301e-­‐13   6  button.sizes  -­‐0.1429598  0.08621804  -­‐1.658119  9.965016e-­‐02  
  • 31. DATA REDUCTION WITH DEPENDENT DATA Subject Di Yij Evan 1 1 Evan 1 0 Ashley 0 1 Ashley 0 1 Ashley 0 1 Greg 1 0 Leena 1 0 Leena 1 1 Ema 0 0 Seamus 1 1 Create bootstrap replicates R1 R2 R3 reduce the replicates as if they’re i.i.d. r1 r2 r3 s1 s2 s3 compute statistics on reduced data
  • 32. THANKS! HERE ARE SOME RESOURCES: • Me: http://seanjtaylor.com • These slides:
 http://www.slideshare.net/seanjtaylor/implementing- and-analyzing-online-experiments • Full Featured Tutorial: 
 http://eytan.github.io/www-15-tutorial/ • “Field Experiments” 
 by Gerber and Green