SlideShare a Scribd company logo
1 of 50
A/B Testing: Avoiding
Common Pitfalls
Danielle Jabin

March 5, 2014
1

Make all the world’s music
available instantly to everyone,
wherever and whenever they
want it
2

As of March 5, 2014
3

Over 24 million active users

As of March 5, 2014
4

Access to more than 20 million
songs

As of March 5, 2014
5
6

But can we make it even
easier?
7

We can try…
…with A/B testing!
8

So…what’s an A/B test?
9

Control

A
Pitfall #1: Not limiting
your error rate
11

Source: assets.20bits.com/20081027/normal-curve-small.png
12

What if I flip a coin 100 times and get 51 heads?
13

What if I flip a coin 100 times and get 5 heads?
14
15

The likelihood of obtaining a
certain value under a given
distribution is measured by its
p-value
16

If there is a low likelihood that a
change is due to chance alone,
we call our results statistically
significant
17

What if I flip a coin 100 times and get 5 heads?
18

Statistical significance is measured by alpha
● alpha levels of 5% and 1% are most commonly used
– Alternatively: P(significant) = .05 or .01
19

Each alpha has a corresponding Z-score
alpha

Z-score (two-sided test)

.10

1.65

.05

1.96

.01

2.58
20

The Z-score tells us how far a
particular value is from the
mean (and what the corresponding
likelihood is)
21

Source: assets.20bits.com/20081027/normal-curve-small.png
22

Compute the Z-score at the end of the test
23

Standard deviation (σ) tells us
how spread out the numbers
are
24
25

To lock in error rates before you
start, fix your sample size
26

What should my sample size be?
● To lock in error rates before you start a test, fix your sample size
Represents the
desired power
(typically .84 for 80%
power).

Sample size in each
group (assumes equal
sized groups)

2s (Z b + Za /2 )
n=
2
difference Represents the desired
2

Standard deviation of
the outcome variable

Source: www.stanford.edu/~kcobb/hrp259/lecture11.ppt

2

Effect Size (the
difference in
means)

level of statistical
significance (typically
1.96).
27

Recap: running an A/B test
● Compute your sample size
– Using alpha, beta, standard deviation of your metric, and effect size
● Run your test! But stop once you’ve reached the fixed sample size stopping point
● Compute your z-score and compare it with the z-score for the chosen alpha level
28

Control

A
29

Resulting Z-score?
30

33.3
Pitfall #2: Stopping your
test before the fixed
sample size stopping
point
32

Sample size for varying alpha levels
● With σ = 10, difference in means = 1

Two-sided test
alpha = .10, beta = .80

1230

alpha = .05, beta = .80

1568

alpha = .01, beta = .80

2339
33

Let’s see some numbers
● 1,000 experiments with 200,000 fake participants divided randomly into two groups both
receiving the exact same version, A, with a 3% conversion rate

90% significance
reached
95% significance
reached
99% significance
reached
Source: destack.home.xs4all.nl/projects/significance/

Stop at first point of
significance
654 of 1,000

Ended as significant

427 of 1,000

49 of 1,000

146 of 1,000

14 of 1,000

100 of 1,000
34

Remedies
● Don’t peek
● Okay, maybe you can peek, but don’t stop or make a decision before you reach the fixed
sample size stopping point
● Sequential sampling
35

Control

A

B
Pitfall #3: Making
multiple comparisons in
one test
37

A test can be one of two things: significant or not significant
● P(significant) + P(not significant) = 1
● Let’s take an alpha of .05
– P(significant) = .05
– P(not significant) = 1 – P(significant) = 1 - .05 = .95
38

What about for two comparisons?
● P(at least 1 significant) = 1 - P(none of the 2 are significant)
● P(none of the 2 are significant) = P(not significant)*P(not significant) = .95*.95 = .9025
● P(at least 1 significant) = 1 - .9025 = .0975
39

What about for two comparisons?

●That’s almost 2x (1.95x, to be precise) your .05
significance rate!
40

And it just gets worse…
P(at least 1 signifcant)

An increase of…

5 variations

1 – (1-.05)^5 = .23

4.6x

10 variations

1 – (1-.05)^10 = .40

8x

20 variations

1 – (1-.05)^20 = .64

12.8x
41

How can we remedy this?
●Bonferroni correction
– Divide P(significant), your alpha, by the number of variations you are testing, n
– alpha/n becomes the new level of statistical significance
42

So what about two comparisons now?
● Our new P(significant) = .05/2 = .025
● Our new P(not significant) = 1 - .025 = .975
● P(at least 1 significant) = 1 - P(none of the 2 are significant)
● P(none of the 2 are significant) = P(not significant)*P(not significant) = .975*.975 = .951
● P(at least 1 significant) = 1 - .951 = .0499
43

P(significant) stays under .05 
Corrected alpha

P(at least 1 signifcant)

5 variations

.05/5 = .01

1 – (1-.01)^5 = .049

10 variations

.05/10 = .005

1 – (1-.005)^10 = .049

20 variations

.05/20 = .0025

1 – (1-.0025)^20 = .049
Questions?
Appendix
46

A/B test steps:
1. Decide what to test
2. Determine a metric to test
3. Formulate your hypothesis
1. Select an effect size threshold: what change of the metric would make a rollout
worthwhile?
4. Calculate sample size (your stopping point)
1. Decide your Type I (alpha) and Type 2 (beta) error levels and the corresponding zscores
2. Determine the standard deviation of your metric
5. Run your test! But stop once you’ve reached the fixed sample size stopping point
6. Compute your z-score and compare it with the z-score for your chosen alpha level
47

Type I and Type II error
● Type I error: incorrectly reject a true null hypothesis
– alpha
● Type II error: incorrectly accept a false null hypothesis
– beta
– Power: 1 - beta
48

Z-score reference table
alpha

One-sided test

Two-sided test

.10

1.28

1.65

.05

1.65

1.96

.01

2.33

2.58
49

Z-score for proportions (e.g. conversion)

More Related Content

What's hot

Talks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet ScaleTalks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet Scalecourseratalks
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at ScaleMounia Lalmas-Roelleke
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerProduct School
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at SpotifyRohan Agrawal
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018Fabien Gouyon
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101Ashish Dua
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyChris Johnson
 
Evidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve BlankEvidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve BlankLean Startup Co.
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsRamkumar Ravichandran
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B TestingAlex Alwan
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ SpotifyOscar Carlsson
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 

What's hot (20)

Talks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet ScaleTalks@Coursera - A/B Testing @ Internet Scale
Talks@Coursera - A/B Testing @ Internet Scale
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product Manager
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Evidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve BlankEvidence-based Entrepreneurship by Steve Blank
Evidence-based Entrepreneurship by Steve Blank
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 

Viewers also liked

4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B TestingJanessa Lantz
 
Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes TomorrowDanielle Jabin
 
Measuring team performance at spotify slideshare
Measuring team performance at spotify slideshareMeasuring team performance at spotify slideshare
Measuring team performance at spotify slideshareDanielle Jabin
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015Danielle Jabin
 
The math-behind-ab-testing
The math-behind-ab-testingThe math-behind-ab-testing
The math-behind-ab-testingAmit Sawhney
 
Managing Experiment at Spotify
Managing Experiment at SpotifyManaging Experiment at Spotify
Managing Experiment at SpotifyAli Sarrafi
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Explore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataExplore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataBen Dressler
 
Being a Data Driven Business
Being a Data Driven Business Being a Data Driven Business
Being a Data Driven Business Ali Sarrafi
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Peter Antman
 
An Intro to Learning Organization
An Intro to Learning OrganizationAn Intro to Learning Organization
An Intro to Learning OrganizationHakan Cuzdan
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean StartupsPete Mauro
 
Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Peter Antman
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionKissmetrics on SlideShare
 
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)Jin Young Kim
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotifypdicorpo
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleCraig Sullivan
 
Testing a 2D Platformer with Spock
Testing a 2D Platformer with SpockTesting a 2D Platformer with Spock
Testing a 2D Platformer with SpockAlexander Tarlinder
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
 

Viewers also liked (20)

4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
 
Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes Tomorrow
 
Measuring team performance at spotify slideshare
Measuring team performance at spotify slideshareMeasuring team performance at spotify slideshare
Measuring team performance at spotify slideshare
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015
 
The math-behind-ab-testing
The math-behind-ab-testingThe math-behind-ab-testing
The math-behind-ab-testing
 
Managing Experiment at Spotify
Managing Experiment at SpotifyManaging Experiment at Spotify
Managing Experiment at Spotify
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Explore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataExplore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and data
 
Being a Data Driven Business
Being a Data Driven Business Being a Data Driven Business
Being a Data Driven Business
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 
An Intro to Learning Organization
An Intro to Learning OrganizationAn Intro to Learning Organization
An Intro to Learning Organization
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean Startups
 
Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong Direction
 
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotify
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype Cycle
 
Testing a 2D Platformer with Spock
Testing a 2D Platformer with SpockTesting a 2D Platformer with Spock
Testing a 2D Platformer with Spock
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
 

Similar to A/B Testing Pitfalls and Lessons Learned at Spotify

Quantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & TricksQuantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & Trickscocubes_learningcalendar
 
Quantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & TricksQuantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & Tricksshwetavashishtha
 
Uji liliefors
Uji lilieforsUji liliefors
Uji lilieforsjojun
 
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxWEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxcockekeshia
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments Furk Kruf
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionRupak Roy
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeAmber Bhaumik
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622Dexen Xi
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonTom Capper
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference BerlinTom Capper
 
6.7 Percent Equations
6.7 Percent Equations6.7 Percent Equations
6.7 Percent EquationsJessca Lundin
 
Lecture_Wk08.pdf
Lecture_Wk08.pdfLecture_Wk08.pdf
Lecture_Wk08.pdfNiel89
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsPrincessNorberte
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamBrent Heard
 

Similar to A/B Testing Pitfalls and Lessons Learned at Spotify (20)

Quantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & TricksQuantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & Tricks
 
Quantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & TricksQuantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & Tricks
 
Uji liliefors
Uji lilieforsUji liliefors
Uji liliefors
 
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxWEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
 
Quant tips edited
Quant tips editedQuant tips edited
Quant tips edited
 
7. the t distribution
7. the t distribution7. the t distribution
7. the t distribution
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
MATHS
MATHSMATHS
MATHS
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative Aptitude
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference London
 
01_SLR_final (1).pptx
01_SLR_final (1).pptx01_SLR_final (1).pptx
01_SLR_final (1).pptx
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference Berlin
 
6.7 Percent Equations
6.7 Percent Equations6.7 Percent Equations
6.7 Percent Equations
 
Lecture_Wk08.pdf
Lecture_Wk08.pdfLecture_Wk08.pdf
Lecture_Wk08.pdf
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcuts
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
 

Recently uploaded

Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 

Recently uploaded (20)

Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 

A/B Testing Pitfalls and Lessons Learned at Spotify

  • 1. A/B Testing: Avoiding Common Pitfalls Danielle Jabin March 5, 2014
  • 2. 1 Make all the world’s music available instantly to everyone, wherever and whenever they want it
  • 3. 2 As of March 5, 2014
  • 4. 3 Over 24 million active users As of March 5, 2014
  • 5. 4 Access to more than 20 million songs As of March 5, 2014
  • 6. 5
  • 7. 6 But can we make it even easier?
  • 8. 7 We can try… …with A/B testing!
  • 11. Pitfall #1: Not limiting your error rate
  • 13. 12 What if I flip a coin 100 times and get 51 heads?
  • 14. 13 What if I flip a coin 100 times and get 5 heads?
  • 15. 14
  • 16. 15 The likelihood of obtaining a certain value under a given distribution is measured by its p-value
  • 17. 16 If there is a low likelihood that a change is due to chance alone, we call our results statistically significant
  • 18. 17 What if I flip a coin 100 times and get 5 heads?
  • 19. 18 Statistical significance is measured by alpha ● alpha levels of 5% and 1% are most commonly used – Alternatively: P(significant) = .05 or .01
  • 20. 19 Each alpha has a corresponding Z-score alpha Z-score (two-sided test) .10 1.65 .05 1.96 .01 2.58
  • 21. 20 The Z-score tells us how far a particular value is from the mean (and what the corresponding likelihood is)
  • 23. 22 Compute the Z-score at the end of the test
  • 24. 23 Standard deviation (σ) tells us how spread out the numbers are
  • 25. 24
  • 26. 25 To lock in error rates before you start, fix your sample size
  • 27. 26 What should my sample size be? ● To lock in error rates before you start a test, fix your sample size Represents the desired power (typically .84 for 80% power). Sample size in each group (assumes equal sized groups) 2s (Z b + Za /2 ) n= 2 difference Represents the desired 2 Standard deviation of the outcome variable Source: www.stanford.edu/~kcobb/hrp259/lecture11.ppt 2 Effect Size (the difference in means) level of statistical significance (typically 1.96).
  • 28. 27 Recap: running an A/B test ● Compute your sample size – Using alpha, beta, standard deviation of your metric, and effect size ● Run your test! But stop once you’ve reached the fixed sample size stopping point ● Compute your z-score and compare it with the z-score for the chosen alpha level
  • 32. Pitfall #2: Stopping your test before the fixed sample size stopping point
  • 33. 32 Sample size for varying alpha levels ● With σ = 10, difference in means = 1 Two-sided test alpha = .10, beta = .80 1230 alpha = .05, beta = .80 1568 alpha = .01, beta = .80 2339
  • 34. 33 Let’s see some numbers ● 1,000 experiments with 200,000 fake participants divided randomly into two groups both receiving the exact same version, A, with a 3% conversion rate 90% significance reached 95% significance reached 99% significance reached Source: destack.home.xs4all.nl/projects/significance/ Stop at first point of significance 654 of 1,000 Ended as significant 427 of 1,000 49 of 1,000 146 of 1,000 14 of 1,000 100 of 1,000
  • 35. 34 Remedies ● Don’t peek ● Okay, maybe you can peek, but don’t stop or make a decision before you reach the fixed sample size stopping point ● Sequential sampling
  • 37. Pitfall #3: Making multiple comparisons in one test
  • 38. 37 A test can be one of two things: significant or not significant ● P(significant) + P(not significant) = 1 ● Let’s take an alpha of .05 – P(significant) = .05 – P(not significant) = 1 – P(significant) = 1 - .05 = .95
  • 39. 38 What about for two comparisons? ● P(at least 1 significant) = 1 - P(none of the 2 are significant) ● P(none of the 2 are significant) = P(not significant)*P(not significant) = .95*.95 = .9025 ● P(at least 1 significant) = 1 - .9025 = .0975
  • 40. 39 What about for two comparisons? ●That’s almost 2x (1.95x, to be precise) your .05 significance rate!
  • 41. 40 And it just gets worse… P(at least 1 signifcant) An increase of… 5 variations 1 – (1-.05)^5 = .23 4.6x 10 variations 1 – (1-.05)^10 = .40 8x 20 variations 1 – (1-.05)^20 = .64 12.8x
  • 42. 41 How can we remedy this? ●Bonferroni correction – Divide P(significant), your alpha, by the number of variations you are testing, n – alpha/n becomes the new level of statistical significance
  • 43. 42 So what about two comparisons now? ● Our new P(significant) = .05/2 = .025 ● Our new P(not significant) = 1 - .025 = .975 ● P(at least 1 significant) = 1 - P(none of the 2 are significant) ● P(none of the 2 are significant) = P(not significant)*P(not significant) = .975*.975 = .951 ● P(at least 1 significant) = 1 - .951 = .0499
  • 44. 43 P(significant) stays under .05  Corrected alpha P(at least 1 signifcant) 5 variations .05/5 = .01 1 – (1-.01)^5 = .049 10 variations .05/10 = .005 1 – (1-.005)^10 = .049 20 variations .05/20 = .0025 1 – (1-.0025)^20 = .049
  • 47. 46 A/B test steps: 1. Decide what to test 2. Determine a metric to test 3. Formulate your hypothesis 1. Select an effect size threshold: what change of the metric would make a rollout worthwhile? 4. Calculate sample size (your stopping point) 1. Decide your Type I (alpha) and Type 2 (beta) error levels and the corresponding zscores 2. Determine the standard deviation of your metric 5. Run your test! But stop once you’ve reached the fixed sample size stopping point 6. Compute your z-score and compare it with the z-score for your chosen alpha level
  • 48. 47 Type I and Type II error ● Type I error: incorrectly reject a true null hypothesis – alpha ● Type II error: incorrectly accept a false null hypothesis – beta – Power: 1 - beta
  • 49. 48 Z-score reference table alpha One-sided test Two-sided test .10 1.28 1.65 .05 1.65 1.96 .01 2.33 2.58
  • 50. 49 Z-score for proportions (e.g. conversion)