Ever felt that expecting a personalized user experience is a bit like tossing a wish to a genie, uncertain if it'll truly come to life? You're not alone.
Many platforms are in a wrestling match with the intricacies of A/B testing, turning the journey to user engagement into a bit of a head-scratcher.
Join us in this session as we zoom into the specifics of building and implementing an A/B testing ecosystem for personalized meditation recommendations. Our guest, Rohan will dive deep into key metrics, testing strategies, and top-notch practices to roll out personalized recommendations that truly enhance user experience and engagement.
5. 5
Proprietary, Confidential, & Thoughtful
History
● Launched in 2020, the content customizer started to
countering the declining retention rate.
● Every headspace user was getting the same editorial content
that used to refresh everyday.
● Headspace had over 1000+ unique content and more than 2
million B2C subscribers
6. 6
Proprietary, Confidential, & Thoughtful
Hypothesis
Personalized recommendations will help members find
more relevant content, thereby increasing their engagement
with Headspace
● Content Customizer (a.k.a CC) is our recommendation
system that serves personalized recommendations
● It uses historical user-item interaction data to generate
recommendation everyday
Proprietary, Confidential, Thoughtful
8. Proprietary, Confidential, & Thoughtful
Components of Modeling
● Data
○ B2C and B2B Subscribers
○ Feature Store
■ User tenure, demographics, language, time of the day etc.
■ Item type, playtime duration, content creation date etc.
○ IOS and Android
● Evaluation
○ Offline: NDGC@K , HITRate@K, Precision@K, etc.
○ Online: A/B Testing
● Inference and Post processing
○ Nightly batched training
○ Payload to prediction service
○ Fix sequence, remove popular and recently completed
Proprietary, Confidential, & Thoughtful
8
9. 9
Proprietary, Confidential, & Thoughtful
Modeling Pipeline
● Deployed for both B2C and B2B members
● Deployed to both iOS and Android platform
Data Pull Model Training
Generate
Prediction
Post
Processing
Send Data to
the Prediction
Service
10. 10
Proprietary, Confidential, & Thoughtful
Version Name Model Type
LightFM model using user and item
features
Feature Based Model
Matrix Factorization model using
historical content interaction
NCF Model using user and item
features
Feature Based Model
Deep learning model that helps to
predict the nonlinear latent factors
and users explicit pos/neg feedback
TiSASRec Model using historical
sequences
Sequence Based Model
Transformer model to capture
sequences of content completes
History of Models
11. Proprietary, Confidential, & Thoughtful
LightFM Model
● Reasons
○ Matrix Factorization
○ Implicit and Explicit Feedback
○ GPU Optimization
○ Highly scalable
● Result
○ Offline: Precision@k and Recall@k
○ Online: No statistically significant lift (content
start/complete) in A/B test
Kula, Maciej. "Metadata embeddings for user and item
cold-start recommendations." arXiv preprint
arXiv:1507.08439 (2015)
11
Image Credit: Google Developer
12. Proprietary, Confidential, & Thoughtful
Neural Collaborative Filtering Model
● Reasons
○ Non-linearity
○ User-item features
○ Sparsity
● Result
○ Offline: Precision@k and Recall@k
○ Online: Over 2% statistically significant lift (content
start/complete) in A/B test
He, Xiangnan, et al. "Neural collaborative filtering." Proceedings of the 26th international
conference on world wide web. 2017.
12
13. Proprietary, Confidential, & Thoughtful
Time Interval Aware Self-Attention for Sequential Recommendation
(TiSASRec)
● Reasons
○ Sequential Data
○ Temporal Dynamics
○ Transformer Architecture
● Result
○ Offline: NDCG@k and HitRate@k
○ Online: Over 4.68% statistically significant lift
(content play) in A/B test
Li, Jiacheng, Yujie Wang, and Julian McAuley. "Time interval aware self-attention for sequential recommendation." Proceedings of the
13th international conference on web search and data mining. 2020.
13
16. 16
Proprietary, Confidential, & Thoughtful
Introduction
● “If you canʼt measure it, you canʼt improve it.” — Peter Drucker.
● Online Controlled Experiments, a.k.a A/B testing, are the gold
standard for estimating causality with high probability.
● A data-driven decision-making culture helps estimate the
measurement uncertainty to refute the null hypothesis based on
experimental data.
● Furthermore, a random assignment of the users into a
control-treatment group allows us to safely ignore the unobserved
factors and model the parameters as random variables
Proprietary, Confidential, & Thoughtful
17. 17
Proprietary, Confidential, & Thoughtful
Experiment Design
● Recommendation surface area
○ Todayʼs tab
○ Hero and Recommended tab on
Meditate/Sleep/Move/Focus
● Platform Selection
○ iOS
○ Android
● Audience Selection
○ User type: B2C, B2B
○ Subscription type: Paid, Free trials
○ Language: English, German, French
○ Region: North America, International
● Driver Metrics and Business KPI
○ Metrics: Content Start, Content Complete, Content Play,
Content start to complete ratio
○ KPIs: DAU, WAU, MAU, Paid conversion, Churn Rate
Proprietary, Confidential, & Thoughtful
19. 19
Proprietary, Confidential, & Thoughtful
Experiment Design
● Sample Size and Experiment Duration:
○ Baseline conversion rate
○ Minimum detectable lift
○ Use standard setting with alpha = 0.05 and beta =
20% for 1 sided test
○ Classical test:
https://www.statsig.com/calculator
○ Sequence test:
https://docs.statsig.com/experiments-plus/pow
er-analysis
Proprietary, Confidential, & Thoughtful
20. 20
Proprietary, Confidential, & Thoughtful
Experiment Design
● Important details
○ Prefer A/B test over Multivariate test
○ Use 1 or 2 driver metrics
○ Use guardrail metrics for rollback
○ Avoid early peeking
○ Use confidence intervals over p-values
Proprietary, Confidential, & Thoughtful
21. 21
Proprietary, Confidential, & Thoughtful
Experiment Architecture
● Content Customizer Model: A Machine Learning model
that generates personalized recommendations for every
eligible user. It sends its prediction to the prediction
service using offline inference.
● Prediction Service: A microservice that sends
predictions to the layout service asynchronously. It is
required to update the layout service after
post-processing of the content.
● Layout Service: A bundle of services that creates a
layout for the client for respective tabs. It receives
requests from the client and serves appropriate content
to the layout.
● A/B Testing Tool: A/B testing platform that helps to
administer the online controlled experiment. It
randomly buckets the eligible users into control and
treatment.
Proprietary, Confidential, & Thoughtful
A/B Testing
Tool
22. 22
Proprietary, Confidential, & Thoughtful
Monitoring and Evaluation
● Confidence Interval is much more reliable
indicator of effectiveness than sole p-value
○ p = 0.06 (this)
○ -9% to 20% (v/s this)
● Always use practical significance threshold
along with statistical significance threshold
○ Statistical Significance is a theoretical
indicator of the impact
○ Practical significance is a practical
indicator of the impact
Proprietary, Confidential, & Thoughtful
23. Proprietary, Confidential, & Thoughtful
Learnings
● Feature Space
○ User features are not rich enough to understand the user behavior
■ Subscription mismatch
■ Sparse interaction
■ Privacy and Compliance
○ Item features are fragmented
■ Many features are stored in multiple system
■ Overlapping content type
● UI Rigidity
○ Fluid design for multiple screens
○ High performing static content
○ Single model everywhere
○ Explicit Feedback
Proprietary, Confidential, & Thoughtful
23
High Performing Content
Static Content
24. 24
Proprietary, Confidential, & Thoughtful
Opportunities
● Client Reform
○ The new shelves based design helps to surface context based
recommendations
○ Provide separate screen to present notified recommendations
○ Search based recommendations
● Modeling
○ Long term value (Align with the productʼs vision)
○ Multi-modal recommendation (text, image, audio)
○ Personalized shelves ranking
○ Causal AI to get explainability
○ Optimize novelty, diversity and relevance
Proprietary, Confidential, & Thoughtful
Personalized
Personalized
24
25. 25
Proprietary, Confidential, & Thoughtful
Opportunities
● Robust E2E System
○ Impression data
○ Analyze data drift, model drift and concept drift
○ Smart caching
○ Real time inference with continuous learning
○ Interconnected recommendation (email, push, notification)
Proprietary, Confidential, & Thoughtful
Personalized
Personalized
25