SlideShare a Scribd company logo
1 of 29
Download to read offline
Leveraging an in-house modeling
framework for fun and profit
Mike Skarlinski & Brian Graham
{michael.skarlinski, brian.graham}@weightwatchers.com
June 2019
Outline
• Introduction: data science at WW – the new Weight Watchers
• Problem: scalable, simple modeling and recommendation systems with a small team
• Solution: design and benefits of building a framework
• Implementation: Examples of deployed recommenders
WW is a data driven application to help members
on their wellness journeys
Member Social
Network
Activity & Food
tracking
Weight progress &
goals
Recipe & food
database
As a new team, we are tasked with building a
foundation of data products
Social
Network:
Connect
Growth
WW
Program
Infra-
structure
Churn model
Return model
LTV models
Single Member View
Recipe recommender
Similar recipes
Composite foods ontology
Personalized feed
Groups search
Who to follow
APIs
Primrose
Data science team’s success hinges on effectively
sharing work and knowledge
openopen
Brian
Graham
Reka
Daniel-Weiner
Yameng
(Eliza) Zhang
Kevin
Zecchini
Carl
Anderson
Michael (Mike)
Skarlinski
open
Dec.
2019
May
2018
Jan.
2019
Mar.
2019
Feb.
2019
...
(Hint hint)
How can we build software that helps us grow and develop as a team?
WW recommender and modeling
challenges
Taking stock of our own challenges at WW
What would make a good recommender system at WW?
Slow serialization
but our medium data
can be kept in RAM...
No live features
but we know Docker, k8s...
Easy onboarding
mono repo with config as code...
We built a framework to solve our challenges and
enforce our design decisions
(Open source coming soon!!!!!)
Primrose: a framework for simple, quick
modeling deployments
Primrose has features to address each design
consideration
Python in-memory DAG runner, with no
serialization between nodes of the DAG.
DAG is defined as configuration-as-code
approach -- one container for all models
Abstract ML and data manipulation operations,
data scientists can easily extend the framework
Data science Infrastructure People
Primrose: (Production In-Memory Solution) framework for solving
WW’s most common use cases, caching batched predictions with
machine-learning engineering baked-in.
Primrose jobs are executed as Directed Acyclic
Graphs (DAG)s in python
Flexibility: any number of operations
allowed in a single DAG, across any
python library
Data and functions are passed between
nodes in an object that understands how
to extract the correct data for each node
DAGs are composed of implementation agnostic,
extensible nodes for data science
Data scientists can write any class that
matches the abstract interface &
incorporate in their DAGs
Data scientists can write individual nodes using
any Python framework or library they choose
Primrose is run like an ETL pipeline in a single
docker container for each configuration
For simpler deployments: Primrose uses a
“configuration as code” approach
Object configuration and DAG structure
are build in a configuration JSON
Primrose validates the configuration
and instantiates the correct classes at
runtime
Different outputs and results for each
DAG
Recipe recommender DAG JSON
Churn Model DAG JSON
Connect Feed DAG JSON
Primrose container Success, fame, money...
The framework has helped our team grow
and develop production models
Deployed 3 production
models and 3 production
recommenders
Onboarded 6 members in less
than a year, everyone is working
in the framework!
We’re going to open-source Primrose !!! Keep on the lookout or contact us!
WW Recommender Examples
Food is at the core of our product
We know you and meet you where you are.
coffee
croissant
fish tacos
apple
cobb salad
pasta with red sauce
ice cream
Personalize your
experience using your data
Recipe Recommendations
Similar Recipes Dinner Recommendations
Similar Recipes Flow
US WW Recipes
Similar Ingredients
Similar Names
Filters
dietary
course
cuisine
main ingredient
document = ingredient list or name string
lemmatize, tokenize, TF-IDF
Cosine similarity
Rank
*Only recipes with images*
Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
Dinner Recommendations Flow
US WW Recipes
Similar Ingredients
Similar Names Business Logic
Eligible Members
2 weeks of tracking history
Tracked >= 1 recipe
US members
Potential Recs
tracked
most similar
X XX
X
2nd most sim.
n = 4 recommendations
Productionalizing is easier the second time
Same BQ reader class,
different SQL input file
New postprocess class to sort, filter and interleave potential recommendations
Success!
logging.warning(‘Data Scientist is developing software engineering skills.’)
Container
Dinner
Recs
Primrose
Container
Container
Recipe Recs
Micro-Service
Flask API
Similar
Recipes
Primrose
Redis Cache
MemoryStore
Final Deployment Architecture
Datalake
BigQuery
Refresh Daily
Refresh Daily
Android
Endpoint
Clients
iOS
Web
Q & A
Open sourcing primrose here soon:
https://github.com/ww-tech
Tech blog
https://medium.com/ww-tech-blog
Leveraging an in-house modeling framework for fun and profit

More Related Content

What's hot

Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallDATAVERSITY
 
Data Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach CorporateData Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach CorporateSlideTeam
 
Reinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapRReinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapRLilia Gutnik
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyArcadia Data
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven OrganizationIT Weekend
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Stop searching for that elusive data scientist
Stop searching for that elusive data scientistStop searching for that elusive data scientist
Stop searching for that elusive data scientistYogita Bansal
 
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management PurgatoryData-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management PurgatoryDATAVERSITY
 
Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)Joey Amanchukwu
 
Webinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data ScienceWebinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data ScienceDATAVERSITY
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...DATAVERSITY
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questionscrystalpullen
 
RWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceRWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceDATAVERSITY
 
Data-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data SinsData-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data SinsDATAVERSITY
 
Stop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data ScientistStop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data ScientistVaibhav Srivastav
 

What's hot (20)

Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Data Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach CorporateData Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach Corporate
 
1530 track2 reid
1530 track2 reid1530 track2 reid
1530 track2 reid
 
Reinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapRReinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapR
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics Strategy
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven Organization
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
1415 gold sanford
1415 gold sanford1415 gold sanford
1415 gold sanford
 
Stop searching for that elusive data scientist
Stop searching for that elusive data scientistStop searching for that elusive data scientist
Stop searching for that elusive data scientist
 
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management PurgatoryData-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
 
Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)
 
Webinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data ScienceWebinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data Science
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
 
RWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceRWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data Governance
 
Data Analyics
Data AnalyicsData Analyics
Data Analyics
 
Data-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data SinsData-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data Sins
 
Stop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data ScientistStop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data Scientist
 
1215 daa industry lunch
1215 daa industry lunch1215 daa industry lunch
1215 daa industry lunch
 

Similar to Leveraging an in-house modeling framework for fun and profit

#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam GreenAI Guild
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Christopher Gutknecht
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionWeCloudData
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterMongoDB
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataWeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataWeCloudData
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
What is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for DevelopmentWhat is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for DevelopmentAshok Kumar Satapathy
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 

Similar to Leveraging an in-house modeling framework for fun and profit (20)

#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
What is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for DevelopmentWhat is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for Development
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 

Recently uploaded

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Recently uploaded (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

Leveraging an in-house modeling framework for fun and profit

  • 1. Leveraging an in-house modeling framework for fun and profit Mike Skarlinski & Brian Graham {michael.skarlinski, brian.graham}@weightwatchers.com June 2019
  • 2. Outline • Introduction: data science at WW – the new Weight Watchers • Problem: scalable, simple modeling and recommendation systems with a small team • Solution: design and benefits of building a framework • Implementation: Examples of deployed recommenders
  • 3.
  • 4. WW is a data driven application to help members on their wellness journeys Member Social Network Activity & Food tracking Weight progress & goals Recipe & food database
  • 5. As a new team, we are tasked with building a foundation of data products Social Network: Connect Growth WW Program Infra- structure Churn model Return model LTV models Single Member View Recipe recommender Similar recipes Composite foods ontology Personalized feed Groups search Who to follow APIs Primrose
  • 6. Data science team’s success hinges on effectively sharing work and knowledge openopen Brian Graham Reka Daniel-Weiner Yameng (Eliza) Zhang Kevin Zecchini Carl Anderson Michael (Mike) Skarlinski open Dec. 2019 May 2018 Jan. 2019 Mar. 2019 Feb. 2019 ... (Hint hint) How can we build software that helps us grow and develop as a team?
  • 7. WW recommender and modeling challenges
  • 8. Taking stock of our own challenges at WW What would make a good recommender system at WW? Slow serialization but our medium data can be kept in RAM... No live features but we know Docker, k8s... Easy onboarding mono repo with config as code...
  • 9. We built a framework to solve our challenges and enforce our design decisions (Open source coming soon!!!!!)
  • 10. Primrose: a framework for simple, quick modeling deployments
  • 11. Primrose has features to address each design consideration Python in-memory DAG runner, with no serialization between nodes of the DAG. DAG is defined as configuration-as-code approach -- one container for all models Abstract ML and data manipulation operations, data scientists can easily extend the framework Data science Infrastructure People Primrose: (Production In-Memory Solution) framework for solving WW’s most common use cases, caching batched predictions with machine-learning engineering baked-in.
  • 12. Primrose jobs are executed as Directed Acyclic Graphs (DAG)s in python Flexibility: any number of operations allowed in a single DAG, across any python library Data and functions are passed between nodes in an object that understands how to extract the correct data for each node
  • 13. DAGs are composed of implementation agnostic, extensible nodes for data science Data scientists can write any class that matches the abstract interface & incorporate in their DAGs Data scientists can write individual nodes using any Python framework or library they choose
  • 14. Primrose is run like an ETL pipeline in a single docker container for each configuration
  • 15. For simpler deployments: Primrose uses a “configuration as code” approach Object configuration and DAG structure are build in a configuration JSON Primrose validates the configuration and instantiates the correct classes at runtime Different outputs and results for each DAG Recipe recommender DAG JSON Churn Model DAG JSON Connect Feed DAG JSON Primrose container Success, fame, money...
  • 16. The framework has helped our team grow and develop production models Deployed 3 production models and 3 production recommenders Onboarded 6 members in less than a year, everyone is working in the framework! We’re going to open-source Primrose !!! Keep on the lookout or contact us!
  • 18. Food is at the core of our product
  • 19. We know you and meet you where you are. coffee croissant fish tacos apple cobb salad pasta with red sauce ice cream Personalize your experience using your data
  • 20. Recipe Recommendations Similar Recipes Dinner Recommendations
  • 21. Similar Recipes Flow US WW Recipes Similar Ingredients Similar Names Filters dietary course cuisine main ingredient document = ingredient list or name string lemmatize, tokenize, TF-IDF Cosine similarity Rank *Only recipes with images*
  • 22. Business Logic (filters) Productionalize in Primrose DAG Google BigQuery Data lake Reader NLTK + Custom Lemmatization Sklearn TF-IDF + cosine similarity Write to GCS Bucket and Google MemoryStore Success! logging.info(‘Your newbie DS has written production quality code.’)
  • 23. Business Logic (filters) Productionalize in Primrose DAG Google BigQuery Data lake Reader NLTK + Custom Lemmatization Sklearn TF-IDF + cosine similarity Write to GCS Bucket and Google MemoryStore Success! logging.info(‘Your newbie DS has written production quality code.’)
  • 24. Business Logic (filters) Productionalize in Primrose DAG Google BigQuery Data lake Reader NLTK + Custom Lemmatization Sklearn TF-IDF + cosine similarity Write to GCS Bucket and Google MemoryStore Success! logging.info(‘Your newbie DS has written production quality code.’)
  • 25. Dinner Recommendations Flow US WW Recipes Similar Ingredients Similar Names Business Logic Eligible Members 2 weeks of tracking history Tracked >= 1 recipe US members Potential Recs tracked most similar X XX X 2nd most sim. n = 4 recommendations
  • 26. Productionalizing is easier the second time Same BQ reader class, different SQL input file New postprocess class to sort, filter and interleave potential recommendations Success! logging.warning(‘Data Scientist is developing software engineering skills.’)
  • 27. Container Dinner Recs Primrose Container Container Recipe Recs Micro-Service Flask API Similar Recipes Primrose Redis Cache MemoryStore Final Deployment Architecture Datalake BigQuery Refresh Daily Refresh Daily Android Endpoint Clients iOS Web
  • 28. Q & A Open sourcing primrose here soon: https://github.com/ww-tech Tech blog https://medium.com/ww-tech-blog