Talk given by Mike Skarlinski and Brian Graham from WW (new Weight Watchers) data science team in 5th NYC RecSys meetup, June 20, 2019, hosted at WW HQ
Student profile product demonstration on grades, ability, well-being and mind...
Leveraging an in-house modeling framework for fun and profit
1. Leveraging an in-house modeling
framework for fun and profit
Mike Skarlinski & Brian Graham
{michael.skarlinski, brian.graham}@weightwatchers.com
June 2019
2. Outline
• Introduction: data science at WW – the new Weight Watchers
• Problem: scalable, simple modeling and recommendation systems with a small team
• Solution: design and benefits of building a framework
• Implementation: Examples of deployed recommenders
3.
4. WW is a data driven application to help members
on their wellness journeys
Member Social
Network
Activity & Food
tracking
Weight progress &
goals
Recipe & food
database
5. As a new team, we are tasked with building a
foundation of data products
Social
Network:
Connect
Growth
WW
Program
Infra-
structure
Churn model
Return model
LTV models
Single Member View
Recipe recommender
Similar recipes
Composite foods ontology
Personalized feed
Groups search
Who to follow
APIs
Primrose
6. Data science team’s success hinges on effectively
sharing work and knowledge
openopen
Brian
Graham
Reka
Daniel-Weiner
Yameng
(Eliza) Zhang
Kevin
Zecchini
Carl
Anderson
Michael (Mike)
Skarlinski
open
Dec.
2019
May
2018
Jan.
2019
Mar.
2019
Feb.
2019
...
(Hint hint)
How can we build software that helps us grow and develop as a team?
8. Taking stock of our own challenges at WW
What would make a good recommender system at WW?
Slow serialization
but our medium data
can be kept in RAM...
No live features
but we know Docker, k8s...
Easy onboarding
mono repo with config as code...
9. We built a framework to solve our challenges and
enforce our design decisions
(Open source coming soon!!!!!)
11. Primrose has features to address each design
consideration
Python in-memory DAG runner, with no
serialization between nodes of the DAG.
DAG is defined as configuration-as-code
approach -- one container for all models
Abstract ML and data manipulation operations,
data scientists can easily extend the framework
Data science Infrastructure People
Primrose: (Production In-Memory Solution) framework for solving
WW’s most common use cases, caching batched predictions with
machine-learning engineering baked-in.
12. Primrose jobs are executed as Directed Acyclic
Graphs (DAG)s in python
Flexibility: any number of operations
allowed in a single DAG, across any
python library
Data and functions are passed between
nodes in an object that understands how
to extract the correct data for each node
13. DAGs are composed of implementation agnostic,
extensible nodes for data science
Data scientists can write any class that
matches the abstract interface &
incorporate in their DAGs
Data scientists can write individual nodes using
any Python framework or library they choose
14. Primrose is run like an ETL pipeline in a single
docker container for each configuration
15. For simpler deployments: Primrose uses a
“configuration as code” approach
Object configuration and DAG structure
are build in a configuration JSON
Primrose validates the configuration
and instantiates the correct classes at
runtime
Different outputs and results for each
DAG
Recipe recommender DAG JSON
Churn Model DAG JSON
Connect Feed DAG JSON
Primrose container Success, fame, money...
16. The framework has helped our team grow
and develop production models
Deployed 3 production
models and 3 production
recommenders
Onboarded 6 members in less
than a year, everyone is working
in the framework!
We’re going to open-source Primrose !!! Keep on the lookout or contact us!
19. We know you and meet you where you are.
coffee
croissant
fish tacos
apple
cobb salad
pasta with red sauce
ice cream
Personalize your
experience using your data
21. Similar Recipes Flow
US WW Recipes
Similar Ingredients
Similar Names
Filters
dietary
course
cuisine
main ingredient
document = ingredient list or name string
lemmatize, tokenize, TF-IDF
Cosine similarity
Rank
*Only recipes with images*
22. Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
23. Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
24. Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
25. Dinner Recommendations Flow
US WW Recipes
Similar Ingredients
Similar Names Business Logic
Eligible Members
2 weeks of tracking history
Tracked >= 1 recipe
US members
Potential Recs
tracked
most similar
X XX
X
2nd most sim.
n = 4 recommendations
26. Productionalizing is easier the second time
Same BQ reader class,
different SQL input file
New postprocess class to sort, filter and interleave potential recommendations
Success!
logging.warning(‘Data Scientist is developing software engineering skills.’)