How to be a Good Machine Learning PM by Google Product Manager

www.productschool.com
How to be a Good Machine Learning
PM by Google Product Manager

FREE INVITE
Join 23,000+ Product Managers on

COURSES
Product Management
Learn the skills you need to land a product manager job

COURSES
Coding for Managers
Build a website and gain the technical knowledge to lead software engineers

COURSES
Data Analytics for Managers
Learn the skills to understand web analytics, SQL and machine learning concepts

COURSES
Digital Marketing for Managers
Learn how to acquire more users and convert them into clients

COURSES
Blockchain for Managers
Learn how to trade cryptocurrencies and build products using the blockchain

Ruben Lozano
TONIGHT’S SPEAKER

Machine Learning for
Product Managers
Product School | Seattle | Oct 17, 2018

Ruben Lozano-Aguilera
Product Manager
Google Cloud

3
Overview: What is ML?
To ML or NOT to ML: When should I use it?
Let’s do ML: What is the ML lifecycle?
Communication: How should I partner with ML scientists?
2
1
4
Agenda

Artificial Intelligence
What is ML?
Machine
Learning
Deep Learning
1950s 1980s 2010s

What is ML?
Rules
Data
Classical
Programming
Answers
Problem Data Algorithm Model Output
Answers
Data
Machine
Learning
Rules
The field of study that gives computers the ability to learn without
being explicitly programmed”
Arthur Samuel
Pioneer of AI research

ML and Statistics
ML optimizes on predictive performance while statistics places importance on
interpretability and parsimony/simplicity.
Statistics Simply Put ML
Dependent/Response/Output Variable The thing you’re trying to predict Label or Target
Independent/Explanatory/Input
Variable
The data that help you make predictions Feature
Data Transformation Reshaping data to get more value out of it Feature
Engineering
Variable/Subset Selection Using the most valuable data Feature Selection

What is ML?
Supervised Learning
Regression
(Quantity)
Classification
(Category)
Linear
Ridge
Lasso
Trees
SVM
KNN
Unsupervised Learning
K-Means
PCA
Collaborative
Filtering

To ML or Not To ML
When should I use ML?
2

To ML when your problem…
Handles very
complex logic Scales-up fast
Adapts in
real-time
Requires
specialized
personalization
…and has existing examples of actual
answers

Sample ML problems
Problem type Description
Ranking
Recommendation
Classification
Regression
Helping users find the most relevant thing
Giving users the thing they may be most
interested in
Figuring out what kind of thing something is
Finding uncommon things
Clustering
Predicting a numerical value of a thing
Example
Anomaly
Putting similar things together
Ranking algorithm within Amazon Search

Sample ML problems
Ranking
Recommendation
Classification
Regression
interested in
Clustering
Example
Anomaly
Recommendations from Netflix
Room suggestions from Google Calendar

Sample ML problems
Ranking
Recommendation
Classification
Regression
interested in
Clustering
Example
Anomaly
Product classification for Amazon catalog
High-Low Dress Straight Dress
Striped Skirt Graphic Shirt

Sample ML problems
Ranking
Recommendation
Classification
Regression
interested in
Clustering
Example
Anomaly
Predicting sales for specific Amazon products
Seasonality | Out of stock | Promotions

Sample ML problems
Ranking
Recommendation
Classification
Regression
interested in
Clustering
Example
Anomaly
Related news from Google Search

Sample ML problems
Ranking
Recommendation
Classification
Regression
interested in
Clustering
Example
Anomaly
Fruit freshness
Before After
Good
Damage
Serious Damage
Decay

To ML when your data…
Is high qualityShould be usedCan be used
Respects privacy
SecureAccessible
Available Fresh
Unbiased
Relevant
Representative
1 2 3

NOT to ML when your problem…
Can be solved by
simple rules
Does not adapt
to new data
Requires full
interpretability
Requires 100%
accuracy

NOT to ML when your data…
Is low qualityShould not be usedCannot be used
Privacy concerns
UnsecureInaccessible
Unavailable Stale
Biased
Irrelevant
Scarce or Incomplete
1 2 3

Exercise: To ML or Not To ML
A. What apparel items should be protected by copyright laws?
B. Which resumes should we prioritize to interview for our candidate pipeline?
C. What products should be exclusively sold to Hispanics in the US?
D. Which sellers have the greatest revenue potential?
E. Where should Amazon build HQ2?
F. Which search queries should we scope for the Amazon Fresh store?

What do you need for ML?
Tools & SystemsProcessesPeople

ML
Scientist
Applied
Scientist
Research
Scientist
Data
Scientist
Data
Engineer
Software
Engineer
Scienc
e
Math; Statistics; ML Algorithms
Engineerin
g
ML Libraries; Data Collection Tools; Programming Languages
ML
Scientis
t
Applied
Scientis
t
Research
Scientist
Data
Scientis
t
Business
Intelligenc
e
Engineer
Data
Enginee
r
Software
Enginee
r
Dev
Manage
r
Technica
l
Program
Manager
Get the right people

Process
ML
Lifecycle
Formulate problem
Select and
preprocess data
Feature engineering
Train, test, and
tune models
2
3
4
1

Formulate the problem
1 PROBLEM 2 DATA 3 FEATURES 4 MODEL
What is the problem to solve?
What is the measurable goal?
What do you want to predict?

Select and preprocess data
Selecting Preprocessing
• Available
• Missing
• Discarding
• Formatting
• Cleaning
• Sampling

Feature engineering
• Feature: Individual measurable property or characteristic of the phenomenon being observed
• Goals: Use domain and data knowledge to develop relevant features from existing raw features in the data to
increase the predictive power of ML
Scaling Decomposition Aggregation

Train, test and tune models
Data Set
Test
Data
Training Data
Model
Training
ML
Model

Productionize
Integrate ML solution with existing software, and keeping it running successfully over time
Deployment
environment
Data storage
Monitoring and
maintenance
Security and
privacy
Great ML problems cannot be productionize due to high implementation costs or inability to
be tested in practice

Product Manager role
in Machine Learning
ML
Lifecycle
Formulate problem
Select and
preprocess data
Feature engineering
Train, test, and
tune models
2
3
4
1

Formulate the problem
To formulate the problem You have to ask the next questions
What is the problem?
PM ROLE Note: The type of problem you solve defines the algorithm to use
(clustering -> k-means)

Problem: You have not use ML before
Increase revenue growth for coached (vs. non-coached) Sellers by X%
at the end of six months.
Each week, the New Seller Success team onboards hundreds of new
Sellers, and this group is expected to grow X% YoY. Personalized
coaching time, however, doesn’t scale. As such, the team needed a
way to accurately predict top performers to double down on.
The top 5% of net new Sellers six months after their launch.
PM ROLE

Problem: You are already using ML
Increase unit oder rate for category X in the US by +X% within the next
X months without affecting revenue
Units per order from category X in the US has remained flat YoY and
engagement has declined as measured by purchase-week frequency.
Category X products that are more likely to be added to a customer cart
based on items in the customer cart
PM ROLE

Selecting Preprocessing
• Formatting
• Cleaning
• Sampling
• Labeling
• Available
• Missing
• Discarding
Select and preprocess data

Selecting data
Select the right datasets
Public
Custom
Internal
for the right purposes
Train and
tune models
Replace flawed
or outdated
data
Measuring
success
PM ROLE

Preprocessing data: Formatting
Format your data consistently, so you can work with it
PM ROLE
Data Type Possible Values Example Usage
Binary 0, 1 (arbitrary labels) binary outcome ("yes/no", "true/false",
"success/failure", etc.)
Categorical
or nominal
1, 2, ..., K (arbitrary labels) categorical outcome (specific blood
type, political party, word, etc.)
Ordinal integer or real
number (arbitrary scale)
relative score, significant only for
creating a ranking
Binomial 0, 1, ..., N number of successes (e.g. yes votes)
out of N possible
Count
nonnegative integers (0, 1,
...)
number of items (telephone calls,
people, molecules, etc.) in given
interval/area

Preprocessing data: Cleaning
Clean
Incomplete
Inconsistent
Noisy
Biased
PM ROLE
means removing or fixing missing data

Preprocessing data: Cleaning
Clean means removing or fixing missing data
Keywords
Recognized
Session?
Is Prime? Customer ID Device
#
Searches
$
iphone case Y N A000 3
iphone case N Mobile 5
iphone case Y N C000 Mobile 10 $ 20
iphone case Y Y D000 Mobile 2
iphone case N E000 Desktop 7 $ 5,000
iphone case N Mobile 4
iphone case N F000 Mobile 8 $ 30
iphone case N Y Tablet 4
iphone case Y Y B000 Mobile $10
iphone case Y N A000 Desktop 1 $ 90
Deletion
$0
$0
$0
$0
$0
Dummy
Substitution
?
Mean
Substitution
Mobile
Frequent
Substitution
Lookup
SubstitutionPM ROLE

Preprocessing data: Sampling
Sampling chooses representative data to solve your problem
ISSUES
STRATEGIES
Random Stratified
Seasonality Trends Leakage Biases
PM ROLE

Preprocessing data: Unintended bias
Sampling chooses representative data to solve your problem
Where to offer Prime Free Same-Day
Delivery?
PM ROLE
Auto labeling
images

Preprocessing data: Labeling
Labeling is tagging or classifying your data
PM ROLE
MANUALAUTOMATED
BIASES
Auditors IncentivesPlurality Metrics
Gold
Standards

develops relevant features from existing raw features
Feature engineering
ML Statistics Simply Put
Label
Target
Dependent/ Response/
Output Variable
The thing you’re trying to
predict
Feature
Independent/
Explanatory/
Input Variable
The data that help you
make predictions
Feature
Engineering
Data Transformation
Reshaping data to get
more value
Feature
Selection
Variable/Subset
Selection
Using the most valuable
data
Feature engineering
PM ROLE

Train, test and tune models
must be trained, tested, and tunedModels
PM ROLE
Data Set
Test
Data
Training
Data
Model
Training
ML
Model

How do you evaluate the model?
Regression (Continuous)
• Root-mean-squared error
• R-squared
Classification (Categorical)
• Accuracy

How do you evaluate the model?
Regression (Continuous)
• Root-mean-squared error
• R-squared
Classification (Categorical)
• Accuracy
• Precision and recall

Precision and Recall
True Positive
Cancer
NoCancer
No Cancer
Cancer
False Positive
False Negative
True Negative
Prediction
TrueState

True Positive
(TP)
Cancer
NoCancer
No Cancer
Cancer
False Positive
(FP)
False Negative
True Negative
Prediction
TrueState
Correct True Predictions
All True Predictions
Precision
(Quality)
TP
TP + FP
What proportion of positive
identifications was actually correct?

True Positive
(TP)
Cancer
NoCancer
No Cancer
Cancer
False Positive
False Negative
(FN)
True Negative
Prediction
TrueState
Correct True Predictions
All True Cases
Recall
(Quantity)
TP
TP + FN
What proportion of actual positives was
identified correctly?

True Positive
Cancer
NoCancer
No Cancer
Cancer
False Positive
False Negative
True Negative
Prediction
TrueState
Precision
Recall0 100
%
100
%

Communication
How can I best partner with scientists?
4

ML
Scientist
Applied
Scientist
Research
Scientist
Data
Scientist
Data
Engineer
Software
Engineer
ML
Scientis
t
Applied
Scientis
t
Research
Scientist
Data
Scientis
t
Business
Intelligenc
e
Engineer
Data
Enginee
r
Software
Enginee
r
Dev
Manage
r
Technica
l
Program
Manager

Treat your ML project as a partnership
“A PM from an ML project I worked on basically threw the requirements over
the fence to me and was mostly unavailable. To meet timelines, I kept
moving forward. Unfortunately, the deliverable at the end of the three-month
project, though aligned with initial business requirements, was not what the
PM wanted and didn’t meet the need. The model never made it into
production and we really didn’t gain any learnings.”

Have a clear problem, hypothesis and
success metric
“PMs who come prepared with a clear, preferably data-driven, problem and
hypothesis will have a much more productive discussion with me than otherwise.
The problem definition need not be perfect, but I do want to understand what’s
been tried, why it isn’t working and what we’re aiming for.”

Be willing to make tradeoffs
Have a clear problem, hypothesis and
success metric

Be willing to make tradeoffs
• Time vs Quality
• White Box vs Black Box
• False Positives vs False Negatives
• Go vs No-Go Metrics

• Help get data and explain it
• Scientists are not Software Engineers
• ML creates tech debt
• Be considerate of scientist time and momentum

www.productschool.com
Part-time Product Management, Coding, Data, Digital
Marketing and Blockchain courses in San Francisco, Silicon
Valley, New York, Santa Monica, Los Angeles, Austin, Boston,
Boulder, Chicago, Denver, Orange County, Seattle, Bellevue,
Toronto, London and Online

How to be a Good Machine Learning PM by Google Product Manager

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to be a Good Machine Learning PM by Google Product Manager

Similar to How to be a Good Machine Learning PM by Google Product Manager (20)

More from Product School

More from Product School (20)

Recently uploaded

Recently uploaded (20)

How to be a Good Machine Learning PM by Google Product Manager