In this presentation you will learn:
-Machine Learning definition and the different types of problems it can solve
-Framework to decide if your specific problem could or should be solved with Machine Learning
-The role that a Product Manager plays in each part of the Machine Learning lifecycle
11. 3
Overview: What is ML?
To ML or NOT to ML: When should I use it?
Let’s do ML: What is the ML lifecycle?
Communication: How should I partner with ML scientists?
2
1
4
Agenda
14. What is ML?
Rules
Data
Classical
Programming
Answers
Problem Data Algorithm Model Output
Answers
Data
Machine
Learning
Rules
The field of study that gives computers the ability to learn without
being explicitly programmed”
Arthur Samuel
Pioneer of AI research
15. ML and Statistics
ML optimizes on predictive performance while statistics places importance on
interpretability and parsimony/simplicity.
Statistics Simply Put ML
Dependent/Response/Output Variable The thing you’re trying to predict Label or Target
Independent/Explanatory/Input
Variable
The data that help you make predictions Feature
Data Transformation Reshaping data to get more value out of it Feature
Engineering
Variable/Subset Selection Using the most valuable data Feature Selection
16. What is ML?
Supervised Learning
Regression
(Quantity)
Classification
(Category)
Linear
Ridge
Lasso
Trees
SVM
KNN
Unsupervised Learning
K-Means
PCA
Collaborative
Filtering
19. To ML when your problem…
Handles very
complex logic Scales-up fast
Adapts in
real-time
Requires
specialized
personalization
…and has existing examples of actual
answers
20. Sample ML problems
Problem type Description
Ranking
Recommendation
Classification
Regression
Helping users find the most relevant thing
Giving users the thing they may be most
interested in
Figuring out what kind of thing something is
Finding uncommon things
Clustering
Predicting a numerical value of a thing
Example
Anomaly
Putting similar things together
Ranking algorithm within Amazon Search
21. Sample ML problems
Problem type Description
Ranking
Recommendation
Classification
Regression
Helping users find the most relevant thing
Giving users the thing they may be most
interested in
Figuring out what kind of thing something is
Finding uncommon things
Clustering
Predicting a numerical value of a thing
Example
Anomaly
Putting similar things together
Recommendations from Netflix
Room suggestions from Google Calendar
22. Sample ML problems
Problem type Description
Ranking
Recommendation
Classification
Regression
Helping users find the most relevant thing
Giving users the thing they may be most
interested in
Figuring out what kind of thing something is
Finding uncommon things
Clustering
Predicting a numerical value of a thing
Example
Anomaly
Putting similar things together
Product classification for Amazon catalog
High-Low Dress Straight Dress
Striped Skirt Graphic Shirt
23. Sample ML problems
Problem type Description
Ranking
Recommendation
Classification
Regression
Helping users find the most relevant thing
Giving users the thing they may be most
interested in
Figuring out what kind of thing something is
Finding uncommon things
Clustering
Predicting a numerical value of a thing
Example
Anomaly
Putting similar things together
Predicting sales for specific Amazon products
Seasonality | Out of stock | Promotions
24. Sample ML problems
Problem type Description
Ranking
Recommendation
Classification
Regression
Helping users find the most relevant thing
Giving users the thing they may be most
interested in
Figuring out what kind of thing something is
Finding uncommon things
Clustering
Predicting a numerical value of a thing
Example
Anomaly
Putting similar things together
Related news from Google Search
25. Sample ML problems
Problem type Description
Ranking
Recommendation
Classification
Regression
Helping users find the most relevant thing
Giving users the thing they may be most
interested in
Figuring out what kind of thing something is
Finding uncommon things
Clustering
Predicting a numerical value of a thing
Example
Anomaly
Putting similar things together
Fruit freshness
Before After
Good
Damage
Serious Damage
Decay
26. To ML when your data…
Is high qualityShould be usedCan be used
Respects privacy
SecureAccessible
Available Fresh
Unbiased
Relevant
Representative
1 2 3
27. NOT to ML when your problem…
Can be solved by
simple rules
Does not adapt
to new data
Requires full
interpretability
Requires 100%
accuracy
28. NOT to ML when your data…
Is low qualityShould not be usedCannot be used
Privacy concerns
UnsecureInaccessible
Unavailable Stale
Biased
Irrelevant
Scarce or Incomplete
1 2 3
29. Exercise: To ML or Not To ML
A. What apparel items should be protected by copyright laws?
B. Which resumes should we prioritize to interview for our candidate pipeline?
C. What products should be exclusively sold to Hispanics in the US?
D. Which sellers have the greatest revenue potential?
E. Where should Amazon build HQ2?
F. Which search queries should we scope for the Amazon Fresh store?
34. Formulate the problem
Tools & SystemsProcessesPeople
1 PROBLEM 2 DATA 3 FEATURES 4 MODEL
What is the problem to solve?
What is the measurable goal?
What do you want to predict?
35. Select and preprocess data
Tools & SystemsProcessesPeople
1 PROBLEM 2 DATA 3 FEATURES 4 MODEL
Selecting Preprocessing
• Available
• Missing
• Discarding
• Formatting
• Cleaning
• Sampling
36. Feature engineering
Tools & SystemsProcessesPeople
1 PROBLEM 2 DATA 3 FEATURES 4 MODEL
• Feature: Individual measurable property or characteristic of the phenomenon being observed
• Goals: Use domain and data knowledge to develop relevant features from existing raw features in the data to
increase the predictive power of ML
Scaling Decomposition Aggregation
37. Train, test and tune models
Tools & SystemsProcessesPeople
1 PROBLEM 2 DATA 3 FEATURES 4 MODEL
Data Set
Test
Data
Training Data
Model
Training
ML
Model
38. Productionize
Integrate ML solution with existing software, and keeping it running successfully over time
Tools & SystemsProcessesPeople
Deployment
environment
Data storage
Monitoring and
maintenance
Security and
privacy
Great ML problems cannot be productionize due to high implementation costs or inability to
be tested in practice
39. Product Manager role
in Machine Learning
ML
Lifecycle
Formulate problem
Select and
preprocess data
Feature engineering
Train, test, and
tune models
2
3
4
1
41. Formulate the problem
To formulate the problem You have to ask the next questions
What is the problem?
What is the measurable goal?
What do you want to predict?
PM ROLE Note: The type of problem you solve defines the algorithm to use
(clustering -> k-means)
42. Problem: You have not use ML before
To formulate the problem You have to ask the next questions
What is the problem?
What is the measurable goal?
What do you want to predict?
Increase revenue growth for coached (vs. non-coached) Sellers by X%
at the end of six months.
Each week, the New Seller Success team onboards hundreds of new
Sellers, and this group is expected to grow X% YoY. Personalized
coaching time, however, doesn’t scale. As such, the team needed a
way to accurately predict top performers to double down on.
The top 5% of net new Sellers six months after their launch.
PM ROLE
43. Problem: You are already using ML
To formulate the problem You have to ask the next questions
What is the problem?
What is the measurable goal?
What do you want to predict?
Increase unit oder rate for category X in the US by +X% within the next
X months without affecting revenue
Units per order from category X in the US has remained flat YoY and
engagement has declined as measured by purchase-week frequency.
Category X products that are more likely to be added to a customer cart
based on items in the customer cart
PM ROLE
46. Selecting data
Select the right datasets
Public
Custom
Internal
for the right purposes
Train and
tune models
Replace flawed
or outdated
data
Measuring
success
PM ROLE
47. Preprocessing data: Formatting
Format your data consistently, so you can work with it
PM ROLE
Data Type Possible Values Example Usage
Binary 0, 1 (arbitrary labels) binary outcome ("yes/no", "true/false",
"success/failure", etc.)
Categorical
or nominal
1, 2, ..., K (arbitrary labels) categorical outcome (specific blood
type, political party, word, etc.)
Ordinal integer or real
number (arbitrary scale)
relative score, significant only for
creating a ranking
Binomial 0, 1, ..., N number of successes (e.g. yes votes)
out of N possible
Count
nonnegative integers (0, 1,
...)
number of items (telephone calls,
people, molecules, etc.) in given
interval/area
49. Preprocessing data: Cleaning
Clean means removing or fixing missing data
Keywords
Recognized
Session?
Is Prime? Customer ID Device
#
Searches
$
iphone case Y N A000 3
iphone case N Mobile 5
iphone case Y N C000 Mobile 10 $ 20
iphone case Y Y D000 Mobile 2
iphone case N E000 Desktop 7 $ 5,000
iphone case N Mobile 4
iphone case N F000 Mobile 8 $ 30
iphone case N Y Tablet 4
iphone case Y Y B000 Mobile $10
iphone case Y N A000 Desktop 1 $ 90
Deletion
$0
$0
$0
$0
$0
Dummy
Substitution
?
Mean
Substitution
Mobile
Frequent
Substitution
Lookup
SubstitutionPM ROLE
50. Preprocessing data: Sampling
Sampling chooses representative data to solve your problem
ISSUES
STRATEGIES
Random Stratified
Seasonality Trends Leakage Biases
PM ROLE
51. Preprocessing data: Unintended bias
Sampling chooses representative data to solve your problem
Where to offer Prime Free Same-Day
Delivery?
PM ROLE
Auto labeling
images
52. Preprocessing data: Labeling
Labeling is tagging or classifying your data
PM ROLE
MANUALAUTOMATED
BIASES
Auditors IncentivesPlurality Metrics
Gold
Standards
54. develops relevant features from existing raw features
Feature engineering
ML Statistics Simply Put
Label
Target
Dependent/ Response/
Output Variable
The thing you’re trying to
predict
Feature
Independent/
Explanatory/
Input Variable
The data that help you
make predictions
Feature
Engineering
Data Transformation
Reshaping data to get
more value
Feature
Selection
Variable/Subset
Selection
Using the most valuable
data
Feature engineering
PM ROLE
56. Train, test and tune models
must be trained, tested, and tunedModels
PM ROLE
Data Set
Test
Data
Training
Data
Model
Training
ML
Model
57. How do you evaluate the model?
Regression (Continuous)
• Root-mean-squared error
• R-squared
Classification (Categorical)
• Accuracy
58. How do you evaluate the model?
Regression (Continuous)
• Root-mean-squared error
• R-squared
Classification (Categorical)
• Accuracy
• Precision and recall
59. Precision and Recall
True Positive
Cancer
NoCancer
No Cancer
Cancer
False Positive
False Negative
True Negative
Prediction
TrueState
60. Precision and Recall
True Positive
(TP)
Cancer
NoCancer
No Cancer
Cancer
False Positive
(FP)
False Negative
True Negative
Prediction
TrueState
Correct True Predictions
All True Predictions
Precision
(Quality)
TP
TP + FP
What proportion of positive
identifications was actually correct?
61. Precision and Recall
True Positive
(TP)
Cancer
NoCancer
No Cancer
Cancer
False Positive
False Negative
(FN)
True Negative
Prediction
TrueState
Correct True Predictions
All True Cases
Recall
(Quantity)
TP
TP + FN
What proportion of actual positives was
identified correctly?
62. Precision and Recall
True Positive
Cancer
NoCancer
No Cancer
Cancer
False Positive
False Negative
True Negative
Prediction
TrueState
Precision
Recall0 100
%
100
%
64. How can I best partner with scientists?
ML
Scientist
Applied
Scientist
Research
Scientist
Data
Scientist
Data
Engineer
Software
Engineer
ML
Scientis
t
Applied
Scientis
t
Research
Scientist
Data
Scientis
t
Business
Intelligenc
e
Engineer
Data
Enginee
r
Software
Enginee
r
Dev
Manage
r
Technica
l
Program
Manager
65. How can I best partner with scientists?
Treat your ML project as a partnership
“A PM from an ML project I worked on basically threw the requirements over
the fence to me and was mostly unavailable. To meet timelines, I kept
moving forward. Unfortunately, the deliverable at the end of the three-month
project, though aligned with initial business requirements, was not what the
PM wanted and didn’t meet the need. The model never made it into
production and we really didn’t gain any learnings.”
66. How can I best partner with scientists?
Treat your ML project as a partnership
Have a clear problem, hypothesis and
success metric
“PMs who come prepared with a clear, preferably data-driven, problem and
hypothesis will have a much more productive discussion with me than otherwise.
The problem definition need not be perfect, but I do want to understand what’s
been tried, why it isn’t working and what we’re aiming for.”
67. How can I best partner with scientists?
Be willing to make tradeoffs
Treat your ML project as a partnership
Have a clear problem, hypothesis and
success metric
68. How can I best partner with scientists?
Be willing to make tradeoffs
• Time vs Quality
• White Box vs Black Box
• False Positives vs False Negatives
• Go vs No-Go Metrics
69. How can I best partner with scientists?
• Help get data and explain it
• Scientists are not Software Engineers
• ML creates tech debt
• Be considerate of scientist time and momentum
71. www.productschool.com
Part-time Product Management, Coding, Data, Digital
Marketing and Blockchain courses in San Francisco, Silicon
Valley, New York, Santa Monica, Los Angeles, Austin, Boston,
Boulder, Chicago, Denver, Orange County, Seattle, Bellevue,
Toronto, London and Online