Machine learning has become an important tool in the modern software toolbox, and high-performing organizations are increasingly coming to rely on data science and machine learning as a core part of their business. eBay introduced machine learning to its commerce search ranking and drove double-digit increases in revenue. Stitch Fix built a multibillion dollar clothing retail business in the US by combining the best of machines with the best of humans. And WeWork is bringing machine-learned approaches to the physical office environment all around the world. In all cases, algorithmic techniques started simple and slowly became more sophisticated over time. This talk will use these examples to derive an agile approach to machine learning, and will explore that approach across several different dimensions. We will set the stage by outlining the kinds of problems that are most amenable to machine-learned approaches as well as describing some important prerequisites, including investments in data quality, a robust data pipeline, and experimental discipline. Next, we will choose the right (algorithmic) tool for the right job, and suggest how to incrementally evolve the algorithmic approaches we bring to bear. Most fancy cutting-edge recommender systems in the real world, for example, started out with simple rules-based techniques or basic regression. Finally, we will integrate machine learning into the broader product development process, and see how it can help us to accelerate business results
6. Technology @randyshoup
• aka “Optimization Function” or “One
Metric That Matters”
• Discussing and agreeing on this metric
is itself valuable
• Only very few metrics, preferably one
Overall Evaluation
Criterion (OEC)
• E.g., Actions vs. click rate
• E.g., Long-term customer value vs.
short-term revenue
• “Pirate metrics” (AARRR): Acquisition,
Activation, Retention, Revenue,
Referral
Aligned to Business
Value
• Validated by data science, not solely
chosen by product / business
• Look for predictive leading indicators
• Avoid lagging indicators and vanity
metrics
Valid and
Measurable
Evaluating Success
Problem
10. Technology @randyshoup
• Many events, only predictive in
aggregate
• E.g., web search queries, ecommerce
clickstream, Netflix viewing metrics
Big but Shallow
• Few events, each of which is significant
• E.g., ecommerce purchases, WeWork
event attendance
Small but Deep
Characterizing Your Data
Data
12. Technology @randyshoup
• Missing data, partial data
• Improperly or inconsistently formatted
Clean Data
• Consolidated into a single (logical)
location so it can be processed or
analyzed
• Joined together (“enriched”) with other
data sources
Aggregated Data
• Tagged by humans with one or more
labels
• Required to train supervised models
• Complicated and expensive at scale
Labeled Data
Better Data
Data
13. Technology @randyshoup
• More potentially useful attributes
• More data sources
• Longer retention
More Data
• Data pipeline to automate collection and
aggregation
• Move from large batch to mini-batch to
streaming data
Timely Data
Better Data
Data
14. “Data preparation accounts
for about 80% of the work of
data scientists.” – CrowdFlower survey,
2016
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#2d58f4ab6f63
16. Technology @randyshoup
• Encode expert knowledge
• Simple set of imperative if-then-else
statements
• Brittle and primitive
• Surprisingly effective
Rules and Heuristics
• Regression
• Decision trees / forests
• Collaborative filtering
• May be all you need
Simple Algorithms
• Iterative Optimization / Dynamic
Programming
• Neural nets
• Deep learning
• Only when absolutely required
Advanced Techniques
Algorithmic Evolution
Algorithms
17. Technology @randyshoup
• Many real-world problems are best
solved through a combination of several
algorithms
• E.g., Netflix Prize
Portfolio / Ensemble
Approaches
Algorithmic Evolution
Algorithms
20. Technology @randyshoup
• Many common algorithms are highly
accurate, but difficult to interpret
• Model can make a decision, but ew
cannot “explain” its decision
• Particularly important in context of
system bias
• (+) Decision trees / forests, linear
regression
• (-) Neural nets, Deep Learning
Interpretability /
Explainability
• Enable data scientists to be self-
sufficient in experimenting, building,
training, and deploying
• End-to-end responsibility for models in
production
• Write models, deploy models, monitor
model performance
DevOps for
Data Science
• Platform-as-a-service for data scientists
• Programming model that matches the
workflow of a data scientist
• Abstract away infrastructure and other
details
Algorithm
Platform
Scaling Algorithm Development
Algorithms
21. Technology @randyshoup
• Data scientists spin up their own resources
• Both ad-hoc execution and repeatable pipelines
• Data science-friendly programming model exposes ETL and
Matrix transforms
• Abstracts away storage (S3), computation (Docker and ECS), and
the model building pipeline (Spark)
Algorithm Platform-as-a-Service
Algorithms
23. “It doesn’t matter how
beautiful your theory is.
It doesn’t matter how
smart you are.
If it doesn’t agree with
experiment, it’s wrong.”
-- Richard Feynman
24. Technology @randyshoup
• What metrics do you expect to move,
and why
• Understand your baseline
1. State Your
Hypothesis
• Sample size based on effect size
• Separate control and treatment groups,
test for bias
• Split traffic between control and
treatment
2. Design a Real A|B
Test
• Understand customer and system
behavior
• Understand why this experiment worked
or did not
3. Obsessively Log and
Measure
Designing and Running
Experimental Discipline
25. Technology @randyshoup
• Data trumps hope and intuition
• Develop insights for the next experiment
4. Listen to the
Data
• This is a journey, not a single step
5. Rinse and Repeat
Designing and Running
Experimental Discipline
26. Technology @randyshoup
Listen to the Data
Experimental Discipline
• 1/3 of ideas were positive and
statistically significant
• 1/3 of ideas were flat: no
statistically significant difference
• 1/3 of ideas were negative and
statistically significant
https://exp-platform.com/experiments-at-microsoft/
27. “Being wrong isn’t a bad
thing, like they teach
you in school. It is an
opportunity to learn
something.”
-- Richard Feynman
28. Technology @randyshoup
• Low-risk, push-button deployment
• Rapid release cadence
• Rapid rollback and recovery
Repeatable Deployment
Pipeline
• Faster to repair
• Easier to understand
• Simpler to diagnose
Smaller Units of Work
• Changes can be rolled out and rolled
back
• Learnings can be applied in the next
experiment
Enables
Experimentation
Continuous Delivery
Experimental Discipline
29. Technology @randyshoup
• Flag controls whether feature is “on” for
a particular set of users
• Independently discovered at eBay,
Yahoo, Google
• Decouple feature delivery from code
delivery
Enable / Disable feature
via configuration
• Develop / test / verify in production
• Rapid on or off for any reason
Makes Speed Safe
• Overall experiment controlled by feature
flag
• Control vs. treatment
Enables
Experimentation
Feature Flags
Experimental Discipline
30. ● Ranking function for search results
○ Small number of hand-tuned factors Thousands of factors
● Incremental Experimentation
○ Predictive models: query->view, view->purchase, etc.
○ Hundreds of parallel A | B tests
○ Full year of steady, incremental improvements
2% increase in eBay revenue (~$120M / year)
@randyshoup
Machine-Learned Ranking
31. ● Reduce user-experienced latency for search results
● Iterative Process
○ Implement a potential improvement
○ Release to the site in an A | B test
○ Monitor metrics –time to first byte, time to click, click rate, purchase rate
2% increase in eBay revenue (~$120M / year)
@randyshoup
Site Speed
36. Technology
Get the predicted
opening occupancy
based on the
recommended 1-Click
price
Adjust the price to see how
occupancy will change
Occupancy Predictor
WeWork Revenue Optimization
@randyshoup
38. Technology
Office Attributes Based Pricing
Corner office (premium)
Offices with high quality
views (premium)
Calculate and recommend
premium and discounts for
key office attributes
WeWork Revenue Optimization
@randyshoup
39. Technology
Example: Recommend alternative usage for unoccupied spaces
Fully optimize inventory usage by
leveraging demand and
profitability predictions
Inventory Management
WeWork Revenue Optimization
@randyshoup
42. Technology @randyshoup
• Identify and frame a clear business
problem
• … that matters to customers or the
business
• Define clear metric(s) for success
1. Drive from Business
Needs
• Single problem
• Solve problem end-to-end
• Show business results
2. Start Small
• Data collection and storage
• Data cleanliness and preparation
• Reliable, accurate, timely data pipeline
• Better data beats a better model (!)
3. Data Matters
Takeaways
An Agile Approach to Machine Learning
43. Technology @randyshoup
• Start with a Hypothesis
• Design an Experiment
• Separate Control and Experiment
group(s)
• Measure business metric for A vs. B
• Learn and Decide
4. A | B Testing
Discipline
• Simple model / No model
• Rules and Heuristics
• Gradually increase sophistication with
more data and more experience
5. Iteratively Refine
Model
• Find broader applicability across the
business
• Apply to more and more problems
• Move “upstream” in the development
process
6. Iteratively Expand
Applications
Takeaways
An Agile Approach to Machine Learning
44. Technology @randyshoup
• Make decisions with data instead of
guesswork and intuition
• Avoid HiPPO decisionmaking
• Can be threatening to designers,
product managers, decisionmakers
7. Data-Driven Culture
• Set of tools in our toolbox
• Sometimes valuable and useful
• Not a panacea
• Not a substitute for thinking
8. Machine Learning is
not Magic
Takeaways
An Agile Approach to Machine Learning