3. Our Mission
โTo share and grow the worldโs knowledgeโ
โ Millions of questions & answers
โ Millions of users
โ Thousands of topics
โ ...
10. Implicit vs. Explicit
โ Many have acknowledged
that implicit feedback is more useful
โ Is implicit feedback really always
more useful?
โ If so, why?
11. โ Implicit data is (usually):
โ More dense, and available for all users
โ Better representative of user behavior vs.
user reflection
โ More related to final objective function
โ Better correlated with AB test results
โ E.g. Rating vs watching
Implicit vs. Explicit
12. โ However
โ It is not always the case that
direct implicit feedback correlates
well with long-term retention
โ E.g. clickbait
โ Solution:
โ Combine different forms of
implicit + explicit to better represent
long-term goal
Implicit vs. Explicit
14. Training a model
โ Model will learn according to:
โ Training data (e.g. implicit and explicit)
โ Target function (e.g. probability of user reading an answer)
โ Metric (e.g. precision vs. recall)
โ Example 1 (made up):
โ Optimize probability of a user going to the cinema to
watch a movie and rate it โhighlyโ by using purchase history
and previous ratings. Use NDCG of the ranking as final
metric using only movies rated 4 or higher as positives.
15. Example 2 - Quoraโs feed
โ Training data = implicit + explicit
โ Target function: Value of showing a story to a
user ~ weighted sum of actions: v = โa
va
1{ya
= 1}
โ predict probabilities for each action, then compute expected
value: v_pred = E[ V | x ] = โa
va
p(a | x)
โ Metric: any ranking metric
17. Supervised/Unsupervised Learning
โ Unsupervised learning as dimensionality reduction
โ Unsupervised learning as feature engineering
โ The โmagicโ behind combining
unsupervised/supervised learning
โ E.g.1 clustering + knn
โ E.g.2 Matrix Factorization
โ MF can be interpreted as
โ Unsupervised:
โ Dimensionality Reduction a la PCA
โ Clustering (e.g. NMF)
โ Supervised
โ Labeled targets ~ regression
18. Supervised/Unsupervised Learning
โ One of the โtricksโ in Deep Learning is how it
combines unsupervised/supervised learning
โ E.g. Stacked Autoencoders
โ E.g. training of convolutional nets
20. Ensembles
โ Netflix Prize was won by an ensemble
โ Initially Bellkor was using GDBTs
โ BigChaos introduced ANN-based ensemble
โ Most practical applications of ML run an ensemble
โ Why wouldnโt you?
โ At least as good as the best of your methods
โ Can add completely different approaches (e.
g. CF and content-based)
โ You can use many different models at the
ensemble layer: LR, GDBTs, RFs, ANNs...
21. Ensembles & Feature Engineering
โ Ensembles are the way to turn any model into a feature!
โ E.g. Donโt know if the way to go is to use Factorization
Machines, Tensor Factorization, or RNNs?
โ Treat each model as a โfeatureโ
โ Feed them into an ensemble
24. Outputs will be inputs
โ Ensembles turn any model into a feature
โ Thatโs great!
โ That can be a mess!
โ Make sure the output of your model is ready to
accept data dependencies
โ E.g. can you easily change the distribution of the
value without affecting all other models
depending on it?
โ Avoid feedback loops
โ Can you treat your ML infrastructure as you would
your software one?
25. ML vs Software
โ Can you treat your ML infrastructure as you would
your software one?
โ Yes and No
โ You should apply best Software Engineering
practices (e.g. encapsulation, abstraction, cohesion,
low couplingโฆ)
โ However, Design Patterns for Machine Learning
software are not well known/documented
27. Feature Engineering
โ Main properties of a well-behaved ML feature
โ Reusable
โ Transformable
โ Interpretable
โ Reliable
โ Reusability: You should be able to reuse features in different
models, applications, and teams
โ Transformability: Besides directly reusing a feature, it
should be easy to use a transformation of it (e.g. log(f), max(f),
โft
over a time windowโฆ)
28. Feature Engineering
โ Main properties of a well-behaved ML feature
โ Reusable
โ Transformable
โ Interpretable
โ Reliable
โ Interpretability: In order to do any of the previous, you
need to be able to understand the meaning of features and
interpret their values.
โ Reliability: It should be easy to monitor and detect bugs/issues
in features
29. Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
โข truthful
โข reusable
โข provides explanation
โข well formatted
โข ...
30. Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
โข Features that relate to the answer
quality itself
โข Interaction features
(upvotes/downvotes, clicks,
commentsโฆ)
โข User features (e.g. expertise in topic)
32. Machine Learning Infrastructure
โ Whenever you develop any ML infrastructure, you need to
target two different modes:
โ Mode 1: ML experimentation
โ Flexibility
โ Easy-to-use
โ Reusability
โ Mode 2: ML production
โ All of the above + performance & scalability
โ Ideally you want the two modes to be as similar as possible
โ How to combine them?
33. Machine Learning Infrastructure: Experimentation & Production
โ Option 1:
โ Favor experimentation and only invest in productionizing
once something shows results
โ E.g. Have ML researchers use R and then ask Engineers
to implement things in production when they work
โ Option 2:
โ Favor production and have โresearchersโ struggle to figure
out how to run experiments
โ E.g. Implement highly optimized C++ code and have ML
researchers experiment only through data available in logs/DB
34. Machine Learning Infrastructure: Experimentation & Production
โ Option 1:
โ Favor experimentation and only invest in productionazing once
something shows results
โ E.g. Have ML researchers use R and then ask Engineers to
implement things in production when they work
โ Option 2:
โ Favor production and have โresearchersโ struggle to figure out
how to run experiments
โ E.g. Implement highly optimized C++ code and have ML
researchers experiment only through data available in logs/DB
35. โ Good intermediate options:
โ Have ML โresearchersโ experiment on iPython Notebooks using
Python tools (scikit-learn, Theanoโฆ). Use same tools in
production whenever possible, implement optimized versions
only when needed.
โ Implement abstraction layers on top of optimized
implementations so they can be accessed from regular/friendly
experimentation tools
Machine Learning Infrastructure: Experimentation & Production
37. Model debuggability
โ Value of a model = value it brings to the product
โ Product owners/stakeholders have expectations on
the product
โ It is important to answer questions to why did
something fail
โ Bridge gap between product design and ML algos
โ Model debuggability is so important it can
determine:
โ Particular model to use
โ Features to rely on
โ Implementation of tools
40. Distributing ML
โ Most of what people do in practice can fit into a multi-
core machine
โ Smart data sampling
โ Offline schemes
โ Efficient parallel code
โ Dangers of โeasyโ distributed approaches such
as Hadoop/Spark
โ Do you care about costs? How about latencies?
41. Distributing ML
โ Example of optimizing computations to fit them into
one machine
โ Spark implementation: 6 hours, 15 machines
โ Developer time: 4 days
โ C++ implementation: 10 minutes, 1 machine
โ Most practical applications of Big Data can fit into
a (multicore) implementation
43. Data Scientists and ML Engineers
โ We all know the definition of a Data Scientist
โ Where do Data Scientists fit in an organization?
โ Many companies struggling with this
โ Valuable to have strong DS who can bring value
from the data
โ Strong DS with solid engineering skills are
unicorns and finding them is not scalable
โ DS need engineers to bring things to production
โ Engineers have enough on their plate to be willing to
โproductionizeโ cool DS projects
44. The data-driven ML innovation funnel
Data Research
ML Exploration -
Product Design
AB Testing
45. Data Scientists and ML Engineers
โ Solution:
โ (1) Define different parts of the innovation funnel
โ Part 1. Data research & hypothesis
building -> Data Science
โ Part 2. ML solution building &
implementation -> ML Engineering
โ Part 3. Online experimentation, AB
Testing analysis-> Data Science
โ (2) Broaden the definition of ML Engineers
to include from coding experts with high-level
ML knowledge to ML experts with good
software skills
Data Research
ML Solution
AB Testing
Data
Science
Data
Science
ML
Engineering
47. โ Make sure you teach your model what you
want it to learn
โ Ensembles and the combination of
supervised/unsupervised techniques are key
in many ML applications
โ Important to focus on feature engineering
โ Be thoughtful about
โ your ML infrastructure/tools
โ about organizing your teams