SlideShare a Scribd company logo
1 of 64
Learning
machine learning
with Yellowbrick
The Model Selection Triple
Arun Kumar http://bit.ly/2abVNrI
Feature
Analysis
Algorithm
Selection
Hyperparameter
Tuning
Feature Analysis
Use radviz or parallel
coordinates to look for
class separability
Yellowbrick Feature Visualizers
● Based on spring tension
minimization algorithm.
● Features equally spaced on a unit
circle, instances dropped into circle.
● Features pull instances towards
their position on the circle in
proportion to their normalized
numerical value for that instance.
● Classification coloring based on
labels in data.
Radial Visualization
Before and after
standardization
Parallel Coordinates
Parallel Coordinates
● Visualize clusters in data.
● Points represented as connected
line segments.
● Each vertical line represents one
attribute (x-axis units not
meaningful).
● One set of connected line segments
represents one instance.
● Points that tend to cluster will
appear closer together.
Use Rank2D for pairwise feature
analysis, find strong correlations
(potential collinearity?)
Rank2D
Rank2D
● Feature engineering requires
understanding of the relationships
between features
● Visualize pairwise relationships as
a heatmap
● Pearson shows us strong
correlations, potential collinearity
● Covariance helps us understand
the sequence of relationships
PCA Projection Plots
● Uses PCA to decompose high
dimensional data into two or three
dimensions
● Each instance plotted in a scatter
plot.
● Projected dataset can be analyzed
along axes of principle variation
● Can be interpreted to determine if
spherical distance metrics can be
utilized.
PCA Projection Plots
Can also plot in 3D to visualize more
components & get a better sense of
distribution in high dimensions
Visualize top tokens,
document distribution &
part-of-speech tagging
Feature Visualizers for Text
How do I select the right
features?
Feature Importance Plot
● Need to select the minimum
required features to produce a
valid model.
● The more features a model
contains, the more complex it is
(sparse data, errors due to
variance).
● This visualizer ranks and plots
underlying impact of features
relative to each other.
Recursive Feature Elimination
Recursive Feature Elimination
● Recursive feature elimination fits a
model and removes the weakest
feature(s) until the specified
number is reached.
● Features are ranked by internal
model’s coef_ or
feature_importances_
● Attempts to eliminate
dependencies and collinearity that
may exist in the model.
Model Evaluation
Evaluating Classifiers
● How well did predicted values match actual labeled values?
● In a 2-class problem, there are two ways to be “right”:
○ Classifier correctly identifies cases (aka “True Positives”)
○ Classifier correctly identifies non-cases (aka “True Negatives”)
● ...and two ways to be “wrong”:
○ Classifier incorrectly identifies a non-case as a case (aka “False Positive” or
“Type I Error”)
○ Classifier incorrectly identifies a case as a non-case (aka “False Negative”
or “Type II Error”)
Metrics for Classification
Metric Measures In Scikit-learn
Precision How many selected are relevant? from sklearn.metrics import precision_score
Recall How many relevant were selected? from sklearn.metrics import recall_score
F1 Weighted average of precision & recall from sklearn.metrics import f1_score
Confusion Matrix True positives, true negatives, false
positives, false negatives
from sklearn.metrics import confusion_matrix
ROC True positive rate vs. false positive rate, as
classification threshold varies
from sklearn.metrics import roc
AUC Aggregate accuracy, as classification
threshold varies
from sklearn.metrics import auc
accuracy = true positives + true negatives / total
precision = true positives / (true positives + false
positives)
recall = true positives / (false negatives + true
positives)
F1 score = 2 * ((precision * recall) / (precision + recall))
Visualize
accuracy
and begin to
diagnose
problems
Yellowbrick Score Visualizers
Classification Report
from sklearn.metrics import classification_report as cr
print(cr(y, yhat, target_names=target_names))
● includes same basic info as confusion matrix
● 3 different evaluation metrics: precision, recall, F1 score
● includes class labels for interpretability
Classification Heatmaps
Precision: of
those labelled
edible, how
many actually
were?Is it better
to have
false
positives
here or
here?
Recall: how many
of the
poisonous ones
did our model
find?
ROC-AUC
from sklearn.metrics import roc_curve, auc
fpr, tpr, thresholds = roc_curve(y,yhat)
roc_auc = auc(fpr, tpr)
Visualize tradeoff between classifier's sensitivity (how well it finds true
positives) and specificity (how well it avoids false positives)
● straight horizontal line -> perfect classifier
● pulling a lot toward the upper left corner -> good accuracy
● exactly aligned with the diagonal -> coin toss
Getting more right comes at the
expense of getting more wrong
ROC-AUC
ROC-AUC for Multiclass Classification
ROC curves are typically used in
binary classification,, but
Yellowbrick allows for multiclass
classification evaluation by
binarizing output (per-class) or
using one-vs-rest (micro score)
or one-vs-all (macro score)
strategies of classification.
Confusion Matrix
● takes as an argument actual values
and predicted values generated by
the fitted model
● outputs a confusion matrix
from sklearn.metrics import confusion_matrix
I have a lot
of classes;
how does
my model
perform on
each?
Do I care
about certain
classes
more than
others?
Confusion Matrix
Class Prediction Error Plot
Similar to
confusion
matrix, but
sometimes
more
interpretable
Discrimination Threshold Visualizer
* for binary
classification
only
● Probability or score at
which positive class is
chosen over negative.
● Generally set to 50%
● Can be adjusted to
increase/decrease
sensitivity to false
positives or other
application factors
● Cases that require special
treatment?
Evaluating Regressors
● How well does the model describe the training data?
● How well does the model predict out-of-sample data?
○ Goodness-of-fit
○ Randomness of residuals
○ Prediction error
Metrics for Regression
Metric Measures In Scikit-learn
Mean Square
Error (MSE,
RMSE)
distance between predicted values and
actual values (more sensitive to
outliers)
from sklearn.metrics import mean_squared_error
Absolute Error
(MAE, RAE)
distance between predicted values and
actual values (less sensitive to outliers)
from sklearn.metrics import
mean_absolute_error, median_absolute_error
Coefficient of
Determination (R-
Squared)
% of variance explained by the
regression; how well future samples are
likely to be predicted by the model
from sklearn.metrics import r2_score
Visualize the
distribution of error to
diagnose
heteroscedasticity
Yellowbrick Score Visualizers
Prediction Error Plots
from sklearn.model_selection import
cross_val_predict
● Cross-validation is a way of measuring model
performance.
● Divide data into training and test splits; fit model on
training, predict on test.
● Use cross_val_predict to visualize prediction errors as a
scatterplot of the predicted and actual values.
Prediction Error Plots
Plotting Residuals
● Standardized y-axis
● Model prediction on x-axis.
● Model accuracy on y-axis; distance from line at 0
indicates how good/bad the prediction was for
that value.
● Check whether residuals are consistent with
random error; data points should appear evenly
dispersed around the plotted line. Should not be
able to predict error.
● Visualize train and test data with different colors.
Plotting Residuals
Metrics for Clustering ...
Maybe?
● Silhouette scores
● Elbow curves
Metrics for Clustering ...
Why is my F1/R2 so low?
● What to do with a low-
accuracy classifier?
● Check for class imbalance.
● Visual cue that we might
try stratified sampling,
oversampling, or getting
more data.
Class Balance
Cross Validation Scores
● Real world data are often
distributed somewhat
unevenly; the fitted model
likely to perform better on
some sections of data than
others.
● See cross-validated scores as a
bar chart (one bar for each
fold) with average score across
all folds plotted as dotted line.
● Explore variations in
performance using different
cross validation strategies.
Learning Curve
● Relationship of the training score
vs. the cross validated test score
for an estimator.
● Do we need more data? If the
scores converge together, then
probably not. If the training score
is much higher than the validation
score, then yes.
● Is the estimator more sensitive to
error due to variance or error due
to bias?
Validation Curve
● Plot the influence of a single
hyperparameter on the training
and test data.
● Is the estimator under- or over-
fitting for some hyperparameter
values?
For SVC, gamma is the coefficient
of the RBF kernel. The larger
gamma is, the tighter the support
vector is around single points (e.g.
overfitting). Here around gamma=0.1
the SVC memorizes the data.
Hyperparameter
Tuning
Hyperparameters
● When we call fit() on an estimator, it learns the parameters of the algorithm
that make it fit the data best.
● However, some parameters are not directly learned within an estimator.
These are the ones we provide when we instantiate the estimator.
○ alpha for LASSO or Ridge
○ C, kernel, and gamma for SVC
● These parameters are often referred to as hyperparameters.
Examples:
● Alpha/penalty for regularization
● Kernel function in support vector machine
● Leaves or depth of a decision tree
● Neighbors used in a nearest neighbor classifier
● Clusters in a k-means clustering
Hyperparameters
How to pick the best hyperparameters?
● Use the defaults
● Pick randomly
● Search parameter space for the best score
(e.g. grid search)
… Except that hyperparameter space is large
and gridsearch is slow if you don’t know
already what you’re looking for.
Hyperparameters
How do I tune this model?
Should I use Lasso,
Ridge, or ElasticNet?
Is regularlization
even working?
More alpha => less
complexity
Reduced bias, but
increased variance
Alpha selection with Yellowbrick
● How many clusters do
you see?
● How do you pick an
initial value for k in k-
means clustering?
● How do you know
whether to increase or
decrease k?
● Is partitive clustering
the right choice?
What’s the right k?
higher silhouette scores
mean denser, more
separate clusters
The elbow
shows the
best value
of k…
Or suggests
a different
algorithm
K-selection with Yellowbrick
Manifold Visualization
● Embed instances described
by many dimensions into 2.
● Look for latent structures in
the data, noise, separability.
● Is it possible to create a
decision space in the data?
● Unlike PCA or SVD,
manifolds use nearest
neighbors, can capture non-
linear structures.
Using Yellowbrick
Install:
$ pip install yellowbrick
Upgrade:
$ pip install -U yellowbrick
Anaconda:
$ conda install -c districtdatalabs yellowbrick
Quickstart
# Import the estimator
from sklearn.linear_model import Lasso
# Instantiate the estimator
model = Lasso()
# Fit the data to the estimator
model.fit(X_train, y_train)
# Generate a prediction
model.predict(X_test)
Scikit-Learn Estimator Interface
# Import the model and visualizer
from sklearn.linear_model import Lasso
from yellowbrick.regressor import PredictionError
# Instantiate the visualizer
visualizer = PredictionError(Lasso())
# Fit
visualizer.fit(X_train, y_train)
# Score and visualize
visualizer.score(X_test, y_test)
visualizer.poof()
Yellowbrick Visualizer Interface
The main API implemented by
Scikit-Learn is that of the
estimator. An estimator is any
object that learns from data;
it may be a classification,
regression or clustering algorithm,
or a transformer that
extracts/filters useful features
from raw data.
class Estimator(object):
def fit(self, X, y=None):
"""
Fits estimator to data.
"""
# set state of self
return self
def predict(self, X):
"""
Predict response of X
"""
# compute predictions pred
return pred
Scikit-learn Estimators
Transformers are special
cases of Estimators --
instead of making
predictions, they transform
the input dataset X to a new
dataset X′.
class Transformer(Estimator):
def transform(self, X):
"""
Transforms the input data.
"""
# transform X to X_prime
return X_prime
Scikit-learn Transformers
A visualizer is an estimator that
produces visualizations based
on data rather than new
datasets or predictions.
Visualizers are intended to work
in concert with Transformers
and Estimators to shed light
onto the modeling process.
class Visualizer(Estimator):
def draw(self):
"""
Draw the data
"""
self.ax.plot()
def finalize(self):
"""
Complete the figure
"""
self.ax.set_title()
def poof(self):
"""
Show the figure
"""
plt.show()
Yellowbrick Visualizers
Contributing
Yellowbrick is an open source project that is supported by
a community who will gratefully and humbly accept any
contributions you might make to the project.
Large or small, any contribution makes a big difference;
and if you’ve never contributed to an open source project
before, we hope you will start with Yellowbrick!
Please star Yellowbrick on GitHub!
github.com/DistrictDataLabs/yellowbrick

More Related Content

What's hot

Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...Edge AI and Vision Alliance
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Edge Detection algorithm and code
Edge Detection algorithm and codeEdge Detection algorithm and code
Edge Detection algorithm and codeVaddi Manikanta
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learningAmr BARAKAT
 
Harris corner detector and face recognition
Harris corner detector and face recognitionHarris corner detector and face recognition
Harris corner detector and face recognitionShih Wei Huang
 
Designing a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDesigning a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDipesh Shome
 
Fuzzy c-means clustering for image segmentation
Fuzzy c-means  clustering for image segmentationFuzzy c-means  clustering for image segmentation
Fuzzy c-means clustering for image segmentationDharmesh Patel
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 

What's hot (20)

Hog
HogHog
Hog
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Super resolution
Super resolutionSuper resolution
Super resolution
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Edge Detection algorithm and code
Edge Detection algorithm and codeEdge Detection algorithm and code
Edge Detection algorithm and code
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Eigenfaces
EigenfacesEigenfaces
Eigenfaces
 
Harris corner detector and face recognition
Harris corner detector and face recognitionHarris corner detector and face recognition
Harris corner detector and face recognition
 
Designing a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean ClassifierDesigning a Minimum Distance classifier to Class Mean Classifier
Designing a Minimum Distance classifier to Class Mean Classifier
 
Fuzzy c-means clustering for image segmentation
Fuzzy c-means  clustering for image segmentationFuzzy c-means  clustering for image segmentation
Fuzzy c-means clustering for image segmentation
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 

Similar to Learning machine learning with Yellowbrick

Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 InternshipTaylor Martell
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningNazmus Sakib
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
Evaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationEvaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationSridhar Nomula
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Rebecca Bilbro
 
WIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual DiagnosticsWIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual DiagnosticsWomen in Analytics Conference
 

Similar to Learning machine learning with Yellowbrick (20)

Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
working with python
working with pythonworking with python
working with python
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...
 
MACHINE LEARNING.pptx
MACHINE LEARNING.pptxMACHINE LEARNING.pptx
MACHINE LEARNING.pptx
 
Evaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationEvaluation of multilabel multi class classification
Evaluation of multilabel multi class classification
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
WIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual DiagnosticsWIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual Diagnostics
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 

More from Rebecca Bilbro

Data Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in ProductionData Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in ProductionRebecca Bilbro
 
Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Rebecca Bilbro
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine LearningRebecca Bilbro
 
Anti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual ConsistencyAnti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual ConsistencyRebecca Bilbro
 
The Promise and Peril of Very Big Models
The Promise and Peril of Very Big ModelsThe Promise and Peril of Very Big Models
The Promise and Peril of Very Big ModelsRebecca Bilbro
 
Beyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusBeyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusRebecca Bilbro
 
PyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningRebecca Bilbro
 
EuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scaleEuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scaleRebecca Bilbro
 
Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scaleRebecca Bilbro
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsRebecca Bilbro
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistRebecca Bilbro
 
Escaping the Black Box
Escaping the Black BoxEscaping the Black Box
Escaping the Black BoxRebecca Bilbro
 
Data Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword CorpusData Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword CorpusRebecca Bilbro
 
Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)Rebecca Bilbro
 
Yellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersYellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersRebecca Bilbro
 
Visualizing the model selection process
Visualizing the model selection processVisualizing the model selection process
Visualizing the model selection processRebecca Bilbro
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday PeopleRebecca Bilbro
 
Commerce Data Usability Project
Commerce Data Usability ProjectCommerce Data Usability Project
Commerce Data Usability ProjectRebecca Bilbro
 

More from Rebecca Bilbro (20)

Data Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in ProductionData Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in Production
 
Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)
 
(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning(Py)testing the Limits of Machine Learning
(Py)testing the Limits of Machine Learning
 
Anti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual ConsistencyAnti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual Consistency
 
The Promise and Peril of Very Big Models
The Promise and Peril of Very Big ModelsThe Promise and Peril of Very Big Models
The Promise and Peril of Very Big Models
 
Beyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusBeyond Off the-Shelf Consensus
Beyond Off the-Shelf Consensus
 
PyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine Learning
 
EuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scaleEuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scale
 
Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scale
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in space
Words in spaceWords in space
Words in space
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
Camlis
CamlisCamlis
Camlis
 
Escaping the Black Box
Escaping the Black BoxEscaping the Black Box
Escaping the Black Box
 
Data Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword CorpusData Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword Corpus
 
Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)
 
Yellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersYellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformers
 
Visualizing the model selection process
Visualizing the model selection processVisualizing the model selection process
Visualizing the model selection process
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
Commerce Data Usability Project
Commerce Data Usability ProjectCommerce Data Usability Project
Commerce Data Usability Project
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Learning machine learning with Yellowbrick

  • 2. The Model Selection Triple Arun Kumar http://bit.ly/2abVNrI Feature Analysis Algorithm Selection Hyperparameter Tuning
  • 3.
  • 5. Use radviz or parallel coordinates to look for class separability Yellowbrick Feature Visualizers
  • 6. ● Based on spring tension minimization algorithm. ● Features equally spaced on a unit circle, instances dropped into circle. ● Features pull instances towards their position on the circle in proportion to their normalized numerical value for that instance. ● Classification coloring based on labels in data. Radial Visualization
  • 8. Parallel Coordinates ● Visualize clusters in data. ● Points represented as connected line segments. ● Each vertical line represents one attribute (x-axis units not meaningful). ● One set of connected line segments represents one instance. ● Points that tend to cluster will appear closer together.
  • 9. Use Rank2D for pairwise feature analysis, find strong correlations (potential collinearity?) Rank2D
  • 10. Rank2D ● Feature engineering requires understanding of the relationships between features ● Visualize pairwise relationships as a heatmap ● Pearson shows us strong correlations, potential collinearity ● Covariance helps us understand the sequence of relationships
  • 11. PCA Projection Plots ● Uses PCA to decompose high dimensional data into two or three dimensions ● Each instance plotted in a scatter plot. ● Projected dataset can be analyzed along axes of principle variation ● Can be interpreted to determine if spherical distance metrics can be utilized.
  • 12. PCA Projection Plots Can also plot in 3D to visualize more components & get a better sense of distribution in high dimensions
  • 13. Visualize top tokens, document distribution & part-of-speech tagging Feature Visualizers for Text
  • 14. How do I select the right features?
  • 15. Feature Importance Plot ● Need to select the minimum required features to produce a valid model. ● The more features a model contains, the more complex it is (sparse data, errors due to variance). ● This visualizer ranks and plots underlying impact of features relative to each other.
  • 17. Recursive Feature Elimination ● Recursive feature elimination fits a model and removes the weakest feature(s) until the specified number is reached. ● Features are ranked by internal model’s coef_ or feature_importances_ ● Attempts to eliminate dependencies and collinearity that may exist in the model.
  • 19. Evaluating Classifiers ● How well did predicted values match actual labeled values? ● In a 2-class problem, there are two ways to be “right”: ○ Classifier correctly identifies cases (aka “True Positives”) ○ Classifier correctly identifies non-cases (aka “True Negatives”) ● ...and two ways to be “wrong”: ○ Classifier incorrectly identifies a non-case as a case (aka “False Positive” or “Type I Error”) ○ Classifier incorrectly identifies a case as a non-case (aka “False Negative” or “Type II Error”)
  • 20. Metrics for Classification Metric Measures In Scikit-learn Precision How many selected are relevant? from sklearn.metrics import precision_score Recall How many relevant were selected? from sklearn.metrics import recall_score F1 Weighted average of precision & recall from sklearn.metrics import f1_score Confusion Matrix True positives, true negatives, false positives, false negatives from sklearn.metrics import confusion_matrix ROC True positive rate vs. false positive rate, as classification threshold varies from sklearn.metrics import roc AUC Aggregate accuracy, as classification threshold varies from sklearn.metrics import auc
  • 21. accuracy = true positives + true negatives / total precision = true positives / (true positives + false positives) recall = true positives / (false negatives + true positives) F1 score = 2 * ((precision * recall) / (precision + recall))
  • 23. Classification Report from sklearn.metrics import classification_report as cr print(cr(y, yhat, target_names=target_names)) ● includes same basic info as confusion matrix ● 3 different evaluation metrics: precision, recall, F1 score ● includes class labels for interpretability
  • 24. Classification Heatmaps Precision: of those labelled edible, how many actually were?Is it better to have false positives here or here? Recall: how many of the poisonous ones did our model find?
  • 25. ROC-AUC from sklearn.metrics import roc_curve, auc fpr, tpr, thresholds = roc_curve(y,yhat) roc_auc = auc(fpr, tpr) Visualize tradeoff between classifier's sensitivity (how well it finds true positives) and specificity (how well it avoids false positives) ● straight horizontal line -> perfect classifier ● pulling a lot toward the upper left corner -> good accuracy ● exactly aligned with the diagonal -> coin toss
  • 26. Getting more right comes at the expense of getting more wrong ROC-AUC
  • 27. ROC-AUC for Multiclass Classification ROC curves are typically used in binary classification,, but Yellowbrick allows for multiclass classification evaluation by binarizing output (per-class) or using one-vs-rest (micro score) or one-vs-all (macro score) strategies of classification.
  • 28. Confusion Matrix ● takes as an argument actual values and predicted values generated by the fitted model ● outputs a confusion matrix from sklearn.metrics import confusion_matrix
  • 29. I have a lot of classes; how does my model perform on each? Do I care about certain classes more than others? Confusion Matrix
  • 30. Class Prediction Error Plot Similar to confusion matrix, but sometimes more interpretable
  • 31. Discrimination Threshold Visualizer * for binary classification only ● Probability or score at which positive class is chosen over negative. ● Generally set to 50% ● Can be adjusted to increase/decrease sensitivity to false positives or other application factors ● Cases that require special treatment?
  • 32. Evaluating Regressors ● How well does the model describe the training data? ● How well does the model predict out-of-sample data? ○ Goodness-of-fit ○ Randomness of residuals ○ Prediction error
  • 33. Metrics for Regression Metric Measures In Scikit-learn Mean Square Error (MSE, RMSE) distance between predicted values and actual values (more sensitive to outliers) from sklearn.metrics import mean_squared_error Absolute Error (MAE, RAE) distance between predicted values and actual values (less sensitive to outliers) from sklearn.metrics import mean_absolute_error, median_absolute_error Coefficient of Determination (R- Squared) % of variance explained by the regression; how well future samples are likely to be predicted by the model from sklearn.metrics import r2_score
  • 34. Visualize the distribution of error to diagnose heteroscedasticity Yellowbrick Score Visualizers
  • 35. Prediction Error Plots from sklearn.model_selection import cross_val_predict ● Cross-validation is a way of measuring model performance. ● Divide data into training and test splits; fit model on training, predict on test. ● Use cross_val_predict to visualize prediction errors as a scatterplot of the predicted and actual values.
  • 37. Plotting Residuals ● Standardized y-axis ● Model prediction on x-axis. ● Model accuracy on y-axis; distance from line at 0 indicates how good/bad the prediction was for that value. ● Check whether residuals are consistent with random error; data points should appear evenly dispersed around the plotted line. Should not be able to predict error. ● Visualize train and test data with different colors.
  • 40. Maybe? ● Silhouette scores ● Elbow curves Metrics for Clustering ...
  • 41. Why is my F1/R2 so low?
  • 42. ● What to do with a low- accuracy classifier? ● Check for class imbalance. ● Visual cue that we might try stratified sampling, oversampling, or getting more data. Class Balance
  • 43. Cross Validation Scores ● Real world data are often distributed somewhat unevenly; the fitted model likely to perform better on some sections of data than others. ● See cross-validated scores as a bar chart (one bar for each fold) with average score across all folds plotted as dotted line. ● Explore variations in performance using different cross validation strategies.
  • 44. Learning Curve ● Relationship of the training score vs. the cross validated test score for an estimator. ● Do we need more data? If the scores converge together, then probably not. If the training score is much higher than the validation score, then yes. ● Is the estimator more sensitive to error due to variance or error due to bias?
  • 45. Validation Curve ● Plot the influence of a single hyperparameter on the training and test data. ● Is the estimator under- or over- fitting for some hyperparameter values? For SVC, gamma is the coefficient of the RBF kernel. The larger gamma is, the tighter the support vector is around single points (e.g. overfitting). Here around gamma=0.1 the SVC memorizes the data.
  • 47. Hyperparameters ● When we call fit() on an estimator, it learns the parameters of the algorithm that make it fit the data best. ● However, some parameters are not directly learned within an estimator. These are the ones we provide when we instantiate the estimator. ○ alpha for LASSO or Ridge ○ C, kernel, and gamma for SVC ● These parameters are often referred to as hyperparameters.
  • 48. Examples: ● Alpha/penalty for regularization ● Kernel function in support vector machine ● Leaves or depth of a decision tree ● Neighbors used in a nearest neighbor classifier ● Clusters in a k-means clustering Hyperparameters
  • 49. How to pick the best hyperparameters? ● Use the defaults ● Pick randomly ● Search parameter space for the best score (e.g. grid search) … Except that hyperparameter space is large and gridsearch is slow if you don’t know already what you’re looking for. Hyperparameters
  • 50. How do I tune this model?
  • 51. Should I use Lasso, Ridge, or ElasticNet? Is regularlization even working? More alpha => less complexity Reduced bias, but increased variance Alpha selection with Yellowbrick
  • 52. ● How many clusters do you see? ● How do you pick an initial value for k in k- means clustering? ● How do you know whether to increase or decrease k? ● Is partitive clustering the right choice? What’s the right k?
  • 53. higher silhouette scores mean denser, more separate clusters The elbow shows the best value of k… Or suggests a different algorithm K-selection with Yellowbrick
  • 54. Manifold Visualization ● Embed instances described by many dimensions into 2. ● Look for latent structures in the data, noise, separability. ● Is it possible to create a decision space in the data? ● Unlike PCA or SVD, manifolds use nearest neighbors, can capture non- linear structures.
  • 56. Install: $ pip install yellowbrick Upgrade: $ pip install -U yellowbrick Anaconda: $ conda install -c districtdatalabs yellowbrick Quickstart
  • 57. # Import the estimator from sklearn.linear_model import Lasso # Instantiate the estimator model = Lasso() # Fit the data to the estimator model.fit(X_train, y_train) # Generate a prediction model.predict(X_test) Scikit-Learn Estimator Interface
  • 58. # Import the model and visualizer from sklearn.linear_model import Lasso from yellowbrick.regressor import PredictionError # Instantiate the visualizer visualizer = PredictionError(Lasso()) # Fit visualizer.fit(X_train, y_train) # Score and visualize visualizer.score(X_test, y_test) visualizer.poof() Yellowbrick Visualizer Interface
  • 59. The main API implemented by Scikit-Learn is that of the estimator. An estimator is any object that learns from data; it may be a classification, regression or clustering algorithm, or a transformer that extracts/filters useful features from raw data. class Estimator(object): def fit(self, X, y=None): """ Fits estimator to data. """ # set state of self return self def predict(self, X): """ Predict response of X """ # compute predictions pred return pred Scikit-learn Estimators
  • 60. Transformers are special cases of Estimators -- instead of making predictions, they transform the input dataset X to a new dataset X′. class Transformer(Estimator): def transform(self, X): """ Transforms the input data. """ # transform X to X_prime return X_prime Scikit-learn Transformers
  • 61. A visualizer is an estimator that produces visualizations based on data rather than new datasets or predictions. Visualizers are intended to work in concert with Transformers and Estimators to shed light onto the modeling process. class Visualizer(Estimator): def draw(self): """ Draw the data """ self.ax.plot() def finalize(self): """ Complete the figure """ self.ax.set_title() def poof(self): """ Show the figure """ plt.show() Yellowbrick Visualizers
  • 63. Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you’ve never contributed to an open source project before, we hope you will start with Yellowbrick!
  • 64. Please star Yellowbrick on GitHub! github.com/DistrictDataLabs/yellowbrick

Editor's Notes

  1. The model selection triple. Arun Kumar did a survey of the analytical process He’s going to crop up in a bit in a more interesting way This feels right to me; and hopefully you see something similar. Machine learning is about learning from example And works on instances (examples) Cite: http://pages.cs.wisc.edu/~arun/vision/SIGMODRecord15.pdf analysts typically use an iterative exploratory process
  2. Visit the docs! http://www.scikit-yb.org/en/develop/index.html
  3. For classification; potentially we want to see if there is good separability Are some features more predictive than others?
  4. We can see that the co2 values for the two classes are intertwined. We get a sense that something like a decision tree will have a hard time with this. Perhaps Gaussian instead? It will be able to use probabilities to describe the spread of those co2 values.
  5. Feature engineering requires understanding of the relationships between features Visualize pairwise relationships as a heatmap Pearson shows us strong correlations => potential collinearity Covariance helps us understand the sequence of relationships
  6. Uses PCA to decompose high dimensional data into two or three dimensions Each instance plotted in a scatter plot. Projected dataset can be analyzed along axes of principle variation Can be interpreted to determine if spherical distance metrics can be utilized. Can also be plotted in three dimensions to attempt to visualize more components and get a better sense of the distribution in high dimensions
  7. Uses PCA to decompose high dimensional data into two or three dimensions Each instance plotted in a scatter plot. Projected dataset can be analyzed along axes of principle variation Can be interpreted to determine if spherical distance metrics can be utilized. Can also be plotted in three dimensions to attempt to visualize more components and get a better sense of the distribution in high dimensions
  8. Frequency distribution - top 50 tokens Stochastic Neighbor Embedding, decomposition then projection into 2D scatterplot Visual part-of-speech tagging
  9. The feature engineering process involves selecting the minimum required features to produce a valid model because the more features a model contains, the more complex it is (and the more sparse the data), therefore the more sensitive the model is to errors due to variance. A common approach to eliminating features is to describe their relative importance to a model, then eliminate weak features or combinations of features and re-evalute to see if the model fairs better during cross-validation. Many model forms describe the underlying impact of features relative to each other. This visualizer uses this attribute to rank and plot relative importances.
  10. Recursive feature elimination (RFE) is a feature selection method that fits a model and removes the weakest feature (or features) until the specified number of features is reached. Features are ranked by the model’s coef_ or feature_importances_ attributes, and by recursively eliminating a small number of features per loop, RFE attempts to eliminate dependencies and collinearity that may exist in the model. RFE requires a specified number of features to keep, however it is often not known in advance how many features are valid. To find the optimal number of features cross-validation is used with RFE to score different feature subsets and select the best scoring collection of features. The RFECVvisualizer plots the number of features in the model along with their cross-validated test score and variability and visualizes the selected number of features.
  11. Recursive feature elimination (RFE) is a feature selection method that fits a model and removes the weakest feature (or features) until the specified number of features is reached. Features are ranked by the model’s coef_ or feature_importances_ attributes, and by recursively eliminating a small number of features per loop, RFE attempts to eliminate dependencies and collinearity that may exist in the model. RFE requires a specified number of features to keep, however it is often not known in advance how many features are valid. To find the optimal number of features cross-validation is used with RFE to score different feature subsets and select the best scoring collection of features. The RFECVvisualizer plots the number of features in the model along with their cross-validated test score and variability and visualizes the selected number of features.
  12. https://en.wikipedia.org/wiki/Precision_and_recall
  13. Receiver operating characteristics/area under curve Classification report heatmap - Quickly identify strengths & weaknesses of model - F1 vs Type I & Type II error Visual confusion matrix - misclassification on a per-class basis
  14. The class prediction error chart provides a way to quickly understand how good your classifier is at predicting the right classes.
  15. A visualization of precision, recall, f1 score, and queue rate with respect to the discrimination threshold of a binary classifier. The discrimination threshold is the probability or score at which the positive class is chosen over the negative class. Generally, this is set to 50% but the threshold can be adjusted to increase or decrease the sensitivity to false positives or to other application factors. One common use is to determine cases that require special treatment. For example, a fraud prevention application might use a classification algorithm to determine if a transaction is likely fraudulent and needs to be investigated in detail. Spam/not spam Precision: An increase in precision is a reduction in the number of false positives; this metric should be optimized when the cost of special treatment is high (e.g. wasted time in fraud preventing or missing an important email). Recall: An increase in recall decreases the likelihood that the positive class is missed; this metric should be optimized when it is vital to catch the case even at the cost of more false positives. (e.g. SPAM v. VIRUS) F1 Score: The F1 score is the harmonic mean between precision and recall. The fbetaparameter determines the relative weight of precision and recall when computing this metric, by default set to 1 or F1. Optimizing this metric produces the best balance between precision and recall. Queue Rate: The “queue” is the spam folder or the inbox of the fraud investigation desk. This metric describes the percentage of instances that must be reviewed. If review has a high cost (e.g. fraud prevention) then this must be minimized with respect to business requirements; if it doesn’t (e.g. spam filter), this could be optimized to ensure the inbox stays clean.
  16. Where/why/how is model performing good/bad Prediction error plot - 45 degree line is theoretical perfect Residuals plot - 0 line is no error See change in amount of variance between x and y, or along x axis => heteroscedasticity
  17. Can we quickly detect class imbalance issues Stratified sampling, oversampling, getting more data -- tricks will help us balance But supervised methods can mask training data; simple graphs like these give us an at-a-glance reference As this gets into multiclass problems, domination could be harder to see and really effect modeling
  18. A learning curve shows the relationship of the training score vs the cross validated test score for an estimator with a varying number of training samples. This visualization is typically used two show two things: How much the estimator benefits from more data (e.g. do we have “enough data” or will the estimator get better if used in an online fashion). If the estimator is more sensitive to error due to variance vs. error due to bias. If the training and cross validation scores converge together as more data is added (shown in the left figure), then the model will probably not benefit from more data. If the training score is much greater than the validation score (as shown in the right figure) then the model probably requires more training examples in order to generalize more effectively.
  19. Plot the influence of a single hyperparameter on the training and test data to determine if the estimator is underfitting or overfitting for some hyperparameter values. For a support vector classifier, gamma is the coefficient of the RBF kernel. It controls the influence of a single example. The larger gamma is, the tighter the support vector is around single points (overfitting the model). In this visualization we see a definite inflection point around gamma=0.1. At this point the training score climbs rapidly as the SVC memorizes the data, while the cross-validation score begins to decrease as the model cannot generalize to unseen data.
  20. Which regularization technique to use? Lasso/L1, Ridge/L2, or ElasticNet L1+L2 Regularization uses a Norm to penalize complexity at a rate, alpha The higher the alpha, the more the regularization. Complexity minimization reduces bias in the model, but increases variance Goal: select the smallest alpha such that error is minimized Visualize the tradeoff Surprising to see: higher alpha increasing error, alpha jumping around, etc. Embed R2, MSE, etc into the graph - quick reference
  21. The Manifold visualizer provides high dimensional visualization using manifold learning to embed instances described by many dimensions into 2, thus allowing the creation of a scatter plot that shows latent structures in data. Unlike decomposition methods such as PCA and SVD, manifolds generally use nearest-neighbors approaches to embedding, allowing them to capture non-linear structures that would be otherwise lost. The projections that are produced can then be analyzed for noise or separability to determine if it is possible to create a decision space in the data.
  22. Estimators learn from data Have a fit and predict method
  23. Transformers transform data Have a transform method
  24. Visualizers can be estimators or transformers Generally have a draw, finalize, and poof method