SlideShare a Scribd company logo
1 of 81
Legal Notices and Disclaimers
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS SUMMARY.
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. Check with your system manufacturer or retailer or learn more at intel.com.
This sample source code is released under the Intel Sample Source Code License
Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2018, Intel Corporation. All rights reserved.
2
Predict
Data with
correct
answers
Model
Predicted
Answers
New data
without
answers
Trained
Model
Trained
Model
Fit
What is Supervised Learning?
Predict
Image Data
(Known)
Classify image or
identify if object is
in image
Image Data
(Unknown)
Fit
What is Supervised Learning?
?
Model
Trained
Model
Trained
Model
Predict
Predict if there is
a person in this
photo (yes | no)
Fit
Classification: Answers are Categories
Person
?
Model
Trained
Model
Trained
Model
Predict
Predict what time of
day it is
(Morning, Afternoon, Evening)
Fit
Classification: Answers are Categories
Morning Afternoon Evening
?
Model
Trained
Model
Trained
Model
Predict
diagnosis
( healthy| disease)
Classification: Answers are Categories
Healthy Disease
Predict
Fit
?
Model
Trained
Model
Trained
Model
Classification: Answers are Categories
Predict
Fit
?
6
3 2
4
5 7 8
1
Labeled Image
Unlabeled Image
Label
Unknown
Image
Model
Trained
Model
Trained
Model
Example Each data point
(one row)
Target Predicted property
(column to predict)
Feature A property of the point used for
prediction
(non-target columns in the model)
Label Target / category of the point
(value of target column)
Learning Terminology
Feature 1 Feature 2 Target
Example 1 Label 1
Example2 Label 2
Predict
Model
Predict if there is
a person in this
photo (yes | no)
Model
Model
Fit
Let's Revisit Predicting a Person
Person
?
Example 1, 2, 3
Target Yes, Yes, No
Feature Skin-like color
People-like shapes
Label Yes
No
Learning Terminology
Skin-Like Color People-Like
Shapes
Target
Example 1 Yes
Example 2 Yes
Example 3 No
Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• yes
• no 0 10 20
Target
Label
Label
Example 1
Example 2
Example 3
yes
no
# Rectangular Shapes Detected
#PixelsinSkinHue
Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No
0 10 20
# Rectangular Shapes Detected
Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No
New Example
known: #nodes
predict: yes | no
0 10 20
# Rectangular Shapes Detected
# Rectangular Shapes Detected
0 10 20
#PixelsinSkinHue
60
40
20
Features and Labels
Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No 0 10 20
60
40
20
# Rectangular Shapes Detected
#PixelsinSkinHue
Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No 0 10 20
60
40
New Example
known: #skin color, shapes
predict: yes | no # Rectangular Shapes Detected
#PixelsinSkinHue
Predict
Model
Is this city Pittsburgh?
What time of day is it?
(Morning, Afternoon, Evening)
Model
Model
Fit
Let's Predict City and Time of Day
Morning Afternoon Evening
?
Degree of Illumination
0 10 20
Corners
60
40
20
Features and Labels
In this case we have:
Two features
• Illumination
• # corners detected against
background
Three labels
• Morning
• Afternoon
• Evening
0 10 20
60
40
20
New Example
known: #illumination, corners
predict:
morning| afternoon | evening
Degree of Illumination
0 10 20
Corners
60
40
20
Features and Labels
In this case we have:
Two features
• Illumination
• # corners detected against
background
Three labels
• Morning
• Afternoon
• Evening
Evaluation Techniques
Last week we looked at evaluation metrics; now we'll look at effectively estimating these
for a learner:
• Train test
• Cross validation
• LOO (leave one out)
Training and Tests
Fold 1
Fold 2
Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Fold 3
Training and Test Sets
Training set (actual)
• Fit the model
Test set (predicted)
• Measure performance
Predict y with model
Compare with actual y
Measure error
Training and Test Sets
From simple workflow in notebook:
import sklearn.model_selection as skms
(data_train, data_tst, tgts_train, tgts_tst) = skms.train_test_split(data, tgts, test_size=.2)
from sklearn import neighbors
# create and fit modelknn_classifier = neighbors.KNeighborsClassifier(n_neighbors=3)
knn_classifier.fit(data_train, tgts_train)
predicted = knn_classifier.predict(data_tst)
Accuracy = skms.accuracy(predicted, tgts_test)
Cross-Validation
Folds do not overlap
Each fold is the same size
Every example is in a fold
Fold 1
Fold 2
Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Fold 3
Cross-Validation
Best fit for Fold 1
What about Folds 2 and 3?
Training Test
Leave One Out Cross-Validation
Removes one datapoint from the dataset at
a time.
Train from the N-1 datapoints
• N = number of datapoints
• In this example, N = 24
This is repeated for each datapoint in the
set.
You can also think of the as N-fold cross
validation
• That is, 24 datapoints is 24-folds
Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Learning Methods
Supervised learning techniques
• K nearest neighbor (KNN)
• Support vector machines (SVM)
Unsupervised learning techniques
• Principal components analysis (PCA)
• Clustering
K Nearest Neighbor (KNN)
KNN
• Memorize the dataset
For prediction, find example most like me
• Without preprocessing, fit is instant – but prediction takes more work.
• With preprocessing, spend some up-front cost to organize the data, then prediction
is (relatively) faster.
• Requires significant memory because it saves some form of the entire dataset.
K Nearest Neighbor (KNN)
K= # nearest neighbors used for predictor
0 10 20
Feature
Predict Node
Nearest Neighbor
1
2
K = 3
3
KNN
K = 3
Predict the node through
its nearest neighbors
0 10 20
60
40
20
Predict Node
# Rectangular Shapes Detected
#PixelsinSkinHue
KNN
Use KNN to make a decision boundary
If the node falls to the right
• Decision = male (red)
If node falls to the left
• Decision = female (blue)
Note false positive and negatives.
0 10
.
40
20
Predict Node
# Rectangular Shapes Detected
#PixelsinSkinHue
KNN
Multiclass decision boundaries break the data into three or more subsets.
Scaling is critical for determining the best decision boundary.
• If data is scaled too closely together in one dimension, examples may be too
similar distance-wise compared to actual class similarity.
The number of nearest neighbors used for analysis is also critical.
• Too small a K will give too wiggly a border.
Example of overfitting or following noise
• Too big K will give too flat a border.
Example of underfitting ignoring signal
Support Vector Machines
Used for discriminative classification
Divides classes of data into groups by drawing a border between them.
• Unlike KNN, it goes straight to the decision boundary
• Remembers boundary, throws out data
Is a maximum margin estimator.
• The border between classes that maximizes the distance to examples from that
class
SVM: Linear Boundaries
There are many (infinite) possibilities.
0 10
40
20
SVM: Linear Boundaries
So, let's use a margin around the linear
boundary.
• Maximum-margin is SVM's preferred line
The examples on the margin lines are
support vectors.
Now, the only important examples to
keep track of are the support vectors.
For new data, see which side of the line it
falls on.
0 10
40
20
SVM: Kernel
How would you separate this data?
Not with a line.
Need to get fancy.
Use linear algebra tricks to rewrite this data in
more complicated terms:
• Polynomials of the inputs
• Distances from one input to another
• These are called kernels
Non-Linear Data
You must pick the right projection to separate the classes.
Kernel transformation computes the basis function for every example.
• Can be too cumbersome when you get large datasets
• Instead, we can use the kernel trick
Implicitly fit kernel-transformed examples
Does not build explicit (expensive) table
Non-Linear Data
Principal Components Analysis (PCA)
What are we trying to achieve?
• Improve clustering
• Improve classification
• Dealing with sparse features
• Visualizing high-dimensional 2D or 3D data
• Minimal loss data compression
PCA: 2D to 1D
You can weight your existing dimensions (2D) to give you 1D data.
• Data as a point-cloud
• Fit ellipse
• Find perpendicular axes
Directions of maximum variation
• Describe data in terms of these directions
Orient the data in a new way
PCA: 2D to 1D
You can weight your existing dimensions (2D) to give you 1D data.
PCA: 2D to 1D
PCA: 2D to 1D
This may clarify or simplify the boundary between classes….
PCA: 3D to 2D
PCA: 3D to 2D
PCA: 3D to 2D
Math speak:
• The new axes are eigenvectors
• The new axes are calculated from
covariance of the matrix features
via a singular value decomposition
Eigenfaces: A PCA Example
Original data
foriginal dimensional plot
Eigenfaces: A PCA Example
PCA data
New freduced dimensional plot
• Removed some number of features from our representation
• Redescribe original faces in terms of these primitive faces
that act like axes
• Similar to redescribing images in terms of circles (which we
saw with Fourier transforms)
PCA
When choosing dimensions for your feature extraction:
• Think about combining features to reduce dimensionality.
That is, instead of analyzing pixel intensity at each color (RGB), can you analyze pixel
intensity (grayscale)?
• You will need to perform hypothesis-driven trials and measure your model’s
performance.
Performing PCA Using scikit-learn*
from sklearn.decomposition import PCA
reducer = PCA( n_components = 20 )
reduced_X = reducer.fit_transform(X)
# “model” could be any trained model
model.fit(reduced_X,Y)
Performing PCA Using scikit-learn*
When you need to predict:
reduced_X_new = reducer.transform(X_new)
model.predict(reduced_X_new)
# “model” is the same model from the prior slide; could be any model.
Predict
Unlabeled
Data
(no answers)
Algorithm
Place of data in
new structure
New
Unlabeled
Data
Supervised
Model
Structure
Fit
What is Unsupervised Learning?
Unsupervised Learning
Find structure in unlabeled
data.
How?
Use K-means
0 10 20
60
40
20
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
K = 2 (find two clusters)
Randomly assign two cluster
centers.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Each point belongs to the closest
center.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Move each center to the cluster’s
mean.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Each point belongs to the new
cluster’s mean.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Move each center to the new
cluster’s mean.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Each point belongs to the new
cluster’s mean.
The points don’t change anymore.
They are converged!
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
K = 3 (find three clusters)
Randomly assign three cluster
centers.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Each point belongs to the closest
center.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Move each center to the cluster’s
mean.
# Rectangular Shapes Detected
#PixelsinSkinHue
K-Means Algorithms
Each point belongs to the new
cluster’s mean.
The points don’t change anymore.
They are converged!
# Rectangular Shapes Detected
#PixelsinSkinHue
Other scikit-learn* Tools
GridSearch
• Specify a space of parameters and evaluate model(s) at those params.
Pipelines
• Create a built/fit/predictable sequence of steps.
• Fitting w.ill fit each of the components in turn
Bag-of-Words Learning
Image classification and recognition
Complicated architecture
Builds a vocabulary for an image based on its features.
BOW Process: Step 1
Take known image and find descriptors of keypoints.
Make a table with all information about keypoints.
• Number of columns are fixed between images.
Columns are how we describe a keypoint
• Number of rows vary between images.
Rows represent number of keypoints
BOW Process: Step 1
Take known image and find descriptors of keypoints.
• Creating local words
Description of Keypoint (columns; fixed #)
#ofKeypoints(rows;varies)
BOW Process: Step1
Repeat for each image in your dataset. Description of Keypoint (columns; fixed #)
#ofKeypoints(rows;varies)
BOW Process: Step 2
Combine ALL individual descriptors.
• Local vocabularies
• Put the rows in space and group clusters of data
Group similar descriptors together
• # of clusters is the number of global words to be used for further analysis (current global
vocabulary: 1, 2, 3, 4, 5, 6, 7).
BOW Process: Step 2
Convert all local words to same number of global words.
Description of Keypoint (local)
#ofKeypoints(Local)
7
6
5
3
6
1
Represents ONE and only ONE row
But it represents the ENTIRE row
BOW Process: Step 3
Redescribe known images in terms of the global vocabulary.
Generate histograms with counts of global words.
And likewise for other colors, empty cells are zero.
1 1 2
g1 g2 g3 g4 g5 g6 g7
Image1
Image2
Image3
Image4
Image5
Image6
Airplane
Airplane
Zebra
Zebra
Piano
Piano
7
7
5
3
BOW Process: Step 4
Build an SVM model predicting from histograms (represented in rows)  object class
Airplane Airplane Zebra Zebra Piano Piano
BOW Process: Step 5
With a new example:
• Describe with regional vocabulary
• Convert to global vocabulary
Use the regional word most similar to the global word
• Create histogram representation
• Feed to SVM to make prediction about class
Imagine the brown zebra example without a known label.
BOW Details
Local vocabularies come from feature descriptors
• SIFT, ORB, and so on
Global vocabulary comes from clustering the local vocabularies
• Convert regional to global by looking up cluster (global term) for a local descriptor
• K-means clustering

More Related Content

What's hot

Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descentankit_ppt
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine LearningPranav Challa
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernelsankit_ppt
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionankit_ppt
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMohsin Ul Haq
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector MachineDerek Kane
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 

What's hot (20)

Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernels
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regression
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
SVM
SVMSVM
SVM
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 

Similar to 07 learning

Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Decision trees
Decision treesDecision trees
Decision treesNcib Lotfi
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMarion DE SOUSA
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Lucidworks
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsankit_ppt
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 

Similar to 07 learning (20)

Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Decision trees
Decision treesDecision trees
Decision trees
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly Detection
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 

More from ankit_ppt

Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summaryankit_ppt
 
01 foundations
01 foundations01 foundations
01 foundationsankit_ppt
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topicsankit_ppt
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesankit_ppt
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingankit_ppt
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsankit_ppt
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stackingankit_ppt
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision treesankit_ppt
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayesankit_ppt
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterpriseankit_ppt
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsankit_ppt
 
Lesson 2 ai in industry
Lesson 2   ai in industryLesson 2   ai in industry
Lesson 2 ai in industryankit_ppt
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arimaankit_ppt
 

More from ankit_ppt (16)

Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
01 foundations
01 foundations01 foundations
01 foundations
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stacking
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayes
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterprise
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
Lesson 2 ai in industry
Lesson 2   ai in industryLesson 2   ai in industry
Lesson 2 ai in industry
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arima
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 

Recently uploaded (20)

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

07 learning

  • 1.
  • 2. Legal Notices and Disclaimers This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. This sample source code is released under the Intel Sample Source Code License Agreement. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2018, Intel Corporation. All rights reserved. 2
  • 3.
  • 5. Predict Image Data (Known) Classify image or identify if object is in image Image Data (Unknown) Fit What is Supervised Learning? ? Model Trained Model Trained Model
  • 6. Predict Predict if there is a person in this photo (yes | no) Fit Classification: Answers are Categories Person ? Model Trained Model Trained Model
  • 7. Predict Predict what time of day it is (Morning, Afternoon, Evening) Fit Classification: Answers are Categories Morning Afternoon Evening ? Model Trained Model Trained Model
  • 8. Predict diagnosis ( healthy| disease) Classification: Answers are Categories Healthy Disease Predict Fit ? Model Trained Model Trained Model
  • 9. Classification: Answers are Categories Predict Fit ? 6 3 2 4 5 7 8 1 Labeled Image Unlabeled Image Label Unknown Image Model Trained Model Trained Model
  • 10. Example Each data point (one row) Target Predicted property (column to predict) Feature A property of the point used for prediction (non-target columns in the model) Label Target / category of the point (value of target column) Learning Terminology Feature 1 Feature 2 Target Example 1 Label 1 Example2 Label 2
  • 11. Predict Model Predict if there is a person in this photo (yes | no) Model Model Fit Let's Revisit Predicting a Person Person ?
  • 12. Example 1, 2, 3 Target Yes, Yes, No Feature Skin-like color People-like shapes Label Yes No Learning Terminology Skin-Like Color People-Like Shapes Target Example 1 Yes Example 2 Yes Example 3 No
  • 13. Features and Labels In this case we have: Two features • Skin-like color • People-like shapes Two labels • yes • no 0 10 20 Target Label Label Example 1 Example 2 Example 3 yes no # Rectangular Shapes Detected #PixelsinSkinHue
  • 14. Features and Labels In this case we have: Two features • Skin-like color • People-like shapes Two labels • Yes • No 0 10 20 # Rectangular Shapes Detected
  • 15. Features and Labels In this case we have: Two features • Skin-like color • People-like shapes Two labels • Yes • No New Example known: #nodes predict: yes | no 0 10 20 # Rectangular Shapes Detected
  • 16. # Rectangular Shapes Detected 0 10 20 #PixelsinSkinHue 60 40 20 Features and Labels
  • 17. Features and Labels In this case we have: Two features • Skin-like color • People-like shapes Two labels • Yes • No 0 10 20 60 40 20 # Rectangular Shapes Detected #PixelsinSkinHue
  • 18. Features and Labels In this case we have: Two features • Skin-like color • People-like shapes Two labels • Yes • No 0 10 20 60 40 New Example known: #skin color, shapes predict: yes | no # Rectangular Shapes Detected #PixelsinSkinHue
  • 19. Predict Model Is this city Pittsburgh? What time of day is it? (Morning, Afternoon, Evening) Model Model Fit Let's Predict City and Time of Day Morning Afternoon Evening ?
  • 20. Degree of Illumination 0 10 20 Corners 60 40 20 Features and Labels In this case we have: Two features • Illumination • # corners detected against background Three labels • Morning • Afternoon • Evening
  • 21. 0 10 20 60 40 20 New Example known: #illumination, corners predict: morning| afternoon | evening Degree of Illumination 0 10 20 Corners 60 40 20 Features and Labels In this case we have: Two features • Illumination • # corners detected against background Three labels • Morning • Afternoon • Evening
  • 22.
  • 23. Evaluation Techniques Last week we looked at evaluation metrics; now we'll look at effectively estimating these for a learner: • Train test • Cross validation • LOO (leave one out)
  • 24. Training and Tests Fold 1 Fold 2 Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Fold 3
  • 25. Training and Test Sets Training set (actual) • Fit the model Test set (predicted) • Measure performance Predict y with model Compare with actual y Measure error
  • 26. Training and Test Sets From simple workflow in notebook: import sklearn.model_selection as skms (data_train, data_tst, tgts_train, tgts_tst) = skms.train_test_split(data, tgts, test_size=.2) from sklearn import neighbors # create and fit modelknn_classifier = neighbors.KNeighborsClassifier(n_neighbors=3) knn_classifier.fit(data_train, tgts_train) predicted = knn_classifier.predict(data_tst) Accuracy = skms.accuracy(predicted, tgts_test)
  • 27. Cross-Validation Folds do not overlap Each fold is the same size Every example is in a fold Fold 1 Fold 2 Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Fold 3
  • 28. Cross-Validation Best fit for Fold 1 What about Folds 2 and 3? Training Test
  • 29. Leave One Out Cross-Validation Removes one datapoint from the dataset at a time. Train from the N-1 datapoints • N = number of datapoints • In this example, N = 24 This is repeated for each datapoint in the set. You can also think of the as N-fold cross validation • That is, 24 datapoints is 24-folds Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
  • 30.
  • 31. Learning Methods Supervised learning techniques • K nearest neighbor (KNN) • Support vector machines (SVM) Unsupervised learning techniques • Principal components analysis (PCA) • Clustering
  • 32. K Nearest Neighbor (KNN) KNN • Memorize the dataset For prediction, find example most like me • Without preprocessing, fit is instant – but prediction takes more work. • With preprocessing, spend some up-front cost to organize the data, then prediction is (relatively) faster. • Requires significant memory because it saves some form of the entire dataset.
  • 33. K Nearest Neighbor (KNN) K= # nearest neighbors used for predictor 0 10 20 Feature Predict Node Nearest Neighbor 1 2 K = 3 3
  • 34. KNN K = 3 Predict the node through its nearest neighbors 0 10 20 60 40 20 Predict Node # Rectangular Shapes Detected #PixelsinSkinHue
  • 35. KNN Use KNN to make a decision boundary If the node falls to the right • Decision = male (red) If node falls to the left • Decision = female (blue) Note false positive and negatives. 0 10 . 40 20 Predict Node # Rectangular Shapes Detected #PixelsinSkinHue
  • 36. KNN Multiclass decision boundaries break the data into three or more subsets. Scaling is critical for determining the best decision boundary. • If data is scaled too closely together in one dimension, examples may be too similar distance-wise compared to actual class similarity. The number of nearest neighbors used for analysis is also critical. • Too small a K will give too wiggly a border. Example of overfitting or following noise • Too big K will give too flat a border. Example of underfitting ignoring signal
  • 37. Support Vector Machines Used for discriminative classification Divides classes of data into groups by drawing a border between them. • Unlike KNN, it goes straight to the decision boundary • Remembers boundary, throws out data Is a maximum margin estimator. • The border between classes that maximizes the distance to examples from that class
  • 38. SVM: Linear Boundaries There are many (infinite) possibilities. 0 10 40 20
  • 39. SVM: Linear Boundaries So, let's use a margin around the linear boundary. • Maximum-margin is SVM's preferred line The examples on the margin lines are support vectors. Now, the only important examples to keep track of are the support vectors. For new data, see which side of the line it falls on. 0 10 40 20
  • 40. SVM: Kernel How would you separate this data? Not with a line. Need to get fancy. Use linear algebra tricks to rewrite this data in more complicated terms: • Polynomials of the inputs • Distances from one input to another • These are called kernels
  • 41. Non-Linear Data You must pick the right projection to separate the classes. Kernel transformation computes the basis function for every example. • Can be too cumbersome when you get large datasets • Instead, we can use the kernel trick Implicitly fit kernel-transformed examples Does not build explicit (expensive) table
  • 43.
  • 44. Principal Components Analysis (PCA) What are we trying to achieve? • Improve clustering • Improve classification • Dealing with sparse features • Visualizing high-dimensional 2D or 3D data • Minimal loss data compression
  • 45. PCA: 2D to 1D You can weight your existing dimensions (2D) to give you 1D data. • Data as a point-cloud • Fit ellipse • Find perpendicular axes Directions of maximum variation • Describe data in terms of these directions Orient the data in a new way
  • 46. PCA: 2D to 1D You can weight your existing dimensions (2D) to give you 1D data.
  • 48. PCA: 2D to 1D This may clarify or simplify the boundary between classes….
  • 51. PCA: 3D to 2D Math speak: • The new axes are eigenvectors • The new axes are calculated from covariance of the matrix features via a singular value decomposition
  • 52. Eigenfaces: A PCA Example Original data foriginal dimensional plot
  • 53. Eigenfaces: A PCA Example PCA data New freduced dimensional plot • Removed some number of features from our representation • Redescribe original faces in terms of these primitive faces that act like axes • Similar to redescribing images in terms of circles (which we saw with Fourier transforms)
  • 54. PCA When choosing dimensions for your feature extraction: • Think about combining features to reduce dimensionality. That is, instead of analyzing pixel intensity at each color (RGB), can you analyze pixel intensity (grayscale)? • You will need to perform hypothesis-driven trials and measure your model’s performance.
  • 55. Performing PCA Using scikit-learn* from sklearn.decomposition import PCA reducer = PCA( n_components = 20 ) reduced_X = reducer.fit_transform(X) # “model” could be any trained model model.fit(reduced_X,Y)
  • 56. Performing PCA Using scikit-learn* When you need to predict: reduced_X_new = reducer.transform(X_new) model.predict(reduced_X_new) # “model” is the same model from the prior slide; could be any model.
  • 57.
  • 58. Predict Unlabeled Data (no answers) Algorithm Place of data in new structure New Unlabeled Data Supervised Model Structure Fit What is Unsupervised Learning?
  • 59. Unsupervised Learning Find structure in unlabeled data. How? Use K-means 0 10 20 60 40 20 # Rectangular Shapes Detected #PixelsinSkinHue
  • 60. K-Means Algorithms K = 2 (find two clusters) Randomly assign two cluster centers. # Rectangular Shapes Detected #PixelsinSkinHue
  • 61. K-Means Algorithms Each point belongs to the closest center. # Rectangular Shapes Detected #PixelsinSkinHue
  • 62. K-Means Algorithms Move each center to the cluster’s mean. # Rectangular Shapes Detected #PixelsinSkinHue
  • 63. K-Means Algorithms Each point belongs to the new cluster’s mean. # Rectangular Shapes Detected #PixelsinSkinHue
  • 64. K-Means Algorithms Move each center to the new cluster’s mean. # Rectangular Shapes Detected #PixelsinSkinHue
  • 65. K-Means Algorithms Each point belongs to the new cluster’s mean. The points don’t change anymore. They are converged! # Rectangular Shapes Detected #PixelsinSkinHue
  • 66. K-Means Algorithms K = 3 (find three clusters) Randomly assign three cluster centers. # Rectangular Shapes Detected #PixelsinSkinHue
  • 67. K-Means Algorithms Each point belongs to the closest center. # Rectangular Shapes Detected #PixelsinSkinHue
  • 68. K-Means Algorithms Move each center to the cluster’s mean. # Rectangular Shapes Detected #PixelsinSkinHue
  • 69. K-Means Algorithms Each point belongs to the new cluster’s mean. The points don’t change anymore. They are converged! # Rectangular Shapes Detected #PixelsinSkinHue
  • 70. Other scikit-learn* Tools GridSearch • Specify a space of parameters and evaluate model(s) at those params. Pipelines • Create a built/fit/predictable sequence of steps. • Fitting w.ill fit each of the components in turn
  • 71.
  • 72. Bag-of-Words Learning Image classification and recognition Complicated architecture Builds a vocabulary for an image based on its features.
  • 73. BOW Process: Step 1 Take known image and find descriptors of keypoints. Make a table with all information about keypoints. • Number of columns are fixed between images. Columns are how we describe a keypoint • Number of rows vary between images. Rows represent number of keypoints
  • 74. BOW Process: Step 1 Take known image and find descriptors of keypoints. • Creating local words Description of Keypoint (columns; fixed #) #ofKeypoints(rows;varies)
  • 75. BOW Process: Step1 Repeat for each image in your dataset. Description of Keypoint (columns; fixed #) #ofKeypoints(rows;varies)
  • 76. BOW Process: Step 2 Combine ALL individual descriptors. • Local vocabularies • Put the rows in space and group clusters of data Group similar descriptors together • # of clusters is the number of global words to be used for further analysis (current global vocabulary: 1, 2, 3, 4, 5, 6, 7).
  • 77. BOW Process: Step 2 Convert all local words to same number of global words. Description of Keypoint (local) #ofKeypoints(Local) 7 6 5 3 6 1 Represents ONE and only ONE row But it represents the ENTIRE row
  • 78. BOW Process: Step 3 Redescribe known images in terms of the global vocabulary. Generate histograms with counts of global words. And likewise for other colors, empty cells are zero. 1 1 2 g1 g2 g3 g4 g5 g6 g7 Image1 Image2 Image3 Image4 Image5 Image6 Airplane Airplane Zebra Zebra Piano Piano 7 7 5 3
  • 79. BOW Process: Step 4 Build an SVM model predicting from histograms (represented in rows)  object class Airplane Airplane Zebra Zebra Piano Piano
  • 80. BOW Process: Step 5 With a new example: • Describe with regional vocabulary • Convert to global vocabulary Use the regional word most similar to the global word • Create histogram representation • Feed to SVM to make prediction about class Imagine the brown zebra example without a known label.
  • 81. BOW Details Local vocabularies come from feature descriptors • SIFT, ORB, and so on Global vocabulary comes from clustering the local vocabularies • Convert regional to global by looking up cluster (global term) for a local descriptor • K-means clustering

Editor's Notes

  1. Image url?
  2. Images are all from wikimedia commons.
  3. Image from: https://upload.wikimedia.org/wikipedia/commons/a/a4/High_Resolution_FMRI_of_the_Human_Brain.gif
  4. Note: These are conceptual examples and not actual examples.
  5. Images are all from Wikimedia.
  6. [see: model_selection_ii.pptx]
  7. Note: Red and blue represent two classes here; not necessarily “yes” and “no”.
  8. Note: the decision boundary is a hypothetical scenario where we have test examples from *everywhere* in the input space. The boundary is the result of classifying each of those input examples.
  9. https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html
  10. In some high-D space, that ellipse is “just” a plane-like thing (flat boundary).
  11. This is a “bad” line because the sum of (normal/perpendicular) distances from points to line is very big!
  12. The “right” PCA line will minimize the orthogonal (normal/perpendicular) distances from the examples to their mapping on the line.
  13. Note: Red points represent data that is fit to the new 2D structure.
  14. Examples are in notebook.
  15. https://kushalvyas.github.io/BOV.html
  16. Image url?
  17. Emphasize that browns are all airplanes (different tone is different specific image). Color is class. Tone is instance of class. - Image url?
  18. Global vocabulary is (1) concise (only small N # of terms), and (2) comparable between different images [we still have to *use* that vocabulary to describe our images somehow [convert many-row description to single row description].