SlideShare a Scribd company logo
1 of 44
k-Nearest Neighbor
&
Instance-based Learning
Instance-based learning
• Instance-based learning is often termed lazy learning, as there is typically no
“transformation” of training instances into more general “statements”
• Instance-based learning methods simply store the training examples.
• Generalizing beyond these examples is postponed until a new instance must be classified.
• Each time a new query instance is encountered, its relationship to the previously stored
examples is examined in order to assign a target function value for the new instance.
•
• Instance-based methods are sometimes referred to as "lazy" learning methods because
they delay processing until a new instance must be classified.
Instance-based learning
• One disadvantage of instance-based approaches is that the cost of classifying new instances
can be high.
• This is due to the fact that nearly all computation takes place at classification time rather
than when the training examples are first encountered.
k-Nearest Neighbor
• Pros:
• High accuracy,
• insensitive to outliers,
• no assumptions about data
• Cons:
• Computationally expensive,
• requires a lot of memory
• Works with:
• Numeric values,
• nominal values
k-NEAREST Neighbor LEARNING
• This algorithm assumes all instances correspond to points in the n-dimensional space Rn.
• The nearest neighbors of an instance are defined in terms of the standard Euclidean
distance.
• more precisely, let an arbitrary instance x be described by the feature vector
• where ar(x) denotes the value of the rth attribute of instance x. Then the distance between two
instances xi and xj is defined to be d(xi, xj), where
Similarity and Dissimilarity
• Similarity
• Numerical measure of how alike two data objects are.
• Is higher when objects are more alike.
• Often falls in the range [0,1]:
• Examples: Cosine, Jaccard, Tanimoto,
• Dissimilarity
• Numerical measure of how different two data objects are
• Lower when objects are more alike
• Minimum dissimilarity is often 0
• Upper limit varies
• Proximity refers to a similarity or dissimilarity
Src: “Introduction to Data Mining” by Vipin Kumar et al
Distance Metric
• Distance d (p, q) between two points p and q is a dissimilarity measure if it
satisfies:
1. Positive definiteness:
d (p, q)  0 for all p and q and
d (p, q) = 0 only if p = q.
2. Symmetry: d (p, q) = d (q, p) for all p and q.
3. Triangle Inequality:
d (p, r)  d (p, q) + d (q, r) for all points p, q, and r.
• Examples:
• Euclidean distance
• Minkowski distance
• Mahalanobis distance
Src: “Introduction to Data Mining” by Vipin Kumar et al
General approach to kNN
1. Collect: Any method.
2. Prepare: Numeric values are needed for a distance calculation. A
structured data format is best.
3. Analyze: Any method.
4. Train: Does not apply to the kNN algorithm.
5. Test: Calculate the error rate.
6. Use: This application needs to get some input data and output structured
numeric values. Next, the application runs the kNN algorithm on this
input data and determines which class the input data should belong to.
The application then takes some action on the calculated class.
k-NEAREST Neighbor LEARNING
Instance-Based Learning
• Idea:
• Similar examples have similar label.
• Classify new examples like similar training examples.
• Algorithm:
• Given some new example x for which we need to predict its class y
• Find most similar training examples
• Classify x “like” these most similar examples
• Questions:
• How to determine similarity?
• How many similar training examples to consider?
• How to resolve inconsistencies among the training examples?
Summary: k-NN
• The simplest machine learning algorithm.
• No explicit training.
• A non-parametric method: no parameter to be optimised.
• The algorithm relies on a distance measure.
12
Think: what could be the possible effect
of the number of training samples used
in k-NN?
1-Nearest Neighbor
• One of the simplest of all machine learning classifiers
• Simple idea: label a new point the same as the closest known point
Label it red.
Distance Metrics
• Different metrics can change the decision surface
• Standard Euclidean distance metric:
• Two-dimensional: Dist(a,b) = sqrt((a1 – b1)2 + (a2 – b2)2)
• Multivariate: Dist(a,b) = sqrt(∑ (ai – bi)2)
Dist(a,b) =(a1 – b1)2 + (a2 – b2)2 Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2
Adapted from “Instance-Based Learning”
lecture slides by Andrew Moore, CMU.
k – Nearest Neighbor
• Generalizes 1-NN to smooth away noise in the labels
• A new point is now assigned the most frequent label of its k nearest
neighbors
Label it red, when k = 3
Label it blue, when k = 7
Selecting the Number of Neighbors
• Increase k:
• Makes KNN less sensitive to noise
• Decrease k:
• Allows capturing finer structure of space
• Pick k not too large, but not too small (depends
on data)
Neighbour Number k
• Hyper-parameter: neighbour number k.
• The process of determining the neighbour number k is called
hyper-parameter selection or model selection.
• In binary classification, what can happen when k is set as an
even number?
We will talk more on hyper-parameter
selection in the next lectures.
18
Neighbour Number k
• Hyper-parameter: neighbour number k.
• The process of determining the neighbour number k is called
hyper-parameter selection or model selection.
• In binary classification, what can happen when k is set as an
even number?
We will talk more on hyper-parameter
selection in the next lecture.
19
Think: what could be
the possible effect of
the neighbour
number?
Effect of Training Samples
• Small number of training samples:
• insufficient information
• Large number of training samples:
• more information
• but, time and memory cost consuming (distance calculation, sorting)
• Noisy training samples:
• overfitting
20
Curse-of-Dimensionality
• Prediction accuracy can quickly degrade when number of attributes
grows.
• Irrelevant attributes easily “swamp” information from relevant attributes
• When many irrelevant attributes, similarity/distance measure becomes less
reliable
• Remedy
• Try to remove irrelevant attributes in pre-processing step
• Weight attributes differently
• Increase k (but not too much)
k NEAREST NEIGHBOR
• Requires 3 things:
• Feature Space(Training Data)
• Distance metric
• to compute distance between
records
• The value of k
• the number of nearest
neighbors to retrieve from which
to get majority class
• To classify an unknown record:
• Compute distance to other training
records
• Identify k nearest neighbors
• Use class labels of nearest
neighbors to determine the
class label of unknown record
?
K-Nearest Neighbor
• Features
• All instances correspond to points in an n-dimensional Euclidean
space
• Classification is delayed till a new instance arrives
• Classification done by comparing feature vectors of the different
points
• Target function may be discrete or real-valued
Summary: Neighbour Number k
• Hyper-parameter: neighbour number k.
• In binary classification, k needs to be an uneven number.
• Small k: we may model noise!
• Large k: neighbours will include too many samples from other
classes.
24
Example
Consider the following database defining a concept
Ex Attribute Attribute Attribute Concept
No. Size Colour Shape Satisfied
1 medium blue brick yes
2 small red wedge no
3 small red sphere yes
4 large red wedge no
5 large green pillar yes
6 large red pillar no
7 large green pillar yes
k NEAREST NEIGHBOR
• Requires 3 things:
• Feature Space(Training Data)
• Distance metric
• to compute distance between
records
• The value of k
• the number of nearest
neighbors to retrieve from which
to get majority class
• To classify an unknown record:
• Compute distance to other training
records
• Identify k nearest neighbors
• Use class labels of nearest
neighbors to determine the
class label of unknown record
?
K-Nearest Neighbor
• Features
• All instances correspond to points in an n-dimensional Euclidean
space
• Classification is delayed till a new instance arrives
• Classification done by comparing feature vectors of the different
points
• Target function may be discrete or real-valued
1-Nearest Neighbor
3-Nearest Neighbor
Example
• We can now use the training set to classify an unknown case (Age=48 and
Loan=$142,000) using Euclidean distance. If K=1 then the nearest neighbor is
the last case in the training set with Default=Y.
• We can now use the training set to classify an unknown case (Age=48 and
Loan=$142,000) using Euclidean distance. If K=1 then the nearest neighbor is the last
case in the training set with Default=Y.
• With K=3, there are two Default=Y and one Default=N out of three closest neighbors.
The prediction for the unknown case is again Default=Y.
In Class
Exercise
In Class
Exercise
In Class
Exercise
• Tissue Paper Quality Classification
• Test instance is 3,7
In Class Exercise
• KNN Solved Example to predict Sugar of Diabetic Patient
given BMI and Age
• Test Example BMI=43.6, Age=40, Sugar=?
Hamming Distance
• Let's learn with the help of example;
• The restaurant A sells burger optional flavors; Pepper, Ginger and
Chilly
• Using Hamming Distance, show how the majority voting would
classify with respect to KNNs
• Query of test instance is
• Pepper: False, Ginger: True, Chilly: True
Hamming
Distance
Hamming Distance
• The training examples contain three attributes, Pepper, Ginger,
and Chilly. Each of these attributes takes either True or False as
the attribute values. Liked is the target that takes either True or
False as the value.
• In the k-nearest neighbor’s algorithm, first, we calculate the
distance between the new example and the training examples.
using this distance we find k-nearest neighbors from the training
examples.
• To calculate the distance the attribute values must be real
numbers. But in our case, the dataset set contains the
categorical values. Hence we use hamming distance measure
to find the distance between the new example and training
examples.
Hamming Distance
• Let x1 and x2 be the attribute values of two instances.
• Then, in the hamming distance, if the categorical values are the
same or matching that is x1 is the same as x2 then the distance
is 0, otherwise 1.
• For example,
• If the value of x1 is blue and x2 is also blue then the distance
between x1 and x2 is 0.
• If the value of x1 is blue and x2 is red then the distance
between x1 and x2 is 1.
KNN Example
•New examples:
• Example 1 (great, no, no, normal, no)
• Example 2 (mediocre, yes, no, normal, no)
Food
(3)
Chat
(2)
Fast
(2)
Price
(3)
Bar
(2)
BigTip
1 great yes yes normal no yes
2 great no yes normal no yes
3 mediocre yes no high no no
4 great yes yes normal yes yes
 most similar: number 2 (1 mismatch, 4 match)  yes
Second most similar example: number 1 (2 mismatch, 3 match)  yes
Similarity metric: Number of matching attributes (k=2)
Yes
 Most similar: number 3 (1 mismatch, 4 match)  no
Second most similar example: number 1 (2 mismatch, 3 match)  yes
Yes/No
Classification Performance Measures
44
• Classification accuracy can be unreliable when assessing
unbalanced data.
• For example, if there are 95 samples from class A and only 5
from class B in the data set, a particular classifier might classify
all the observations as class A.
• Confusion matrix and other classification measures provide
more information on the classification results.
• A self-learning practice based on the next slide.
You will practise it in lab ex2.
Classification Performance Measures
45
• Confusion matrix is a table with two rows and
two columns that reports the number of false
positives (FP), false negatives (FN), true
positives (TP), and true negatives (TN). In
multi-class classification, a confusion matrix
can be computed for each class.
• Other classification measures:
• Sensitivity (recall):
• Specificity:
• Precision:
• F1-score:
confusion matrix for “cat” class
TP
the number of real positives
=
TP
TP+FN
TP
TP+FP
2
1
precision
+
1
recall
=
2precision´recall
precision+recall
overall classification accurcy =
TP+TN
total number of sample
=
TP+TN
TP+NP+TN+FN
TN
the number of real negatives
=
TN
TN+FP
Self-learning material:
https://en.wikipedia.org/wiki/Confusion_matrix
Summary
• In this chapter, we have learned the simplest machine learning
algorithm k-NN.
• non-parametric
• no training
• We have also studied the behavior of the k-NN algorithm, e.g.,
how/why its performance changes over varying numbers of
neighbors and training samples.
• In the next chapter, we will talk about the simplest parametric
model in machine learning: a linear model trained using the least
squares approach.
46
You are asked to use k-NN
classifier for face recognition
and observe its behaviour in
lab ex2.

More Related Content

Similar to k-Nearest Neighbors with brief explanation.pptx

Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab
 

Similar to k-Nearest Neighbors with brief explanation.pptx (20)

K nearest neighbours
K nearest neighboursK nearest neighbours
K nearest neighbours
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Machine learning by using python By: Professor Lili Saghafi
Machine learning by using python By: Professor Lili SaghafiMachine learning by using python By: Professor Lili Saghafi
Machine learning by using python By: Professor Lili Saghafi
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf
 
Knn
KnnKnn
Knn
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
Lecture6 pca
Lecture6 pcaLecture6 pca
Lecture6 pca
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
4646150.ppt
4646150.ppt4646150.ppt
4646150.ppt
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
Sampling and Data_Update.ppt
Sampling and Data_Update.pptSampling and Data_Update.ppt
Sampling and Data_Update.ppt
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 

Recently uploaded

The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
heathfieldcps1
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
中 央社
 

Recently uploaded (20)

Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 Inventory
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
The Liver & Gallbladder (Anatomy & Physiology).pptx
The Liver &  Gallbladder (Anatomy & Physiology).pptxThe Liver &  Gallbladder (Anatomy & Physiology).pptx
The Liver & Gallbladder (Anatomy & Physiology).pptx
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 

k-Nearest Neighbors with brief explanation.pptx

  • 2. Instance-based learning • Instance-based learning is often termed lazy learning, as there is typically no “transformation” of training instances into more general “statements” • Instance-based learning methods simply store the training examples. • Generalizing beyond these examples is postponed until a new instance must be classified. • Each time a new query instance is encountered, its relationship to the previously stored examples is examined in order to assign a target function value for the new instance. • • Instance-based methods are sometimes referred to as "lazy" learning methods because they delay processing until a new instance must be classified.
  • 3. Instance-based learning • One disadvantage of instance-based approaches is that the cost of classifying new instances can be high. • This is due to the fact that nearly all computation takes place at classification time rather than when the training examples are first encountered.
  • 4. k-Nearest Neighbor • Pros: • High accuracy, • insensitive to outliers, • no assumptions about data • Cons: • Computationally expensive, • requires a lot of memory • Works with: • Numeric values, • nominal values
  • 5. k-NEAREST Neighbor LEARNING • This algorithm assumes all instances correspond to points in the n-dimensional space Rn. • The nearest neighbors of an instance are defined in terms of the standard Euclidean distance. • more precisely, let an arbitrary instance x be described by the feature vector • where ar(x) denotes the value of the rth attribute of instance x. Then the distance between two instances xi and xj is defined to be d(xi, xj), where
  • 6. Similarity and Dissimilarity • Similarity • Numerical measure of how alike two data objects are. • Is higher when objects are more alike. • Often falls in the range [0,1]: • Examples: Cosine, Jaccard, Tanimoto, • Dissimilarity • Numerical measure of how different two data objects are • Lower when objects are more alike • Minimum dissimilarity is often 0 • Upper limit varies • Proximity refers to a similarity or dissimilarity Src: “Introduction to Data Mining” by Vipin Kumar et al
  • 7. Distance Metric • Distance d (p, q) between two points p and q is a dissimilarity measure if it satisfies: 1. Positive definiteness: d (p, q)  0 for all p and q and d (p, q) = 0 only if p = q. 2. Symmetry: d (p, q) = d (q, p) for all p and q. 3. Triangle Inequality: d (p, r)  d (p, q) + d (q, r) for all points p, q, and r. • Examples: • Euclidean distance • Minkowski distance • Mahalanobis distance Src: “Introduction to Data Mining” by Vipin Kumar et al
  • 8. General approach to kNN 1. Collect: Any method. 2. Prepare: Numeric values are needed for a distance calculation. A structured data format is best. 3. Analyze: Any method. 4. Train: Does not apply to the kNN algorithm. 5. Test: Calculate the error rate. 6. Use: This application needs to get some input data and output structured numeric values. Next, the application runs the kNN algorithm on this input data and determines which class the input data should belong to. The application then takes some action on the calculated class.
  • 10. Instance-Based Learning • Idea: • Similar examples have similar label. • Classify new examples like similar training examples. • Algorithm: • Given some new example x for which we need to predict its class y • Find most similar training examples • Classify x “like” these most similar examples • Questions: • How to determine similarity? • How many similar training examples to consider? • How to resolve inconsistencies among the training examples?
  • 11. Summary: k-NN • The simplest machine learning algorithm. • No explicit training. • A non-parametric method: no parameter to be optimised. • The algorithm relies on a distance measure. 12 Think: what could be the possible effect of the number of training samples used in k-NN?
  • 12. 1-Nearest Neighbor • One of the simplest of all machine learning classifiers • Simple idea: label a new point the same as the closest known point Label it red.
  • 13. Distance Metrics • Different metrics can change the decision surface • Standard Euclidean distance metric: • Two-dimensional: Dist(a,b) = sqrt((a1 – b1)2 + (a2 – b2)2) • Multivariate: Dist(a,b) = sqrt(∑ (ai – bi)2) Dist(a,b) =(a1 – b1)2 + (a2 – b2)2 Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2 Adapted from “Instance-Based Learning” lecture slides by Andrew Moore, CMU.
  • 14. k – Nearest Neighbor • Generalizes 1-NN to smooth away noise in the labels • A new point is now assigned the most frequent label of its k nearest neighbors Label it red, when k = 3 Label it blue, when k = 7
  • 15. Selecting the Number of Neighbors • Increase k: • Makes KNN less sensitive to noise • Decrease k: • Allows capturing finer structure of space • Pick k not too large, but not too small (depends on data)
  • 16. Neighbour Number k • Hyper-parameter: neighbour number k. • The process of determining the neighbour number k is called hyper-parameter selection or model selection. • In binary classification, what can happen when k is set as an even number? We will talk more on hyper-parameter selection in the next lectures. 18
  • 17. Neighbour Number k • Hyper-parameter: neighbour number k. • The process of determining the neighbour number k is called hyper-parameter selection or model selection. • In binary classification, what can happen when k is set as an even number? We will talk more on hyper-parameter selection in the next lecture. 19 Think: what could be the possible effect of the neighbour number?
  • 18. Effect of Training Samples • Small number of training samples: • insufficient information • Large number of training samples: • more information • but, time and memory cost consuming (distance calculation, sorting) • Noisy training samples: • overfitting 20
  • 19. Curse-of-Dimensionality • Prediction accuracy can quickly degrade when number of attributes grows. • Irrelevant attributes easily “swamp” information from relevant attributes • When many irrelevant attributes, similarity/distance measure becomes less reliable • Remedy • Try to remove irrelevant attributes in pre-processing step • Weight attributes differently • Increase k (but not too much)
  • 20. k NEAREST NEIGHBOR • Requires 3 things: • Feature Space(Training Data) • Distance metric • to compute distance between records • The value of k • the number of nearest neighbors to retrieve from which to get majority class • To classify an unknown record: • Compute distance to other training records • Identify k nearest neighbors • Use class labels of nearest neighbors to determine the class label of unknown record ?
  • 21. K-Nearest Neighbor • Features • All instances correspond to points in an n-dimensional Euclidean space • Classification is delayed till a new instance arrives • Classification done by comparing feature vectors of the different points • Target function may be discrete or real-valued
  • 22. Summary: Neighbour Number k • Hyper-parameter: neighbour number k. • In binary classification, k needs to be an uneven number. • Small k: we may model noise! • Large k: neighbours will include too many samples from other classes. 24
  • 23. Example Consider the following database defining a concept Ex Attribute Attribute Attribute Concept No. Size Colour Shape Satisfied 1 medium blue brick yes 2 small red wedge no 3 small red sphere yes 4 large red wedge no 5 large green pillar yes 6 large red pillar no 7 large green pillar yes
  • 24. k NEAREST NEIGHBOR • Requires 3 things: • Feature Space(Training Data) • Distance metric • to compute distance between records • The value of k • the number of nearest neighbors to retrieve from which to get majority class • To classify an unknown record: • Compute distance to other training records • Identify k nearest neighbors • Use class labels of nearest neighbors to determine the class label of unknown record ?
  • 25. K-Nearest Neighbor • Features • All instances correspond to points in an n-dimensional Euclidean space • Classification is delayed till a new instance arrives • Classification done by comparing feature vectors of the different points • Target function may be discrete or real-valued
  • 28. Example • We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000) using Euclidean distance. If K=1 then the nearest neighbor is the last case in the training set with Default=Y.
  • 29. • We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000) using Euclidean distance. If K=1 then the nearest neighbor is the last case in the training set with Default=Y. • With K=3, there are two Default=Y and one Default=N out of three closest neighbors. The prediction for the unknown case is again Default=Y.
  • 32. In Class Exercise • Tissue Paper Quality Classification • Test instance is 3,7
  • 33. In Class Exercise • KNN Solved Example to predict Sugar of Diabetic Patient given BMI and Age • Test Example BMI=43.6, Age=40, Sugar=?
  • 34.
  • 35. Hamming Distance • Let's learn with the help of example; • The restaurant A sells burger optional flavors; Pepper, Ginger and Chilly • Using Hamming Distance, show how the majority voting would classify with respect to KNNs • Query of test instance is • Pepper: False, Ginger: True, Chilly: True
  • 37. Hamming Distance • The training examples contain three attributes, Pepper, Ginger, and Chilly. Each of these attributes takes either True or False as the attribute values. Liked is the target that takes either True or False as the value. • In the k-nearest neighbor’s algorithm, first, we calculate the distance between the new example and the training examples. using this distance we find k-nearest neighbors from the training examples. • To calculate the distance the attribute values must be real numbers. But in our case, the dataset set contains the categorical values. Hence we use hamming distance measure to find the distance between the new example and training examples.
  • 38. Hamming Distance • Let x1 and x2 be the attribute values of two instances. • Then, in the hamming distance, if the categorical values are the same or matching that is x1 is the same as x2 then the distance is 0, otherwise 1. • For example, • If the value of x1 is blue and x2 is also blue then the distance between x1 and x2 is 0. • If the value of x1 is blue and x2 is red then the distance between x1 and x2 is 1.
  • 39.
  • 40.
  • 41. KNN Example •New examples: • Example 1 (great, no, no, normal, no) • Example 2 (mediocre, yes, no, normal, no) Food (3) Chat (2) Fast (2) Price (3) Bar (2) BigTip 1 great yes yes normal no yes 2 great no yes normal no yes 3 mediocre yes no high no no 4 great yes yes normal yes yes  most similar: number 2 (1 mismatch, 4 match)  yes Second most similar example: number 1 (2 mismatch, 3 match)  yes Similarity metric: Number of matching attributes (k=2) Yes  Most similar: number 3 (1 mismatch, 4 match)  no Second most similar example: number 1 (2 mismatch, 3 match)  yes Yes/No
  • 42. Classification Performance Measures 44 • Classification accuracy can be unreliable when assessing unbalanced data. • For example, if there are 95 samples from class A and only 5 from class B in the data set, a particular classifier might classify all the observations as class A. • Confusion matrix and other classification measures provide more information on the classification results. • A self-learning practice based on the next slide. You will practise it in lab ex2.
  • 43. Classification Performance Measures 45 • Confusion matrix is a table with two rows and two columns that reports the number of false positives (FP), false negatives (FN), true positives (TP), and true negatives (TN). In multi-class classification, a confusion matrix can be computed for each class. • Other classification measures: • Sensitivity (recall): • Specificity: • Precision: • F1-score: confusion matrix for “cat” class TP the number of real positives = TP TP+FN TP TP+FP 2 1 precision + 1 recall = 2precision´recall precision+recall overall classification accurcy = TP+TN total number of sample = TP+TN TP+NP+TN+FN TN the number of real negatives = TN TN+FP Self-learning material: https://en.wikipedia.org/wiki/Confusion_matrix
  • 44. Summary • In this chapter, we have learned the simplest machine learning algorithm k-NN. • non-parametric • no training • We have also studied the behavior of the k-NN algorithm, e.g., how/why its performance changes over varying numbers of neighbors and training samples. • In the next chapter, we will talk about the simplest parametric model in machine learning: a linear model trained using the least squares approach. 46 You are asked to use k-NN classifier for face recognition and observe its behaviour in lab ex2.