SlideShare a Scribd company logo
1 of 23
K-NEAREST NEIGHBOR CLASSIFIER
Ajay Krishna Teja Kavuri
ajkavuri@mix.wvu.edu
OUTLINE
• BACKGROUND
• DEFINITION
• K-NN IN ACTION
• K-NN PROPERTIES
• REMARKS
BACKGROUND
“Classification is a data mining technique used to predict group
membership for data instances.”
• The group membership is utilized in for the prediction of the
future data sets.
ORIGINS OF K-NN
• Nearest Neighbors have been used in statistical estimation and
pattern recognition already in the beginning of 1970’s (non-
parametric techniques).
• The method prevailed in several disciplines and still it is one
of the top 10 Data Mining algorithm.
MOST CITED PAPERS
K-NN has several variations that came out of optimizations
through research. Following are most cited publications:
• Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk, Rajeev Motwani
• Nearest neighbor queries
Nick Roussopoulos, Stephen Kelley, Frédéric Vincent
• Machine learning in automated text categorization
Fabrizio Sebastiani
IN A SENTENCE K-NN IS…..
• It’s how people judge by observing our peers.
• We tend to move with people of
similar attributes so does data.
DEFINITION
• K-Nearest Neighbor is considered a lazy learning algorithm
that classifies data sets based on their similarity with
neighbors.
• “K” stands for number of data set items
that are considered for the classification.
Ex: Image shows classification for different k-values.
TECHNICALLY…..
• For the given attributes A={X1, X2….. XD} Where D is the
dimension of the data, we need to predict the corresponding
classification group G={Y1,Y2…Yn} using the proximity
metric over K items in D dimension that defines the closeness
of association such that X € RD and Yp € G.
THAT IS….
• Attribute A={Color, Outline, Dot}
• Classification Group,
G={triangle, square}
• D=3, we are free to choose K value.
Attributes A
C
l
a
s
s
i
f
i
c
a
t
i
o
n
G
r
o
u
p
PROXIMITY METRIC
• Definition: Also termed as “Similarity Measure” quantifies the
association among different items.
• Following is a table of measures for different data items:
Similarity Measure Data Format
Contingency Table, Jaccard coefficient, Distance Measure Binary
Z-Score, Min-Max Normalization, Distance Measures Numeric
Cosine Similarity, Dot Product Vectors
PROXIMITY METRIC
• For the numeric data let us consider some distance measures:
– Manhattan Distance:
– Ex: Given X = {1,2} & Y = {2,5}
Manhattan Distance = dist(X,Y) = |1-2|+|2-5|
= 1+3
= 4
PROXIMITY METRIC
- Euclidean Distance:
- Ex: Given X = {-2,2} & Y = {2,5}
Euclidean Distance = dist(X,Y) = [ (-2-2)^2 + (2-5)^2 ]^(1/2)
= dist(X,Y) = (16 + 9)^(1/2)
= dist(X,Y) = 5
K-NN IN ACTION
• Consider the following data:
A={weight,color}
G={Apple(A), Banana(B)}
• We need to predict the type of a
fruit with:
weight = 378
color = red
SOME PROCESSING….
• Assign color codes to convert into numerical data:
• Let’s label Apple as “A” and
Banana as “B”
PLOTTING
• Using K=3,
Our result will be,
AS K VARIES….
• Clearly, K has an impact on the classification.
Can you guess?
K-NN LIVE!!
• http://www.ai.mit.edu/courses/6.034b/KNN.html
K-NN VARIATIONS
• Weighted K-NN: Takes the weights associated with each
attribute. This can give priority among attributes.
Ex: For the data,
Weight:
Probability:
Where,
Above is the resulting dataset
K-NN VARIATIONS
• (K-l)-NN: Reduce complexity by having a threshold on the
majority. We could restrict the associations through (K-l)-NN.
Ex: Decide if majority is over a given
threshold l. Otherwise reject.
Here, K=5 and l=4. As there is no
majority with count>4. We reject
to classify the element.
K-NN PROPERTIES
• K-NN is a lazy algorithm
• The processing defers with respect to K value.
• Result is generated after analysis of stored data.
• It neglects any intermediate values.
REMARKS: FIRST THE GOOD
Advantages
• Can be applied to the data from any distribution
for example, data does not have to be separable with a linear
boundary
• Very simple and intuitive
• Good classification if the number of samples is large enough
NOW THE BAD….
Disadvantages
• Dependent on K Value
• Test stage is computationally expensive
• No training stage, all the work is done during the test stage
• This is actually the opposite of what we want. Usually we can
afford training step to take a long time, but we want fast test step
• Need large number of samples for accuracy
THANK YOU

More Related Content

What's hot

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning ANKUSH PAL
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligenceMdAlAmin187
 

What's hot (20)

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Decision tree
Decision treeDecision tree
Decision tree
 

Viewers also liked

Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar AhmedZaffar Ahmed Shaikh
 

Viewers also liked (7)

Algorithme knn
Algorithme knnAlgorithme knn
Algorithme knn
 
Knn
KnnKnn
Knn
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
 
Machine learning clisification algorthims
Machine learning clisification algorthimsMachine learning clisification algorthims
Machine learning clisification algorthims
 
Knn
KnnKnn
Knn
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
 
ML KNN-ALGORITHM
ML KNN-ALGORITHMML KNN-ALGORITHM
ML KNN-ALGORITHM
 

Similar to KNN

KNN Algorithm using C++
KNN Algorithm using C++KNN Algorithm using C++
KNN Algorithm using C++Afraz Khan
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningNandakumar P
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Sampling and Data_Update.ppt
Sampling and Data_Update.pptSampling and Data_Update.ppt
Sampling and Data_Update.pptMdShohelRana69
 
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationMadeleine Organ
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdfbintis1
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 
MachineLearning.pptx
MachineLearning.pptxMachineLearning.pptx
MachineLearning.pptxBangtangurl
 

Similar to KNN (20)

KNN Algorithm using C++
KNN Algorithm using C++KNN Algorithm using C++
KNN Algorithm using C++
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
K nearest neighbours
K nearest neighboursK nearest neighbours
K nearest neighbours
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Sampling and Data_Update.ppt
Sampling and Data_Update.pptSampling and Data_Update.ppt
Sampling and Data_Update.ppt
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
 
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_Presentation
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
MachineLearning.pptx
MachineLearning.pptxMachineLearning.pptx
MachineLearning.pptx
 

Recently uploaded

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 

Recently uploaded (20)

YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 

KNN

  • 1. K-NEAREST NEIGHBOR CLASSIFIER Ajay Krishna Teja Kavuri ajkavuri@mix.wvu.edu
  • 2. OUTLINE • BACKGROUND • DEFINITION • K-NN IN ACTION • K-NN PROPERTIES • REMARKS
  • 3. BACKGROUND “Classification is a data mining technique used to predict group membership for data instances.” • The group membership is utilized in for the prediction of the future data sets.
  • 4. ORIGINS OF K-NN • Nearest Neighbors have been used in statistical estimation and pattern recognition already in the beginning of 1970’s (non- parametric techniques). • The method prevailed in several disciplines and still it is one of the top 10 Data Mining algorithm.
  • 5. MOST CITED PAPERS K-NN has several variations that came out of optimizations through research. Following are most cited publications: • Approximate nearest neighbors: towards removing the curse of dimensionality Piotr Indyk, Rajeev Motwani • Nearest neighbor queries Nick Roussopoulos, Stephen Kelley, Frédéric Vincent • Machine learning in automated text categorization Fabrizio Sebastiani
  • 6. IN A SENTENCE K-NN IS….. • It’s how people judge by observing our peers. • We tend to move with people of similar attributes so does data.
  • 7. DEFINITION • K-Nearest Neighbor is considered a lazy learning algorithm that classifies data sets based on their similarity with neighbors. • “K” stands for number of data set items that are considered for the classification. Ex: Image shows classification for different k-values.
  • 8. TECHNICALLY….. • For the given attributes A={X1, X2….. XD} Where D is the dimension of the data, we need to predict the corresponding classification group G={Y1,Y2…Yn} using the proximity metric over K items in D dimension that defines the closeness of association such that X € RD and Yp € G.
  • 9. THAT IS…. • Attribute A={Color, Outline, Dot} • Classification Group, G={triangle, square} • D=3, we are free to choose K value. Attributes A C l a s s i f i c a t i o n G r o u p
  • 10. PROXIMITY METRIC • Definition: Also termed as “Similarity Measure” quantifies the association among different items. • Following is a table of measures for different data items: Similarity Measure Data Format Contingency Table, Jaccard coefficient, Distance Measure Binary Z-Score, Min-Max Normalization, Distance Measures Numeric Cosine Similarity, Dot Product Vectors
  • 11. PROXIMITY METRIC • For the numeric data let us consider some distance measures: – Manhattan Distance: – Ex: Given X = {1,2} & Y = {2,5} Manhattan Distance = dist(X,Y) = |1-2|+|2-5| = 1+3 = 4
  • 12. PROXIMITY METRIC - Euclidean Distance: - Ex: Given X = {-2,2} & Y = {2,5} Euclidean Distance = dist(X,Y) = [ (-2-2)^2 + (2-5)^2 ]^(1/2) = dist(X,Y) = (16 + 9)^(1/2) = dist(X,Y) = 5
  • 13. K-NN IN ACTION • Consider the following data: A={weight,color} G={Apple(A), Banana(B)} • We need to predict the type of a fruit with: weight = 378 color = red
  • 14. SOME PROCESSING…. • Assign color codes to convert into numerical data: • Let’s label Apple as “A” and Banana as “B”
  • 15. PLOTTING • Using K=3, Our result will be,
  • 16. AS K VARIES…. • Clearly, K has an impact on the classification. Can you guess?
  • 18. K-NN VARIATIONS • Weighted K-NN: Takes the weights associated with each attribute. This can give priority among attributes. Ex: For the data, Weight: Probability: Where, Above is the resulting dataset
  • 19. K-NN VARIATIONS • (K-l)-NN: Reduce complexity by having a threshold on the majority. We could restrict the associations through (K-l)-NN. Ex: Decide if majority is over a given threshold l. Otherwise reject. Here, K=5 and l=4. As there is no majority with count>4. We reject to classify the element.
  • 20. K-NN PROPERTIES • K-NN is a lazy algorithm • The processing defers with respect to K value. • Result is generated after analysis of stored data. • It neglects any intermediate values.
  • 21. REMARKS: FIRST THE GOOD Advantages • Can be applied to the data from any distribution for example, data does not have to be separable with a linear boundary • Very simple and intuitive • Good classification if the number of samples is large enough
  • 22. NOW THE BAD…. Disadvantages • Dependent on K Value • Test stage is computationally expensive • No training stage, all the work is done during the test stage • This is actually the opposite of what we want. Usually we can afford training step to take a long time, but we want fast test step • Need large number of samples for accuracy