KNN Classifier Explained

•

1 like•2,113 views

Tejas Bubane

Presentation for my BE project (stage-1)

Technology

K Nearest Neighbour Classifier
●
Tejas Bubane (I-05)
●
Shriyansh Jain (H-43)
●
Mitesh Butala (J- 15)
●
Gaurav Jagtap (H-42)

Project Guide: Asst. Prof. P.A. Bailke
VIT, Pune

TF-IDF Values
● Term Frequency (TF): Importance of the term within that document – raw
frequency
i.e. TF(d,t) = Number of occurrences of the term(t) in the document(d)

● Inverse Document Frequency (IDF): Importance of the term in the corpus

IDF(t) = log(D/t)
where, D = total number of documents
t = number of documents in which the term has occurred

word occurs in many documents – less useful – IDF value low (and vice-versa)

● TF-IDF(d,t) = TF(d,t) × IDF(t)

KNN - Introduction
● Learning by analogy – comparison with similar items from training set

● Training tuples described by n attributes – each document represents a
point in n dimensional space

● Closeness defined in terms of distance metric
eg. Euclidean distance, Cosine similarity, Manhattan distance

●

● Cos = 1 i.e. Angle = 0 documents are similar
● Cos = 0 i.e. Angle = 90 documents are not similar

KNN Algorithm

● Find cosine distance of query document with each document
in the training set

● Find the k documents that are closest / nearest to the query document

● Class of query is the class of majority of the nearest neighbours
(classes of each document in the training set are known)

Further Analysis of Classification
● Lazy Learner : Starts operation only after a query is provided
eg. KNN (calculates TF-IDF values after receiving query)

● Eager Learner : Operates and keeps “learning” till query is received.
eg. ANN (adjusts weights before receiving query)

● Supervised Learning : Labelled training data
eg. Classification

● Unsupervised Learning : Find hidded structure in unlabelled data
eg. Clusturing

● KNN is Supervised Learning Algorithm and follows Lazy Learning approach

Scaling KNN
● Vocabulary – Set of all words occurring in all documents

● Large data set – Drastic increase in the vocabulary – difficult to handle

● Feature Selection – Relation between terms in vocabulary and classes
Remove words which are less related (below threshold) to all classes
Reduce vocabulary to make it manageable

● eg. Chi-square test

Similar to KNN Classifier Explained

Some Information Retrieval Models and Our Experiments for TREC KBAPatrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)

IRT Unit_ 2.pptxthenmozhip8

KNN.pptxRahul Halder

KNN.pptxdfgd7

Ir modelsAmbreen Angel

Zizka aimsa 2012Natalia Ostapuk

Vector space classificationUjjawal

Information retrieval 8 term weightingVaibhav Khanna

The vector space modelpkgosh

Composing (Im)politeness in Dependent Type SemanticsDaisuke BEKKI

Data mining techniquesHigher Education Department KPK, Pakistan

Similar to KNN Classifier Explained (11)

Some Information Retrieval Models and Our Experiments for TREC KBA

IRT Unit_ 2.pptx

KNN.pptx

Ir models

Zizka aimsa 2012

Vector space classification

Information retrieval 8 term weighting

The vector space model

Composing (Im)politeness in Dependent Type Semantics

Data mining techniques

Recently uploaded

From Family Reminiscence to Scholarly Archive .Alan Dix

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Artificial intelligence in cctv survelliance.pptxhariprasad279825

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Gen AI in Business - Global Trends Report 2024.pdfAddepto

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Advanced Computer Architecture – An IntroductionDilum Bandara

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .

DevEX - reference for building teams, processes, and platforms

Artificial intelligence in cctv survelliance.pptx

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Search Engine Optimization SEO PDF for 2024.pdf

Gen AI in Business - Global Trends Report 2024.pdf

TeamStation AI System Report LATAM IT Salaries 2024

SAP Build Work Zone - Overview L2-L3.pptx

Unraveling Multimodality with Large Language Models.pdf

Take control of your SAP testing with UiPath Test Suite

Human Factors of XR: Using Human Factors to Design XR Systems

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Anypoint Exchange: It’s Not Just a Repo!

Advanced Computer Architecture – An Introduction

Are Multi-Cloud and Serverless Good or Bad?

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

WordPress Websites for Engineers: Elevate Your Brand

KNN Classifier Explained

1. K Nearest Neighbour Classifier ● Tejas Bubane (I-05) ● Shriyansh Jain (H-43) ● Mitesh Butala (J- 15) ● Gaurav Jagtap (H-42) Project Guide: Asst. Prof. P.A. Bailke VIT, Pune

2. TF-IDF Values ● Term Frequency (TF): Importance of the term within that document – raw frequency i.e. TF(d,t) = Number of occurrences of the term(t) in the document(d) ● Inverse Document Frequency (IDF): Importance of the term in the corpus IDF(t) = log(D/t) where, D = total number of documents t = number of documents in which the term has occurred word occurs in many documents – less useful – IDF value low (and vice-versa) ● TF-IDF(d,t) = TF(d,t) × IDF(t)

3. KNN - Introduction ● Learning by analogy – comparison with similar items from training set ● Training tuples described by n attributes – each document represents a point in n dimensional space ● Closeness defined in terms of distance metric eg. Euclidean distance, Cosine similarity, Manhattan distance ● ● Cos = 1 i.e. Angle = 0 documents are similar ● Cos = 0 i.e. Angle = 90 documents are not similar

4. KNN Algorithm ● Find cosine distance of query document with each document in the training set ● Find the k documents that are closest / nearest to the query document ● Class of query is the class of majority of the nearest neighbours (classes of each document in the training set are known)

5. Further Analysis of Classification ● Lazy Learner : Starts operation only after a query is provided eg. KNN (calculates TF-IDF values after receiving query) ● Eager Learner : Operates and keeps “learning” till query is received. eg. ANN (adjusts weights before receiving query) ● Supervised Learning : Labelled training data eg. Classification ● Unsupervised Learning : Find hidded structure in unlabelled data eg. Clusturing ● KNN is Supervised Learning Algorithm and follows Lazy Learning approach

6. Scaling KNN ● Vocabulary – Set of all words occurring in all documents ● Large data set – Drastic increase in the vocabulary – difficult to handle ● Feature Selection – Relation between terms in vocabulary and classes Remove words which are less related (below threshold) to all classes Reduce vocabulary to make it manageable ● eg. Chi-square test

KNN Classifier Explained

Recommended

Recommended

More Related Content

Similar to KNN Classifier Explained

Similar to KNN Classifier Explained (11)

Recently uploaded

Recently uploaded (20)

KNN Classifier Explained