SlideShare a Scribd company logo
1 of 27
Download to read offline
Machine Learning Basics
Classification and Clustering
Humberto Marchezi
hcmarchezi@gmail.com
November 2015
Definitions
Pattern recognition, artificial intelligence and a bit of data
mining
Solves a given task without explicitly being programmed to do
so instead it makes predictions from provided data
Machine learning algorithms can be divided into 3 categories:
Supervised learning
Unsupervised learning
Reinforcement learning
Problem types
Classification
Regression
Clustering
etc.
Algorithms
Supervised Learning
Naive Bayesian Classifier
Linear/Polynomial/Logistic/Multinomial Regression
Neural Networks
etc.
Unsupervised Learning
K-means / K-medoids
Principal Component Analysis
Gaussian Distribution (Anomaly Detection)
etc.
Naive Bayes Classifier
Classify information based on probabilistic model score
Score for a category ck with features f1, f2, f3, ..., fn
p(Ck|f1, f2, ..., fn) = P(Ck )p(f1|Ck )p(f2|Ck )...p(fn|Ck )
p(f1)p(f2)...p(fn)
For a text classifier, features above are each word in the
sentence (bag-of-words model)
Also known as multinomial naive bayes classifier
Naive Bayes Classifier
Concrete Example
Ingredients
2 tbsp salt
lemon
Instructions
Cut lemon
Pour salt
Naive Bayes Classifier
Concrete Example
Ingredients
word occurrences
2 1
tbsp 1
salt 1
lemon 1
total 4
examples 2
Instructions
word occurrences
cut 1
lemon 1
pour 1
salt 1
total 4
examples 2
Global
word occurrences
2 1
tbsp 1
salt 2
lemon 2
cut 1
pour 1
total 8
examples 4
Naive Bayes Classifier
Concrete Example
Ingredients 1/2
word probability
2 1/4
tbsp 1/4
salt 1/4
lemon 1/4
Instructions 1/2
word probability
cut 1/4
lemon 1/4
pour 1/4
salt 1/4
Global
word probability
2 1/8
tbsp 1/8
salt 2/8
lemon 2/8
cut 1/8
pour 1/8
Naive Bayes Classifier
Concrete Example
Query ’1 tbsp salt’
Ingredients (I)
p(I| 1 , tbsp , salt ) = P(I)p( 1 |I)p( tbsp |I)p( salt |I)
p( 1 )p( tbsp )p( salt )
= 0.5x0.0001x0.25x0.25
0.0001x0.125x0.25 = 1
Instructions (D)
p(D| 1 , tbsp , salt ) = P(D)p( 1 |D)p( tbsp |D)p( salt |D)
p( 1 )p( tbsp )p( salt )
= 0.5x0.0001x0.0001x0.25
0.0001x0.125x0.25 = 0.0004
Result: Ingredients (since it has the highest probability)
Note: 0.0001 is the probability of an unknown element (cannot be
zero!)
Naive Bayes Classifier
Examples
Classify email as spam or not spam
Document type classification
Document sections classification
Image Classification
K-Means
Unsupervised learning algorithm to identify clusters
Find clusters for unlabeled data
Algorithm
k-means
Choose K examples as initial centroids
While centroids move
1) Choose closest centroid Ki for each xi and store distance ci
2) Calculate new centroid Ki in each cluster
end
K-Means
K-means example steps to converge to final solution
Figure : Taken from https://en.wikipedia.org/wiki/File:
K_Means_Example_Step_2.svg
K-Means
How to avoid sub-optimal results ?
Figure : Generated from http://www.naftaliharris.com/blog/
visualizing-k-means-clustering/
K-Means
How to avoid sub-optimal results ?
k-means
Repeat N times do
Randomly choose K examples as initial centroids
While centroids move
1) Choose closest centroid Ki for each xi and store distance ci
2) Calculate new centroid Ki in each cluster
end
Calculate result cost (average distance of examples to its centroids)
If result cost is lower
end (repeat)
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=1
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=2
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=3
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=4
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Solution for k=5
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : Cluster costs
K-Means
Elbow Method - How to identify the number of clusters ?
Elbow method
Repeat for clusters K = 1,2,3,...n
Run K-Means
Compute average cost for K clusters
n
i=1 ci
n (simplifying
n
i=1 ci )
end (repeat)
Plot cost for each K and choose the one located at the ”elbow”
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
K-Means
Elbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
Not always possible to find elbow (well distributes examples)
Best practice associate cluster number with business meaning
K-Means
Examples
Figure : Customer segmentation with k-means
K-Means
Examples
Figure : Identify related news and articles
K-Means
Examples
Figure : Image color reduction -
http://opencv-python-tutroals.readthedocs.org/en/latest/
_images/oc_color_quantization.jpg
References and Resources
1 Coursera Machine Learning
https://www.coursera.org/learn/machine-learning
2 Naive Bayes Classifier - Wikipedia
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
3 K-Means Clustering - Wikipedia
https://en.wikipedia.org/wiki/K-means_clustering
4 Visualizing K-Means Clustering
http://www.naftaliharris.com/blog/visualizing-k-means-clustering/
5 Naive Bayes for Image Processing
http://www.cs.ubc.ca/~lowe/papers/12mccannCVPR.pdf
6 Document Clustering with K-Means
http://www.codeproject.com/Articles/439890/
Text-Documents-Clustering-using-K-Means-Algorithm

More Related Content

What's hot

Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Marina Santini
 
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…Dongseo University
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing mapsraphaelkiminya
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University
 
Image recogonization
Image recogonizationImage recogonization
Image recogonizationSANTOSH RATH
 
Implementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmImplementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmDipesh Shome
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...Shuhei Yoshida
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 

What's hot (20)

Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Image recogonization
Image recogonizationImage recogonization
Image recogonization
 
Implementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmImplementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor Algorithm
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 

Similar to Machine Learning Basics

Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clusteringmonalisa Das
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...MostafaHazemMostafaa
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analyticsCollin Bennett
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Algorithm chapter 1
Algorithm chapter 1Algorithm chapter 1
Algorithm chapter 1chidabdu
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxrinehi3578
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 

Similar to Machine Learning Basics (20)

Lect4
Lect4Lect4
Lect4
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Data clustering
Data clustering Data clustering
Data clustering
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
ML basic & clustering
ML basic & clusteringML basic & clustering
ML basic & clustering
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Lec13 Clustering.pptx
Lec13 Clustering.pptxLec13 Clustering.pptx
Lec13 Clustering.pptx
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Algorithm chapter 1
Algorithm chapter 1Algorithm chapter 1
Algorithm chapter 1
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptxANLY 501 Lab 7 Presentation Group 8 slide.pptx
ANLY 501 Lab 7 Presentation Group 8 slide.pptx
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 

More from Humberto Marchezi

Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesHumberto Marchezi
 
Building Anomaly Detections Systems
Building Anomaly Detections SystemsBuilding Anomaly Detections Systems
Building Anomaly Detections SystemsHumberto Marchezi
 
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...Humberto Marchezi
 
C++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkC++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkHumberto Marchezi
 

More from Humberto Marchezi (6)

Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
 
Building Anomaly Detections Systems
Building Anomaly Detections SystemsBuilding Anomaly Detections Systems
Building Anomaly Detections Systems
 
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
Um Ambiente Grafico para Desenvolvimento de Software de Controle para Robos M...
 
C++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkC++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing Framework
 
NHibernate
NHibernateNHibernate
NHibernate
 
Padroes de desenho
Padroes de desenhoPadroes de desenho
Padroes de desenho
 

Recently uploaded

Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 

Recently uploaded (20)

Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 

Machine Learning Basics

  • 1. Machine Learning Basics Classification and Clustering Humberto Marchezi hcmarchezi@gmail.com November 2015
  • 2. Definitions Pattern recognition, artificial intelligence and a bit of data mining Solves a given task without explicitly being programmed to do so instead it makes predictions from provided data Machine learning algorithms can be divided into 3 categories: Supervised learning Unsupervised learning Reinforcement learning Problem types Classification Regression Clustering etc.
  • 3. Algorithms Supervised Learning Naive Bayesian Classifier Linear/Polynomial/Logistic/Multinomial Regression Neural Networks etc. Unsupervised Learning K-means / K-medoids Principal Component Analysis Gaussian Distribution (Anomaly Detection) etc.
  • 4. Naive Bayes Classifier Classify information based on probabilistic model score Score for a category ck with features f1, f2, f3, ..., fn p(Ck|f1, f2, ..., fn) = P(Ck )p(f1|Ck )p(f2|Ck )...p(fn|Ck ) p(f1)p(f2)...p(fn) For a text classifier, features above are each word in the sentence (bag-of-words model) Also known as multinomial naive bayes classifier
  • 5. Naive Bayes Classifier Concrete Example Ingredients 2 tbsp salt lemon Instructions Cut lemon Pour salt
  • 6. Naive Bayes Classifier Concrete Example Ingredients word occurrences 2 1 tbsp 1 salt 1 lemon 1 total 4 examples 2 Instructions word occurrences cut 1 lemon 1 pour 1 salt 1 total 4 examples 2 Global word occurrences 2 1 tbsp 1 salt 2 lemon 2 cut 1 pour 1 total 8 examples 4
  • 7. Naive Bayes Classifier Concrete Example Ingredients 1/2 word probability 2 1/4 tbsp 1/4 salt 1/4 lemon 1/4 Instructions 1/2 word probability cut 1/4 lemon 1/4 pour 1/4 salt 1/4 Global word probability 2 1/8 tbsp 1/8 salt 2/8 lemon 2/8 cut 1/8 pour 1/8
  • 8. Naive Bayes Classifier Concrete Example Query ’1 tbsp salt’ Ingredients (I) p(I| 1 , tbsp , salt ) = P(I)p( 1 |I)p( tbsp |I)p( salt |I) p( 1 )p( tbsp )p( salt ) = 0.5x0.0001x0.25x0.25 0.0001x0.125x0.25 = 1 Instructions (D) p(D| 1 , tbsp , salt ) = P(D)p( 1 |D)p( tbsp |D)p( salt |D) p( 1 )p( tbsp )p( salt ) = 0.5x0.0001x0.0001x0.25 0.0001x0.125x0.25 = 0.0004 Result: Ingredients (since it has the highest probability) Note: 0.0001 is the probability of an unknown element (cannot be zero!)
  • 9. Naive Bayes Classifier Examples Classify email as spam or not spam Document type classification Document sections classification Image Classification
  • 10. K-Means Unsupervised learning algorithm to identify clusters Find clusters for unlabeled data Algorithm k-means Choose K examples as initial centroids While centroids move 1) Choose closest centroid Ki for each xi and store distance ci 2) Calculate new centroid Ki in each cluster end
  • 11. K-Means K-means example steps to converge to final solution Figure : Taken from https://en.wikipedia.org/wiki/File: K_Means_Example_Step_2.svg
  • 12. K-Means How to avoid sub-optimal results ? Figure : Generated from http://www.naftaliharris.com/blog/ visualizing-k-means-clustering/
  • 13. K-Means How to avoid sub-optimal results ? k-means Repeat N times do Randomly choose K examples as initial centroids While centroids move 1) Choose closest centroid Ki for each xi and store distance ci 2) Calculate new centroid Ki in each cluster end Calculate result cost (average distance of examples to its centroids) If result cost is lower end (repeat)
  • 14. K-Means Elbow Method - How to identify the number of clusters ? Figure : K-means elbow method
  • 15. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=1
  • 16. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=2
  • 17. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=3
  • 18. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=4
  • 19. K-Means Elbow Method - How to identify the number of clusters ? Figure : Solution for k=5
  • 20. K-Means Elbow Method - How to identify the number of clusters ? Figure : Cluster costs
  • 21. K-Means Elbow Method - How to identify the number of clusters ? Elbow method Repeat for clusters K = 1,2,3,...n Run K-Means Compute average cost for K clusters n i=1 ci n (simplifying n i=1 ci ) end (repeat) Plot cost for each K and choose the one located at the ”elbow”
  • 22. K-Means Elbow Method - How to identify the number of clusters ? Figure : K-means elbow method
  • 23. K-Means Elbow Method - How to identify the number of clusters ? Figure : K-means elbow method Not always possible to find elbow (well distributes examples) Best practice associate cluster number with business meaning
  • 24. K-Means Examples Figure : Customer segmentation with k-means
  • 25. K-Means Examples Figure : Identify related news and articles
  • 26. K-Means Examples Figure : Image color reduction - http://opencv-python-tutroals.readthedocs.org/en/latest/ _images/oc_color_quantization.jpg
  • 27. References and Resources 1 Coursera Machine Learning https://www.coursera.org/learn/machine-learning 2 Naive Bayes Classifier - Wikipedia https://en.wikipedia.org/wiki/Naive_Bayes_classifier 3 K-Means Clustering - Wikipedia https://en.wikipedia.org/wiki/K-means_clustering 4 Visualizing K-Means Clustering http://www.naftaliharris.com/blog/visualizing-k-means-clustering/ 5 Naive Bayes for Image Processing http://www.cs.ubc.ca/~lowe/papers/12mccannCVPR.pdf 6 Document Clustering with K-Means http://www.codeproject.com/Articles/439890/ Text-Documents-Clustering-using-K-Means-Algorithm