SlideShare a Scribd company logo
1 of 69
Download to read offline
Prof. Pier Luca Lanzi
Other Classification Methods
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
Study Material
• Mining of Massive Datasets Section 12.3, 12.4
• Data Mining and Analysis: Fundamental Concepts and Algorithms
- Chapter 18 & 19
• Chapter 4 of Data Mining, Fourth Edition: Practical Machine
Learning Tools and Techniques - Ian H. Witten et al.
2
Prof. Pier Luca Lanzi
Naïve Bayes
3
Prof. Pier Luca Lanzi
Bayes Probability
• Conditional Probability:
• Bayes theorem,
• A priori probability of C, P(C), is the probability of event before
evidence is seen
• A posteriori probability of C, P(C|A), is the probability of event
after evidence is seen
4
Prof. Pier Luca Lanzi
Naïve Bayes Classifiers
• What’s the probability of the class given an example?
• An example is represented as a tuple of attributes
<e1, …, en>
• Given the event H (identifying the class value for the instance) we
are looking for the class with the highest probability for E
5
Prof. Pier Luca Lanzi
Naïve Bayes Classifiers
• Given the class H and the example E described
by n attributes, Bayes Theorem says that
• Naïve Bayes classifiers assume that attributes are statistically
independent
• Thus, evidence splits into parts that are independent
6
Prof. Pier Luca Lanzi
Naïve Bayes for Classification
• Training
§Computes the class probability P(H) and the conditional
probability P(ei|H) for each attribute value ei and each class
value H
• Testing
§Given an example E, computes the class as
7
Prof. Pier Luca Lanzi
Naïve Bayes Classifier
• It is the “opposite” of 1R as it uses all the attributes
• Two assumptions
§Attributes are equally important
§Attribute are statistically independent
• Statistically independent means that knowing the value of one
attribute says nothing about the value of another (if the class is
known)
• Independence assumption is almost never correct! But the
scheme works well in practice
8
Prof. Pier Luca Lanzi
The Weather Dataset
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
9
Prof. Pier Luca Lanzi
Computing Probabilities 10
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
What is the assigned class?
Prof. Pier Luca Lanzi
Using the Probabilities for
Classification
• Conversion into a probability by normalization:
11
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
Prof. Pier Luca Lanzi
The “Zero-Frequency Problem”
• What if an attribute value does not occur with every class value? (for
instance, “Humidity = high” for class “yes”)
• The corresponding probability will be zero,
• A posteriori probability will also be zero!
(No matter how likely the other values are!)
• The typical remedy is to add 1 to the count for every attribute value-
class combination (Laplace estimator)
• The probabilities will never be zero! (also: stabilizes probability
estimates)
12
Prof. Pier Luca Lanzi
Modified Probability Estimates
• Sometimes, adding a constant different from one might be more
appropriate; for example, we can add the same μ/3 to each
count
• Or different weights to each count
• Weights don’t need to be equal (but they must sum to 1)
13
Sunny Overcast Rainy
Prof. Pier Luca Lanzi
Missing Values
• Training
§Instance is not included in frequency count for attribute value-
class combination
• Testing
§The attribute will be omitted from calculation
14
Outlook Temperature Humidity Windy Play
? Cool High True ?
Likelihood of “yes” = 3/9 x 3/9 x 3/9 x 9/14 = 0.0238
Likelihood of “no” = 1/5 x 4/5 x 3/5 x 5/14 = 0.0343
P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41%
P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%
Prof. Pier Luca Lanzi
Example of Naïve Bayes using KNIME
Prof. Pier Luca Lanzi
Example of Naïve Bayes using KNIME
Prof. Pier Luca Lanzi
Another Version of the Weather Data
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool, freeze}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
...
17
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Example of Naïve Bayes with Laplace Estimator
Prof. Pier Luca Lanzi
Numeric Attributes
• We assume that the attributes have a normal or Gaussian probability
distribution (given the class)
• The probability density function for the normal distribution is defined by two
parameters, the mean and the standard deviation
• Sample mean,
• Standard deviation,
• Then the density function f(x) is
20
Prof. Pier Luca Lanzi
Statistics for Weather data
with numeric temperature
21
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 64, 68, 65, 71, 65, 70, 70, 85, False 6 2 9 5
Overcast 4 0 69, 70, 72, 80, 70, 75, 90, 91, True 3 3
Rainy 3 2 72, … 85, … 80, … 95, …
Sunny 2/9 3/5 μ =73 μ =75 μ=79 μ=86 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 σ =6.2 σ =7.9 σ=10.2 σ=9.7 True 3/9 3/5
Rainy 3/9 2/5
Prof. Pier Luca Lanzi
Classifying a new day
• A new day,
• Missing values during training are not included in calculation of
mean and standard deviation
22
Outlook Temperature Humidity Windy Play
Sunny 66 90 true ?
Likelihood of “yes” = 2/9 x 0.0340 x 0.0221 x 3/9 x 9/14 = 0.000036
Likelihood of “no” = 3/5 x 0.0291 x 0.0380 x 3/5 x 5/14 = 0.000136
P(“yes”) = 0.000036 / (0.000036 + 0. 000136) = 20.9%
P(“no”) = 0.000136 / (0.000036 + 0. 000136) = 79.1%
Prof. Pier Luca Lanzi
Example of Naïve Bayes with nominal and continuous attributes using KNIME
23
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Discussion
• Naïve Bayes works surprisingly well, even if independence assumption is clearly
violated
• Why? Because classification doesn’t require accurate probability estimates as
long as maximum probability is assigned to correct class
• However, adding too many redundant attributes will cause problems (e.g.
identical attributes)
• Also, many numeric attributes are not normally distributed
26
Prof. Pier Luca Lanzi
Nearest Neighbors
(Instance-Based Learning)
27
Prof. Pier Luca Lanzi
is this an apple?
Prof. Pier Luca Lanzi
Nearest Neighbors
• To decide the label for an unseen example, consider the k
examples from the training data that are more similar to the
unknown one
• Classify the unknown example using the most frequent class
29
Prof. Pier Luca Lanzi
Nearest Neighbor in Action
• If k=5, these 5 fruits are the most similar ones
to the unclassified example
• Since, the majority of the fruits are labeled as positive examples,
we decide that the unknown fruit is an apple
30
Prof. Pier Luca Lanzi
Instance-Based Learning in General
• Store the training records only, no model is computed
• Use the training records to predict an unknown class label
31
Att1 … Attn Class
Att1 … Attn
Prof. Pier Luca Lanzi
Instance-Based Methods
• They are the simplest form of learning
• The training dataset is searched for the instances that are more similar
to the unlabeled instance
• The training dataset is the model itself, it is the knowledge
• The similarity function defines what’s “learned”
• They implement a “lazy evaluation” (or lazy learning) scheme, nothing
happens until a new unlabeled instance must be classified
• Known methods include “Rote Learning”, “Case Base Reasoning”, “k-
Nearest Neighbors”
32
Prof. Pier Luca Lanzi
K-Nearest-Neighbors Classifier
• Three elements
§ The training dataset
§ Similarity function
(or distance metric)
§ The value of k, the number of
nearest neighbors to retrieve
• Classification
§ Compute distance to other
training records
§ Identify the k nearest neighbors
§ Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
33
Unknown record
Prof. Pier Luca Lanzi
How Many Neighbors? 34
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points that have
the k smallest distance to x
Prof. Pier Luca Lanzi
How many neighbors?
If k is too small, classification
might be sensitive to noise points
If k is too large, neighborhood may
include quite dissimilar examples
Prof. Pier Luca Lanzi
kNN with k=1
Prof. Pier Luca Lanzi
kNN with k=3
Prof. Pier Luca Lanzi
kNN with k=1 & Crossvalidation
Prof. Pier Luca Lanzi
kNN with k=3 & Crossvalidation
Prof. Pier Luca Lanzi
What similarity measures?
Prof. Pier Luca Lanzi
What Similarity Measure?
• Euclidian distance is the typical function used to compute the
similarity between two examples
• To determine the class from nearest neighbor list
§Take the majority vote of class labels among the k neighbors
§Or weight the vote according to distance (e.g., w = 1/d2)
• Another popular metric is city-block (Manhattan) metric,
distance is the sum of absolute differences
41
Prof. Pier Luca Lanzi
Normalization and Other Issues
• Different attributes are measured on different scales
• Attributes might need to be normalized:
vi is the actual value of attribute i
ai is the normalized value
• For nominal attributes, the distance either 0 or 1
• Missing values are usually assumed to be maximally distant
(given normalized attributes)
42
ii
ii
i
vv
vv
a
minmax
min
−
−
=
)(
)(
i
ii
i
vStDev
vAvgv
a
-
=
Prof. Pier Luca Lanzi
Finding Nearest Neighbors Efficiently
• Basic Approach
§Linear scan of the data
§Classification time is proportional to the product of the
number of instances in training and test sets (and the
• Nearest-neighbor search can be speeded up by using
§KD-Trees
§Ball-Trees
§…
43
Prof. Pier Luca Lanzi
KD-Trees
Split the space hierarchically using
a tree generated from the data
To find the neighbor of a specific example,
navigate the tree using the example
Prof. Pier Luca Lanzi
Example of KD-tree. The tree works like a decision tree and splits the space in a hierarchy. When
the class of one example is needed, the tree is navigated until a leaf is reached and only the
nearby areas are actually checked by backtracking on the tree.
Prof. Pier Luca Lanzi
More on kD-trees
• Complexity depends on depth of the tree, given by the logarithm
of number of nodes for a balanced tree
• Amount of backtracking required depends on quality of tree
(“square” vs. “skinny” nodes)
• How to build a good tree?
§Need to find good split point and split direction
§Possible split direction: direction with greatest variance
§Possible split point: median value along that direction
• Using value closest to mean (rather than median) can be better if
data is skewed
• Can apply this split selection strategy recursively just like in the
case of decision tree learning
46
Prof. Pier Luca Lanzi
Ball Trees
• Corners in high-dimensional space may mean query ball intersects
with many regions and this is a potential limitation for KD-Tree
• Note that there is no need to make sure that regions do not
overlap, so they do not heed to be hyperrectangles
• Can use hyperspheres (balls) instead of hyperrectangles
• A ball tree organizes the data into a tree of k-dimensional
hyperspheres
• Balls may allow for a better fit to the data and thus more efficient
search
47
Prof. Pier Luca Lanzi
Example of Ball Tree.
Prof. Pier Luca Lanzi
k-Nearest Neighbor Regression
• The regression methods we previously discussed are parametric
§ They assume an approximation function parametrized by a weight vector
§ The regression algorithm search for the best weight vector
§ E.g., linear regression assumes that the target is computed as a linear
combination of the weight vector w and the feature vector h(x)
• Nearest Neighbor regression fits each point locally
• Basic Algorithm
§ Given a dataset (x1,y1) … (xN,yN)
§ Given a query point xq
§ The value yq associated to xq is computed as a local interpolation of the
targets associated to neighbor points
• Prediction can use the plain average of the k nearest targets or a weighted
average of the same targets
• Prediction can use kernel functions that take the distance as input and return a
weight (e.g., a Gaussian function of the distance)
49
Prof. Pier Luca Lanzi
Discussion
• K-Nearest Neighbor is often very accurate but slow, since simple
version scans entire training data to derive a prediction
• Assumes all attributes are equally important, thus may need
attribute selection or weights
• Possible remedies against noisy instances:
§Take a majority vote over the k nearest neighbors
§Remove noisy instances from dataset (difficult!)
• Statisticians have used k-NN since early 1950s,
If n → ∞ and k/n → 0, error approaches minimum
• Data Structures
§kD-trees can become inefficient when the number of
attributes is too large
§Ball trees are may help
50
Prof. Pier Luca Lanzi
SupportVector MachinesBayesian Belief Networks
Prof. Pier Luca Lanzi
Bayesian Belief Networks
• The conditional independence assumption,
§makes computation possible
§yields optimal classifiers when satisfied
§but is seldom satisfied in practice, as attributes (variables) are
often correlated
• Bayesian Belief Networks (BBN) allows us to specify which pair of
attributes are conditionally independent
• They provide a graphical representation of probabilistic
relationships among a set of random variables
52
Prof. Pier Luca Lanzi
Bayesian Belief Networks
• Describe the probability distribution governing a set of variables
by specifying
§Conditional independence assumptions that apply on subsets
of the variables
§A set of conditional probabilities
• Two key elements
§A direct acyclic graph, encoding the dependence relationships
among variables
§A probability table associating each node to its immediate
parents node
53
Prof. Pier Luca Lanzi
• Variable A and B are independent, but each one of them has an
influence on the variable A
• A is conditionally independent of both B and D, given C
• The configuration of the typical naïve Bayes classifier
Examples of Bayesian Belief
Networks
54
Prof. Pier Luca Lanzi
An Example of Bayesian Belief
Network
55
Prof. Pier Luca Lanzi
Probability Tables
• The network topology imposes conditions regarding the variable
conditional independence
• Each node is associated with a probability table
§If a node X does not have any parents, then
the table contains only the prior probability P(X)
§If a node X has only one any parent Y, then
the table contains only the conditional probability P(X|Y)
§If a node X has multiple parents, Y1, …, Yk the
the table contains the conditional probability P(X|Y1…Yk)
56
Prof. Pier Luca Lanzi
Bayesian Belief Networks
• In general the inference is NP-complete but there are
approximating methods, e.g. Monte-Carlo
57
evidence, observed
unobserved
to be predicted
Prof. Pier Luca Lanzi
http://www.cs.waikato.ac.nz/~remco/weka.bn.pdf
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Bayesian Belief Networks
• Bayesian belief network allows a subset of the variables
conditionally independent
• A graphical model of causal relationships
• Several cases of learning Bayesian belief networks
§Given both network structure and
all the variables is easy
§Given network structure but only some variables
§When the network structure is not known in advance
60
Prof. Pier Luca Lanzi
Bayesian Belief Networks
• BBN provides an approach for capturing prior knowledge of a
particular domain using a graphical model
• Building the network is time consuming, but
adding variables to a network is straightforward
• They can encode causal dependencies among variables
• They are well suited to dealing with incomplete data
• They are robust to overfitting
61
Prof. Pier Luca Lanzi
SupportVector MachinesSupport Vector Machines
Prof. Pier Luca Lanzi
Many decision boundaries can separate these two classes
Which one should we choose?
Class 1
Class 2
Prof. Pier Luca Lanzi
Class 1
Class 2
Class 1
Class 2
Class 1
Class 2
Class 1
Class 2
Prof. Pier Luca Lanzi
γ
Prof. Pier Luca Lanzi
γ A
C
B
SVMs work by searching for the hyperplane that
maximizes the margin or the largest γ such that
Prof. Pier Luca Lanzi
The search of the separating hyperplane can be reduced
to the search for the d+1 points (in the +/- plane) that
uniquely identify the plane we are looking for
Prof. Pier Luca Lanzi
Regression
f(x)
x
e
87
Prof. Pier Luca Lanzi
Regression
x
ε
φ(x)
φ(f(x))
88

More Related Content

What's hot

DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 RegressionPier Luca Lanzi
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesPier Luca Lanzi
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationPier Luca Lanzi
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesPier Luca Lanzi
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationPier Luca Lanzi
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningPier Luca Lanzi
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1Pier Luca Lanzi
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
 

What's hot (20)

DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 Regression
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification Rules
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision Trees
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 

Similar to Naive Bayes and Nearest Neighbor Classification Methods Explained

learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.docbutest
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.docbutest
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningLibya Thomas
 
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisTroy Magennis
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelJunya Tanaka
 
8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt8. Hypothesis Testing.ppt
8. Hypothesis Testing.pptABDULRAUF411
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsAndrea Arcuri
 
Fuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionFuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionNagasuri Bala Venkateswarlu
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning FoundationsAlbert Y. C. Chen
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringPier Luca Lanzi
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersPier Luca Lanzi
 
probability.pptx
probability.pptxprobability.pptx
probability.pptxbisan3
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 

Similar to Naive Bayes and Nearest Neighbor Classification Methods Explained (20)

learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine Learning
 
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 
NaiveBayes.ppt
NaiveBayes.pptNaiveBayes.ppt
NaiveBayes.ppt
 
NaiveBayes.ppt
NaiveBayes.pptNaiveBayes.ppt
NaiveBayes.ppt
 
NaiveBayes.ppt
NaiveBayes.pptNaiveBayes.ppt
NaiveBayes.ppt
 
8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
Fuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introductionFuzzy mathematics:An application oriented introduction
Fuzzy mathematics:An application oriented introduction
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning Foundations
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
1019Lec1.ppt
1019Lec1.ppt1019Lec1.ppt
1019Lec1.ppt
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical Clustering
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
 
Probability
ProbabilityProbability
Probability
 
probability.pptx
probability.pptxprobability.pptx
probability.pptx
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionPier Luca Lanzi
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelinePier Luca Lanzi
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityPier Luca Lanzi
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationPier Luca Lanzi
 
VDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designVDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designPier Luca Lanzi
 

More from Pier Luca Lanzi (14)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 Introduction
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipeline
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with Unity
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generation
 
VDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designVDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game design
 

Recently uploaded

Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 

Recently uploaded (20)

Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 

Naive Bayes and Nearest Neighbor Classification Methods Explained

  • 1. Prof. Pier Luca Lanzi Other Classification Methods Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi Study Material • Mining of Massive Datasets Section 12.3, 12.4 • Data Mining and Analysis: Fundamental Concepts and Algorithms - Chapter 18 & 19 • Chapter 4 of Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques - Ian H. Witten et al. 2
  • 3. Prof. Pier Luca Lanzi Naïve Bayes 3
  • 4. Prof. Pier Luca Lanzi Bayes Probability • Conditional Probability: • Bayes theorem, • A priori probability of C, P(C), is the probability of event before evidence is seen • A posteriori probability of C, P(C|A), is the probability of event after evidence is seen 4
  • 5. Prof. Pier Luca Lanzi Naïve Bayes Classifiers • What’s the probability of the class given an example? • An example is represented as a tuple of attributes <e1, …, en> • Given the event H (identifying the class value for the instance) we are looking for the class with the highest probability for E 5
  • 6. Prof. Pier Luca Lanzi Naïve Bayes Classifiers • Given the class H and the example E described by n attributes, Bayes Theorem says that • Naïve Bayes classifiers assume that attributes are statistically independent • Thus, evidence splits into parts that are independent 6
  • 7. Prof. Pier Luca Lanzi Naïve Bayes for Classification • Training §Computes the class probability P(H) and the conditional probability P(ei|H) for each attribute value ei and each class value H • Testing §Given an example E, computes the class as 7
  • 8. Prof. Pier Luca Lanzi Naïve Bayes Classifier • It is the “opposite” of 1R as it uses all the attributes • Two assumptions §Attributes are equally important §Attribute are statistically independent • Statistically independent means that knowing the value of one attribute says nothing about the value of another (if the class is known) • Independence assumption is almost never correct! But the scheme works well in practice 8
  • 9. Prof. Pier Luca Lanzi The Weather Dataset Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No 9
  • 10. Prof. Pier Luca Lanzi Computing Probabilities 10 Outlook Temperature Humidity Windy Play Yes No Yes No Yes No Yes No Yes No Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5 Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3 Rainy 3 2 Cool 3 1 Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14 Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 Rainy 3/9 2/5 Cool 3/9 1/5 Outlook Temp. Humidity Windy Play Sunny Cool High True ? What is the assigned class?
  • 11. Prof. Pier Luca Lanzi Using the Probabilities for Classification • Conversion into a probability by normalization: 11 Outlook Temp. Humidity Windy Play Sunny Cool High True ?
  • 12. Prof. Pier Luca Lanzi The “Zero-Frequency Problem” • What if an attribute value does not occur with every class value? (for instance, “Humidity = high” for class “yes”) • The corresponding probability will be zero, • A posteriori probability will also be zero! (No matter how likely the other values are!) • The typical remedy is to add 1 to the count for every attribute value- class combination (Laplace estimator) • The probabilities will never be zero! (also: stabilizes probability estimates) 12
  • 13. Prof. Pier Luca Lanzi Modified Probability Estimates • Sometimes, adding a constant different from one might be more appropriate; for example, we can add the same μ/3 to each count • Or different weights to each count • Weights don’t need to be equal (but they must sum to 1) 13 Sunny Overcast Rainy
  • 14. Prof. Pier Luca Lanzi Missing Values • Training §Instance is not included in frequency count for attribute value- class combination • Testing §The attribute will be omitted from calculation 14 Outlook Temperature Humidity Windy Play ? Cool High True ? Likelihood of “yes” = 3/9 x 3/9 x 3/9 x 9/14 = 0.0238 Likelihood of “no” = 1/5 x 4/5 x 3/5 x 5/14 = 0.0343 P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41% P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%
  • 15. Prof. Pier Luca Lanzi Example of Naïve Bayes using KNIME
  • 16. Prof. Pier Luca Lanzi Example of Naïve Bayes using KNIME
  • 17. Prof. Pier Luca Lanzi Another Version of the Weather Data @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} @attribute temperature {hot, mild, cool, freeze} @attribute humidity {high, normal} @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,hot,high,FALSE,no sunny,hot,high,TRUE,no ... 17
  • 19. Prof. Pier Luca Lanzi Example of Naïve Bayes with Laplace Estimator
  • 20. Prof. Pier Luca Lanzi Numeric Attributes • We assume that the attributes have a normal or Gaussian probability distribution (given the class) • The probability density function for the normal distribution is defined by two parameters, the mean and the standard deviation • Sample mean, • Standard deviation, • Then the density function f(x) is 20
  • 21. Prof. Pier Luca Lanzi Statistics for Weather data with numeric temperature 21 Outlook Temperature Humidity Windy Play Yes No Yes No Yes No Yes No Yes No Sunny 2 3 64, 68, 65, 71, 65, 70, 70, 85, False 6 2 9 5 Overcast 4 0 69, 70, 72, 80, 70, 75, 90, 91, True 3 3 Rainy 3 2 72, … 85, … 80, … 95, … Sunny 2/9 3/5 μ =73 μ =75 μ=79 μ=86 False 6/9 2/5 9/14 5/14 Overcast 4/9 0/5 σ =6.2 σ =7.9 σ=10.2 σ=9.7 True 3/9 3/5 Rainy 3/9 2/5
  • 22. Prof. Pier Luca Lanzi Classifying a new day • A new day, • Missing values during training are not included in calculation of mean and standard deviation 22 Outlook Temperature Humidity Windy Play Sunny 66 90 true ? Likelihood of “yes” = 2/9 x 0.0340 x 0.0221 x 3/9 x 9/14 = 0.000036 Likelihood of “no” = 3/5 x 0.0291 x 0.0380 x 3/5 x 5/14 = 0.000136 P(“yes”) = 0.000036 / (0.000036 + 0. 000136) = 20.9% P(“no”) = 0.000136 / (0.000036 + 0. 000136) = 79.1%
  • 23. Prof. Pier Luca Lanzi Example of Naïve Bayes with nominal and continuous attributes using KNIME 23
  • 26. Prof. Pier Luca Lanzi Discussion • Naïve Bayes works surprisingly well, even if independence assumption is clearly violated • Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class • However, adding too many redundant attributes will cause problems (e.g. identical attributes) • Also, many numeric attributes are not normally distributed 26
  • 27. Prof. Pier Luca Lanzi Nearest Neighbors (Instance-Based Learning) 27
  • 28. Prof. Pier Luca Lanzi is this an apple?
  • 29. Prof. Pier Luca Lanzi Nearest Neighbors • To decide the label for an unseen example, consider the k examples from the training data that are more similar to the unknown one • Classify the unknown example using the most frequent class 29
  • 30. Prof. Pier Luca Lanzi Nearest Neighbor in Action • If k=5, these 5 fruits are the most similar ones to the unclassified example • Since, the majority of the fruits are labeled as positive examples, we decide that the unknown fruit is an apple 30
  • 31. Prof. Pier Luca Lanzi Instance-Based Learning in General • Store the training records only, no model is computed • Use the training records to predict an unknown class label 31 Att1 … Attn Class Att1 … Attn
  • 32. Prof. Pier Luca Lanzi Instance-Based Methods • They are the simplest form of learning • The training dataset is searched for the instances that are more similar to the unlabeled instance • The training dataset is the model itself, it is the knowledge • The similarity function defines what’s “learned” • They implement a “lazy evaluation” (or lazy learning) scheme, nothing happens until a new unlabeled instance must be classified • Known methods include “Rote Learning”, “Case Base Reasoning”, “k- Nearest Neighbors” 32
  • 33. Prof. Pier Luca Lanzi K-Nearest-Neighbors Classifier • Three elements § The training dataset § Similarity function (or distance metric) § The value of k, the number of nearest neighbors to retrieve • Classification § Compute distance to other training records § Identify the k nearest neighbors § Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) 33 Unknown record
  • 34. Prof. Pier Luca Lanzi How Many Neighbors? 34 X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x
  • 35. Prof. Pier Luca Lanzi How many neighbors? If k is too small, classification might be sensitive to noise points If k is too large, neighborhood may include quite dissimilar examples
  • 36. Prof. Pier Luca Lanzi kNN with k=1
  • 37. Prof. Pier Luca Lanzi kNN with k=3
  • 38. Prof. Pier Luca Lanzi kNN with k=1 & Crossvalidation
  • 39. Prof. Pier Luca Lanzi kNN with k=3 & Crossvalidation
  • 40. Prof. Pier Luca Lanzi What similarity measures?
  • 41. Prof. Pier Luca Lanzi What Similarity Measure? • Euclidian distance is the typical function used to compute the similarity between two examples • To determine the class from nearest neighbor list §Take the majority vote of class labels among the k neighbors §Or weight the vote according to distance (e.g., w = 1/d2) • Another popular metric is city-block (Manhattan) metric, distance is the sum of absolute differences 41
  • 42. Prof. Pier Luca Lanzi Normalization and Other Issues • Different attributes are measured on different scales • Attributes might need to be normalized: vi is the actual value of attribute i ai is the normalized value • For nominal attributes, the distance either 0 or 1 • Missing values are usually assumed to be maximally distant (given normalized attributes) 42 ii ii i vv vv a minmax min − − = )( )( i ii i vStDev vAvgv a - =
  • 43. Prof. Pier Luca Lanzi Finding Nearest Neighbors Efficiently • Basic Approach §Linear scan of the data §Classification time is proportional to the product of the number of instances in training and test sets (and the • Nearest-neighbor search can be speeded up by using §KD-Trees §Ball-Trees §… 43
  • 44. Prof. Pier Luca Lanzi KD-Trees Split the space hierarchically using a tree generated from the data To find the neighbor of a specific example, navigate the tree using the example
  • 45. Prof. Pier Luca Lanzi Example of KD-tree. The tree works like a decision tree and splits the space in a hierarchy. When the class of one example is needed, the tree is navigated until a leaf is reached and only the nearby areas are actually checked by backtracking on the tree.
  • 46. Prof. Pier Luca Lanzi More on kD-trees • Complexity depends on depth of the tree, given by the logarithm of number of nodes for a balanced tree • Amount of backtracking required depends on quality of tree (“square” vs. “skinny” nodes) • How to build a good tree? §Need to find good split point and split direction §Possible split direction: direction with greatest variance §Possible split point: median value along that direction • Using value closest to mean (rather than median) can be better if data is skewed • Can apply this split selection strategy recursively just like in the case of decision tree learning 46
  • 47. Prof. Pier Luca Lanzi Ball Trees • Corners in high-dimensional space may mean query ball intersects with many regions and this is a potential limitation for KD-Tree • Note that there is no need to make sure that regions do not overlap, so they do not heed to be hyperrectangles • Can use hyperspheres (balls) instead of hyperrectangles • A ball tree organizes the data into a tree of k-dimensional hyperspheres • Balls may allow for a better fit to the data and thus more efficient search 47
  • 48. Prof. Pier Luca Lanzi Example of Ball Tree.
  • 49. Prof. Pier Luca Lanzi k-Nearest Neighbor Regression • The regression methods we previously discussed are parametric § They assume an approximation function parametrized by a weight vector § The regression algorithm search for the best weight vector § E.g., linear regression assumes that the target is computed as a linear combination of the weight vector w and the feature vector h(x) • Nearest Neighbor regression fits each point locally • Basic Algorithm § Given a dataset (x1,y1) … (xN,yN) § Given a query point xq § The value yq associated to xq is computed as a local interpolation of the targets associated to neighbor points • Prediction can use the plain average of the k nearest targets or a weighted average of the same targets • Prediction can use kernel functions that take the distance as input and return a weight (e.g., a Gaussian function of the distance) 49
  • 50. Prof. Pier Luca Lanzi Discussion • K-Nearest Neighbor is often very accurate but slow, since simple version scans entire training data to derive a prediction • Assumes all attributes are equally important, thus may need attribute selection or weights • Possible remedies against noisy instances: §Take a majority vote over the k nearest neighbors §Remove noisy instances from dataset (difficult!) • Statisticians have used k-NN since early 1950s, If n → ∞ and k/n → 0, error approaches minimum • Data Structures §kD-trees can become inefficient when the number of attributes is too large §Ball trees are may help 50
  • 51. Prof. Pier Luca Lanzi SupportVector MachinesBayesian Belief Networks
  • 52. Prof. Pier Luca Lanzi Bayesian Belief Networks • The conditional independence assumption, §makes computation possible §yields optimal classifiers when satisfied §but is seldom satisfied in practice, as attributes (variables) are often correlated • Bayesian Belief Networks (BBN) allows us to specify which pair of attributes are conditionally independent • They provide a graphical representation of probabilistic relationships among a set of random variables 52
  • 53. Prof. Pier Luca Lanzi Bayesian Belief Networks • Describe the probability distribution governing a set of variables by specifying §Conditional independence assumptions that apply on subsets of the variables §A set of conditional probabilities • Two key elements §A direct acyclic graph, encoding the dependence relationships among variables §A probability table associating each node to its immediate parents node 53
  • 54. Prof. Pier Luca Lanzi • Variable A and B are independent, but each one of them has an influence on the variable A • A is conditionally independent of both B and D, given C • The configuration of the typical naïve Bayes classifier Examples of Bayesian Belief Networks 54
  • 55. Prof. Pier Luca Lanzi An Example of Bayesian Belief Network 55
  • 56. Prof. Pier Luca Lanzi Probability Tables • The network topology imposes conditions regarding the variable conditional independence • Each node is associated with a probability table §If a node X does not have any parents, then the table contains only the prior probability P(X) §If a node X has only one any parent Y, then the table contains only the conditional probability P(X|Y) §If a node X has multiple parents, Y1, …, Yk the the table contains the conditional probability P(X|Y1…Yk) 56
  • 57. Prof. Pier Luca Lanzi Bayesian Belief Networks • In general the inference is NP-complete but there are approximating methods, e.g. Monte-Carlo 57 evidence, observed unobserved to be predicted
  • 58. Prof. Pier Luca Lanzi http://www.cs.waikato.ac.nz/~remco/weka.bn.pdf
  • 60. Prof. Pier Luca Lanzi Bayesian Belief Networks • Bayesian belief network allows a subset of the variables conditionally independent • A graphical model of causal relationships • Several cases of learning Bayesian belief networks §Given both network structure and all the variables is easy §Given network structure but only some variables §When the network structure is not known in advance 60
  • 61. Prof. Pier Luca Lanzi Bayesian Belief Networks • BBN provides an approach for capturing prior knowledge of a particular domain using a graphical model • Building the network is time consuming, but adding variables to a network is straightforward • They can encode causal dependencies among variables • They are well suited to dealing with incomplete data • They are robust to overfitting 61
  • 62. Prof. Pier Luca Lanzi SupportVector MachinesSupport Vector Machines
  • 63. Prof. Pier Luca Lanzi Many decision boundaries can separate these two classes Which one should we choose? Class 1 Class 2
  • 64. Prof. Pier Luca Lanzi Class 1 Class 2 Class 1 Class 2 Class 1 Class 2 Class 1 Class 2
  • 65. Prof. Pier Luca Lanzi γ
  • 66. Prof. Pier Luca Lanzi γ A C B SVMs work by searching for the hyperplane that maximizes the margin or the largest γ such that
  • 67. Prof. Pier Luca Lanzi The search of the separating hyperplane can be reduced to the search for the d+1 points (in the +/- plane) that uniquely identify the plane we are looking for
  • 68. Prof. Pier Luca Lanzi Regression f(x) x e 87
  • 69. Prof. Pier Luca Lanzi Regression x ε φ(x) φ(f(x)) 88