SlideShare a Scribd company logo
1 of 63
Statistical Classification:
A review on some techniques

           Bamparopoulos Giorgos
       Master in Web Science, Department of
  Mathematics, Aristotle University of Thessaloniki
What is Pattern Recognition?
   The study of how machines can observe the

       learn to distinguish patterns of interest from their
        background, and
       make sound and reasonable decisions about the
        categories of the patterns.

   A pattern is an object, process or event that can
    be given a name.
   A pattern class (or category) is a set of patterns
    sharing common attributes and usually originating
    from the same source.
In machine learning, pattern recognition is the assignment of
a label to a given input value. An example of pattern
recognition is classification. However, pattern recognition is a
more general problem that encompasses other types of output
as well.

Other examples are regression, which assigns a real-valued
output to each input; sequence labeling, which assigns a class
to each member of a sequence of values (for example, part of
speech tagging, which assigns a part of speech to each word in
an input sentence); and parsing, which assigns a parse tree to
an input sentence, describing the syntactic structure of the
                                             - Wikipedia
Pattern recognition system
                        Classification Mode
   test                         Feature
             Preprocessing                    Classification
  pattern                     Measurement

  training                       Feature
  pattern    Preprocessing     Extraction/      Learning
                             Training Mode
Supervised vs Unsupervised Learning

   Supervised learning (classification)
       The training data are accompanied by labels
        indicating the class of the observations
       New data is classified based on the training set

   Unsupervised learning (clustering)
       The class labels of training data is unknown
       Given a set of measurements, observations,
        with the aim of establishing the existence of
        classes or clusters in the data
A classification problem occurs when an object needs to be
assigned into a predefined group or class based on a
number of observed attributes related to that object.

The individual observations are analyzed into a set of
quantifiable     properties,     known      as     explanatory
variables, features, etc. These properties may variously
be categorical , ordinal. integer-valued or real-valued.

During classification given objects are assigned to
prescribed classes. A classifier is a mathematical function,
implemented by a classification algorithm, that maps input
data to a categorywhich performs classification
Application Domains (1/3)
   Computer vision
       Medical imaging and medical image analysis
       Optical character recognition
       Video tracking
   Drug discovery and development
       Toxicogenomics
       Quantitative structure-activity relationship
   Geostatistics
   Speech recognition
Application Domains (2/3)
   Handwriting recognition
   Biometric identification
   Biological classification
   Statistical natural language processing
   Document classification
   Internet search engines
   Credit scoring
Application Domains (3/3)
          Digit recognition
                                               Automated protein

    Phoneme recognition

[Waibel, Hanzawa, Hinton,Shikano, Lang 1989]
Example of Classification 1/2
                                    Input:                 Output:

          Spam                                                  Binary

            Character                                           Multi-Class

[thanks to Ben Taskar for slide!]
Example of Classification 2/2
                                    Input              Output


           3D object

[thanks to Ben Taskar for slide!]
              Neural networks

              Quadratic classifiers

              Naive Bayes classifier

              Kernel estimation and K-nearest
              neighbor algorithms

              Decision trees, decision lists

              Support vector machines

               Maximum entropy classifier
Linear Classifier

 7                                                                                                  R.A. Fisher
 5                                                             If previously unseen instance above the line
 4                                                                  class is Katydid
 3                                                             else
                                                                    class is Grasshopper
          1 2 3 4 5 6 7 8 9 10                                           Grasshoppers
Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
Higher Dimensional Spaces

 … we can visualize it as
 being an n-dimensional

Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
If we did not have the 3rd dimension…

Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
We can no longer get
              perfect accuracy with the
              simple linear classifier.

              Maybe solve this problem by
              using a simple quadratic
              classifier or a simple cubic

Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
Naive Bayesian Classifier
Naive Bayesian Classifier
Naive Bayesian Classifier

Linear Discriminant Analysis
     Assumes that the conditional class densities are
      (multivariate) Gaussian
     Assumes equal covariance for every class

  Then the sample is in the class such that the
  discriminant function is maximized for that sample.

Classification rule:

                     ˆ                ( xi  k )( xi  k )T / ( N  K )
  Covariance matrix: Σ                        ˆ          ˆ
                            k 1   gi
Quadratic Discriminant Analysis
   Class conditional probability densities are allowed to
    have different covariance matrices

   The class decision boundaries are not linear rather

Classification rule:
Problems on Learning

   Dimensionality
     The number of features is too large relative
      to the number of training samples
   Classifiers complexity
     The    number of unknown parameters
      associated with the classifier is large
   Overtrained
     Too intensively optimized on training set.
Dimensionality Reduction

   Feature Extraction
       Create new features based on the
        original feature set
       Transforms are usually involved

   Feature Selection
       Select the best subset from a given
        feature set.
Overfitting and underfitting

 underfitting   good fit       overfitting
Kernel Density Estimation
Kernel density estimation (KDE) is a non-parametric way
to estimate the probability density function of a random variable.

Let (x1, x2, …, xn) be a sample drawn from some distribution with
an unknown density ƒ. Its kernel density estimator is described by
equation below, where K(•) is the kernel function that integrates to
one and h > 0 is a smoothing parameter called bandwidth.

    Some Kernel
Glass Identification dataset
This data is available from the UCI Machine Learning Repository.

Murphy,P.M., Aha, D.W. (1994). UCI Repository of machine learning
databases ( Irvine, CA: University of
California, Department of Information and Computer Science.

From USA Forensic Science Service.

 Creator: B. German -- Central Research Establishment Home
  Office Forensic Science Service Aldermaston, Reading, Berkshire
  RG7 4PN
 Donor: Vina Spiehler, Ph.D., DABFT Diagnostic Products
  Corporation (213) 776-0180 (ext 3014)
 Date: September, 1987
Glass Identification dataset
   Dataset have 214 instances
   6 types of glass; defined in terms of
    their oxide content (i.e. Na, Fe, K, etc)
   The study of classification of types of
    glass was motivated by criminological
    investigation. At the scene of the
    crime, the glass left can be used as                         Attribute
    evidence...if it is correctly identified!                  Information:
                                                               RI: refractive
        id     Type of glass: (class attribute)                     index
                                                        2       Na: Sodium
        1      building windows float processed
                                                        3    Mg: Magnesium
        2    building windows non float processed       4      Al: Aluminum
                                                        5        Si: Silicon
        3      vehicle windows float processed          6      K: Potassium
                                                        7       Ca: Calcium
        4                containers
                                                        8       Ba: Barium
        5                 tableware                     9         Fe: Iron
        6                headlamps                  *unit measurement: weight
                                                    percent in corresponding oxide
Scatter plots of features

The scatter plots
depict the relationship
between features
grouped by the type
of glass.

   X-axis:
       Refractive index
       Sodium
       Magnesium
   Y-axis:
       Iron
       Barium
       Calcium
Scatter plots of features

   X-axis:
       Refractive index
       Sodium
       Magnesium
   Y-axis:
       Potassium
       Silicon
       Aluminum
Scatter plots of features

   X-axis:
       Aluminum
       Silicon
       Potassium
   Y-axis:
       Iron
       Barium
       Calcium
Scatter plot and histograms
Linear Discriminant Analysis
First classify the data using the default linear discriminant
analysis (LDA).

In the scatter plot above, we draw a X through the
misclassified observations.
Confusion Matrix
         A confusion matrix contains information about known
          class labels and predicted class labels.
         The (i,j) element in the confusion matrix is the number
          of samples whose predicted class is i and whose
          known class label is class j.
         The diagonal elements represent correctly classified

46       16    3    0     1    0
14       41    3    2     1    1
                                       The misclassification error (the
10       12   11    0     0    1
                                       proportion of misclassified
0        4     0    10    0    2
                                       observations) on the training set
0        3     0    0     7    1
                                       is 35.05%
0        0     0    1     0    24
Visualization of the regions
The function has separated the plane into regions
divided by lines, and assigned different regions to
different species. One way to visualize these regions
is to create a grid of (x,y) values and apply the
classification function to that grid.
Generalization error
Generalization error, is the expected prediction error on
an independent set. Cross-validation is a statistical
method for estimating the generalization error on
classification algorithms.

In k-fold cross-validation the data is first partitioned into
k equally (or nearly equally) sized segments or folds.
Subsequently k iterations of training and validation are
performed such that within each iteration a different fold
of the data is held-out for validation while the remaining
k-1 folds are used for learning. Here we use 10-fold

     The LDA cross-validation error is 40.19%
Quadratic discriminant analysis
    The covariance matrix of some classes in training was
    not positive definite, so in order to solve that problem,
    we used quadratic discriminant analysis (QDA) without
    taking into account the features 6,8 and 9.

        Confusion Matrix
54      42   1    0     0    0
6       26   0    0     1    0        Misclassification error is
10      7    16   0     0    0
                                      31.78%      and      cross-
0       1    0    9     0    0
                                      validation error is 42.52%
0       0    0    0    12    0
0       0    0    0     0   29
Mean of variables
                  The table above depict the
                   mean of each variable
                   (feature) for each group

                  The plot depict the means
                   in three dimensions.
ClassesFeatures Refractive index Sodium (Na) Magnesium (Mg) Aluminum (Al) Silicon (Si) Potassium (K) Calcium (Ca (Ba) Iron (Fe)
building windows
                    1,518718286   13,24228571   3,552428571   1,163857143   72,61914286 0,447428571 8,797285714 0,012714    0,057
building windows
                    1,518618553   13,11171053   3,002105263   1,408157895   72,59802632 0,521052632 9,073684211 0,050263 0,079737
     non float
vehicle windows
                    1,517963529   13,43705882   3,543529412   1,201176471   72,40470588 0,406470588 8,782941176 0,008824 0,057059

    containers      1,518927692   12,82769231   0,773846154   2,033846154   72,36615385     1,47      10,12384615 0,187692 0,060769

    tableware       1,517455556   14,64666667   1,305555556   1,366666667   73,20666667      0        9,356666667    0        0

   headlamps        1,517116207   14,44206897   0,538275862   2,122758621   72,96586207 0,325172414   8,49137931    1,04   0,013448
Variance of variables
                The table above depict
                the variance of each
                variable (feature) for each
                group separately. The plot
                depict the means in three

                   Refractive                                                       Potassium   Calcium    Barium       Iron
ClassesFeatures     index
                              Sodium (Na) Magnesium (Mg) Aluminum (Al) Silicon (Si)
                                                                                       (K)       (Ca)       (Ba)        (Fe)

building windows
                    5,14E-06   0,249302       0,06103       0,074615     0,324312   0,046173    0,330403   0,007029   0,007934

building windows
                    1,45E-05   0,441108       1,477833      0,101341     0,525005   0,045679    3,692682   0,131291   0,011328
     non float

 vehicle windows
                    3,67E-06   0,256935       0,026499      0,120749     0,262426   0,052849    0,144485   0,001324   0,011635

   containers       1,12E-05   0,603786       0,998292      0,481526     1,644342   4,574017    4,768942   0,369969   0,024208

    tableware       9,71E-06    1,1751        1,203703      0,327025     1,16525       0        2,10235       0          0

   headlamps        6,48E-06   0,471088       1,249215      0,196006     0,884039   0,446883    0,947712   0,442679   0,000888
NaiveBayes Classifier
    We use the Gaussian distribution for features
    1,2,3,4,5,7 and use the kernel density estimation for
    features 6, 8 and 9.

            Confusion Matrix
       12   2    0    0    5   3

       50   56   7    1    0   0
                                      Misclassification error is
       5    12   10   0    0   0      46.26%      and      cross-
       1    0    0    28   1   4      validation error is 57.01%
       7    0    0    0    7   0

       1    0    0    0    0   2

If we assume that the prior probabilities are equal for all
classes, the errors are 60.75% and 64.49% respectively.
NaiveBayes Classifier
   If we use the kernel density estimation with normal
   kernel function we have:

          Confusion Matrix
     60    17   3    0   0    0

     4     44   0    0   1    0

     6     11   14   0   0    1
                                    Misclassification error is
                                    21.96%      and      cross-
     0     1    0    9   0    0
                                    validation error is 39.72%
     0     3    0    0   12   0

     0     0    0    0   0    28

If we assume that the prior probabilities are equal for all
classes, the errors are 29.44% and 45.33% respectively.
NaiveBayes Classifier
    If we use the kernel density estimation with triangle
    kernel function we have:

         Confusion Matrix
    61   16   2    0    0   0
    4    44   0    0    1   0     Misclassification error is
    5    12   15   0    0   1     21.03%      and      cross-
    0     1   0    9    0   0     validation error is 42.06%
    0     3   0    0   12   0
    0     0   0    0    0   28

If we assume that the prior probabilities are equal for all
classes, the errors are 29.44% and 45.33% respectively.
NaiveBayes Classifier
    If we use the kernel density estimation with
    epanechnikov kernel function we have:

          Confusion Matrix
     61   16    3   0    0   0

      4   44    0   1    0   0
                                    Misclassification error is
      5   12   14   0    0   1
                                    21.5%       and      cross-
      0    4    0   12   0   0      validation error is 42.52%
      0    0    0   0    9   0

      0    0    0   0    0   28

If we assume that the prior probabilities are equal for all
classes, the errors are 30.37% and 44.39% respectively.
Neural Networks
          Biological neural networks are made up of real biological
          neurons that are connected or functionally related in a nervous
          system. In the field of neuroscience, they are often identified as
          groups of neurons that perform a specific physiological function in
          laboratory analysis.
nnectionist.html                                  reach-the-resolution-to-image-individual-neurons
Artificial Neural Network
   An Artificial Neural Network (ANN), usually called neural
    network (NN), is a mathematical model or computational
    model that is inspired by the structure and/or functional
    aspects of biological neural networks.
   A neural network consists of an interconnected group
    of artificial neurons, and it processes information using
    a connectionist approach to computation.

                                                      - wikipedia
Multiple Layers of Neurons
A network can have several layers. Each layer has a
weight matrix W, a bias vector b, and an output vector a.

   Neural Network Toolbox™, User’s Guide, Mark Hudson Beale, Martin T. Hagan. Howard B. Demuth
Neural network for Classification
In this classification problem, we use a two-layered
feed-forward network, with 10 neurons in hidden
Transfer functions
  Hyperbolic tangent sigmoid transfer function is used
  in both hidden and output neurons.

o This is mathematically equivalent
  to tanh(n). It differs in that it runs
  faster    than       the     MATLAB
  implementation of tanh, but the
  results can have very small
  numerical differences.
Initializing weights and bias
   We initialized weights and biases in each layer with
    initnw function from Neural Network Toolbox in Matlab.

   This function initializes a layer's weights and biases
    according to the Nguyen-Widrow initialization algorithm.

   This algorithm chooses values in order to distribute the
    active region of each neuron in the layer approximately
    evenly across the layer's input space.

    The values contain a degree of randomness, so they
    are not the same each time this function is called.
Performance Function
Training function
We trained the network using Scaled conjugate
gradient back propagation.

The scaled conjugate gradient algorithm is based on
conjugate directions, but it does not perform a line search
at each iteration. (Moller ,Neural Networks, Vol. 6, 1993, pp.
Error Back propagation Algorithm
The gradients are computed through a back propagation

                                       Thanks Sargur Srihari for the slide
Error Back propagation Algorithm

                        Thanks Sargur Srihari for the slide
Confusion Matrices

We observe that the
missclassification error is
arournd 31%-32% for both
training, validation and test

The total missclassification
error is 31.8%.
Performance function
The plot below depicts the mean squared error in 31
epochs (repetitions):
Training state
The plot below depicts the gradient and
validation performance fails
Receiver operating characteristic
   The receiver operating characteristic (roc) is a
    metric used to check the quality of classifiers.

   For each class of a classifier, roc applies
    threshold values across the interval [0,1] to

   For each threshold, two values are calculated:
       True Positive Ratio (the number of outputs greater or
        equal to the threshold, divided by the number of one
       False Positive Ratio (the number of outputs less than
        the threshold, divided by the number of zero targets)
Receiver operating characteristic

   These plots depict the
    receiver      operating
    characteristic for each
    output class. The
    more each curve hugs
    the left and top edges
    of the plot, the better
    the classification.
Weights and biases in the first layer
Neurons      1        2        3        4        5         6        7        8        9
   1      0,359192 -0,94229 -0,17764 -0,04082 0,400314 -0,4872 -0,16613 0,275041 -0,25531

   2      0,887809 0,593089 0,932094 -1,77481 1,246858 -0,53313 -0,15056 -0,28295 -0,00608    -1,03526

   3      -0,32364 -0,96883 0,404728 -0,32628 -0,09994 -0,50305 0,458974 0,645598 -1,11714    1,24571

   4      0,447813 -0,29893 0,406556 0,678079 -0,37813 -1,02435 -1,14411 -0,05205 0,81408     -0,02091

   5      -0,46423 -1,72538 -0,09891 -0,70205 -0,28577 0,704576 0,829762 -0,35652 -0,04445
   6      0,785984 0,215749 -0,79605 -0,8742   -0,2018 0,149792 1,380153 0,967953 0,663471
   7      1,300987 -0,72761 1,17204 0,313455 -0,62038 0,317305 0,929915 -1,13674 -0,21046     1,07551

   8      -1,07973 1,031171 0,357152 -0,1526   0,68631   -0,5282 -0,22145 -0,74771 0,143525
   9      -0,60609 0,272917 -0,49096 0,306454 0,732686 0,494037 0,698083 0,554006 0,358042
Weights and biases in second layer

Neurons      1        2         3         4        5        6        7        8        9
   1      0,882965 1,837766   0,0633   0,099547 0,949198 -0,25449 0,382952 -0,06175 -0,65366

   2      -0,01837 -0,67212 0,203406 0,566755 0,77496 1,235017 0,936339 0,488274 1,107461

   3      0,211355 -0,33971 0,030057 -0,64298 -0,05428 0,815069 -0,29038   0,82432 0,691158    -0,18307

   4      0,891952 -1,50686 -1,66037 -0,92405 0,931156 -0,86609 0,371743 -0,34958 0,383325     -0,33859

   5      0,849883 0,136988 -0,69641 -0,52966 0,018272 -0,51882 -0,16588 0,254206 0,968375     0,954107

   6      0,241114 -0,18111 -0,30394 0,186851 -1,10375 -0,28263 -1,56852 -0,19618 1,182049
Conclusion (1/2)
One major limitation of the statistical models is that
they work well only when the underlying
assumptions are satisfied.

The effectiveness of these methods depends on the
various assumptions or conditions under which the
models are developed.

On the other side, neural networks are data driven
self-adaptive methods in that they can adjust
themselves to the data without any explicit
specification of functional or distributional form for
the underlying model.
Conclusion (2/2)
In this dataset, it seems that neural networks is
significantly more accurate than linear discriminant
analysis, quadratic discriminant analysis and Naive

Although in some traditional classification
methods the missclassification error was near 21%,
the cross-validation error was over 40% for the
majority of them. These results implies that for this
dataset, these methods tend to overfit the data.

In neural network, the missclassification error of the
independent test sample was 31.3%.
   M. F. Moller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning,
    Neural Networks, Vol. 6, pp. 525-533, 1993
   G. P. Zhang, Neural Networks for Classification: A Survey, IEEE transactions on
    systems, man, and cybernetics—part c: applications and reviews, Vol. 30, No. 4,
   Hagan, Demuth, and Beale Neural Network Toolbox™ User’s Guide, 2012
   Murphy,P.M., Aha, D.W. (1994). UCI Repository of machine learning databases
    ( Irvine, CA: University of California, Department of
    Information and Computer Science.

  I. Antoniou, Statistical Models of Networks 1, Master in Web Science, Aristotle
   University of Thessaloniki, 2012
   I. Antoniou, Statistical Models of Networks 2, Master in Web Science, Aristotle
   University of Thessaloniki, 2012

All computations were performed in Matlab. The following toolboxes were used:
    Neural Network Toolbox™
    StatisticsToolbox™

   Thank you for your

More Related Content

What's hot

Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compressionasodariyabhavesh
Image enhancement in the spatial domain1
Image enhancement in the spatial domain1Image enhancement in the spatial domain1
Image enhancement in the spatial domain1shabanam tamboli
Image Enhancement using Frequency Domain Filters
Image Enhancement using Frequency Domain FiltersImage Enhancement using Frequency Domain Filters
Image Enhancement using Frequency Domain FiltersKarthika Ramachandran
Adaptive Median Filters
Adaptive Median FiltersAdaptive Median Filters
Adaptive Median FiltersAmnaakhaan
Frequency domain methods
Frequency domain methods Frequency domain methods
Frequency domain methods thanhhoang2012
digital image processing
digital image processingdigital image processing
digital image processingAbinaya B
Image Acquisition and Representation
Image Acquisition and RepresentationImage Acquisition and Representation
Image Acquisition and RepresentationAmnaakhaan
Image feature extraction
Image feature extractionImage feature extraction
Image feature extractionRishabh shah
Image Filtering in the Frequency Domain
Image Filtering in the Frequency DomainImage Filtering in the Frequency Domain
Image Filtering in the Frequency DomainAmnaakhaan
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformationsJohn Williams
Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)John Williams
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
Spatial filtering
Spatial filteringSpatial filtering
Spatial filteringDeepikaT13
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domainAshish Kumar
Lab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid ShaikhLab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid Shaikhkhalidsheikh24
Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Kalyan Acharjya

What's hot (20)

Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compression
image compression ppt
image compression pptimage compression ppt
image compression ppt
Image enhancement in the spatial domain1
Image enhancement in the spatial domain1Image enhancement in the spatial domain1
Image enhancement in the spatial domain1
Image Enhancement using Frequency Domain Filters
Image Enhancement using Frequency Domain FiltersImage Enhancement using Frequency Domain Filters
Image Enhancement using Frequency Domain Filters
Adaptive Median Filters
Adaptive Median FiltersAdaptive Median Filters
Adaptive Median Filters
Frequency domain methods
Frequency domain methods Frequency domain methods
Frequency domain methods
Histogram processing
Histogram processingHistogram processing
Histogram processing
digital image processing
digital image processingdigital image processing
digital image processing
Image Acquisition and Representation
Image Acquisition and RepresentationImage Acquisition and Representation
Image Acquisition and Representation
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
Image Filtering in the Frequency Domain
Image Filtering in the Frequency DomainImage Filtering in the Frequency Domain
Image Filtering in the Frequency Domain
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformations
Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
Spatial filtering
Spatial filteringSpatial filtering
Spatial filtering
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domain
Lab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid ShaikhLab manual of Digital image processing using python by khalid Shaikh
Lab manual of Digital image processing using python by khalid Shaikh
Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)Spatial Filters (Digital Image Processing)
Spatial Filters (Digital Image Processing)
Edge detection
Edge detectionEdge detection
Edge detection

Similar to Statistical classification: A review on some techniques

Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...IDES Editor
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
Computational decision making
Computational decision makingComputational decision making
Computational decision makingBoris Adryan
Deployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement predictionDeployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement predictionijtsrd
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample Applications
2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample Applications2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample Applications
2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample ApplicationsLeandro de Castro
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive ModelsDatamining Tools
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Miningijdmtaiir
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara

Similar to Statistical classification: A review on some techniques (20)

Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
Computational decision making
Computational decision makingComputational decision making
Computational decision making
Deployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement predictionDeployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement prediction
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample Applications
2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample Applications2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample Applications
2008: Applied AIS - A Roadmap of AIS Research in Brazil and Sample Applications
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
Mahout part2
Mahout part2Mahout part2
Mahout part2
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Mining
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective

Recently uploaded

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo

Statistical classification: A review on some techniques

  • 1. Statistical Classification: A review on some techniques Bamparopoulos Giorgos Master in Web Science, Department of Mathematics, Aristotle University of Thessaloniki
  • 2. What is Pattern Recognition?  The study of how machines can observe the environment  learn to distinguish patterns of interest from their background, and  make sound and reasonable decisions about the categories of the patterns.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source.
  • 3. Definition In machine learning, pattern recognition is the assignment of a label to a given input value. An example of pattern recognition is classification. However, pattern recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence. - Wikipedia
  • 4. Pattern recognition system Classification Mode Test Set test Feature Preprocessing Classification pattern Measurement training Feature pattern Preprocessing Extraction/ Learning Selection Training Set Training Mode
  • 5. Supervised vs Unsupervised Learning  Supervised learning (classification)  The training data are accompanied by labels indicating the class of the observations  New data is classified based on the training set  Unsupervised learning (clustering)  The class labels of training data is unknown  Given a set of measurements, observations, with the aim of establishing the existence of classes or clusters in the data
  • 6. Classification A classification problem occurs when an object needs to be assigned into a predefined group or class based on a number of observed attributes related to that object. The individual observations are analyzed into a set of quantifiable properties, known as explanatory variables, features, etc. These properties may variously be categorical , ordinal. integer-valued or real-valued. During classification given objects are assigned to prescribed classes. A classifier is a mathematical function, implemented by a classification algorithm, that maps input data to a categorywhich performs classification
  • 7. Application Domains (1/3)  Computer vision  Medical imaging and medical image analysis  Optical character recognition  Video tracking  Drug discovery and development  Toxicogenomics  Quantitative structure-activity relationship  Geostatistics  Speech recognition
  • 8. Application Domains (2/3)  Handwriting recognition  Biometric identification  Biological classification  Statistical natural language processing  Document classification  Internet search engines  Credit scoring
  • 9. Application Domains (3/3) Digit recognition Object Automated protein recognition classification Phoneme recognition [Waibel, Hanzawa, Hinton,Shikano, Lang 1989]
  • 10. Example of Classification 1/2 Input: Output: Spam Binary filtering !!!!$$$!!!! Character Multi-Class recognition [thanks to Ben Taskar for slide!] C
  • 11. Example of Classification 2/2 Input Output brace Handwriting recognition 3D object recognition [thanks to Ben Taskar for slide!]
  • 12. Classifiers Neural networks Quadratic classifiers Naive Bayes classifier Kernel estimation and K-nearest neighbor algorithms Decision trees, decision lists Support vector machines Maximum entropy classifier
  • 13. Linear Classifier 10 9 8 7 R.A. Fisher 1890-1962 6 5 If previously unseen instance above the line then 4 class is Katydid 3 else class is Grasshopper 2 1 Katydids 1 2 3 4 5 6 7 8 9 10 Grasshoppers Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
  • 14. Higher Dimensional Spaces … we can visualize it as being an n-dimensional hyperplane Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
  • 15. If we did not have the 3rd dimension… Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
  • 16. We can no longer get perfect accuracy with the simple linear classifier. Maybe solve this problem by using a simple quadratic classifier or a simple cubic classifier. Eamonn Keogh, Professor Computer Science & Engineering Department, University of California - Riverside
  • 20. Linear Discriminant Analysis  Assumes that the conditional class densities are (multivariate) Gaussian  Assumes equal covariance for every class Then the sample is in the class such that the discriminant function is maximized for that sample. Classification rule: ˆ   ( xi  k )( xi  k )T / ( N  K ) K Covariance matrix: Σ  ˆ ˆ k 1 gi
  • 21. Quadratic Discriminant Analysis  Class conditional probability densities are allowed to have different covariance matrices  The class decision boundaries are not linear rather quadratic Classification rule:
  • 22. Problems on Learning  Dimensionality  The number of features is too large relative to the number of training samples  Classifiers complexity  The number of unknown parameters associated with the classifier is large  Overtrained  Too intensively optimized on training set.
  • 23. Dimensionality Reduction  Feature Extraction  Create new features based on the original feature set  Transforms are usually involved  Feature Selection  Select the best subset from a given feature set.
  • 24. Overfitting and underfitting underfitting good fit overfitting
  • 25. Kernel Density Estimation Kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Let (x1, x2, …, xn) be a sample drawn from some distribution with an unknown density ƒ. Its kernel density estimator is described by equation below, where K(•) is the kernel function that integrates to one and h > 0 is a smoothing parameter called bandwidth. Some Kernel Functions:
  • 26. Glass Identification dataset This data is available from the UCI Machine Learning Repository. Murphy,P.M., Aha, D.W. (1994). UCI Repository of machine learning databases ( Irvine, CA: University of California, Department of Information and Computer Science. From USA Forensic Science Service. Sources:  Creator: B. German -- Central Research Establishment Home Office Forensic Science Service Aldermaston, Reading, Berkshire RG7 4PN  Donor: Vina Spiehler, Ph.D., DABFT Diagnostic Products Corporation (213) 776-0180 (ext 3014)  Date: September, 1987
  • 27. Glass Identification dataset  Dataset have 214 instances  6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc)  The study of classification of types of glass was motivated by criminological investigation. At the scene of the crime, the glass left can be used as Attribute id evidence...if it is correctly identified! Information: RI: refractive 1 id Type of glass: (class attribute) index 2 Na: Sodium 1 building windows float processed 3 Mg: Magnesium 2 building windows non float processed 4 Al: Aluminum 5 Si: Silicon 3 vehicle windows float processed 6 K: Potassium 7 Ca: Calcium 4 containers 8 Ba: Barium 5 tableware 9 Fe: Iron 6 headlamps *unit measurement: weight percent in corresponding oxide
  • 28. Scatter plots of features The scatter plots depict the relationship between features grouped by the type of glass.  X-axis:  Refractive index  Sodium  Magnesium  Y-axis:  Iron  Barium  Calcium
  • 29. Scatter plots of features  X-axis:  Refractive index  Sodium  Magnesium  Y-axis:  Potassium  Silicon  Aluminum
  • 30. Scatter plots of features  X-axis:  Aluminum  Silicon  Potassium  Y-axis:  Iron  Barium  Calcium
  • 31. Scatter plot and histograms
  • 32. Linear Discriminant Analysis First classify the data using the default linear discriminant analysis (LDA). In the scatter plot above, we draw a X through the misclassified observations.
  • 33. Confusion Matrix  A confusion matrix contains information about known class labels and predicted class labels.  The (i,j) element in the confusion matrix is the number of samples whose predicted class is i and whose known class label is class j.  The diagonal elements represent correctly classified observations. 46 16 3 0 1 0 14 41 3 2 1 1 The misclassification error (the 10 12 11 0 0 1 proportion of misclassified 0 4 0 10 0 2 observations) on the training set 0 3 0 0 7 1 is 35.05% 0 0 0 1 0 24
  • 34. Visualization of the regions The function has separated the plane into regions divided by lines, and assigned different regions to different species. One way to visualize these regions is to create a grid of (x,y) values and apply the classification function to that grid.
  • 35. Generalization error Generalization error, is the expected prediction error on an independent set. Cross-validation is a statistical method for estimating the generalization error on classification algorithms. In k-fold cross-validation the data is first partitioned into k equally (or nearly equally) sized segments or folds. Subsequently k iterations of training and validation are performed such that within each iteration a different fold of the data is held-out for validation while the remaining k-1 folds are used for learning. Here we use 10-fold cross-validation. The LDA cross-validation error is 40.19%
  • 36. Quadratic discriminant analysis The covariance matrix of some classes in training was not positive definite, so in order to solve that problem, we used quadratic discriminant analysis (QDA) without taking into account the features 6,8 and 9. Confusion Matrix 54 42 1 0 0 0 6 26 0 0 1 0 Misclassification error is 10 7 16 0 0 0 31.78% and cross- 0 1 0 9 0 0 validation error is 42.52% 0 0 0 0 12 0 0 0 0 0 0 29
  • 37. Mean of variables  The table above depict the mean of each variable (feature) for each group separately.  The plot depict the means in three dimensions. Barium ClassesFeatures Refractive index Sodium (Na) Magnesium (Mg) Aluminum (Al) Silicon (Si) Potassium (K) Calcium (Ca (Ba) Iron (Fe) building windows 1,518718286 13,24228571 3,552428571 1,163857143 72,61914286 0,447428571 8,797285714 0,012714 0,057 float building windows 1,518618553 13,11171053 3,002105263 1,408157895 72,59802632 0,521052632 9,073684211 0,050263 0,079737 non float vehicle windows 1,517963529 13,43705882 3,543529412 1,201176471 72,40470588 0,406470588 8,782941176 0,008824 0,057059 float containers 1,518927692 12,82769231 0,773846154 2,033846154 72,36615385 1,47 10,12384615 0,187692 0,060769 tableware 1,517455556 14,64666667 1,305555556 1,366666667 73,20666667 0 9,356666667 0 0 headlamps 1,517116207 14,44206897 0,538275862 2,122758621 72,96586207 0,325172414 8,49137931 1,04 0,013448
  • 38. Variance of variables The table above depict the variance of each variable (feature) for each group separately. The plot depict the means in three dimensions. Refractive Potassium Calcium Barium Iron ClassesFeatures index Sodium (Na) Magnesium (Mg) Aluminum (Al) Silicon (Si) (K) (Ca) (Ba) (Fe) building windows 5,14E-06 0,249302 0,06103 0,074615 0,324312 0,046173 0,330403 0,007029 0,007934 float building windows 1,45E-05 0,441108 1,477833 0,101341 0,525005 0,045679 3,692682 0,131291 0,011328 non float vehicle windows 3,67E-06 0,256935 0,026499 0,120749 0,262426 0,052849 0,144485 0,001324 0,011635 float containers 1,12E-05 0,603786 0,998292 0,481526 1,644342 4,574017 4,768942 0,369969 0,024208 tableware 9,71E-06 1,1751 1,203703 0,327025 1,16525 0 2,10235 0 0 headlamps 6,48E-06 0,471088 1,249215 0,196006 0,884039 0,446883 0,947712 0,442679 0,000888
  • 39. NaiveBayes Classifier We use the Gaussian distribution for features 1,2,3,4,5,7 and use the kernel density estimation for features 6, 8 and 9. Confusion Matrix 12 2 0 0 5 3 50 56 7 1 0 0 Misclassification error is 5 12 10 0 0 0 46.26% and cross- 1 0 0 28 1 4 validation error is 57.01% 7 0 0 0 7 0 1 0 0 0 0 2 If we assume that the prior probabilities are equal for all classes, the errors are 60.75% and 64.49% respectively.
  • 40. NaiveBayes Classifier If we use the kernel density estimation with normal kernel function we have: Confusion Matrix 60 17 3 0 0 0 4 44 0 0 1 0 6 11 14 0 0 1 Misclassification error is 21.96% and cross- 0 1 0 9 0 0 validation error is 39.72% 0 3 0 0 12 0 0 0 0 0 0 28 If we assume that the prior probabilities are equal for all classes, the errors are 29.44% and 45.33% respectively.
  • 41. NaiveBayes Classifier If we use the kernel density estimation with triangle kernel function we have: Confusion Matrix 61 16 2 0 0 0 4 44 0 0 1 0 Misclassification error is 5 12 15 0 0 1 21.03% and cross- 0 1 0 9 0 0 validation error is 42.06% 0 3 0 0 12 0 0 0 0 0 0 28 If we assume that the prior probabilities are equal for all classes, the errors are 29.44% and 45.33% respectively.
  • 42. NaiveBayes Classifier If we use the kernel density estimation with epanechnikov kernel function we have: Confusion Matrix 61 16 3 0 0 0 4 44 0 1 0 0 Misclassification error is 5 12 14 0 0 1 21.5% and cross- 0 4 0 12 0 0 validation error is 42.52% 0 0 0 0 9 0 0 0 0 0 0 28 If we assume that the prior probabilities are equal for all classes, the errors are 30.37% and 44.39% respectively.
  • 43. Neural Networks Biological neural networks are made up of real biological neurons that are connected or functionally related in a nervous system. In the field of neuroscience, they are often identified as groups of neurons that perform a specific physiological function in laboratory analysis. nnectionist.html reach-the-resolution-to-image-individual-neurons
  • 44. Artificial Neural Network  An Artificial Neural Network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks.  A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. - wikipedia
  • 45. Multiple Layers of Neurons A network can have several layers. Each layer has a weight matrix W, a bias vector b, and an output vector a. Neural Network Toolbox™, User’s Guide, Mark Hudson Beale, Martin T. Hagan. Howard B. Demuth
  • 46. Neural network for Classification In this classification problem, we use a two-layered feed-forward network, with 10 neurons in hidden layer.
  • 47. Transfer functions Hyperbolic tangent sigmoid transfer function is used in both hidden and output neurons. o This is mathematically equivalent to tanh(n). It differs in that it runs faster than the MATLAB implementation of tanh, but the results can have very small numerical differences.
  • 48. Initializing weights and bias  We initialized weights and biases in each layer with initnw function from Neural Network Toolbox in Matlab.  This function initializes a layer's weights and biases according to the Nguyen-Widrow initialization algorithm.  This algorithm chooses values in order to distribute the active region of each neuron in the layer approximately evenly across the layer's input space.  The values contain a degree of randomness, so they are not the same each time this function is called.
  • 50. Training function We trained the network using Scaled conjugate gradient back propagation. The scaled conjugate gradient algorithm is based on conjugate directions, but it does not perform a line search at each iteration. (Moller ,Neural Networks, Vol. 6, 1993, pp. 525–533)
  • 51. Error Back propagation Algorithm The gradients are computed through a back propagation process. Thanks Sargur Srihari for the slide
  • 52. Error Back propagation Algorithm Thanks Sargur Srihari for the slide
  • 53. Confusion Matrices We observe that the missclassification error is arournd 31%-32% for both training, validation and test samples. The total missclassification error is 31.8%.
  • 54. Performance function The plot below depicts the mean squared error in 31 epochs (repetitions):
  • 55. Training state The plot below depicts the gradient and validation performance fails
  • 56. Receiver operating characteristic  The receiver operating characteristic (roc) is a metric used to check the quality of classifiers.  For each class of a classifier, roc applies threshold values across the interval [0,1] to outputs.  For each threshold, two values are calculated:  True Positive Ratio (the number of outputs greater or equal to the threshold, divided by the number of one targets)  False Positive Ratio (the number of outputs less than the threshold, divided by the number of zero targets)
  • 57. Receiver operating characteristic  These plots depict the receiver operating characteristic for each output class. The more each curve hugs the left and top edges of the plot, the better the classification.
  • 58. Weights and biases in the first layer Biases Neurons 1 2 3 4 5 6 7 8 9 -2,09671 1 0,359192 -0,94229 -0,17764 -0,04082 0,400314 -0,4872 -0,16613 0,275041 -0,25531 2 0,887809 0,593089 0,932094 -1,77481 1,246858 -0,53313 -0,15056 -0,28295 -0,00608 -1,03526 3 -0,32364 -0,96883 0,404728 -0,32628 -0,09994 -0,50305 0,458974 0,645598 -1,11714 1,24571 4 0,447813 -0,29893 0,406556 0,678079 -0,37813 -1,02435 -1,14411 -0,05205 0,81408 -0,02091 5 -0,46423 -1,72538 -0,09891 -0,70205 -0,28577 0,704576 0,829762 -0,35652 -0,04445 0,669005 6 0,785984 0,215749 -0,79605 -0,8742 -0,2018 0,149792 1,380153 0,967953 0,663471 -0,59566 7 1,300987 -0,72761 1,17204 0,313455 -0,62038 0,317305 0,929915 -1,13674 -0,21046 1,07551 8 -1,07973 1,031171 0,357152 -0,1526 0,68631 -0,5282 -0,22145 -0,74771 0,143525 -1,36921 9 -0,60609 0,272917 -0,49096 0,306454 0,732686 0,494037 0,698083 0,554006 0,358042 -2,36606
  • 59. Weights and biases in second layer Neurons 1 2 3 4 5 6 7 8 9 Biases 1 0,882965 1,837766 0,0633 0,099547 0,949198 -0,25449 0,382952 -0,06175 -0,65366 -1,92628 2 -0,01837 -0,67212 0,203406 0,566755 0,77496 1,235017 0,936339 0,488274 1,107461 0,387852 3 0,211355 -0,33971 0,030057 -0,64298 -0,05428 0,815069 -0,29038 0,82432 0,691158 -0,18307 4 0,891952 -1,50686 -1,66037 -0,92405 0,931156 -0,86609 0,371743 -0,34958 0,383325 -0,33859 5 0,849883 0,136988 -0,69641 -0,52966 0,018272 -0,51882 -0,16588 0,254206 0,968375 0,954107 1,219512 6 0,241114 -0,18111 -0,30394 0,186851 -1,10375 -0,28263 -1,56852 -0,19618 1,182049
  • 60. Conclusion (1/2) One major limitation of the statistical models is that they work well only when the underlying assumptions are satisfied. The effectiveness of these methods depends on the various assumptions or conditions under which the models are developed. On the other side, neural networks are data driven self-adaptive methods in that they can adjust themselves to the data without any explicit specification of functional or distributional form for the underlying model.
  • 61. Conclusion (2/2) In this dataset, it seems that neural networks is significantly more accurate than linear discriminant analysis, quadratic discriminant analysis and Naive Bayes. Although in some traditional classification methods the missclassification error was near 21%, the cross-validation error was over 40% for the majority of them. These results implies that for this dataset, these methods tend to overfit the data. In neural network, the missclassification error of the independent test sample was 31.3%.
  • 62. References  M. F. Moller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning, Neural Networks, Vol. 6, pp. 525-533, 1993  G. P. Zhang, Neural Networks for Classification: A Survey, IEEE transactions on systems, man, and cybernetics—part c: applications and reviews, Vol. 30, No. 4, 2000  Hagan, Demuth, and Beale Neural Network Toolbox™ User’s Guide, 2012  Murphy,P.M., Aha, D.W. (1994). UCI Repository of machine learning databases ( Irvine, CA: University of California, Department of Information and Computer Science. Lectures:  I. Antoniou, Statistical Models of Networks 1, Master in Web Science, Aristotle University of Thessaloniki, 2012  I. Antoniou, Statistical Models of Networks 2, Master in Web Science, Aristotle University of Thessaloniki, 2012 All computations were performed in Matlab. The following toolboxes were used:  Neural Network Toolbox™  StatisticsToolbox™
  • 63. Questions? Thank you for your attention!