SlideShare a Scribd company logo
1 of 30
Introduction to Classification
Machine Learning and Data Mining (Unit 10)




Prof. Pier Luca Lanzi
References                                    2



  Jiawei Han and Micheline Kamber, "Data Mining, : Concepts
  and Techniques", The Morgan Kaufmann Series in Data
  Management Systems (Second Edition).
  Tom M. Mitchell. “Machine Learning” McGraw Hill 1997.
  Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
  “Introduction to Data Mining”, Addison Wesley.




                  Prof. Pier Luca Lanzi
What is an apple?                           3




                    Prof. Pier Luca Lanzi
Are these apples?                           4




                    Prof. Pier Luca Lanzi
Supervised vs. Unsupervised Learning           5



  Unsupervised learning (clustering)
    The class labels of training data is unknown
    Given a set of measurements, observations, etc. with the
    aim of establishing the existence of classes or clusters in
    the data
  Supervised learning (classification)
    Supervision: The training data (observations,
    measurements, etc.) are accompanied by labels indicating
    the class of the observations
    New data is classified based on the training set




                  Prof. Pier Luca Lanzi
The contact lenses data                                                   6


     Age         Spectacle prescription      Astigmatism   Tear production rate   Recommended
                                                                                      lenses
    Young              Myope                      No            Reduced                None
    Young              Myope                      No             Normal                Soft
    Young              Myope                      Yes           Reduced                None
    Young              Myope                      Yes            Normal                Hard
    Young           Hypermetrope                  No            Reduced                None
    Young           Hypermetrope                  No             Normal                Soft
    Young           Hypermetrope                  Yes           Reduced                None
    Young           Hypermetrope                  Yes            Normal                hard
Pre-presbyopic         Myope                      No            Reduced                None
Pre-presbyopic         Myope                      No             Normal                Soft
Pre-presbyopic         Myope                      Yes           Reduced                None
Pre-presbyopic         Myope                      Yes            Normal                Hard
Pre-presbyopic      Hypermetrope                  No            Reduced                None
Pre-presbyopic      Hypermetrope                  No             Normal                Soft
Pre-presbyopic      Hypermetrope                  Yes           Reduced                None
Pre-presbyopic      Hypermetrope                  Yes            Normal                None
  Presbyopic           Myope                      No            Reduced                None
  Presbyopic           Myope                      No             Normal                None
  Presbyopic           Myope                      Yes           Reduced                None
  Presbyopic           Myope                      Yes            Normal                Hard
  Presbyopic        Hypermetrope                  No            Reduced                None
  Presbyopic        Hypermetrope                  No             Normal                Soft
  Presbyopic        Hypermetrope                  Yes           Reduced                None
  Presbyopic        Hypermetrope                  Yes            Normal                None
                                Prof. Pier Luca Lanzi
A model extracted from                               7
the contact lenses data

  If tear production rate = reduced then recommendation = none
  If age = young and astigmatic = no
     and tear production rate = normal then recommendation = soft
  If age = pre-presbyopic and astigmatic = no
     and tear production rate = normal then recommendation = soft
  If age = presbyopic and spectacle prescription = myope
     and astigmatic = no then recommendation = none
  If spectacle prescription = hypermetrope and astigmatic = no
     and tear production rate = normal then recommendation = soft
  If spectacle prescription = myope and astigmatic = yes
     and tear production rate = normal then recommendation = hard
  If age young and astigmatic = yes
     and tear production rate = normal then recommendation = hard
  If age = pre-presbyopic
     and spectacle prescription = hypermetrope
     and astigmatic = yes then recommendation = none
  If age = presbyopic and spectacle prescription = hypermetrope
     and astigmatic = yes then recommendation = none




                     Prof. Pier Luca Lanzi
Predicting CPU performance                                              8



      209 different computer configurations

       Cycle time   Main memory         Cache            Channels           Performance
          (ns)          (Kb)             (Kb)
         MYCT       MMIN    MMAX         CACH      CHMIN      CHMAX            PRP
  1       125       256      6000         256       16          128            198
  2       29        8000    32000          32        8          32             269
 …
208       480       512      8000          32        0              0           67
209       480       1000     4000          0         0              0           45




      Amodel to predict the performance
      PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX
            + 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX


                           Prof. Pier Luca Lanzi
Classification vs. Prediction                     9



  Classification
     predicts categorical class labels (discrete or nominal)
     classifies data (constructs a model) based on the training
     set and the values (class labels) in a classifying attribute
     and uses it in classifying new data

  Prediction
     models continuous-valued functions, i.e., predicts
     unknown or missing values

  Applications
    Credit approval
    Target marketing
    Medical diagnosis
    Fraud detection

                   Prof. Pier Luca Lanzi
What is classification?                         10



  It is a two-step Process
  Model construction
      Given a set of data representing examples of
      a target concept, build a model to “explain” the concept
  Model usage
      The classification model is used for classifying
      future or unknown cases
      Estimate accuracy of the model




                   Prof. Pier Luca Lanzi
Classification: Model Construction             11




                                              Classification
                                               Algorithms
              Training
               Data


                                                Classifier
NAM E   RANK           YEARS TENURED
                                                (Model)
M ike   Assistant Prof   3      no
M ary   Assistant Prof   7      yes
Bill    Professor        2      yes
Jim     Associate Prof   7      yes
                                           IF rank = ‘professor’
Dave    Assistant Prof   6      no
                                           OR years > 6
Anne    Associate Prof   3      no
                                           THEN tenured = ‘yes’
                   Prof. Pier Luca Lanzi
Classification: Model Usage                                  12




                                             Classifier


                 Testing
                                                            Unseen Data
                  Data

                                                          (Jeff, Professor, 4)
NAM E      RANK           YEARS TENURED
                                                          Tenured?
Tom        Assistant Prof   2      no
M erlisa   Associate Prof   7      no
George     Professor        5      yes
Joseph     Assistant Prof   7      yes
                     Prof. Pier Luca Lanzi
Illustrating Classification Task                                           13




                Attrib1    Attrib2   Attrib3   Class
          Tid
                                               No
          1     Yes       Large      125K
                                               No
          2     No        Medium     100K
                                               No
          3     No        Small      70K
                                               No
          4     Yes       Medium     120K
                                               Yes
          5     No        Large      95K
                                               No
          6     No        Medium     60K
                                                                   Learn
                                               No
          7     Yes       Large      220K

                                                                   Model
                                               Yes
          8     No        Small      85K
                                               No
          9     No        Medium     75K
                                               Yes
          10    No        Small      90K
     10




                                                                   Apply
                                                                   Model
                Attrib1    Attrib2   Attrib3   Class
          Tid
                                               ?
          11    No        Small      55K
                                               ?
          12    Yes       Medium     80K
                                               ?
          13    Yes       Large      110K
                                               ?
          14    No        Small      95K
                                               ?
          15    No        Large      67K
     10




                                           Prof. Pier Luca Lanzi
Evaluating Classification Methods              14



  Accuracy
     classifier accuracy: predicting class label
     predictor accuracy: guessing value of predicted attributes
  Speed
     time to construct the model (training time)
     time to use the model (classification/prediction time)
  Robustness: handling noise and missing values
  Scalability: efficiency in disk-resident databases
  Interpretability
     understanding and insight provided by the model
  Other measures, e.g., goodness of rules, such as decision
  tree size or compactness of classification rules




                   Prof. Pier Luca Lanzi
Machine Learning perspective                   15
on classification

  Classification algorithms are methods of supervised Learning

  In Supervised Learning
     The experience E consists of a set of examples of a target
     concept that have been prepared by a supervisor
     The task T consists of finding an hypothesis that
     accurately explains the target concept
     The performance P depends on how accurately the
     hypothesis h explains the examples in E




                  Prof. Pier Luca Lanzi
Machine Learning perspective                   16
on classification

  Let us define the problem domain as the set of instance X

  For instance, X contains different fruits

  We define a concept over X as a function c which maps
  elements of X into a range D

                                  c:X→ D

  The range D represents the type of concept that is going to
  be analyzed

  For instance, c: X → {apple, not_an_apple}




                   Prof. Pier Luca Lanzi
Machine Learning perspective                         17
on classification

  Experience E is a set of <x,d> pairs, with x∈X and d∈D.
  The task T consists of finding an hypothesis h to explain E:

                              ∀x∈X h(x)=c(x)

  The set H of all the possible hypotheses h that can be used to
  explain c it is called the hypothesis space

  The goodness of an hypothesis h can be evaluated as the
  percentage of examples that are correctly explained by h

                P(h) = | {x| x∈X e h(x)=c(x)}| / |X|




                     Prof. Pier Luca Lanzi
Examples                                         18



  Concept Learning
  when D={0,1}

  Supervised classification
  when D consists of a finite number of labels

  Prediction
  when D is a subset of Rn




                   Prof. Pier Luca Lanzi
Machine Learning perspective                  19
on classification

  Supervised learning algorithms, given the examples in E,
  search the hypotheses space H for the hypothesis h that best
  explains the examples in E
  Learning is viewed as a search in the hypotheses space




                  Prof. Pier Luca Lanzi
Searching for the hypothesis                    20



  The type of hypothesis required influences the search
  algorithm

  The more complex the representation
  the more complex the search algorithm

  Many algorithms assume that it is possible to define a partial
  ordering over the hypothesis space

  The hypothesis space can be searched using either a general
  to specific or a specific-to-general strategy




                   Prof. Pier Luca Lanzi
Exploring the Hypothesis Space                21



  General to Specific
    Start with the most general hypothesis and then go on
    through specialization steps

  Specific to General
    Start with the set of the most specific hypothesis and then
    go on through generalization steps




                  Prof. Pier Luca Lanzi
Inductive Bias                                   22



  Set of assumptions that together with the training data
  deductively justify the classification assigned by the learner
  to future instances




                   Prof. Pier Luca Lanzi
Inductive Bias                                   23



  Set of assumptions that together with the training data
  deductively justify the classification assigned by the learner
  to future instances

  There can be a number of hypotheses consistent with
  training data

  Each learning algorithm has an inductive bias that imposes a
  preference on the space of all possible hypotheses




                   Prof. Pier Luca Lanzi
Types of Inductive Bias                          24



  Syntactic Bias, depends on the language used to represent
  hypotheses

  Semantic Bias, depends on the heuristics used to filter
  hypotheses

  Preference Bias, depends on the ability to rank and compare
  hypotheses

  Restriction Bias, depends on the ability to restrict the search
  space




                   Prof. Pier Luca Lanzi
Why looking for h?                              25



  Inductive Learning Hypothesis: any hypothesis (h) found to
  approximate the target function (c) over a sufficiently large
  set of training examples will also approximate the target
  function (c) well over other unobserved examples.




                   Prof. Pier Luca Lanzi
Trainining and testing                          26



  Training: the hypothesis h is developed to explain the
  examples in Etrain
  Testing: the hypothesis h is evaluated (verified) with respect
  to the previously unseen examples in Etest




                   Prof. Pier Luca Lanzi
Generalization and Overfitting                    27



  The hypothesis h is developed based on a set of training
  examples Etrain
  The underlying hypothesis is that if h explains Etrain then it
  can also be used to explain other examples in Etest not
  previously used to develop h




                    Prof. Pier Luca Lanzi
Generalization and Overfitting                       28



  When h explains “well” both Etrain and Etest we say that h is general
  and that the method used to develop h has adequately generalized
  When h explains Etrain but not Etest we say that the method used to
  develop h has overfitted
  We have overfitting when the hypothesis h explains Etrain too
  accurately so that h is not general enough to be applied outside
  Etrain




                     Prof. Pier Luca Lanzi
What are the general issues                 29
for classification in Machine Learning?

  Type of training experience
     Direct or indirect?
     Supervised or not?
  Type of target function and performance
  Type of search algorithm
  Type of representation of the solution
  Type of Inductive bias




                  Prof. Pier Luca Lanzi
Summary                                         30



 Classification is a two-step process involving the building,
 the testing, and the usage of the classification model
 Major issues for Data Mining include:
    The type of input data
    The representation used for the model
    The generalization performance on unseen data
 In Machine Learning, classification is viewed as
 an instance of supervised learning
 The focus is on the search process aimed at finding the
 classifier (the hypothesis) that best explains the data
 Major issues for Machine Learning include:
    The type of input experience
    The search algorithm
    The inductive biases


                  Prof. Pier Luca Lanzi

More Related Content

What's hot

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...Edureka!
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Edureka!
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learningBabu Priyavrat
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierYiqun Hu
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierArunabha Saha
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 

What's hot (20)

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Xgboost
XgboostXgboost
Xgboost
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 

More from Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 

Recently uploaded

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Machine Learning and Data Mining: 10 Introduction to Classification

  • 1. Introduction to Classification Machine Learning and Data Mining (Unit 10) Prof. Pier Luca Lanzi
  • 2. References 2 Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann Series in Data Management Systems (Second Edition). Tom M. Mitchell. “Machine Learning” McGraw Hill 1997. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, “Introduction to Data Mining”, Addison Wesley. Prof. Pier Luca Lanzi
  • 3. What is an apple? 3 Prof. Pier Luca Lanzi
  • 4. Are these apples? 4 Prof. Pier Luca Lanzi
  • 5. Supervised vs. Unsupervised Learning 5 Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set Prof. Pier Luca Lanzi
  • 6. The contact lenses data 6 Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Myope No Reduced None Young Myope No Normal Soft Young Myope Yes Reduced None Young Myope Yes Normal Hard Young Hypermetrope No Reduced None Young Hypermetrope No Normal Soft Young Hypermetrope Yes Reduced None Young Hypermetrope Yes Normal hard Pre-presbyopic Myope No Reduced None Pre-presbyopic Myope No Normal Soft Pre-presbyopic Myope Yes Reduced None Pre-presbyopic Myope Yes Normal Hard Pre-presbyopic Hypermetrope No Reduced None Pre-presbyopic Hypermetrope No Normal Soft Pre-presbyopic Hypermetrope Yes Reduced None Pre-presbyopic Hypermetrope Yes Normal None Presbyopic Myope No Reduced None Presbyopic Myope No Normal None Presbyopic Myope Yes Reduced None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope No Reduced None Presbyopic Hypermetrope No Normal Soft Presbyopic Hypermetrope Yes Reduced None Presbyopic Hypermetrope Yes Normal None Prof. Pier Luca Lanzi
  • 7. A model extracted from 7 the contact lenses data If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none Prof. Pier Luca Lanzi
  • 8. Predicting CPU performance 8 209 different computer configurations Cycle time Main memory Cache Channels Performance (ns) (Kb) (Kb) MYCT MMIN MMAX CACH CHMIN CHMAX PRP 1 125 256 6000 256 16 128 198 2 29 8000 32000 32 8 32 269 … 208 480 512 8000 32 0 0 67 209 480 1000 4000 0 0 0 45 Amodel to predict the performance PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX + 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX Prof. Pier Luca Lanzi
  • 9. Classification vs. Prediction 9 Classification predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction models continuous-valued functions, i.e., predicts unknown or missing values Applications Credit approval Target marketing Medical diagnosis Fraud detection Prof. Pier Luca Lanzi
  • 10. What is classification? 10 It is a two-step Process Model construction Given a set of data representing examples of a target concept, build a model to “explain” the concept Model usage The classification model is used for classifying future or unknown cases Estimate accuracy of the model Prof. Pier Luca Lanzi
  • 11. Classification: Model Construction 11 Classification Algorithms Training Data Classifier NAM E RANK YEARS TENURED (Model) M ike Assistant Prof 3 no M ary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes IF rank = ‘professor’ Dave Assistant Prof 6 no OR years > 6 Anne Associate Prof 3 no THEN tenured = ‘yes’ Prof. Pier Luca Lanzi
  • 12. Classification: Model Usage 12 Classifier Testing Unseen Data Data (Jeff, Professor, 4) NAM E RANK YEARS TENURED Tenured? Tom Assistant Prof 2 no M erlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Prof. Pier Luca Lanzi
  • 13. Illustrating Classification Task 13 Attrib1 Attrib2 Attrib3 Class Tid No 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K Yes 5 No Large 95K No 6 No Medium 60K Learn No 7 Yes Large 220K Model Yes 8 No Small 85K No 9 No Medium 75K Yes 10 No Small 90K 10 Apply Model Attrib1 Attrib2 Attrib3 Class Tid ? 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K 10 Prof. Pier Luca Lanzi
  • 14. Evaluating Classification Methods 14 Accuracy classifier accuracy: predicting class label predictor accuracy: guessing value of predicted attributes Speed time to construct the model (training time) time to use the model (classification/prediction time) Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability understanding and insight provided by the model Other measures, e.g., goodness of rules, such as decision tree size or compactness of classification rules Prof. Pier Luca Lanzi
  • 15. Machine Learning perspective 15 on classification Classification algorithms are methods of supervised Learning In Supervised Learning The experience E consists of a set of examples of a target concept that have been prepared by a supervisor The task T consists of finding an hypothesis that accurately explains the target concept The performance P depends on how accurately the hypothesis h explains the examples in E Prof. Pier Luca Lanzi
  • 16. Machine Learning perspective 16 on classification Let us define the problem domain as the set of instance X For instance, X contains different fruits We define a concept over X as a function c which maps elements of X into a range D c:X→ D The range D represents the type of concept that is going to be analyzed For instance, c: X → {apple, not_an_apple} Prof. Pier Luca Lanzi
  • 17. Machine Learning perspective 17 on classification Experience E is a set of <x,d> pairs, with x∈X and d∈D. The task T consists of finding an hypothesis h to explain E: ∀x∈X h(x)=c(x) The set H of all the possible hypotheses h that can be used to explain c it is called the hypothesis space The goodness of an hypothesis h can be evaluated as the percentage of examples that are correctly explained by h P(h) = | {x| x∈X e h(x)=c(x)}| / |X| Prof. Pier Luca Lanzi
  • 18. Examples 18 Concept Learning when D={0,1} Supervised classification when D consists of a finite number of labels Prediction when D is a subset of Rn Prof. Pier Luca Lanzi
  • 19. Machine Learning perspective 19 on classification Supervised learning algorithms, given the examples in E, search the hypotheses space H for the hypothesis h that best explains the examples in E Learning is viewed as a search in the hypotheses space Prof. Pier Luca Lanzi
  • 20. Searching for the hypothesis 20 The type of hypothesis required influences the search algorithm The more complex the representation the more complex the search algorithm Many algorithms assume that it is possible to define a partial ordering over the hypothesis space The hypothesis space can be searched using either a general to specific or a specific-to-general strategy Prof. Pier Luca Lanzi
  • 21. Exploring the Hypothesis Space 21 General to Specific Start with the most general hypothesis and then go on through specialization steps Specific to General Start with the set of the most specific hypothesis and then go on through generalization steps Prof. Pier Luca Lanzi
  • 22. Inductive Bias 22 Set of assumptions that together with the training data deductively justify the classification assigned by the learner to future instances Prof. Pier Luca Lanzi
  • 23. Inductive Bias 23 Set of assumptions that together with the training data deductively justify the classification assigned by the learner to future instances There can be a number of hypotheses consistent with training data Each learning algorithm has an inductive bias that imposes a preference on the space of all possible hypotheses Prof. Pier Luca Lanzi
  • 24. Types of Inductive Bias 24 Syntactic Bias, depends on the language used to represent hypotheses Semantic Bias, depends on the heuristics used to filter hypotheses Preference Bias, depends on the ability to rank and compare hypotheses Restriction Bias, depends on the ability to restrict the search space Prof. Pier Luca Lanzi
  • 25. Why looking for h? 25 Inductive Learning Hypothesis: any hypothesis (h) found to approximate the target function (c) over a sufficiently large set of training examples will also approximate the target function (c) well over other unobserved examples. Prof. Pier Luca Lanzi
  • 26. Trainining and testing 26 Training: the hypothesis h is developed to explain the examples in Etrain Testing: the hypothesis h is evaluated (verified) with respect to the previously unseen examples in Etest Prof. Pier Luca Lanzi
  • 27. Generalization and Overfitting 27 The hypothesis h is developed based on a set of training examples Etrain The underlying hypothesis is that if h explains Etrain then it can also be used to explain other examples in Etest not previously used to develop h Prof. Pier Luca Lanzi
  • 28. Generalization and Overfitting 28 When h explains “well” both Etrain and Etest we say that h is general and that the method used to develop h has adequately generalized When h explains Etrain but not Etest we say that the method used to develop h has overfitted We have overfitting when the hypothesis h explains Etrain too accurately so that h is not general enough to be applied outside Etrain Prof. Pier Luca Lanzi
  • 29. What are the general issues 29 for classification in Machine Learning? Type of training experience Direct or indirect? Supervised or not? Type of target function and performance Type of search algorithm Type of representation of the solution Type of Inductive bias Prof. Pier Luca Lanzi
  • 30. Summary 30 Classification is a two-step process involving the building, the testing, and the usage of the classification model Major issues for Data Mining include: The type of input data The representation used for the model The generalization performance on unseen data In Machine Learning, classification is viewed as an instance of supervised learning The focus is on the search process aimed at finding the classifier (the hypothesis) that best explains the data Major issues for Machine Learning include: The type of input experience The search algorithm The inductive biases Prof. Pier Luca Lanzi