SlideShare a Scribd company logo
1 of 27
Download to read offline
Introduction to Machine
       Learning
                  Lecture 5


               Albert Orriols i Puig
              aorriols@salle.url.edu

     Artificial Intelligence โ€“ Machine Learning
                       g                      g
         Enginyeria i Arquitectura La Salle
                Universitat Ramon Llull
Recap of Lecture 4

       The input consists of examples featured by
       different characteristics




                                                           Slide 2
Artificial Intelligence                 Machine Learning
Recap of Lecture 4
        Different problems in machine learning
                Classification: Find the class to which a new instance belongs to
                          E.g.: Find whether a new patient has cancer or not

                Numeric prediction: A variation of classification in which the output
                         p                                                        p
                consists of numeric classes
                          E.g.: Find the frequency of cancerous cell found

                Regression: Find a function that fits your examples
                          E.g.: Find a function that controls your chain process

                Association: Find association among your problem attributes or
                variables
                          E.g.: Find relations such as a patient with high-blood-pressure is
                          more lik l to have heart-attack di
                                likely t h     h t tt k disease

                Clustering: Process to cluster/group the instances into classes
                          E.g.:
                          E g : Group clients whose purchases are similar


                                                                                           Slide 3
Artificial Intelligence                         Machine Learning
Todayโ€™s Agenda

        Reviewing the Goal of Data Classification
        Whatโ€™s a Decision Tree
        How to build Decision Trees:
                ID3
                From ID3 to C4.5
        Run C4.5 on Weka




                                                      Slide 4
Artificial Intelligence            Machine Learning
The Goal of Data Classification

                  Data Set               Classification Model         How?




       The classification model can be implemented in several ways:
               โ€ข Rules
               โ€ข Decision trees
               โ€ข Mathematical formulae




                                                                             Slide 5
Artificial Intelligence                    Machine Learning
The Goal of Data Classification
        Data can have complex structures
        We will accept the following type of data:
                Data described by features which have a single measurement
                Features can be
                          Nominal: @attribute color {green, orange, red}
                          Continuous: @attribute length real [0,10]
                I can have unknown values
                          I could have lost โ€“ or never have measured โ€“ the attribute of
                          a particular example




                                                                                   Slide 6
Artificial Intelligence                      Machine Learning
Classifying Plants
        Letโ€™s classify different plants in three classes:
                Iris-setosa, iris-virginica, and iris-versicolor




                Weka format
                Dataset publically available at UCI repository

                                                                   Slide 7
Artificial Intelligence                    Machine Learning
Classifying Plants
        Decision tree form this output



                                                   Internal node: test of the
                                                   value of a given attribute
                                                   Branch: value of an
                                                   attribute
                                                   Leaf: predicted class




        How could I automatically generate these types of trees?


                                                                    Slide 8
Artificial Intelligence         Machine Learning
Types of DT Builders
        So, where is the trick?
                Chose the attribute in each internal node and chose the best
                partition
        Several algorithms able to generate decision trees:
        S     ll     ith    bl t         t d ii      t
                Naรฏve Bayes Tree
                Random forests
                CART (classification and regression tree)
                ID3
                C45
        We are going to start from ID3 and finish with C4.5
                C 5 s the ost
                C4.5 is t e most influential dec s o t ee bu de a go t
                                     ue t a decision tree builder algorithm


                                                                              Slide 9
Artificial Intelligence                Machine Learning
ID3
        Ross Quinlan started with
                ID3 (Quinlan, 1979)
                C4.5 (Quinlan, 1993)


        Some assumptions in the basic algorithm
                  p                     g
                All attributes are nominal
                We do not have unknown values


        With these assumptions, Quinlan designed a heuristic
                   assumptions
        approach to infer decision trees from labeled data



                                                           Slide 10
Artificial Intelligence                 Machine Learning
Description of ID3
ID3(D, Target, Atts)              (Mitchell,1997)
returns: a decision tree that correctly classifies the given examples

variables
D: Training set of examples
Target: Attribute whose value is to be predicted by the tree
Atts: List of other attributes that may be tested by the learned decision tree

create a Root node for the tree
if D are all positive then Root โ† +
else if D are all negative then Root โ† โ€“
else if Attrs = โˆ… then          Root โ† most common value of target in D
else
        Aโ† the best decision attribute from Atts
        root โ† A
        for each possible value vi of A
              add a new tree branch with A=vi
              Dvi โ† subset of D that have value vi for A
              if Dvi = โˆ… add then leaf โ† most common value of Target in D
              else add the subtree ID3( Dvi Target Atts {A} )
                                        Dvi, Target, Atts-{A}


                                                                                 Slide 11
Artificial Intelligence                    Machine Learning
Description of ID3
                                                                                 D = {d1 , d 2 ,..., d L }
ID3(D, Target, Atts)                             (Mitchell,1997)

                                                                                 Atts = {a1 , a2 ,..., ak }
returns: a decision tree that correctly classifies the given examples

variables
D: Training set of examples
Target: Attribute whose value is to be predicted by the tree
Atts: List of other attributes that may be tested by the learned decision tree

create a Root node for the tree
if D are all positive then Root โ† +
else if D are all negative then Root โ† โ€“
else if Attrs = โˆ… then          Root โ† most common value of target in D
else
        Aโ† the best decision attribute from Atts
        root โ† A
        for each possible value vi of A
               add a new tree branch with A=vi
                Dvi โ† subset of D that have value vi for A
                if Dvi = โˆ… add then leaf โ† most common value of Target in D

                else add the subtree ID3( Dvi, Target, Atts-{A} )
                                                       Atts {A}




                                                                                                              Slide 12
Artificial Intelligence                                      Machine Learning
Description of ID3
                                                                                 D = {d1 , d 2 ,..., d L }
ID3(D, Target, Atts)                             (Mitchell,1997)

                                                                                 Atts = {a1 , a2 ,..., ak }
returns: a decision tree that correctly classifies the given examples

variables
D: Training set of examples
Target: Attribute whose value is to be predicted by the tree
Atts: List of other attributes that may be tested by the learned decision tree

create a Root node for the tree
if D are all positive then Root โ† +
else if D are all negative then Root โ† โ€“
else if Attrs = โˆ… then          Root โ† most common value of target in D
else
        Aโ† the best decision attribute from Atts
        root โ† A
        for each possible value vi of A
               add a new tree branch with A=vi
                Dvi โ† subset of D that have value vi for A
                if Dvi = โˆ… add then leaf โ† most common value of Target in D

                else add the subtree ID3( Dvi, Target, Atts-{A} )
                                                       Atts {A}




                                                                                                              Slide 13
Artificial Intelligence                                      Machine Learning
Description of ID3
                                                                                 D = {d1 , d 2 ,..., d L }
ID3(D, Target, Atts)                             (Mitchell,1997)
returns: a decision tree that correctly classifies the given examples
                                                                                 Atts = {a1 , a2 ,..., ak }
variables
D: Training set of examples
                                                                                           ai
Target: Attribute whose value is to be predicted by the tree
Atts: List of other attributes that may be tested by the learned decision tree

create a Root node for the tree
if D are all positive then Root โ† +
else if D are all negative then Root โ† โ€“
else if Attrs = โˆ… then          Root โ† most common value of target in D
else
        Aโ† the best decision attribute from Atts
        root โ† A
        for each possible value vi of A
               add a new tree branch with A=vi
                Dvi โ† subset of D that have value vi for A
                if Dvi = โˆ… add then leaf โ† most common value of Target in D

                else add the subtree ID3( Dvi, Target, Atts-{A} )
                                                       Atts {A}




                                                                                                              Slide 14
Artificial Intelligence                                      Machine Learning
Description of ID3
                                                                                           D = {d1 , d 2 ,..., d L }
ID3(D, Target, Atts)                             (Mitchell,1997)
returns: a decision tree that correctly classifies the given examples
                                                                                           Atts = {a1 , a2 ,..., ak }
variables
D: Training set of examples
                                                                                                     ai
Target: Attribute whose value is to be predicted by the tree
Atts: List of other attributes that may be tested by the learned decision tree

create a Root node for the tree
                                                                                 ai = v1            ai = v2               ai = vn
if D are all positive then Root โ† +
else if D are all negative then Root โ† โ€“                                                                        โ€ฆ
else if Attrs = โˆ… then          Root โ† most common value of target in D
else
        Aโ† the best decision attribute from Atts
        root โ† A
        for each possible value vi of A
               add a new tree branch with A=vi
                Dvi โ† subset of D that have value vi for A
                if Dvi = โˆ… add then leaf โ† most common value of Target in D

                else add the subtree ID3( Dvi, Target, Atts-{A} )
                                                       Atts {A}




                                                                                                                        Slide 15
Artificial Intelligence                                      Machine Learning
Description of ID3
                                                                                                       D = {d1 , d 2 ,..., d L }
ID3(D, Target, Atts)                             (Mitchell,1997)
                                                                                                       Atts = {a1 , a2 ,..., ak }
returns: a decision tree that correctly classifies the given examples

variables
                                                                                                                 ai
D: Training set of examples
Target: Attribute whose value is to be predicted by the tree
Atts: List of other attributes that may be tested by the learned decision tree
                                                                                       ai = v1                  ai = v2                    ai = vn
                                                                                       ai = v1                  ai = v2 โ€ฆ                  ai = vn
create a Root node for the tree
if D are all positive then Root โ† +
                                                                                                                            โ€ฆ
else if D are all negative then Root โ† โ€“
else if Attrs = โˆ… then          Root โ† most common value of target in D
else
                                                                        D = {d1' , d 2' ,..., d L }
                                                                                                '
        Aโ† the best decision attribute from Atts
                                                                                                                             D = {d1''' , d 2''' ,..., d L''' }
        root โ† A
                                                               Atts = {a1 ,..ai โˆ’1 , ai +1 ,.., ak }                         Atts = {a1 ,..ai โˆ’1 , ai +1 ,.., ak }
        for each possible value vi of A
               add a new tree branch with A=vi
                Dvi โ† subset of D that have value vi for A
                if Dvi = โˆ… add then leaf โ† most common value of Target in D

                else add the subtree ID3( Dvi, Target, Atts-{A} )
                                                       Atts {A}




                                                                                                                                                Slide 16
Artificial Intelligence                                      Machine Learning
Which Attribute Should I Select First?

        Which is the best choice?
                We have 29 positive examples and 35 negative ones
                Should I use attribute 1 or attribute 2 in this iteration of the
                node?
                  d?




                                                                                   Slide 17
Artificial Intelligence                  Machine Learning
Letโ€™s Rely on Information Theory
        Use the concept of Entropy
            Characterizes the impurity of an
            arbitrary collection of examples
            Given S:


                    Entropy ( S ) = โˆ’ pโŠ• log 2 pโŠ• โˆ’ pโˆ’ log 2 pโˆ’

                  where p+ and p- are the proportion of
                  positive/negative examples in S.


            Extension t c classes:
            Et    i to     l                                                Examples:
                                       c
                                                                               p+=0 โ†’ entropy=0
                     Entropy ( S ) = โˆ’โˆ‘ p log 2 pi
                                             i

                                                                               p+=1 โ†’ entropy=0
                                      i =1


                                                                               p+=0.5 p-=0.5 โ†’ entropy=1
                                                                               (maximum)
                                                                               P+=9 p-=5 โ†’ entropy=-
                                                                               (
                                                                               (9/14)log2(9/14)-
                                                                                    ) g(      )
                                                                               (5/14)log2(5/14)=0.940


                                                                                                           Slide 18
Artificial Intelligence                                  Machine Learning
Entropy
        What does this measure mean?
                Entropy is the minimum number of bits needed to encode the
                classification of a member of S randomly drawn.
                          p
                          p+=1, the receiver knows the class, no message sent, Entropy=0.
                              ,                             ,         g      ,      py
                          p+=0.5, 1 bit needed.

                Optimal length code assigns -log2p to message having probability p
                The idea behind is to assign shorter codes to the more probable
                messages and longer codes to less likely examples.
                Thus, the expected number of bits t encode + or โ€“ of random
                Th    th        td    b    f bit to     d          f    d
                member of S is:

                             Entropy ( S ) = pโŠ• (โˆ’ log 2 pโŠ• ) + pโˆ’ (โˆ’ log 2 pโˆ’ )




                                                                                       Slide 19
Artificial Intelligence                           Machine Learning
Information Gain

      Measures the expected reduction in entropy caused by partitioning the
      examples according to the given attribute
      Gain(S,A): the number of bits saved when encoding the target value of an
      arbitrary member of S, knowing the value of attribute A.
      Expected reduction in entropy caused by knowing the value of A.
                                                                           |S |
                                                                            Sv
                          Gain( S , A) = Entropy ( S ) โˆ’ โˆ‘vโˆˆvalues ( A )        Entropy ( S v )
                                                                           |S|


           where values(A) is the set of all possible values for A, and Sv is the
      subset of S for which attribute A has value v




                                                                                                  Slide 20
Artificial Intelligence                                  Machine Learning
Remember the Example?
        Which is the best choice?
                We have 29 positive examples and 35 negative ones
                Should I use attribute 1 or attribute 2 in this iteration of the
                node?
                  d?




                          Gain(A1)=0.993 - 26/64 0.70 - 36/64 0.74=0.292
                          Gain(A2)=0.993 - 51/64 0.93 -13/64 0.61 = 0.128



                                                                                   Slide 21
Artificial Intelligence                       Machine Learning
Yet Another Example
        The textbook example




                                                  Slide 22
Artificial Intelligence        Machine Learning
Yet Another Example




                                                Slide 23
Artificial Intelligence      Machine Learning
Hypothesis Search Space




                                             Slide 24
Artificial Intelligence   Machine Learning
To Sum Up
        ID3 is a strong system that
                Uses hill-climbing search based on the information gain
                measure to search through the space of decision trees
                Outputs
                O t t a single hypothesis.
                         i lh      th i
                Never backtracks. It converges to locally optimal solutions.
                Uses all training examples at each step, contrary to methods
                that make decisions incrementally.
                Uses statistical properties of all examples: the search is less
                sensitive to errors in individual training examples.
                Can handle noisy data by modifying its termination criterion to
                accept hypotheses that imperfectly fit the data.




                                                                               Slide 25
Artificial Intelligence                 Machine Learning
Next Class


        From ID3 to C4.5. C4.5 extends ID3 and enables the
        system to:
                Be more robust in the presence of noise. Avoiding overfitting
                Deal with continuous attributes
                Deal with missing data
                Convert trees to rules




                                                                           Slide 26
Artificial Intelligence                  Machine Learning
Introduction to Machine
       Learning
                  Lecture 5

               Albert Orriols i Puig
              aorriols@salle.url.edu

     Artificial Intelligence โ€“ Machine Learning
                       g                      g
         Enginyeria i Arquitectura La Salle
                Universitat Ramon Llull

More Related Content

What's hot

Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
Albert Orriols-Puig
ย 
Random forest
Random forestRandom forest
Random forest
Musa Hawamdah
ย 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
ย 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
ย 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
ย 
Decision tree
Decision treeDecision tree
Decision tree
R A Akerkar
ย 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
Marc Garcia
ย 

What's hot (20)

Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
ย 
Decision tree
Decision treeDecision tree
Decision tree
ย 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
ย 
Feature selection
Feature selectionFeature selection
Feature selection
ย 
Random forest
Random forestRandom forest
Random forest
ย 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
ย 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
ย 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
ย 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
ย 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
ย 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
ย 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
ย 
Decision trees & random forests
Decision trees & random forestsDecision trees & random forests
Decision trees & random forests
ย 
Decision tree
Decision treeDecision tree
Decision tree
ย 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
ย 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
ย 
Decision tree
Decision treeDecision tree
Decision tree
ย 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
ย 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
ย 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
ย 

Similar to Lecture5 - C4.5

Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
Albert Orriols-Puig
ย 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
butest
ย 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
Albert Orriols-Puig
ย 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
Albert Orriols-Puig
ย 

Similar to Lecture5 - C4.5 (20)

Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
ย 
Lecture17
Lecture17Lecture17
Lecture17
ย 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
ย 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
ย 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
ย 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
ย 
Decision tree
Decision treeDecision tree
Decision tree
ย 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
ย 
Classfication Basic.ppt
Classfication Basic.pptClassfication Basic.ppt
Classfication Basic.ppt
ย 
Machine learning
Machine learningMachine learning
Machine learning
ย 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Mining
ย 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
ย 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
ย 
Enhancing Privacy of Confidential Data using K Anonymization
Enhancing Privacy of Confidential Data using K AnonymizationEnhancing Privacy of Confidential Data using K Anonymization
Enhancing Privacy of Confidential Data using K Anonymization
ย 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
ย 
MachinaFiesta: A Vision into Machine Learning ๐Ÿš€
MachinaFiesta: A Vision into Machine Learning ๐Ÿš€MachinaFiesta: A Vision into Machine Learning ๐Ÿš€
MachinaFiesta: A Vision into Machine Learning ๐Ÿš€
ย 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
ย 
Computational decision making
Computational decision makingComputational decision making
Computational decision making
ย 
Intrusion Detection System for Classification of Attacks with Cross Validation
Intrusion Detection System for Classification of Attacks with Cross ValidationIntrusion Detection System for Classification of Attacks with Cross Validation
Intrusion Detection System for Classification of Attacks with Cross Validation
ย 
Deployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement predictionDeployment of ID3 decision tree algorithm for placement prediction
Deployment of ID3 decision tree algorithm for placement prediction
ย 

More from Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
Albert Orriols-Puig
ย 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
Albert Orriols-Puig
ย 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
Albert Orriols-Puig
ย 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
Albert Orriols-Puig
ย 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
Albert Orriols-Puig
ย 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
Albert Orriols-Puig
ย 
Lecture10 - Naรฏve Bayes
Lecture10 - Naรฏve BayesLecture10 - Naรฏve Bayes
Lecture10 - Naรฏve Bayes
Albert Orriols-Puig
ย 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
Albert Orriols-Puig
ย 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
Albert Orriols-Puig
ย 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
Albert Orriols-Puig
ย 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
Albert Orriols-Puig
ย 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
Albert Orriols-Puig
ย 

More from Albert Orriols-Puig (20)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
ย 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
ย 
Lecture24
Lecture24Lecture24
Lecture24
ย 
Lecture23
Lecture23Lecture23
Lecture23
ย 
Lecture22
Lecture22Lecture22
Lecture22
ย 
Lecture21
Lecture21Lecture21
Lecture21
ย 
Lecture20
Lecture20Lecture20
Lecture20
ย 
Lecture19
Lecture19Lecture19
Lecture19
ย 
Lecture18
Lecture18Lecture18
Lecture18
ย 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
ย 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
ย 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
ย 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
ย 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
ย 
Lecture10 - Naรฏve Bayes
Lecture10 - Naรฏve BayesLecture10 - Naรฏve Bayes
Lecture10 - Naรฏve Bayes
ย 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
ย 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
ย 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
ย 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
ย 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
ย 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
ย 
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
Nguyen Thanh Tu Collection
ย 

Recently uploaded (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
ย 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
ย 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
ย 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
ย 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
ย 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
ย 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
ย 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
ย 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
ย 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
ย 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
ย 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
ย 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
ย 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
ย 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
ย 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
ย 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
ย 
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
80 ฤแป€ THI THแปฌ TUYแป‚N SINH TIแบพNG ANH Vร€O 10 Sแปž GD โ€“ ฤT THร€NH PHแป Hแป’ CHร MINH Nฤ‚...
ย 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
ย 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
ย 

Lecture5 - C4.5

  • 1. Introduction to Machine Learning Lecture 5 Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence โ€“ Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lecture 4 The input consists of examples featured by different characteristics Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 4 Different problems in machine learning Classification: Find the class to which a new instance belongs to E.g.: Find whether a new patient has cancer or not Numeric prediction: A variation of classification in which the output p p consists of numeric classes E.g.: Find the frequency of cancerous cell found Regression: Find a function that fits your examples E.g.: Find a function that controls your chain process Association: Find association among your problem attributes or variables E.g.: Find relations such as a patient with high-blood-pressure is more lik l to have heart-attack di likely t h h t tt k disease Clustering: Process to cluster/group the instances into classes E.g.: E g : Group clients whose purchases are similar Slide 3 Artificial Intelligence Machine Learning
  • 4. Todayโ€™s Agenda Reviewing the Goal of Data Classification Whatโ€™s a Decision Tree How to build Decision Trees: ID3 From ID3 to C4.5 Run C4.5 on Weka Slide 4 Artificial Intelligence Machine Learning
  • 5. The Goal of Data Classification Data Set Classification Model How? The classification model can be implemented in several ways: โ€ข Rules โ€ข Decision trees โ€ข Mathematical formulae Slide 5 Artificial Intelligence Machine Learning
  • 6. The Goal of Data Classification Data can have complex structures We will accept the following type of data: Data described by features which have a single measurement Features can be Nominal: @attribute color {green, orange, red} Continuous: @attribute length real [0,10] I can have unknown values I could have lost โ€“ or never have measured โ€“ the attribute of a particular example Slide 6 Artificial Intelligence Machine Learning
  • 7. Classifying Plants Letโ€™s classify different plants in three classes: Iris-setosa, iris-virginica, and iris-versicolor Weka format Dataset publically available at UCI repository Slide 7 Artificial Intelligence Machine Learning
  • 8. Classifying Plants Decision tree form this output Internal node: test of the value of a given attribute Branch: value of an attribute Leaf: predicted class How could I automatically generate these types of trees? Slide 8 Artificial Intelligence Machine Learning
  • 9. Types of DT Builders So, where is the trick? Chose the attribute in each internal node and chose the best partition Several algorithms able to generate decision trees: S ll ith bl t t d ii t Naรฏve Bayes Tree Random forests CART (classification and regression tree) ID3 C45 We are going to start from ID3 and finish with C4.5 C 5 s the ost C4.5 is t e most influential dec s o t ee bu de a go t ue t a decision tree builder algorithm Slide 9 Artificial Intelligence Machine Learning
  • 10. ID3 Ross Quinlan started with ID3 (Quinlan, 1979) C4.5 (Quinlan, 1993) Some assumptions in the basic algorithm p g All attributes are nominal We do not have unknown values With these assumptions, Quinlan designed a heuristic assumptions approach to infer decision trees from labeled data Slide 10 Artificial Intelligence Machine Learning
  • 11. Description of ID3 ID3(D, Target, Atts) (Mitchell,1997) returns: a decision tree that correctly classifies the given examples variables D: Training set of examples Target: Attribute whose value is to be predicted by the tree Atts: List of other attributes that may be tested by the learned decision tree create a Root node for the tree if D are all positive then Root โ† + else if D are all negative then Root โ† โ€“ else if Attrs = โˆ… then Root โ† most common value of target in D else Aโ† the best decision attribute from Atts root โ† A for each possible value vi of A add a new tree branch with A=vi Dvi โ† subset of D that have value vi for A if Dvi = โˆ… add then leaf โ† most common value of Target in D else add the subtree ID3( Dvi Target Atts {A} ) Dvi, Target, Atts-{A} Slide 11 Artificial Intelligence Machine Learning
  • 12. Description of ID3 D = {d1 , d 2 ,..., d L } ID3(D, Target, Atts) (Mitchell,1997) Atts = {a1 , a2 ,..., ak } returns: a decision tree that correctly classifies the given examples variables D: Training set of examples Target: Attribute whose value is to be predicted by the tree Atts: List of other attributes that may be tested by the learned decision tree create a Root node for the tree if D are all positive then Root โ† + else if D are all negative then Root โ† โ€“ else if Attrs = โˆ… then Root โ† most common value of target in D else Aโ† the best decision attribute from Atts root โ† A for each possible value vi of A add a new tree branch with A=vi Dvi โ† subset of D that have value vi for A if Dvi = โˆ… add then leaf โ† most common value of Target in D else add the subtree ID3( Dvi, Target, Atts-{A} ) Atts {A} Slide 12 Artificial Intelligence Machine Learning
  • 13. Description of ID3 D = {d1 , d 2 ,..., d L } ID3(D, Target, Atts) (Mitchell,1997) Atts = {a1 , a2 ,..., ak } returns: a decision tree that correctly classifies the given examples variables D: Training set of examples Target: Attribute whose value is to be predicted by the tree Atts: List of other attributes that may be tested by the learned decision tree create a Root node for the tree if D are all positive then Root โ† + else if D are all negative then Root โ† โ€“ else if Attrs = โˆ… then Root โ† most common value of target in D else Aโ† the best decision attribute from Atts root โ† A for each possible value vi of A add a new tree branch with A=vi Dvi โ† subset of D that have value vi for A if Dvi = โˆ… add then leaf โ† most common value of Target in D else add the subtree ID3( Dvi, Target, Atts-{A} ) Atts {A} Slide 13 Artificial Intelligence Machine Learning
  • 14. Description of ID3 D = {d1 , d 2 ,..., d L } ID3(D, Target, Atts) (Mitchell,1997) returns: a decision tree that correctly classifies the given examples Atts = {a1 , a2 ,..., ak } variables D: Training set of examples ai Target: Attribute whose value is to be predicted by the tree Atts: List of other attributes that may be tested by the learned decision tree create a Root node for the tree if D are all positive then Root โ† + else if D are all negative then Root โ† โ€“ else if Attrs = โˆ… then Root โ† most common value of target in D else Aโ† the best decision attribute from Atts root โ† A for each possible value vi of A add a new tree branch with A=vi Dvi โ† subset of D that have value vi for A if Dvi = โˆ… add then leaf โ† most common value of Target in D else add the subtree ID3( Dvi, Target, Atts-{A} ) Atts {A} Slide 14 Artificial Intelligence Machine Learning
  • 15. Description of ID3 D = {d1 , d 2 ,..., d L } ID3(D, Target, Atts) (Mitchell,1997) returns: a decision tree that correctly classifies the given examples Atts = {a1 , a2 ,..., ak } variables D: Training set of examples ai Target: Attribute whose value is to be predicted by the tree Atts: List of other attributes that may be tested by the learned decision tree create a Root node for the tree ai = v1 ai = v2 ai = vn if D are all positive then Root โ† + else if D are all negative then Root โ† โ€“ โ€ฆ else if Attrs = โˆ… then Root โ† most common value of target in D else Aโ† the best decision attribute from Atts root โ† A for each possible value vi of A add a new tree branch with A=vi Dvi โ† subset of D that have value vi for A if Dvi = โˆ… add then leaf โ† most common value of Target in D else add the subtree ID3( Dvi, Target, Atts-{A} ) Atts {A} Slide 15 Artificial Intelligence Machine Learning
  • 16. Description of ID3 D = {d1 , d 2 ,..., d L } ID3(D, Target, Atts) (Mitchell,1997) Atts = {a1 , a2 ,..., ak } returns: a decision tree that correctly classifies the given examples variables ai D: Training set of examples Target: Attribute whose value is to be predicted by the tree Atts: List of other attributes that may be tested by the learned decision tree ai = v1 ai = v2 ai = vn ai = v1 ai = v2 โ€ฆ ai = vn create a Root node for the tree if D are all positive then Root โ† + โ€ฆ else if D are all negative then Root โ† โ€“ else if Attrs = โˆ… then Root โ† most common value of target in D else D = {d1' , d 2' ,..., d L } ' Aโ† the best decision attribute from Atts D = {d1''' , d 2''' ,..., d L''' } root โ† A Atts = {a1 ,..ai โˆ’1 , ai +1 ,.., ak } Atts = {a1 ,..ai โˆ’1 , ai +1 ,.., ak } for each possible value vi of A add a new tree branch with A=vi Dvi โ† subset of D that have value vi for A if Dvi = โˆ… add then leaf โ† most common value of Target in D else add the subtree ID3( Dvi, Target, Atts-{A} ) Atts {A} Slide 16 Artificial Intelligence Machine Learning
  • 17. Which Attribute Should I Select First? Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 1 or attribute 2 in this iteration of the node? d? Slide 17 Artificial Intelligence Machine Learning
  • 18. Letโ€™s Rely on Information Theory Use the concept of Entropy Characterizes the impurity of an arbitrary collection of examples Given S: Entropy ( S ) = โˆ’ pโŠ• log 2 pโŠ• โˆ’ pโˆ’ log 2 pโˆ’ where p+ and p- are the proportion of positive/negative examples in S. Extension t c classes: Et i to l Examples: c p+=0 โ†’ entropy=0 Entropy ( S ) = โˆ’โˆ‘ p log 2 pi i p+=1 โ†’ entropy=0 i =1 p+=0.5 p-=0.5 โ†’ entropy=1 (maximum) P+=9 p-=5 โ†’ entropy=- ( (9/14)log2(9/14)- ) g( ) (5/14)log2(5/14)=0.940 Slide 18 Artificial Intelligence Machine Learning
  • 19. Entropy What does this measure mean? Entropy is the minimum number of bits needed to encode the classification of a member of S randomly drawn. p p+=1, the receiver knows the class, no message sent, Entropy=0. , , g , py p+=0.5, 1 bit needed. Optimal length code assigns -log2p to message having probability p The idea behind is to assign shorter codes to the more probable messages and longer codes to less likely examples. Thus, the expected number of bits t encode + or โ€“ of random Th th td b f bit to d f d member of S is: Entropy ( S ) = pโŠ• (โˆ’ log 2 pโŠ• ) + pโˆ’ (โˆ’ log 2 pโˆ’ ) Slide 19 Artificial Intelligence Machine Learning
  • 20. Information Gain Measures the expected reduction in entropy caused by partitioning the examples according to the given attribute Gain(S,A): the number of bits saved when encoding the target value of an arbitrary member of S, knowing the value of attribute A. Expected reduction in entropy caused by knowing the value of A. |S | Sv Gain( S , A) = Entropy ( S ) โˆ’ โˆ‘vโˆˆvalues ( A ) Entropy ( S v ) |S| where values(A) is the set of all possible values for A, and Sv is the subset of S for which attribute A has value v Slide 20 Artificial Intelligence Machine Learning
  • 21. Remember the Example? Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 1 or attribute 2 in this iteration of the node? d? Gain(A1)=0.993 - 26/64 0.70 - 36/64 0.74=0.292 Gain(A2)=0.993 - 51/64 0.93 -13/64 0.61 = 0.128 Slide 21 Artificial Intelligence Machine Learning
  • 22. Yet Another Example The textbook example Slide 22 Artificial Intelligence Machine Learning
  • 23. Yet Another Example Slide 23 Artificial Intelligence Machine Learning
  • 24. Hypothesis Search Space Slide 24 Artificial Intelligence Machine Learning
  • 25. To Sum Up ID3 is a strong system that Uses hill-climbing search based on the information gain measure to search through the space of decision trees Outputs O t t a single hypothesis. i lh th i Never backtracks. It converges to locally optimal solutions. Uses all training examples at each step, contrary to methods that make decisions incrementally. Uses statistical properties of all examples: the search is less sensitive to errors in individual training examples. Can handle noisy data by modifying its termination criterion to accept hypotheses that imperfectly fit the data. Slide 25 Artificial Intelligence Machine Learning
  • 26. Next Class From ID3 to C4.5. C4.5 extends ID3 and enables the system to: Be more robust in the presence of noise. Avoiding overfitting Deal with continuous attributes Deal with missing data Convert trees to rules Slide 26 Artificial Intelligence Machine Learning
  • 27. Introduction to Machine Learning Lecture 5 Albert Orriols i Puig aorriols@salle.url.edu Artificial Intelligence โ€“ Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull