ID3, C4.5 :used to generate a decision tree developed by Ross Quinlan typically used in the machine learning and natural language processing domains, overview about these algorithms with illustrated examples
3. Introduction-Machine Learning
Machine learning is a branch of artificial
intelligence, concerns the construction and
study of systems that can learn from data.
Machine learning and Data mining
Machine
Known
Data
learning prediction
properties learned from the training data.
mining discovery
Previously
With
unknown properties in the data.
overlapping
Al Zaqqa-PSUT
4. What is Decision Tree?
A decision tree is a tree in
which each branch node
represents a choice between a
number of alternatives, and
each leaf node represents a
decision.
Al Zaqqa-PSUT
Root Node:
Attribute
Edges:
Attribute Value
Leaf Node:
output, class
or decision
5. Introduction
ID3 (Iterative Dichotomiser 3) is an algorithm
invented by Ross Quinlan used to generate a
decision tree from a dataset using Shannon
Entropy.
Typically used in the machine learning and
natural language processing domains.
Al Zaqqa-PSUT
6. ID3 basics
ID3 employs Top_Down Induction of Decision
Tree (greedy algorithm)
Attribute selection is the fundamental step to
construct a decision tree.
Select which attribute will be selected to
become a node of the decision tree and so on.
There are two terms Entropy and Information
Gain is used to process attribute selection.
Al Zaqqa-PSUT
7. Entropy
Entropy H(S) is a measure of the amount of
uncertainty in the (data) set S
More uniform More information we can gain
More entropy More information we can gain
Al Zaqqa-PSUT
11. Example-Dataset Elements
Outlook
Temperature
Humidity
Wind
Play ball
Sunny
Hot
High
Sunny
Hot
High
Overcast Hot
High
Weak
Yes
Mild High
Weak
Yes
Yes
Rain
Weak
No
Strong No
Rain
Cool
Normal Weak
Rain
Cool
Normal
Strong No
Cool
Normal
Strong
Overcast
Sunny
Mild High
Sunny
Cool
Weak
Yes
No
Sunny
Overcast
Yes
Mild
Normal Weak
Yes
Mild
Rain
Normal Weak
Normal
Overcast Hot
Rain
Yes
Strong
Mild High
Strong
Yes
Normal Weak
Mild High
Yes
Strong No
Total
Collection (S) All the records in the table refer as Collection
(S).
14
Al Zaqqa-PSUT
12. Example-Dataset Elements
Outlook
Temperature
Humidity
Wind
Play ball
Sunny
High
Sunny
Hot
High
Overcast Hot
Attributes
Hot
High
Weak
Yes
Mild High
Weak
Yes
Yes
Rain
Weak
No
Strong
No
Rain
Cool
Normal Weak
Rain
Cool
Normal
Strong
Cool
Normal
Strong Yes
Overcast
Sunny
Mild High
Sunny
Cool
No
Weak
No
Overcast
Normal Weak
Yes
Mild
Sunny
Yes
Mild
Rain
Normal Weak
Normal
Mild High
Overcast Hot
Rain
Strong Yes
Strong Yes
Normal Weak
Strong
No
Total
Class(C) or
Classifier:Play ball
Mild High
Yes
14
Because based on Outlook, Temperature, Humidity and
Wind we need to decide whether we can Play ball or
not, that’s why Play ball is a classifier to make decision.
Al Zaqqa-PSUT
13. ID3 Algorithm
1.
Compute Entropy(S) =
-(9/14)log2(9/14)-(5/14)log2(5/14)=0.940
2.
Compute information gain for each
attribute:
Gain(S,Windy) = Entropy(S)(8/14)Entropy(Sfalse) -(6/14)Entropy(Strue)
=0.048
Windy: Weak=8(6+,2-), Strong=6(3+,3-)
• Entropy(Sfalse)=-6/8Log2(6/8)-2/8Log2(2/8)=0.811
• Entropy(Strue) =-3/6Log2(3/6)-3/6Log2(3/6)=1
Gain(S,Windy) = 0.940-(8/14)(0.811)-(6/14)(1)=0.048
Al Zaqqa-PSUT
14. ID3 Algorithm
3.
Select attribute with the maximum
information gain for splitting:
Gain(S, Windy)=0.048
Gain(S, Humidity) =0.151
Gain(S, Temperature)=0.029
Gain(S, Outlook) = 0.246
Al Zaqqa-PSUT
15. ID3 Algorithm
4.
Apply ID3 to each child
node of this root, until leaf
node or node that has
entropy=0 are reached.
Al Zaqqa-PSUT
16. C4.5
C4.5 is an extension of Quinlan's earlier ID3
algorithm.
Handling
both continuous and discrete attributes.
Handling training data with missing attribute
values
Pruning trees after creation.
Al Zaqqa-PSUT
17. Continuous-valued attributes
Outlook
Temperature
Humidity
Wind
Play ball
Sunny
Hot
0.9
Sunny
Hot
0.87
Overcast Hot
0.93
Weak
Yes
0.89
Weak
Yes
Weak
Yes
Rain
Mild
Weak
No
Strong No
Rain
Cool
0.80
Rain
Cool
0.59
Strong No
Cool
0.77
Strong
Overcast
Sunny
Mild
Yes
Weak
0.68
Weak
Yes
Mild
0.84
Weak
Yes
Mild
Sunny
0.91
0.72
Strong
Yes
Mild
0.49
Strong
Yes
Cool
Rain
Sunny
Overcast
Overcast Hot
Rain
0.74
Mild
No
Weak
0.86
Yes
Strong No
Total
Al Zaqqa-PSUT
14
18. Continuous-valued attributes
Humidity
Play ball
0.9
No
0.87
1. sort the numeric attribute values,
2. Identify adjacent examples that differ in
their target classification to pick the
threshold.
No
0.93
Yes
0.89
Yes
0.80
Yes
0.59
0.68
0.72
0.87
0.9
0.91
Humidity
yes
yes
no
no
no
No
0.77
0.91
Humidity
Yes
No
0.68
Yes
0.84
Yes
0.72
Yes
0.49
Yes
0.74
Humidity>(0.72+0.87)/2 Humidity>0.795
Yes
0.86
No
Al Zaqqa-PSUT
20. Overfitting
“Under fitting”
“Just right”
“Over fitting”
Overfitting: If we have too many attributes(features) the
learned hypothesis may fit the training set very well, but
fail to generalize to new examples (Predict price on new
examples).
Al Zaqqa-PSUT
23. Why overfitting happens?
Presence of error in the
training examples. (In
general in machine learning).
When small numbers of
examples are associated
with leaf node.
Al Zaqqa-PSUT
24. Reduce Overfitting
Stop growing the tree earlier, before it
reaches the point where it perfectly
classifies the training data. (difficult)
Allow the tree to overfit the data, and
then post-prune the tree.
Al Zaqqa-PSUT
25. Rule post-pruning
(Outlook = Sunny " Humidity = Normal) P
(Outlook = Sunny " Humidity = High) N
(Outlook = Overcast) P
(Outlook = Rain " Wind = Strong) N
(Outlook = Rain " Wind = Weak) P
Al Zaqqa-PSUT
28. Rule post-pruning
Validation set
Save a portion of the data for validation
Training set
s
Validation set
Test set
<= t, prune subtree
{s validation performance with subtree at node, t
validation set performance with leaf instead of subtree)
Rule post-pruning (Quinlan 1993)
Can remove smaller elements than whole subtrees
Improved readability
Reduced-error pruning (Quinlan 1987)
…
Al Zaqqa-PSUT
29. Missing information
Example: Missing information in mammograph
data
BI-RAD Age
shape
Margin
Density Class
4
48
4
5
?
1
5
67
3
5
3
1
5
57
4
4
3
1
5
60
?
5
1
1
4
53
?
4
3
1
4
28
1
1
3
0
4
70
?
2
3
0
2
66
1
1
?
0
5
63
3
?
3
0
4
78
1
1
1
0
Al Zaqqa-PSUT
30. Missing information-according to
most common
Fill in the data according to most common
(given class)
BI-RAD Age
shape
Margin
Density Class
4
48
4
5
3
1
5
67
3
5
3
1
5
57
4
4
3
1
5
60
4
5
1
1
4
53
4
4
3
1
4
28
1
1
3
0
4
70
1
2
3
0
2
66
1
1
3
0
5
63
3
?
3
0
4
78
1
1
0
Al Zaqqa-PSUT
1
32. Summery
ID3, C4.5 :used to generate a decision tree
developed by Ross Quinlan typically used in the
machine learning and natural language
processing domains
ID3, C4.5: uses the entropy of an attribute and
picks the attribute with the highest reduction in
entropy to determine which attribute should the
data be split with first and then through a series of
recursive functions that calculate the entropy of
the node the process is continued until all the left
nodes are pure.
Al Zaqqa-PSUT
Editor's Notes
to minimize the decisiontree depth, when we traverse the tree path, weneed to select the optimal attribute for splitting thetree node, which we can easily imply that theattribute with the most entropy reduction is thebest choice.