SlideShare a Scribd company logo
1 of 37
Download to read offline
Classifiers Ensembles
Machine Learning and Data Mining (Unit 16)




Prof. Pier Luca Lanzi
References                                     2



  Jiawei Han and Micheline Kamber, quot;Data Mining: Concepts
  and Techniquesquot;, The Morgan Kaufmann Series in Data
  Management Systems (Second Edition)
  Tom M. Mitchell. “Machine Learning” McGraw Hill 1997
  Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
  “Introduction to Data Mining”, Addison Wesley
  Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine
  Learning Tools and Techniques with Java Implementations”
  2nd Edition




                  Prof. Pier Luca Lanzi
Outline                                   3



  What is the general idea?

  What ensemble methods?
    Bagging
    Boosting
    Random Forest




                  Prof. Pier Luca Lanzi
Ensemble Methods                                  4



  Construct a set of classifiers from the training data

  Predict class label of previously unseen records by
  aggregating predictions made by multiple classifiers




                   Prof. Pier Luca Lanzi
What is the General Idea?                5




                 Prof. Pier Luca Lanzi
Building models ensembles                       6



  Basic idea
     Build different “experts”, let them vote

  Advantage
    Often improves predictive performance

  Disadvantage
     Usually produces output that is very hard to analyze

  However, there are approaches that aim to produce a single
  comprehensible structure




                   Prof. Pier Luca Lanzi
Why does it work?                                  7



  Suppose there are 25 base classifiers

  Each classifier has error rate, ε = 0.35

  Assume classifiers are independent

  Probability that the ensemble classifier makes a wrong
  prediction:

                       ⎛ 25 ⎞ i
                 25

                 ∑⎜ i ⎟⎜ ⎟ε (1 − ε ) 25−i = 0.06
                            ⎠
                 i =13 ⎝




                      Prof. Pier Luca Lanzi
How to generate an ensemble?               8



  Bootstrap Aggregating (Bagging)

  Boosting

  Random Forests




                   Prof. Pier Luca Lanzi
What is Bagging? (Bootstrap                             9
Aggregation)

  Analogy: Diagnosis based on multiple doctors’ majority vote

  Training
     Given a set D of d tuples, at each iteration i, a training set Di of
     d tuples is sampled with replacement from D (i.e., bootstrap)
     A classifier model Mi is learned for each training set Di

  Classification: classify an unknown sample X
     Each classifier Mi returns its class prediction
     The bagged classifier M* counts the votes and
     assigns the class with the most votes to X

  Prediction: can be applied to the prediction of continuous values by
  taking the average value of each prediction for a given test tuple




                     Prof. Pier Luca Lanzi
What is Bagging?                                  10



  Combining predictions by voting/averaging
    Simplest way
    Each model receives equal weight

  “Idealized” version:
     Sample several training sets of size n
     (instead of just having one training set of size n)‫‏‬
     Build a classifier for each training set
     Combine the classifiers’ predictions




                   Prof. Pier Luca Lanzi
More on bagging                                11



  Bagging works because it reduces variance
  by voting/averaging
     Note: in some pathological hypothetical situations the
     overall error might increase
     Usually, the more classifiers the better

  Problem: we only have one dataset!
  Solution: generate new ones of size n
  by Bootstrap, i.e., sampling from it with replacement

  Can help a lot if data is noisy

  Can also be applied to numeric prediction
    Aside: bias-variance decomposition originally only known
    for numeric prediction



                    Prof. Pier Luca Lanzi
Bagging classifiers                         12




  Model generation
  Let n be the number of instances in the training data
  For each of t iterations:
         Sample n instances from training set
                (with replacement)‫‏‬
         Apply learning algorithm to the sample
         Store resulting model




  Classification
  For each of the t models:
         Predict class of instance using model
  Return class that is predicted most often




                   Prof. Pier Luca Lanzi
Bias-variance decomposition                      13



  Used to analyze how much selection of any specific training
  set affects performance

  Assume infinitely many classifiers,
  built from different training sets of size n

  For any learning scheme,
     Bias = expected error of the combined
     classifier on new data
     Variance = expected error due to
     the particular training set used

  Total expected error ≈ bias + variance




                    Prof. Pier Luca Lanzi
When does Bagging work?                           14



  Learning algorithm is unstable, if small changes to the
  training set cause large changes in the learned classifier

  If the learning algorithm is unstable, then
  Bagging almost always improves performance

  Bagging stable classifiers is not a good idea

  Which ones are unstable?
    Neural nets, decision trees, regression trees, linear
    regression

  Which ones are stable?
    K-nearest neighbors


                   Prof. Pier Luca Lanzi
Why Bagging works?                                   15



  Let T={<xn, yn>} be the set of training data containing N examples

  Let {Tk} be a sequence of training sets containing N examples
  independently sampled from T, for instance Tk can be generated
  using bootstrap

  Let P be the underlying distribution of T

  Bagging replaces the prediction of the model φ(x,T) obtained by E,
  with the majority of the predictions given by the models {φ(x,Tk)}

                         φA(x,P) = ET (φ(x,Tk))

  The algorithm is instable, if perturbing the learning set can cause
  significant changes in the predictor constructed

  Bagging can improve the accuracy of the predictor,



                     Prof. Pier Luca Lanzi
Why Bagging works?                                     16



  It is possible to prove that,

                  (y –ET(φ(x,T)))2 ≤ ET(y - φ(x,T))2

  Thus, Bagging produces a smaller error

  How much is smaller the error is, depends on how much unequal
  are the two sides of,

                      [ET(φ(x,T)) ]2 ≤ ET(φ2(x,T))

  If the algorithm is stable, the two sides will be nearly equal
  If more highly variable the φ(x,T) are the more improvement the
  aggregation produces

  However, φA always improves φ



                      Prof. Pier Luca Lanzi
Bagging with costs                             17



  Bagging unpruned decision trees known to produce good
  probability estimates
     Where, instead of voting, the individual classifiers'
     probability estimates are averaged
     Note: this can also improve the success rate

  Can use this with minimum-expected cost approach
  for learning problems with costs

  Problem: not interpretable
     MetaCost re-labels training data using bagging with costs
     and then builds single tree




                  Prof. Pier Luca Lanzi
Randomization                                 18



  Can randomize learning algorithm instead of input

  Some algorithms already have a random component:
  e.g. initial weights in neural net

  Most algorithms can be randomized, e.g. greedy algorithms:
    Pick from the N best options at random instead of always
    picking the best options
    E.g.: attribute selection in decision trees

  More generally applicable than bagging:
  e.g. random subsets in nearest-neighbor scheme

  Can be combined with bagging


                  Prof. Pier Luca Lanzi
What is Boosting?                                     19



  Analogy: Consult several doctors, based on a combination of
  weighted diagnoses—weight assigned based on the previous
  diagnosis accuracy
  How boosting works?
     Weights are assigned to each training tuple
     A series of k classifiers is iteratively learned
     After a classifier Mi is learned, the weights are updated to allow
     the subsequent classifier, Mi+1, to pay more attention to the
     training tuples that were misclassified by Mi
     The final M* combines the votes of each individual classifier,
     where the weight of each classifier's vote is a function of its
     accuracy
  The boosting algorithm can be extended for the prediction of
  continuous values
  Comparing with bagging: boosting tends to achieve greater
  accuracy, but it also risks overfitting the model to misclassified data



                     Prof. Pier Luca Lanzi
What is the Basic Idea?                              20



  Suppose there are just 5 training examples {1,2,3,4,5}

  Initially each example has a 0.2 (1/5) probability of being sampled

  1st round of boosting samples (with replacement) 5 examples:
  {2, 4, 4, 3, 2} and builds a classifier from them

  Suppose examples 2, 3, 5 are correctly predicted by this classifier,
  and examples 1, 4 are wrongly predicted:
     Weight of examples 1 and 4 is increased,
     Weight of examples 2, 3, 5 is decreased

  2nd round of boosting samples again 5 examples, but now
  examples 1 and 4 are more likely to be sampled

  And so on … until some convergence is achieved

                     Prof. Pier Luca Lanzi
Boosting                                           21



  Also uses voting/averaging

  Weights models according to performance

  Iterative: new models are influenced
  by the performance of previously built ones
      Encourage new model to become an “expert” for instances
      misclassified by earlier models
      Intuitive justification: models should be experts that
      complement each other

  Several variants
     Boosting by sampling,
     the weights are used to sample the data for training
     Boosting by weighting,
     the weights are used by the learning algorithm


                    Prof. Pier Luca Lanzi
AdaBoost.M1                                 22



 Model generation
 Assign equal weight to each training instance
 For t iterations:
   Apply learning algorithm to weighted dataset,
              store resulting model
   Compute model’s error e on weighted dataset
   If e = 0 or e ≥ 0.5:
     Terminate model generation
   For each instance in dataset:
     If classified correctly by model:
        Multiply instance’s weight by e/(1-e)‫‏‬
   Normalize weight of all instances



 Classification
 Assign weight = 0 to all classes
 For each of the t (or less) models:
        For the class this model predicts
               add –log e/(1-e) to this class’s weight
 Return class with highest weight

                  Prof. Pier Luca Lanzi
Example: AdaBoost                                 23



  Base classifiers: C1, C2, …, CT

  Error rate:


           ∑ w δ (C ( x ) ≠ y )
            N
       1
  εi =            j      i     j              j
       N   j =1


  Importance of a classifier:

       1 ⎛ 1 − εi ⎞
   αi = ln⎜
          ⎜ε ⎟
       2⎝ i⎟      ⎠



                      Prof. Pier Luca Lanzi
Example: AdaBoost                                        24



  Weight update:

                       ⎧exp −α j if C j ( xi ) = yi
                   w⎪
                    ( j)
         ( j +1)
                 =     ⎨
                    i
      wi                    α
                   Z j ⎪ exp j if C j ( xi ) ≠ yi
                       ⎩
       where Z j is the normalizat ion factor

  If any intermediate rounds produce error rate higher than
  50%, the weights are reverted back to 1/n and the
  resampling procedure is repeated

  Classification:

                C * ( x ) = arg max ∑ α jδ (C j ( x ) = y )
                                                  T


                                                  j =1
                                              y

                      Prof. Pier Luca Lanzi
Illustrating AdaBoost                              25




             Initial weights for each data point        Data points
                                                        for training




                    Prof. Pier Luca Lanzi
Illustrating AdaBoost                    26




                 Prof. Pier Luca Lanzi
Adaboost (Freund and Schapire, 1997)                        27



   Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)
   Initially, all the weights of tuples are set the same (1/d)
   Generate k classifiers in k rounds. At round i,
        Tuples from D are sampled (with replacement) to form a
        training set Di of the same size
        Each tuple’s chance of being selected is based on its weight
        A classification model Mi is derived from Di
        Its error rate is calculated using Di as a test set
        If a tuple is misclassified, its weight is increased, o.w. it is
        decreased
   Error rate: err(Xj) is the misclassification error of tuple Xj.
   Classifier Mi error rate is the sum of the weights of the
   misclassified tuples:
                                       d
                      error ( M i ) = ∑ w j × err ( X j )
                                       j

   The weight of classifier Mi’s vote is
                              1 − error ( M i )
                        log
                                error ( M i )
                      Prof. Pier Luca Lanzi
What is a Random Forest?                         28



  Random forests (RF) are a combination of tree predictors

  Each tree depends on the values of a random vector sampled
  independently and with the same distribution for all trees in
  the forest

  The generalization error of a forest of tree classifiers depends
  on the strength of the individual trees in the forest and the
  correlation between them

  Using a random selection of features to split each node yields
  error rates that compare favorably to Adaboost, and are
  more robust with respect to noise




                   Prof. Pier Luca Lanzi
How do Random Forests Work?                             29




D = training set                            F = set of tests
k = nb of trees in forest                   n = nb of tests

for i = 1 to k do:
   build data set Di by sampling with replacement from D
   learn tree Ti (Tilde) from Di:
      at each node:
         choose best split from random subset of F of size n
      allow aggregates and refinement of aggregates in tests


make predictions according to majority vote of the set of k trees.




                    Prof. Pier Luca Lanzi
Random Forests                                 30



  Ensemble method tailored for decision tree classifiers

  Creates k decision trees, where each tree is independently
  generated based on random decisions

  Bagging using decision trees can be seen as a special case of
  random forests where the random decisions are the random
  creations of the bootstrap samples




                   Prof. Pier Luca Lanzi
Two examples of random decisions                     31
in decision forests

  At each internal tree node, randomly select F attributes, and
  evaluate just those attributes to choose the partitioning attribute
      Tends to produce trees larger than trees where all attributes are
      considered for selection at each node, but different classes will
      be eventually assigned to different leaf nodes, anyway
      Saves processing time in the construction of each individual
      tree, since just a subset of attributes is considered at each
      internal node

  At each internal tree node, evaluate the quality of all possible
  partitioning attributes, but randomly select one of the F best
  attributes to label that node (based on InfoGain, etc.)
      Unlike the previous approach, does not save processing time




                     Prof. Pier Luca Lanzi
Properties of Random Forests                         32



  Easy to use (quot;off-the-shelvequot;), only 2 parameters
  (no. of trees, %variables for split)
  Very high accuracy
  No overfitting if selecting large number of trees (choose high)
  Insensitive to choice of split% (~20%)
  Returns an estimate of variable importance




                     Prof. Pier Luca Lanzi
Out of the bag                                       33



  For every tree grown, about one-third of the cases are out-of-bag
  (out of the bootstrap sample). Abbreviated oob.

  Put these oob cases down the corresponding tree and get response
  estimates for them.

  For each case n, average or pluralize the response estimates over
  all time that n was oob to get a test set estimate yn for yn.

  Averaging the loss over all n give the test set
  estimate of prediction error.

  The only adjustable parameter in RF is m.

  The default value for m is M. But RF is not sensitive to the value of
  m over a wide range.


                     Prof. Pier Luca Lanzi
Variable Importance                                 34



  Because of the need to know which variables are important in the
  classification, RF has three different ways of looking at variable
  importance

  Measure 1
     To estimate the importance of the mth variable, in the oob
     cases for the kth tree, randomly permute all values of the mth
     variable
     Put these altered oob x-values down the tree and get
     classifications.
     Proceed as though computing a new internal error rate.
     The amount by which this new error exceeds the original test
     set error is defined as the importance of the mth variable.




                    Prof. Pier Luca Lanzi
Variable Importance                                  35



  For the nth case in the data, its margin at the end of a run is the
  proportion of votes for its true class minus the maximum of the
  proportion of votes for each of the other classes

  The 2nd measure of importance of the mth variable is the average
  lowering of the margin across all cases when the mth variable is
  randomly permuted as in method 1

  The third measure is the count of how many margins are lowered
  minus the number of margins raised




                     Prof. Pier Luca Lanzi
Summary of Random Forests                     36



  Random forests are an effective tool in prediction.
  Forests give results competitive with boosting and adaptive
  bagging, yet do not progressively change the training set.
  Random inputs and random features produce good results in
  classification- less so in regression.
  For larger data sets, we can gain accuracy by combining
  random features with boosting.




                  Prof. Pier Luca Lanzi
Summary                                        37



 Ensembles in general improve predictive accuracy
    Good results reported for most application domains,
    unlike algorithm variations whose success are more
    dependant on the application domain/dataset

 Improvement in accuracy, but interpretability decreases
   Much more difficult for the user to interpret an ensemble
   of classification models than a single classification model

 Diversity of the base classifiers in the ensemble is important
    Trade-off between each base classifier’s error and
    diversity
    Maximizing classifier diversity tends to increase the error
    of each individual base classifier


                  Prof. Pier Luca Lanzi

More Related Content

What's hot

Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithmKIRAN R
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodKirkwood Donavin
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA BoostAman Patel
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Naive bayes
Naive bayesNaive bayes
Naive bayesumeskath
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 

What's hot (20)

Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Random forest
Random forestRandom forest
Random forest
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Random forest
Random forestRandom forest
Random forest
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

Similar to Machine Learning Classifiers Ensembles Guide

DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdfDynamicPitch
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 RegressionPier Luca Lanzi
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
Introduction to Machine Learning Lectures
Introduction to Machine Learning LecturesIntroduction to Machine Learning Lectures
Introduction to Machine Learning Lecturesssuserfece35
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningSeung-gyu Byeon
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...ijistjournal
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.pptbutest
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
Download It
Download ItDownload It
Download Itbutest
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...ijistjournal
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 

Similar to Machine Learning Classifiers Ensembles Guide (20)

DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 Regression
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
Introduction to Machine Learning Lectures
Introduction to Machine Learning LecturesIntroduction to Machine Learning Lectures
Introduction to Machine Learning Lectures
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learning
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
Download It
Download ItDownload It
Download It
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
 
Ensemble Learning.pptx
Ensemble Learning.pptxEnsemble Learning.pptx
Ensemble Learning.pptx
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 

More from Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Machine Learning Classifiers Ensembles Guide

  • 1. Classifiers Ensembles Machine Learning and Data Mining (Unit 16) Prof. Pier Luca Lanzi
  • 2. References 2 Jiawei Han and Micheline Kamber, quot;Data Mining: Concepts and Techniquesquot;, The Morgan Kaufmann Series in Data Management Systems (Second Edition) Tom M. Mitchell. “Machine Learning” McGraw Hill 1997 Pang-Ning Tan, Michael Steinbach, Vipin Kumar, “Introduction to Data Mining”, Addison Wesley Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” 2nd Edition Prof. Pier Luca Lanzi
  • 3. Outline 3 What is the general idea? What ensemble methods? Bagging Boosting Random Forest Prof. Pier Luca Lanzi
  • 4. Ensemble Methods 4 Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made by multiple classifiers Prof. Pier Luca Lanzi
  • 5. What is the General Idea? 5 Prof. Pier Luca Lanzi
  • 6. Building models ensembles 6 Basic idea Build different “experts”, let them vote Advantage Often improves predictive performance Disadvantage Usually produces output that is very hard to analyze However, there are approaches that aim to produce a single comprehensible structure Prof. Pier Luca Lanzi
  • 7. Why does it work? 7 Suppose there are 25 base classifiers Each classifier has error rate, ε = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction: ⎛ 25 ⎞ i 25 ∑⎜ i ⎟⎜ ⎟ε (1 − ε ) 25−i = 0.06 ⎠ i =13 ⎝ Prof. Pier Luca Lanzi
  • 8. How to generate an ensemble? 8 Bootstrap Aggregating (Bagging) Boosting Random Forests Prof. Pier Luca Lanzi
  • 9. What is Bagging? (Bootstrap 9 Aggregation) Analogy: Diagnosis based on multiple doctors’ majority vote Training Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement from D (i.e., bootstrap) A classifier model Mi is learned for each training set Di Classification: classify an unknown sample X Each classifier Mi returns its class prediction The bagged classifier M* counts the votes and assigns the class with the most votes to X Prediction: can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple Prof. Pier Luca Lanzi
  • 10. What is Bagging? 10 Combining predictions by voting/averaging Simplest way Each model receives equal weight “Idealized” version: Sample several training sets of size n (instead of just having one training set of size n)‫‏‬ Build a classifier for each training set Combine the classifiers’ predictions Prof. Pier Luca Lanzi
  • 11. More on bagging 11 Bagging works because it reduces variance by voting/averaging Note: in some pathological hypothetical situations the overall error might increase Usually, the more classifiers the better Problem: we only have one dataset! Solution: generate new ones of size n by Bootstrap, i.e., sampling from it with replacement Can help a lot if data is noisy Can also be applied to numeric prediction Aside: bias-variance decomposition originally only known for numeric prediction Prof. Pier Luca Lanzi
  • 12. Bagging classifiers 12 Model generation Let n be the number of instances in the training data For each of t iterations: Sample n instances from training set (with replacement)‫‏‬ Apply learning algorithm to the sample Store resulting model Classification For each of the t models: Predict class of instance using model Return class that is predicted most often Prof. Pier Luca Lanzi
  • 13. Bias-variance decomposition 13 Used to analyze how much selection of any specific training set affects performance Assume infinitely many classifiers, built from different training sets of size n For any learning scheme, Bias = expected error of the combined classifier on new data Variance = expected error due to the particular training set used Total expected error ≈ bias + variance Prof. Pier Luca Lanzi
  • 14. When does Bagging work? 14 Learning algorithm is unstable, if small changes to the training set cause large changes in the learned classifier If the learning algorithm is unstable, then Bagging almost always improves performance Bagging stable classifiers is not a good idea Which ones are unstable? Neural nets, decision trees, regression trees, linear regression Which ones are stable? K-nearest neighbors Prof. Pier Luca Lanzi
  • 15. Why Bagging works? 15 Let T={<xn, yn>} be the set of training data containing N examples Let {Tk} be a sequence of training sets containing N examples independently sampled from T, for instance Tk can be generated using bootstrap Let P be the underlying distribution of T Bagging replaces the prediction of the model φ(x,T) obtained by E, with the majority of the predictions given by the models {φ(x,Tk)} φA(x,P) = ET (φ(x,Tk)) The algorithm is instable, if perturbing the learning set can cause significant changes in the predictor constructed Bagging can improve the accuracy of the predictor, Prof. Pier Luca Lanzi
  • 16. Why Bagging works? 16 It is possible to prove that, (y –ET(φ(x,T)))2 ≤ ET(y - φ(x,T))2 Thus, Bagging produces a smaller error How much is smaller the error is, depends on how much unequal are the two sides of, [ET(φ(x,T)) ]2 ≤ ET(φ2(x,T)) If the algorithm is stable, the two sides will be nearly equal If more highly variable the φ(x,T) are the more improvement the aggregation produces However, φA always improves φ Prof. Pier Luca Lanzi
  • 17. Bagging with costs 17 Bagging unpruned decision trees known to produce good probability estimates Where, instead of voting, the individual classifiers' probability estimates are averaged Note: this can also improve the success rate Can use this with minimum-expected cost approach for learning problems with costs Problem: not interpretable MetaCost re-labels training data using bagging with costs and then builds single tree Prof. Pier Luca Lanzi
  • 18. Randomization 18 Can randomize learning algorithm instead of input Some algorithms already have a random component: e.g. initial weights in neural net Most algorithms can be randomized, e.g. greedy algorithms: Pick from the N best options at random instead of always picking the best options E.g.: attribute selection in decision trees More generally applicable than bagging: e.g. random subsets in nearest-neighbor scheme Can be combined with bagging Prof. Pier Luca Lanzi
  • 19. What is Boosting? 19 Analogy: Consult several doctors, based on a combination of weighted diagnoses—weight assigned based on the previous diagnosis accuracy How boosting works? Weights are assigned to each training tuple A series of k classifiers is iteratively learned After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1, to pay more attention to the training tuples that were misclassified by Mi The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy The boosting algorithm can be extended for the prediction of continuous values Comparing with bagging: boosting tends to achieve greater accuracy, but it also risks overfitting the model to misclassified data Prof. Pier Luca Lanzi
  • 20. What is the Basic Idea? 20 Suppose there are just 5 training examples {1,2,3,4,5} Initially each example has a 0.2 (1/5) probability of being sampled 1st round of boosting samples (with replacement) 5 examples: {2, 4, 4, 3, 2} and builds a classifier from them Suppose examples 2, 3, 5 are correctly predicted by this classifier, and examples 1, 4 are wrongly predicted: Weight of examples 1 and 4 is increased, Weight of examples 2, 3, 5 is decreased 2nd round of boosting samples again 5 examples, but now examples 1 and 4 are more likely to be sampled And so on … until some convergence is achieved Prof. Pier Luca Lanzi
  • 21. Boosting 21 Also uses voting/averaging Weights models according to performance Iterative: new models are influenced by the performance of previously built ones Encourage new model to become an “expert” for instances misclassified by earlier models Intuitive justification: models should be experts that complement each other Several variants Boosting by sampling, the weights are used to sample the data for training Boosting by weighting, the weights are used by the learning algorithm Prof. Pier Luca Lanzi
  • 22. AdaBoost.M1 22 Model generation Assign equal weight to each training instance For t iterations: Apply learning algorithm to weighted dataset, store resulting model Compute model’s error e on weighted dataset If e = 0 or e ≥ 0.5: Terminate model generation For each instance in dataset: If classified correctly by model: Multiply instance’s weight by e/(1-e)‫‏‬ Normalize weight of all instances Classification Assign weight = 0 to all classes For each of the t (or less) models: For the class this model predicts add –log e/(1-e) to this class’s weight Return class with highest weight Prof. Pier Luca Lanzi
  • 23. Example: AdaBoost 23 Base classifiers: C1, C2, …, CT Error rate: ∑ w δ (C ( x ) ≠ y ) N 1 εi = j i j j N j =1 Importance of a classifier: 1 ⎛ 1 − εi ⎞ αi = ln⎜ ⎜ε ⎟ 2⎝ i⎟ ⎠ Prof. Pier Luca Lanzi
  • 24. Example: AdaBoost 24 Weight update: ⎧exp −α j if C j ( xi ) = yi w⎪ ( j) ( j +1) = ⎨ i wi α Z j ⎪ exp j if C j ( xi ) ≠ yi ⎩ where Z j is the normalizat ion factor If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated Classification: C * ( x ) = arg max ∑ α jδ (C j ( x ) = y ) T j =1 y Prof. Pier Luca Lanzi
  • 25. Illustrating AdaBoost 25 Initial weights for each data point Data points for training Prof. Pier Luca Lanzi
  • 26. Illustrating AdaBoost 26 Prof. Pier Luca Lanzi
  • 27. Adaboost (Freund and Schapire, 1997) 27 Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd) Initially, all the weights of tuples are set the same (1/d) Generate k classifiers in k rounds. At round i, Tuples from D are sampled (with replacement) to form a training set Di of the same size Each tuple’s chance of being selected is based on its weight A classification model Mi is derived from Di Its error rate is calculated using Di as a test set If a tuple is misclassified, its weight is increased, o.w. it is decreased Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of the weights of the misclassified tuples: d error ( M i ) = ∑ w j × err ( X j ) j The weight of classifier Mi’s vote is 1 − error ( M i ) log error ( M i ) Prof. Pier Luca Lanzi
  • 28. What is a Random Forest? 28 Random forests (RF) are a combination of tree predictors Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them Using a random selection of features to split each node yields error rates that compare favorably to Adaboost, and are more robust with respect to noise Prof. Pier Luca Lanzi
  • 29. How do Random Forests Work? 29 D = training set F = set of tests k = nb of trees in forest n = nb of tests for i = 1 to k do: build data set Di by sampling with replacement from D learn tree Ti (Tilde) from Di: at each node: choose best split from random subset of F of size n allow aggregates and refinement of aggregates in tests make predictions according to majority vote of the set of k trees. Prof. Pier Luca Lanzi
  • 30. Random Forests 30 Ensemble method tailored for decision tree classifiers Creates k decision trees, where each tree is independently generated based on random decisions Bagging using decision trees can be seen as a special case of random forests where the random decisions are the random creations of the bootstrap samples Prof. Pier Luca Lanzi
  • 31. Two examples of random decisions 31 in decision forests At each internal tree node, randomly select F attributes, and evaluate just those attributes to choose the partitioning attribute Tends to produce trees larger than trees where all attributes are considered for selection at each node, but different classes will be eventually assigned to different leaf nodes, anyway Saves processing time in the construction of each individual tree, since just a subset of attributes is considered at each internal node At each internal tree node, evaluate the quality of all possible partitioning attributes, but randomly select one of the F best attributes to label that node (based on InfoGain, etc.) Unlike the previous approach, does not save processing time Prof. Pier Luca Lanzi
  • 32. Properties of Random Forests 32 Easy to use (quot;off-the-shelvequot;), only 2 parameters (no. of trees, %variables for split) Very high accuracy No overfitting if selecting large number of trees (choose high) Insensitive to choice of split% (~20%) Returns an estimate of variable importance Prof. Pier Luca Lanzi
  • 33. Out of the bag 33 For every tree grown, about one-third of the cases are out-of-bag (out of the bootstrap sample). Abbreviated oob. Put these oob cases down the corresponding tree and get response estimates for them. For each case n, average or pluralize the response estimates over all time that n was oob to get a test set estimate yn for yn. Averaging the loss over all n give the test set estimate of prediction error. The only adjustable parameter in RF is m. The default value for m is M. But RF is not sensitive to the value of m over a wide range. Prof. Pier Luca Lanzi
  • 34. Variable Importance 34 Because of the need to know which variables are important in the classification, RF has three different ways of looking at variable importance Measure 1 To estimate the importance of the mth variable, in the oob cases for the kth tree, randomly permute all values of the mth variable Put these altered oob x-values down the tree and get classifications. Proceed as though computing a new internal error rate. The amount by which this new error exceeds the original test set error is defined as the importance of the mth variable. Prof. Pier Luca Lanzi
  • 35. Variable Importance 35 For the nth case in the data, its margin at the end of a run is the proportion of votes for its true class minus the maximum of the proportion of votes for each of the other classes The 2nd measure of importance of the mth variable is the average lowering of the margin across all cases when the mth variable is randomly permuted as in method 1 The third measure is the count of how many margins are lowered minus the number of margins raised Prof. Pier Luca Lanzi
  • 36. Summary of Random Forests 36 Random forests are an effective tool in prediction. Forests give results competitive with boosting and adaptive bagging, yet do not progressively change the training set. Random inputs and random features produce good results in classification- less so in regression. For larger data sets, we can gain accuracy by combining random features with boosting. Prof. Pier Luca Lanzi
  • 37. Summary 37 Ensembles in general improve predictive accuracy Good results reported for most application domains, unlike algorithm variations whose success are more dependant on the application domain/dataset Improvement in accuracy, but interpretability decreases Much more difficult for the user to interpret an ensemble of classification models than a single classification model Diversity of the base classifiers in the ensemble is important Trade-off between each base classifier’s error and diversity Maximizing classifier diversity tends to increase the error of each individual base classifier Prof. Pier Luca Lanzi