SlideShare a Scribd company logo
1 of 46
Download to read offline
1
A Survey on
Boosting Algorithms for
Supervised Learning
Artur Ferreira
supervisor Prof. Mário Figueiredo
26 November 2007
2
Summary
1. Ensembles of classifiers and boosting
2. Boosting: origins and features
3. The AdaBoost algorithm and its variants
 Supervised Learning (SL)
 Semi-Supervised Learning (SSL)
4. Experimental results (binary classification)
 Comparison with other techniques
5. Concluding remarks (and ongoing work)
3
1. Ensembles of classifiers
 Motivation for ensembles of classifiers
 Instead of learning a single complex classifier, learn
several simple classifiers
 Combine the output of the simple classifiers to
produce the classification decision
 For instance, instead of training a large neural
network (NN), train several smaller NN and combine
their individual output in order to produce the final
output
4
1. Ensembles of classifiers
 H1(x),...,HM(x) are the weak learners
 The output classification decision is given by
H(x)
5
2. Boosting: origins and features
 Boosting builds an ensemble of classifiers
 Combine simple classifiers to build a robust classifier
 Each classifier has an associated contribution
6
2. The origins of Boosting
 Boosting is related to other techniques such as
Bootstrap and Bagging
7
2. Bootstrap: features
 General purpose sample-based statistical method
 Assesses the statistical accuracy of some estimate S(Z), over a
training set Z={z1, z2,…, zN}, with zi=(xi,yi)
 Draws randomly with replacement from a set of data points,
 Uses B versions of the training set
 To check the accuracy, the measure is applied over the B
sampled versions of the training set
8
2. Bootstrap: graphical idea
9
2. Bootstrap: algorithm
Input:
• Training set Z={z1,z2,…,zN}, with zi=(xi,yi).
• B, number of sampled versions of the training set.
Output:
• S(Z), statistical estimate and its accuracy.
Step 1
for n=1 to B
a) Draw, with replacement, L < N samples from the training set Z,
obtaining the nth sample Z*n.
b) For each sample Z*n, estimate a statistic S (Z*n).
end
Step 2
Produce the bootstrap estimate S(Z), using S (Z*n) with n={1,…,B}.
Step 3
Compute the accuracy of the estimate, using the variance or some other
criterion.
10
2. Bagging: features
 Proposed by Breiman, 1996
 Consists of Bootstrap aggregation
 Training set Z={z1, z2,…, zN}, with zi=(xi,yi) for
which we intend to fit a model f(x)
 Obtain a prediction of f(x) at input x
 For each bootstrap sample b we fit our model,
giving a prediction fb(x).
 The Bagging estimate is the average of the
individual estimates
11
2. Bagging: algorithm (classification)
Input:
• Training set Z={z1,z2,…,zN}, with zi=(xi,yi).
• B, number of sampled versions of the training set.
Output:
• H(x), a classifier suited for the training set.
Step 1
for n=1 to B
a) Draw, with replacement, L < N samples from the training set Z,
obtaining the nth sample Z*n.
b) For each sample Z*n, learn classifier Hn.
end
Step 2
Produce the final classifier as a vote of Hn with n={1,…,B}.
B
n
n xHxH
1
)(sgn)(
12
2. Boosting: features
 Freund and Schapire, 1989
 Similar to Bagging, in the sense that uses
several versions of the training set
 Uses 3 subsets of the training set
 Differs from Bagging, in the sense that does not
allow replacement
 The final classification is done my a majority
vote (3 classifiers)
13
2. Boosting: algorithm (classification)
Input:
• Training set Z={z1,z2,…,zN}, with zi=(xi,yi).
Output:
• H(x), a classifier suited for the training set.
Step 1 - Draw, without replacement, L < N samples from the training set Z,
obtaining Z*1; train weak learner H1 on Z*1
Step 2 - Select L2 < N samples from Z with half of the samples
misclassified by H1 to obtain Z*2; train weak learner H2 on it.
Step 3 - Select all samples from Z that H1 and H2 disagree on; train weak
learner H3, using these samples.
Step 4 - Produce the final classifier as a vote of the three weak learners
3
1
)(sgn)(
n
n xHxH
14
2. Boosting: graphical idea
15
3. AdaBoost Algorithm
 Adaptive Boosting
 By Freund and Schapire, 1996
 Instead of sample, reweight the
data
 Consecutive classifiers are
learned on weigthed versions
of the data
 The entire training set is
considered in order to learn
each classifier
16
3. AdaBoost Algorithm
 Consecutive classifiers are
learned on weigthed versions of
the data
 The weight is given by the
(mis)classification of the
previous classifiers
 The next classifier focus on the
most difficult patterns
 The algorithm stops when:
 the error rate of a classifier is >0.5
 it reaches M classifiers
17
3. AdaBoost Algorithm
1. Initialize the weights , with
2. For m=1 to M
a) Fit a classifier to data using weights
b) Compute the error rate of the classifier
c) Compute the contribution
d) Update the weights
3. Output the ensemble
AdaBoost
Input: Training set (x,y) N = # input patterns M = # classifiers
Output: The ensemble given by
18
3. AdaBoost Algorithm: detail 1
1. Initialize the weights
2. For m=1 to M
a) Fit a classifier to data using current weights
b) Compute the error rate of the current classifier
c) Compute the weighted contribution of the classifier
d) Update the weight of each
input pattern
3. Output the ensemble
19
3. AdaBoost Algorithm: detail 2
1. Initialize the weights
2. For m=1 to M
a) Fit a classifier to data using current weights
b) Compute the error rate of the current classifier
c) Compute the weighted contribution of the classifier
d) Update the weight of each input pattern
3. Output the ensemble
Indicator
function equal
different
,1
,1 Weight increases
Weight decreases
20
3. AdaBoost Algorithm
 Comparison of Boosting (1989) and AdaBoost (1996)
Feature Boosting AdaBoost
Data processing Random sampling
without replacement
Weighting (no
sampling)
Num. of classifiers Three Up to M
Decision Majority vote Weighted vote
21
3. AdaBoost Algorithm: variants
 Variants for SL
 Binary case
 Since 1999,
several variants
have been
proposed
22
3. AdaBoost Algorithm: variants
 Variants for SL
 Multiclass case
 Since 1997,
several variants
have been
proposed
23
3. AdaBoost Algorithm: variants
 Variants for SSL
 Since 2001, only three variants have been proposed
 The key issue is how to handle unlabeled data
 usually there is some loss (or contrast) function
 this my current topic of research and maybe the title of
my next talk...!
24
3. AdaBoost variants
 Variants for SL to consider:
 Real AdaBoost (1999)
 Gentle Boost (2000)
 Modest AdaBoost (2005)
 Reweight Boost (2007)
1. Initialize the weights
2. For m=1 to M
a) Fit a classifier to data using current weights
b) Compute the error rate of the current classifier
c) Compute the weighted contribution of the classifier
d) Update the weight of each input pattern
3. Output the ensemble
Usually, variants
change only these
steps
3. AdaBoost variants: Real AdaBoost
25
Input:
• Training set Z={z1,z2,…,zN}, with zi=(xi,yi).
• M, the maximum number of classifiers.
Output:
• H(x), a classifier suited for the training set.
Step 1
Initialize the weights wi = 1/N, i={1,...,N}
Step 2
for m=1 to M
a) Fit the class probability estimate pm(x)=Pw(y=1|x), using weights wi on
the training data.
b) Set
c) Set weights and renormalize.
Step 3
Produce the final classifier.
(x)p
(x)p-1
log0.5(x)H
m
m
m
)(exp xHyww miii
M
n
n xHxH
1
)(sgn)(
AdaBoost variants
 Real AdaBoost (RA) differs from AdaBoost on steps 2a)
and 2b)
 On RA, these steps consist on the calculation of the probability
that a given pattern belongs to a class.
 The AdaBoost algorithm classifies the input patterns and
calculates the weighted amount of error.
 RA performs exact optimization with respect to Hm
 Gentle AdaBoost (GA) improves it, using weighted
least-squares regression
 Usually, GA produces a more reliable and stable
ensemble.
26
3.AdaBoost variants: Gentle AdaBoost
27
Input:
• Training set Z={z1,z2,…,zN}, with zi=(xi,yi).
• M, the maximum number of classifiers.
Output:
• H(x), a classifier suited for the training set.
Step 1
Initialize the weights wi = 1/N, i={1,...,N}
Step 2
for m=1 to M
a) Train Hm(x) by weighted least squares of xi to yi, with weights wi
b) Set
c) Set weights and renormalize.
Step 3
Produce the final classifier.
(x)HH(x)H(x) m
)(exp xHyww miii
M
n
n xHxH
1
)(sgn)(
AdaBoost variants
 Modest AdaBoost (MA) improves Gentle AdaBoost
(GA) improves it, using an “inverted” distribution
 this distribution gives higher weights to samples that are
already correctly classified by earlier classifiers
 Usually MA attains ensembles with less generalization
error than GA
 But, MA typpicaly has higher training error than GA
28
ww 1
3.AdaBoost variants: Modest AdaBoost
29
Input:....
Output:… The same as before
Step 1 - Initialize the weights wi = 1/N, i={1,...,N}
Step 2
for m=1 to M
a) Train Hm(x) by weighted least squares of xi to yi, with weights wi
b) Compute inverted distribution and renormalize.
c) Compute
d) Set
e) Update
Step 3
Produce the final classifier.
w-1w
)(exp xHyww miii
M
n
n xHxH
1
)(sgn)(
))(,1(
))(,1(
))(,1(
))(,1(
1
1
1
1
xHyPP
xHyPP
xHyPP
xHyPP
mwm
mwm
mwm
mwm
)1()1()( 1111
mmmmm PPPPxH
How good the current
classifier is at predicting
class labels.
How well the current
classifier is on the data
previously classified by
earlier classifiers.
3.AdaBoost variants: Modest AdaBoost
30
 The update on step 2d) is
 Decreases the contribution of a given classifier if it
works “too well” on the previously correctly classified
data, with high margin
 Increases the contribution of the classifiers that have
high certain on the current classification
 The algorithm is named “Modest” because the class
tend to work in “their domain”
)1()1()( 1111
mmmmm PPPPxH
3.AdaBoost variants: Reweight Boost
31
Input:....
r (>0), the number of previous classifiers
Output:… The same as before
Step 1 - Initialize the weights wi = 1/N, i={1,...,N}
Step 2
for m=1 to M
a) Fit a classifier Hm(x) to the data with weights wi
b) Get combined classifier Hrm(x) from the previous r classifiers
c) Compute error
d) Compute the contribution
e) Update the weights
Step 3
Produce the final classifier.
32
4. Experimental results
 Motivated by the claim
 “AdaBoost with trees is the best off-the shelf classifier”
 and by this proven result
 AdaBoost performs quite well, when each classifier is just a little better
than random guessing (Friedam et al, 2001)
 We carried out some results on synthetic and real data using the
weak classifiers:
 Generative (2 gaussians)
 RBF (Radial Basis Function) Unit with 2 gaussians and one output layer
33
4. Generative Weak Learner (AG)
 The generative weak learner has two multivariate Gaussian
functions G0 and G1 (one per class)
 p0 and p1 are the (sample) probabilites of each class
 Gaussians learned by weighted mean and covariance estimation
34
4. RBF Weak Learner (ARBF)
 The RBF (Radial Basis Function) weak learner has
 two multivariate Gaussian functions (one per class)
 one output layer
 The logistic function is applied on the output layer
35
4. RBF Weak Learner (ARBF)
 Training of the RBF classifier is carried out by one-
stage (global) algorithm*
 Mean and covariance matrix of each Gaussian are learn by the EM
algorithm for mixture of gaussians (generative approach)
 Simultaneously, the output layer coefficients are learned by logistic
regression (discriminative approach), by the Newton-Raphson step
* A. Ferreira and M. Figueiredo, Hybrid generative/discriminative training of RBF Networks, ESANN2006
E-Step
M-Step
36
4. Experimental results – synthetic 1
 Synthetic 1 dataset: two-dimensional; 100 train patterns (50 +
50); 400 test patterns (200 + 200)
 Using up to M=10 classifiers
 Test set error rate: AG (M=10) 6.5% ARBF (M=3) 1%
AG ARBF
37
4. Experimental results – synthetic 1
 Synthetic 1 dataset: two-dimensional; 100 train patterns (50 +
50); 400 test patterns (200 + 200)
 Using up to M=10 classifiers
 Training and test set error rate, as a function of M
• ARBF obtains better results, but is
unable to achieve zero errors on the
test set
• For both algorithms, after a given
value of M, there is no decrease on the
error
38
4. Experimental results – synthetic 2
AG ARBF
 Synthetic 2 dataset: two-dimensional; 180 train patterns (90 +
90); 360 test patterns (180 + 180)
 Using up to M=15 classifiers
 Test set error rate: AG (M=15) 0.28% ARBF (M=15) 0.83%
39
4. Experimental results – synthetic 2
 Synthetic 2 dataset: two-dimensional; 180 train patterns (90 +
90); 360 test patterns (180 + 180)
 Using up to M=15 classifiers
 Training and test set error rate, as a function of M
• In this case, AG achieves better results
• Unable to achieve zero errors on the test set
• ARBF levels off with M=6
• For AG we have it with M=12
40
4. Experimental results – standard data sets
Dataset Dimension Train Test
Kwok 2 200 (1) + 300(0) 4080(1) + 6120(0)
• On the test set, AG is slightly better
• The same happens on the training set, but
with a larger difference
• Both algorithms level off with M=4, on the
training and test sets
41
4. Experimental results – standard data sets
Dataset Dimension Train Test
Crabs 5 40(1) + 40(0) 60(1) + 60(0)
• AG is better than ARBF
• With M=24, AG attains 1.7%
of error rate on the test set
• ARBF levels off with M=12
42
4. Experimental results – standard data sets
Dataset Dim AG
Adaboost
Generative
ATR
Adaboost Trees
Real
ATM
Adaboost Trees
Modest
SVM
M % M % M % %
Ripley 2 4 10.3 15 15.5 15 10.1 10.6
Kwok 2 4 17.5 80 13.6 80 12.5 11.7
Crabs 5 25 1.7 24 45.8 24 47.5 3.3
Phoneme 5 3 20.9 3 22.1 3 22.1 15.4
Pima 7 3 23.5 3 23.2 3 23.2 19.7
Abalone 8 3 24.5 3 24.7 3 24.7 20.9
Contraceptive 9 3 36.7 3 34.4 3 34.4 28.6
TicTacToe 9 4 24.5 4 28.2 3 33.2 1.7
M – the number of weak learners
Blue – the best test set error rate
Underline – the best boosting algorithm
43
4. Experimental results
Detector AdaBoost Variant Weak Learner
Viola-Jones (2001) Discrete AdaBoost Stub
Float Boost (2004) Float Boost 1D Histograms
KLBoost (2003) KLBoost 1D Histograms
Schneiderman Real AdaBoost One group of n-D
Histograms
 Application to face detection
44
5. Concluding remarks
 Adaptive Boosting is a field of intensive research
 The main focus of research is for SSL
 It seems to be “easy” to accomodate for unlabeled data
 Our tests, on SL, showed that:
 Boosting of generative and RBF classifiers achieve performance close to
boosting of trees, on standard datasets
 Good performance with low training complexity, suited for embedded
systems
 AG and ARBF obtain results close to SVM, using a small number of weak
learners
 Typically, (the simpler) AG attains better results than ARBF
 For high-dimensional datasets (with small training set) may be necessary
to use diagonal covariances
45
5. Concluding remarks: ongoing work
 For SL:
 Establish (one more!) variant of AdaBoost, using the logistic
function
 The logistic function gives us a degree of confidence on the
classification
 For SSL:
 Study and minor modifications of existing boosting
algorithms
 MixtBoost, SSMarginBoost, SemiBoost
 Extend our AdaBoost variant, proposed for SL, to SSL
 using the logistic function properties to accommodate for unlabeled data
46
5. Concluding remarks: ongoing work
 Logistic Function
)exp(1
1
)(
a
al

More Related Content

What's hot

Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningPaxcel Technologies
 
Naive bayes
Naive bayesNaive bayes
Naive bayesumeskath
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data SciencePremier Publishers
 
Machine learning
Machine learningMachine learning
Machine learningShreyas G S
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEParogozhnikov
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3arogozhnikov
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 

What's hot (13)

Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 
Machine learning
Machine learningMachine learning
Machine learning
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
ppt
pptppt
ppt
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEP
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 

Viewers also liked

Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeBernard Ong
 
Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...
Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...
Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...IJARIIT
 
Datamining 4th Adaboost
Datamining 4th AdaboostDatamining 4th Adaboost
Datamining 4th Adaboostsesejun
 
Métodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De Crédito
Métodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De CréditoMétodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De Crédito
Métodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De CréditoDMC Perú
 
Object tracking survey
Object tracking surveyObject tracking survey
Object tracking surveyRich Nguyen
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoostKato Mivule
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA BoostAman Patel
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …Dongseo University
 
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…Dongseo University
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble MethodsDongseo University
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University
 

Viewers also liked (20)

Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...
Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...
Classifications & Misclassifications of EEG Signals using Linear and AdaBoost...
 
Datamining 4th Adaboost
Datamining 4th AdaboostDatamining 4th Adaboost
Datamining 4th Adaboost
 
Métodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De Crédito
Métodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De CréditoMétodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De Crédito
Métodos Predictivos: Aplicación a la Detección de Fraudes en Tarjetas De Crédito
 
Object tracking survey
Object tracking surveyObject tracking survey
Object tracking survey
 
Multiple Classifier Systems
Multiple Classifier SystemsMultiple Classifier Systems
Multiple Classifier Systems
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Ada boost
Ada boostAda boost
Ada boost
 
Week6 face detection
Week6 face detectionWeek6 face detection
Week6 face detection
 
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
 
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
 

Similar to 2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…

Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Figure 1.doc
Figure 1.docFigure 1.doc
Figure 1.docbutest
 
Figure 1.doc
Figure 1.docFigure 1.doc
Figure 1.docbutest
 
Figure 1.doc
Figure 1.docFigure 1.doc
Figure 1.docbutest
 
Bagging_and_Boosting.pptx
Bagging_and_Boosting.pptxBagging_and_Boosting.pptx
Bagging_and_Boosting.pptxABINASHPADHY6
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedMachine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedJonathan Mitchell
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierDipesh Shome
 
New approaches for boosting to uniformity
New approaches for boosting to uniformityNew approaches for boosting to uniformity
New approaches for boosting to uniformityNikita Kazeev
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 

Similar to 2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting… (20)

ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Figure 1.doc
Figure 1.docFigure 1.doc
Figure 1.doc
 
Figure 1.doc
Figure 1.docFigure 1.doc
Figure 1.doc
 
Figure 1.doc
Figure 1.docFigure 1.doc
Figure 1.doc
 
Bagging_and_Boosting.pptx
Bagging_and_Boosting.pptxBagging_and_Boosting.pptx
Bagging_and_Boosting.pptx
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedMachine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate Classifier
 
New approaches for boosting to uniformity
New approaches for boosting to uniformityNew approaches for boosting to uniformity
New approaches for boosting to uniformity
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
[ppt]
[ppt][ppt]
[ppt]
 

More from Dongseo University

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfDongseo University
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Dongseo University
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Dongseo University
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Dongseo University
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection AlgorithmDongseo University
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison SortDongseo University
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayDongseo University
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction ExampleDongseo University
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributionsDongseo University
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)Dongseo University
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)Dongseo University
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5Dongseo University
 

More from Dongseo University (20)

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdf
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01
 
Markov Chain Monte Carlo
Markov Chain Monte CarloMarkov Chain Monte Carlo
Markov Chain Monte Carlo
 
Simplex Lecture Notes
Simplex Lecture NotesSimplex Lecture Notes
Simplex Lecture Notes
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Median of Medians
Median of MediansMedian of Medians
Median of Medians
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection Algorithm
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison Sort
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using Array
 
Running Time of MergeSort
Running Time of MergeSortRunning Time of MergeSort
Running Time of MergeSort
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction Example
 
TRPO and PPO notes
TRPO and PPO notesTRPO and PPO notes
TRPO and PPO notes
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributions
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)
 
2017-2 ML W11 GAN #1
2017-2 ML W11 GAN #12017-2 ML W11 GAN #1
2017-2 ML W11 GAN #1
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5
 

Recently uploaded

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 

Recently uploaded (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…

  • 1. 1 A Survey on Boosting Algorithms for Supervised Learning Artur Ferreira supervisor Prof. Mário Figueiredo 26 November 2007
  • 2. 2 Summary 1. Ensembles of classifiers and boosting 2. Boosting: origins and features 3. The AdaBoost algorithm and its variants  Supervised Learning (SL)  Semi-Supervised Learning (SSL) 4. Experimental results (binary classification)  Comparison with other techniques 5. Concluding remarks (and ongoing work)
  • 3. 3 1. Ensembles of classifiers  Motivation for ensembles of classifiers  Instead of learning a single complex classifier, learn several simple classifiers  Combine the output of the simple classifiers to produce the classification decision  For instance, instead of training a large neural network (NN), train several smaller NN and combine their individual output in order to produce the final output
  • 4. 4 1. Ensembles of classifiers  H1(x),...,HM(x) are the weak learners  The output classification decision is given by H(x)
  • 5. 5 2. Boosting: origins and features  Boosting builds an ensemble of classifiers  Combine simple classifiers to build a robust classifier  Each classifier has an associated contribution
  • 6. 6 2. The origins of Boosting  Boosting is related to other techniques such as Bootstrap and Bagging
  • 7. 7 2. Bootstrap: features  General purpose sample-based statistical method  Assesses the statistical accuracy of some estimate S(Z), over a training set Z={z1, z2,…, zN}, with zi=(xi,yi)  Draws randomly with replacement from a set of data points,  Uses B versions of the training set  To check the accuracy, the measure is applied over the B sampled versions of the training set
  • 9. 9 2. Bootstrap: algorithm Input: • Training set Z={z1,z2,…,zN}, with zi=(xi,yi). • B, number of sampled versions of the training set. Output: • S(Z), statistical estimate and its accuracy. Step 1 for n=1 to B a) Draw, with replacement, L < N samples from the training set Z, obtaining the nth sample Z*n. b) For each sample Z*n, estimate a statistic S (Z*n). end Step 2 Produce the bootstrap estimate S(Z), using S (Z*n) with n={1,…,B}. Step 3 Compute the accuracy of the estimate, using the variance or some other criterion.
  • 10. 10 2. Bagging: features  Proposed by Breiman, 1996  Consists of Bootstrap aggregation  Training set Z={z1, z2,…, zN}, with zi=(xi,yi) for which we intend to fit a model f(x)  Obtain a prediction of f(x) at input x  For each bootstrap sample b we fit our model, giving a prediction fb(x).  The Bagging estimate is the average of the individual estimates
  • 11. 11 2. Bagging: algorithm (classification) Input: • Training set Z={z1,z2,…,zN}, with zi=(xi,yi). • B, number of sampled versions of the training set. Output: • H(x), a classifier suited for the training set. Step 1 for n=1 to B a) Draw, with replacement, L < N samples from the training set Z, obtaining the nth sample Z*n. b) For each sample Z*n, learn classifier Hn. end Step 2 Produce the final classifier as a vote of Hn with n={1,…,B}. B n n xHxH 1 )(sgn)(
  • 12. 12 2. Boosting: features  Freund and Schapire, 1989  Similar to Bagging, in the sense that uses several versions of the training set  Uses 3 subsets of the training set  Differs from Bagging, in the sense that does not allow replacement  The final classification is done my a majority vote (3 classifiers)
  • 13. 13 2. Boosting: algorithm (classification) Input: • Training set Z={z1,z2,…,zN}, with zi=(xi,yi). Output: • H(x), a classifier suited for the training set. Step 1 - Draw, without replacement, L < N samples from the training set Z, obtaining Z*1; train weak learner H1 on Z*1 Step 2 - Select L2 < N samples from Z with half of the samples misclassified by H1 to obtain Z*2; train weak learner H2 on it. Step 3 - Select all samples from Z that H1 and H2 disagree on; train weak learner H3, using these samples. Step 4 - Produce the final classifier as a vote of the three weak learners 3 1 )(sgn)( n n xHxH
  • 15. 15 3. AdaBoost Algorithm  Adaptive Boosting  By Freund and Schapire, 1996  Instead of sample, reweight the data  Consecutive classifiers are learned on weigthed versions of the data  The entire training set is considered in order to learn each classifier
  • 16. 16 3. AdaBoost Algorithm  Consecutive classifiers are learned on weigthed versions of the data  The weight is given by the (mis)classification of the previous classifiers  The next classifier focus on the most difficult patterns  The algorithm stops when:  the error rate of a classifier is >0.5  it reaches M classifiers
  • 17. 17 3. AdaBoost Algorithm 1. Initialize the weights , with 2. For m=1 to M a) Fit a classifier to data using weights b) Compute the error rate of the classifier c) Compute the contribution d) Update the weights 3. Output the ensemble AdaBoost Input: Training set (x,y) N = # input patterns M = # classifiers Output: The ensemble given by
  • 18. 18 3. AdaBoost Algorithm: detail 1 1. Initialize the weights 2. For m=1 to M a) Fit a classifier to data using current weights b) Compute the error rate of the current classifier c) Compute the weighted contribution of the classifier d) Update the weight of each input pattern 3. Output the ensemble
  • 19. 19 3. AdaBoost Algorithm: detail 2 1. Initialize the weights 2. For m=1 to M a) Fit a classifier to data using current weights b) Compute the error rate of the current classifier c) Compute the weighted contribution of the classifier d) Update the weight of each input pattern 3. Output the ensemble Indicator function equal different ,1 ,1 Weight increases Weight decreases
  • 20. 20 3. AdaBoost Algorithm  Comparison of Boosting (1989) and AdaBoost (1996) Feature Boosting AdaBoost Data processing Random sampling without replacement Weighting (no sampling) Num. of classifiers Three Up to M Decision Majority vote Weighted vote
  • 21. 21 3. AdaBoost Algorithm: variants  Variants for SL  Binary case  Since 1999, several variants have been proposed
  • 22. 22 3. AdaBoost Algorithm: variants  Variants for SL  Multiclass case  Since 1997, several variants have been proposed
  • 23. 23 3. AdaBoost Algorithm: variants  Variants for SSL  Since 2001, only three variants have been proposed  The key issue is how to handle unlabeled data  usually there is some loss (or contrast) function  this my current topic of research and maybe the title of my next talk...!
  • 24. 24 3. AdaBoost variants  Variants for SL to consider:  Real AdaBoost (1999)  Gentle Boost (2000)  Modest AdaBoost (2005)  Reweight Boost (2007) 1. Initialize the weights 2. For m=1 to M a) Fit a classifier to data using current weights b) Compute the error rate of the current classifier c) Compute the weighted contribution of the classifier d) Update the weight of each input pattern 3. Output the ensemble Usually, variants change only these steps
  • 25. 3. AdaBoost variants: Real AdaBoost 25 Input: • Training set Z={z1,z2,…,zN}, with zi=(xi,yi). • M, the maximum number of classifiers. Output: • H(x), a classifier suited for the training set. Step 1 Initialize the weights wi = 1/N, i={1,...,N} Step 2 for m=1 to M a) Fit the class probability estimate pm(x)=Pw(y=1|x), using weights wi on the training data. b) Set c) Set weights and renormalize. Step 3 Produce the final classifier. (x)p (x)p-1 log0.5(x)H m m m )(exp xHyww miii M n n xHxH 1 )(sgn)(
  • 26. AdaBoost variants  Real AdaBoost (RA) differs from AdaBoost on steps 2a) and 2b)  On RA, these steps consist on the calculation of the probability that a given pattern belongs to a class.  The AdaBoost algorithm classifies the input patterns and calculates the weighted amount of error.  RA performs exact optimization with respect to Hm  Gentle AdaBoost (GA) improves it, using weighted least-squares regression  Usually, GA produces a more reliable and stable ensemble. 26
  • 27. 3.AdaBoost variants: Gentle AdaBoost 27 Input: • Training set Z={z1,z2,…,zN}, with zi=(xi,yi). • M, the maximum number of classifiers. Output: • H(x), a classifier suited for the training set. Step 1 Initialize the weights wi = 1/N, i={1,...,N} Step 2 for m=1 to M a) Train Hm(x) by weighted least squares of xi to yi, with weights wi b) Set c) Set weights and renormalize. Step 3 Produce the final classifier. (x)HH(x)H(x) m )(exp xHyww miii M n n xHxH 1 )(sgn)(
  • 28. AdaBoost variants  Modest AdaBoost (MA) improves Gentle AdaBoost (GA) improves it, using an “inverted” distribution  this distribution gives higher weights to samples that are already correctly classified by earlier classifiers  Usually MA attains ensembles with less generalization error than GA  But, MA typpicaly has higher training error than GA 28 ww 1
  • 29. 3.AdaBoost variants: Modest AdaBoost 29 Input:.... Output:… The same as before Step 1 - Initialize the weights wi = 1/N, i={1,...,N} Step 2 for m=1 to M a) Train Hm(x) by weighted least squares of xi to yi, with weights wi b) Compute inverted distribution and renormalize. c) Compute d) Set e) Update Step 3 Produce the final classifier. w-1w )(exp xHyww miii M n n xHxH 1 )(sgn)( ))(,1( ))(,1( ))(,1( ))(,1( 1 1 1 1 xHyPP xHyPP xHyPP xHyPP mwm mwm mwm mwm )1()1()( 1111 mmmmm PPPPxH How good the current classifier is at predicting class labels. How well the current classifier is on the data previously classified by earlier classifiers.
  • 30. 3.AdaBoost variants: Modest AdaBoost 30  The update on step 2d) is  Decreases the contribution of a given classifier if it works “too well” on the previously correctly classified data, with high margin  Increases the contribution of the classifiers that have high certain on the current classification  The algorithm is named “Modest” because the class tend to work in “their domain” )1()1()( 1111 mmmmm PPPPxH
  • 31. 3.AdaBoost variants: Reweight Boost 31 Input:.... r (>0), the number of previous classifiers Output:… The same as before Step 1 - Initialize the weights wi = 1/N, i={1,...,N} Step 2 for m=1 to M a) Fit a classifier Hm(x) to the data with weights wi b) Get combined classifier Hrm(x) from the previous r classifiers c) Compute error d) Compute the contribution e) Update the weights Step 3 Produce the final classifier.
  • 32. 32 4. Experimental results  Motivated by the claim  “AdaBoost with trees is the best off-the shelf classifier”  and by this proven result  AdaBoost performs quite well, when each classifier is just a little better than random guessing (Friedam et al, 2001)  We carried out some results on synthetic and real data using the weak classifiers:  Generative (2 gaussians)  RBF (Radial Basis Function) Unit with 2 gaussians and one output layer
  • 33. 33 4. Generative Weak Learner (AG)  The generative weak learner has two multivariate Gaussian functions G0 and G1 (one per class)  p0 and p1 are the (sample) probabilites of each class  Gaussians learned by weighted mean and covariance estimation
  • 34. 34 4. RBF Weak Learner (ARBF)  The RBF (Radial Basis Function) weak learner has  two multivariate Gaussian functions (one per class)  one output layer  The logistic function is applied on the output layer
  • 35. 35 4. RBF Weak Learner (ARBF)  Training of the RBF classifier is carried out by one- stage (global) algorithm*  Mean and covariance matrix of each Gaussian are learn by the EM algorithm for mixture of gaussians (generative approach)  Simultaneously, the output layer coefficients are learned by logistic regression (discriminative approach), by the Newton-Raphson step * A. Ferreira and M. Figueiredo, Hybrid generative/discriminative training of RBF Networks, ESANN2006 E-Step M-Step
  • 36. 36 4. Experimental results – synthetic 1  Synthetic 1 dataset: two-dimensional; 100 train patterns (50 + 50); 400 test patterns (200 + 200)  Using up to M=10 classifiers  Test set error rate: AG (M=10) 6.5% ARBF (M=3) 1% AG ARBF
  • 37. 37 4. Experimental results – synthetic 1  Synthetic 1 dataset: two-dimensional; 100 train patterns (50 + 50); 400 test patterns (200 + 200)  Using up to M=10 classifiers  Training and test set error rate, as a function of M • ARBF obtains better results, but is unable to achieve zero errors on the test set • For both algorithms, after a given value of M, there is no decrease on the error
  • 38. 38 4. Experimental results – synthetic 2 AG ARBF  Synthetic 2 dataset: two-dimensional; 180 train patterns (90 + 90); 360 test patterns (180 + 180)  Using up to M=15 classifiers  Test set error rate: AG (M=15) 0.28% ARBF (M=15) 0.83%
  • 39. 39 4. Experimental results – synthetic 2  Synthetic 2 dataset: two-dimensional; 180 train patterns (90 + 90); 360 test patterns (180 + 180)  Using up to M=15 classifiers  Training and test set error rate, as a function of M • In this case, AG achieves better results • Unable to achieve zero errors on the test set • ARBF levels off with M=6 • For AG we have it with M=12
  • 40. 40 4. Experimental results – standard data sets Dataset Dimension Train Test Kwok 2 200 (1) + 300(0) 4080(1) + 6120(0) • On the test set, AG is slightly better • The same happens on the training set, but with a larger difference • Both algorithms level off with M=4, on the training and test sets
  • 41. 41 4. Experimental results – standard data sets Dataset Dimension Train Test Crabs 5 40(1) + 40(0) 60(1) + 60(0) • AG is better than ARBF • With M=24, AG attains 1.7% of error rate on the test set • ARBF levels off with M=12
  • 42. 42 4. Experimental results – standard data sets Dataset Dim AG Adaboost Generative ATR Adaboost Trees Real ATM Adaboost Trees Modest SVM M % M % M % % Ripley 2 4 10.3 15 15.5 15 10.1 10.6 Kwok 2 4 17.5 80 13.6 80 12.5 11.7 Crabs 5 25 1.7 24 45.8 24 47.5 3.3 Phoneme 5 3 20.9 3 22.1 3 22.1 15.4 Pima 7 3 23.5 3 23.2 3 23.2 19.7 Abalone 8 3 24.5 3 24.7 3 24.7 20.9 Contraceptive 9 3 36.7 3 34.4 3 34.4 28.6 TicTacToe 9 4 24.5 4 28.2 3 33.2 1.7 M – the number of weak learners Blue – the best test set error rate Underline – the best boosting algorithm
  • 43. 43 4. Experimental results Detector AdaBoost Variant Weak Learner Viola-Jones (2001) Discrete AdaBoost Stub Float Boost (2004) Float Boost 1D Histograms KLBoost (2003) KLBoost 1D Histograms Schneiderman Real AdaBoost One group of n-D Histograms  Application to face detection
  • 44. 44 5. Concluding remarks  Adaptive Boosting is a field of intensive research  The main focus of research is for SSL  It seems to be “easy” to accomodate for unlabeled data  Our tests, on SL, showed that:  Boosting of generative and RBF classifiers achieve performance close to boosting of trees, on standard datasets  Good performance with low training complexity, suited for embedded systems  AG and ARBF obtain results close to SVM, using a small number of weak learners  Typically, (the simpler) AG attains better results than ARBF  For high-dimensional datasets (with small training set) may be necessary to use diagonal covariances
  • 45. 45 5. Concluding remarks: ongoing work  For SL:  Establish (one more!) variant of AdaBoost, using the logistic function  The logistic function gives us a degree of confidence on the classification  For SSL:  Study and minor modifications of existing boosting algorithms  MixtBoost, SSMarginBoost, SemiBoost  Extend our AdaBoost variant, proposed for SL, to SSL  using the logistic function properties to accommodate for unlabeled data
  • 46. 46 5. Concluding remarks: ongoing work  Logistic Function )exp(1 1 )( a al