3. Adaboost v.2c3
Concept-Classifier
Training procedures
Give +ve and –ve examples to the system, then the
system will learn to classify an unknown input.
E.g. give pictures of faces (+ve examples) and non-
faces (-ve examples) to train the system.
Detection procedures
Input an unknown (e.g. an image) , the system will
tell you it is a face or not.
Face non-face
4.
)(
otherwise1
if1
)(
becomes)(,
.variablesareconstants,are
and),]),[((:functiontheis
1},-or1{polaritywhere
)(
otherwise1
)(if1
)(
use.oyou want tcasewhichcontroltopolarityuseand
equationbecometotogether2and1casecombine,-At time-
otherwise1
constantsgivenare,where1
)(
:aswrittenbecanIt.1otherwise
1then,areawhite""in theis],[xpointaIf
---Case2-
otherwise1
constantsgivenare,where,if1
)(
:aswrittenbecanIt.1otherwise
1then,areagray""in theis][pointaIf
---Case1-
ib
cpmuvp
xh
iequationcpmuvp
u,vm,cwhere
cmuvvuxff
p
i
pxfp
xh
p
(i)t
m,xc ,mu)if -(v-
xh
h(x)
h(x)v)(u
m,xcv-mu
xh
h(x)
h(x)(u,v)x
tt
t
tt
t
t
tttt
t
t
4
(Updated)!! First let us learn what is what a weak classifier h( )
v=mu+c
or
v-mu=c
•m,c are used to define
the line
•Any points in the gray
area satisfy v-mu<c
•Any points in the white
area satisfy v-mu>c
v
c
Gradient m
(0,0)
v-mu<c
v-mu>c
u
5. Adaboost - Adaptive Boosting
5
Instead of resampling, uses training set re-weighting
Each training sample uses a weight to determine the
probability of being selected for a training set.
AdaBoost is an algorithm for constructing a “strong”
classifier as linear combination of “simple” “weak”
classifier
Final classification based on weighted vote of weak
classifiers
6. Concept
Weak learners from
the family of lines
h => p(error) = 0.5 it is at chance
Each data point has
a class label:
wt =1
and a weight:
+1 ( )
-1 ( )
yt =
7. Concept
This one seems to be the best
Each data point has
a class label:
wt =1
and a weight:
+1 ( )
-1 ( )
yt =
This is a ‘weak classifier’: It performs slightly better than chance.
8. Concept
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )
yt =
9. We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )
yt =
Concept
10. Concept
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )
yt =
11. We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )
yt =
Concept
12. The strong (non- linear) classifier is built as the combination of
all the weak (linear) classifiers.
f1 f2
f3
f4
Concept
13. An example to show how Adaboost
works
Adaboost v.2c13
Training,
Present ten samples to the
system :[xi={ui,vi},yi={’+’ or ‘-’}]
5 +ve (blue, diamond) samples
5 –ve (red, circle) samples
Train up the system
Detection
Give an input xj=(1.5,3.4)
The system will tell you it is ‘+’ or ‘-’.
E.g. Face or non-face
Example:
u=weight, v=height
Classification: suitability to play in
the basket ball team.
[xi={-0.48,0},yi=’+’]
[xi={-0.2,-0.5},yi=’+’]u-axis
v-axis
14. Adaboost concept
Adaboost v.2c14
Use this training data,
how to make a classifier
One axis-parallel weak
classifier cannot achieve 100%
classification. E.g. h1(), h2(),
h3() all fail. That means no
matter how you place the
decision line (horizontally or
vertically) you cannot get 100%
classification result.
You may try it yourself!
The above strong classifier should; work,
but how can we find it?
ANSWER:
Combine many weak classifiers to
achieve it.
Training data
6 squares,
5 circles.
h1( )
h2 ( )
h3( )
The solution is a
H_complex( )
Objective: Train a classifier to
classify an unknown input to see
if it is a circle or square.
15. How? Each classifier may not be perfect but each can achieve over 50%
correct rate.
1
T
t
tt (x)hαsignH(x)
1
Adaboost v.2c15
Classification
Result
Combine to form the
Final strong classifier
h1( ) h2() h3( ) h4( ) h5( ) h6() h7()
2 3 4 5 6
7
7,..,2,1for,
classifierweak
eachforWeight
ii
16. Adaboost
Algorithm
Adaboost v.2c16
otherwise
iosigny
)(xhαtS)(xhαxo
CE
)(xhαtI)(xhαsigny
x
)(xhαtI)(xhαsigny
x
I
)(xhαtI
n
E
)(xhαtECE
Z
xhyiD
iDStep
ε
ε
.ε
otherwise
yxh
IIiD
εhD
Xh
,...Tt
)(
niD
YyXx),,y),..(xy(x
ti
i
t
iit
t
i
t
ii
i
i
t
ii
i
n
i
tj
j
ijt
t
ititt
t
t
t
t
t
t
iit
yxhyxh
n
i
tt
q
q
tt
t
t
iinn,
iitiit
0
)(if1
,,and,)(outputThe
}
break;t,Tthen0If
1,,errorhence,
i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf
0,,errorhence,
i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf
:followsasdefinedis)(and
,,,
1
errorclassifiercurrentthehilew
,,errorclassifiercascadedtalCurrent to:Step4
nexplanatioforslidenextsee,
))(exp()(
)(:3
value).confidence(orweight,
1
ln
2
1
:Step2
stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking
0
y)incorrectld(classifie)(if1
where,*)(error:Step1b
minarg:meansthat,respect toerror with
theminimizesthat}1,1{:classifiertheFind:Step1a{
1For
examples)1(negativeofnumberLexamples;1positiveofnumberM
LMnsuch that;/1)((weight)ondistributiInitialze
}1,1{,where:Given
1
1
1
1
1
1
)()(
1
1
11
classifierstrongfinalThe
1
T
t
tt (x)hαsignH(x)
Initialization
Main
Training
loop
The final strong
classifier
See
enlarged
versions
in the
following
slides
)(xhy(i)eD)(xhy(i)eD
weightincorrrectweightcorrectZ
ondistrubutiyprobabilitDionnormalizatZ
iti
α
classifiedyincorrectln
i
titi
-α
classifiedcorrectlyn
i
t
classifiedyincorrectln
i
classifiedcorrectlyn
i
t
tt
tt
__
1
__
1
__
1
__
1
__
aissofactor,where
18. Main loop
(step1,2,3)
Adaboost v.2c18
nexplanatioforslidenextsee,
))(exp()(
)(:3
value).confidence(orweight,
1
ln
2
1
:Step2
stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking
0
)yincrroectlclassified()(if1
where,*)(error:Step1b
minarg:meansthat,respect toerror with
theminimizesthat}1,1{:classifiertheFind:Step1a{
1For
1
)()(
1
t
ititt
t
t
t
t
t
t
iit
yxhyxh
n
i
tt
q
q
tt
t
Z
xhyiD
iDStep
ε
ε
.ε
otherwise
yxh
IIiD
εhD
Xh
,...Tt
iitiit
19. Main loop (step 4)
Adaboost v.2c19
otherwise
iosigny
)(xhαtS)(xhαxo
CE
)(xhαtI)(xhαsigny
x
)(xhαtI)(xhαsigny
x
I
)(xhαtI
n
E
)(xhαtECE
ti
i
t
iit
t
i
t
ii
i
i
t
ii
i
n
i
tj
j
ijt
0
)(if1
,,and,)(outputThe
}
break;t,Tthen0If
1,,errorhence,
i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf
0,,errorhence,
i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf
:followsasdefinedis)(and
,,,
1
errorclassifiercurrentthehilew
,,errorclassifiercascadedtalCurrent to:Step4
1
1
1
1
1
classifierstrongfinalThe
1
T
t
tt (x)hαsignH(x)
20. AdaBoost chooses this weight update function deliberately
Because,
•when a training sample is correctly classified, weight decreases
•when a training sample is incorrectly classified, weight increases
Note: Normalization factor Zt in step3
Adaboost v.2c20
)(xhy(i)eD)(xhy(i)eD
weightincorrrectweightcorrectZ
ondistrubutiyprobabilitDionnormalizatZ
Z
xhyiD
iDStep
call
iti
α
classifiedyincorrectln
i
titi
-α
classifiedcorrectlyn
i
t
classifiedyincorrectln
i
classifiedcorrectlyn
i
t
tt
t
ititt
t
tt
__
1
__
1
__
1
__
1
1
__
abecomessofactor,where
,
))(exp()(
)(:3
:Re
))(exp()()(1 itittt xhyiDiD
21. Note: Stopping criterion of the main loop
The main loops stops when all training data are correctly
classified by the cascaded classifier up to stage t.
}
break;t,Tthen0If
1,,errorhence,
i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf
0,,errorhence,
i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf
:followsasdefinedis)(and
,,,
1
errorclassifiercurrentthehilew
,,errorclassifiercascadedtalCurrent to:Step4
1
1
1
1
t
i
t
ii
i
i
t
ii
i
n
i
tj
j
ijt
CE
)(xhαtI)(xhαsigny
x
)(xhαtI)(xhαsigny
x
I
)(xhαtI
n
E
)(xhαtECE
Adaboost v.2c21
22. Dt(i) =weight
Adaboost v.2c22
Dt(i) = probability distribution of the i-th
training sample at time t . i=1,2…n.
It shows how much you trust this sample.
At t=1, all samples are the same with equal
weight. Dt=1(all i)=same
At t >1 , Dt>1(i) will be modified, we will see later.
23. An example to show how Adaboost
works
Adaboost v.2c23
Training,
Present ten samples to the
system :[xi={ui,vi},yi={’+’ or ‘-’}]
5 +ve (blue, diamond) samples
5 –ve (red, circle) samples
Train up the classification system.
Detection example:
Give an input xj=(1.5,3.4)
The system will tell you it is ‘+’ or ‘-’.
E.g. Face or non-face.
Example:
You may treat u=weight, v=height
Classification task: suitability to play
in the basket ball team.
[xi={-0.48,0},yi=’+’]
[xi={-0.2,-0.5},yi=’+’]u-axis
v-axis
24. Initialization
M=5 +ve (blue, diamond) samples
L=5 –ve (red, circle) samples
n=M+L=10
Initialize weight D(t=1)(i)= 1/10 for all
i=1,2,..,10,
So, D(1)(1)=0.1, D(1) (2)=0.1,……, D(1)(10)=0.1
exampleLexample;positiveM
LMnthatsuch;/1)(Initialze
}1,1{,wherewhere:Given
1
11
negative
niD
YyXx),,y),..(x,y(x
t
iinn
Adaboost v.2c24
25. Select h( ): For simplicity in implementation
we use the Axis-parallel weak classifier
0
0
bycontrolledbecanlinetheofpositionthe
line)(vertcialmgradientoflineais
or
bycontrolledbecanlinetheofpositionthe
line)l(horizonta0mgradientoflineais
classifierweakparallel-Axis
.variablesareconstants,are),(:functiontheis
threshold1},-or1{polaritywhere
)(
otherwise0
)(if1
)(
Recall
u
f
v
f
u,vm,ccmuff
vp
i
pxfp
xh
tt
tttt
t
Adaboost v.2c25
ha (x)
hb(x)
u0
v0
26. Step1a,
1b
Assume h() can only be
horizontal or vertical
separators. (axis-parallel
weak classifier)
There are still many ways to
set h(), here, if this hq() is
selected, there will be 3
incorrectly classified training
samples.
See the 3 circled training
samples
We can go through all h( )s
and select the best with the
least misclassification (see
the following 2 slides)
stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking:Step1b
minarg:meansThat
respect toerror withtheminimizethat}1,1{:classifiertheFind:{Step1a
.ε
εh
DXh
t
q
q
t
tt
Adaboost v.2c26
Incorrectly classified by hq()
hq()
27. Example :Training example slides from [Smyth 2007]
classifier the ten red (circle)/blue (diamond) dots
Step 1a:
},-{p
(x)h
vvux
pupu
xh
i
i
i
11polarity
axis.verticalthe
toparallelisbecause
usednotis),,(
otherwise1
if1
)(
Adaboost v.2c27
Initialize:
Dn
(t=1)=1/10
You may choose
one of the following
axis-parallel (vertical
line) classifiers
Vertical Dotted lines
are possible choices
hi=1(x) ………….. hi=4(x) ……………… hi=9(x)
u1 u2 u3 u4 u5 u6 u7 u8 u9
u-axis
v-axis
There are 9x2 choices here,
hi=1,2,3,..9, (polarity +1)
h’i=1,2,3,..9, (polarity -1)
28. Example :Training example slides from [Smyth 2007]
classifier the ten red (circle)/blue (diamond) dots
Step 1a:
},-{p
(x)h
uvux
pvpv
xh
j
j
j
11polarity
axis.horizontalthe
toparallelisbecause
usednotis),,(
otherwise1
if1
)(
28
Initialize:
Dn
(t=1)=1/10
You may choose
one of the following
axis-parallel (horizontal
lines) classifiers
Horizontal dotted lines
are possible choices
hj=1(x)
hj=2(x)
:
hj=4(x)
:
:
:
:
:
hj=9(x)
v1
v2
v3
V4
V5
V6
V7
V8
v9
u-axis
v-axis
There are 9x2 choices here,
hj=1,2,3,..9, (polarity +1)
h’j=1,2,3,..9, (polarity -1)
All together including the previous
slide 36 choices
29. Step 1b:
Find and check the error of the weak classifier
h( )
To evaluate how successful is your selected weak classifier h( ),
we can evaluate the error rate of the weak classifier
ɛt = Misclassification probability of h( )
Checking: If εt>= 0.5 (something wrong), stop the training
Because, by definition a weak classifier should be slightly
better than a random choice--probability =0.5
So if εt >= 0.5 , your h( ) is a bad choice, redesign another
h”( ) and do the training based on the new h”( ).
stop.otherwise,50:teprerequisi:stepchecking:Step1b
0
)classifiedly(incorrect)(if1
where,*)( )()(
1
.ε
otherwise
yxh
IIiD
t
iit
yxhyxh
n
i
tt iitiit
Adaboost v.2c29
30. Assume h() can only be
horizontal or vertical
separators.
How many different
classifiers are available?
If hj() is selected as shown,
circle the misclassified
training samples. Find ɛ( ) to
see misclassification
probability if the probability
distribution (D) for each
sample is the same.
Find h() with minimum error.
stop.otherwise,50:teprerequisi:stepchecking:Step1b
respect toerror withtheminimizesthat}1,1{:classfiertheFind:{Step1a
.ε
DXh
t
tt
Adaboost v.2c30
hj()
31. Result of step2 at t=1
Adaboost v.2c31
Incorrectly classified by ht=1(x)
ht=1(x)
32. Step2 at t=1 (refer to the previous
slide)
Using εt=1=0.3, because
3 samples are
incorrectly classified
424.0
30.0
3.01
ln
2
1
.classifierofrateerrorweightedtheiswhere
1
ln
2
1
:Step2
3.01.01.01.0
1
1
t
tt
t
t
t
t
so
hε
ε
ε
ε
otherwise
yxh
I
IiD
iit
yxh
yxh
n
i
tt
iit
iit
0
)(if1
where
,*)(
)(
)(
1
Adaboost v.2c32
The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf
Also see appendix.
33. Step3 at t=1, update Dt to Dt+1
Update the weight Dt(i) for each training sample i
function)(prob.ondistrubutiaisso
factor,ionnormalizatwhere
))(exp()(
)(:3 1
t
t
t
ititt
t
D
Z
Z
xhyiD
iDStep
Adaboost v.2c33
The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf
Also see appendix.
34. Step 3: Find first Z (the normalization
factor). Note that Dt=1=0.1, at=1 =0.424
911.0
456.0455.0
52.1*3*1.065.0*7*1.0*3*1.0*7*1.0
)__()__(
initput,1)(so),(:classifiedyincorrectl
initput,1)(so),(:classifiedcorrectly
)(
__
1t
samplesincorrect3andcorrect7
424.0,1.0
1
424.0424.0
)()()(
)1(
)(
)1(
)()(
)( )(
11
t
xhyi
α
t
xhy
α
t
xhyi
α
t
xhy
α
tt
iiiiii
iiiiii
xhyi
)(xhyα
t
xhy
)(xhyα
tt
xhy xhyi
t
tt
Z
ee
weightincorrecttotalweightcorrecttotal
(i)eD(i)eD(i)eD(i)eDZ
(i)xhyxhy
(i)xhyxhy
i(i)eD(i)eDZ
weightincorrectweightcorrectZ
αD
ii
t
iii
t
ii
t
iii
t
ii
itit
iii
itit
iii ii
Adaboost v.2c34
Note: currently t=1,
Dt=1(i)=0.1 for all i
7 correctly classified
3 incorrectly classified
35. Step 3: Example: update Dt to Dt+1
If correctly classified, weight Dt+1 will decrease, and vice versa.
167.052.1
911.0
1.0
911.0
1.0
)(
0714.065.0
911.0
1.0
911.0
1.0
)(
,911.0since
52.1*1.0
1.01.0
)(
65.0
1.0
)(
1.0
)(
1
1
1
1
42.01
1
1
42.0
)(
1
eiDincrease
eiDdecrease
SoZ
e
Z
e
Z
iD
Z
iD
e
Z
e
Z
iD
D
incorrectt
correctt
t
tt
incorrectt
t
correctt
t
correct
t
t
t
Adaboost v.2c35
36. Now run the main training loop second time t=2
167.052.1
911.0
1.0
911.0
1.0
)(
0714.065.0
911.0
1.0
911.0
1.0
)(
1
1
1
1
eiD
eiD
incorrectt
correctt
Adaboost v.2c36
37. Now run the main training loop second
time t=2, and then t=3
Adaboost v.2c37
Final classifier by
combining three weak
classifiers
38. Combined classifier for t=1,2,3
Exercise: work out 1and 2
)()()(*424.0)( 33221
1
xhαxhαxhsignxH
(x)hαsignH(x)
tt
T
t
tt
Adaboost v.2c38
Combine to form the
classifier.
May need one more step for the
final classifier
ht=1()
ht=2()
ht=3()
1
2 3
47. Example
)Sum(r)Sum(r blacki,whitei, if
•Feature’s value is calculated as the difference between the
sum of the pixels within white and black rectangle regions.
thresholdfif
thresholdfif
xh
i
i
i
1
1
)(