AdaBoost is an ensemble learning algorithm that combines multiple weak learners into a single strong learner. It works in rounds, assigning higher weights to examples that previous rounds misclassified. Each weak learner is trained on the reweighted data and must only be slightly better than random guessing. AdaBoost then calculates error rates and weights and combines predictions from all weak learners into a final strong learner using a weighted majority vote. The algorithm stops when error rate stops decreasing or the maximum number of rounds is reached.
3. • 1 0
• N
• (Xi , ci )(i = 1, . . . , N ) C
XC = (No, Yes, Yes, Yes), cC = 1
4. • R:
•
•
•
1
i wi
•
N
wi = 1/N (i = 1, . . . , N )
1
• 10
wi = 1/10
1
5. • t=1,...,R
1. : t i
t
pt
i pt =
wi
i N t
i=1 wi
N
• p1 = wi = 1/10
1
t=1 t
wi =1 i
i=1
2. WeakLearner
WeakLearner ( t < 1/2 ) ht
t
N
t = pt |ht (Xi ) − ci | < 1/2
i
i=1
0, ht (Xi ) ci
ht (Xi ) − ci =
1,
Step 3
6. • t=1 WeakLearner
ID A F h1(Xi) = 1
E,F h1(Xi)=ci D,J ci=0
ID G,H,I,J h1(Xi) = 0
WeakLearner
10 2
p1 = 1/10
i
1 = 1/10 × 2 = 1/5 < 1/2
T2
WeakLearner
7. t+1
3. wi βt
t
βt = 0≤ < 1/2, 0 ≤ βt < 1
1− t
t
• β
1−|ht (Xi )−ci |
wi = wi βt
t+1 t
• εt βt
• εt βt
• WeakLearner ht
8. • t=1
1 = 0.2
1
β1 = = 0.2/0.8 = 0.25
1− 1
• A D, G J ht E, F
2
wA = wB = wC = wD = wG = wH = wI = wJ
2 2 2 2 2 2 2
= 1/10 × β1 = 0.025
1
2
wE = wF = 1/10 × β1 = 0.1
2 0
16. AdaBoost
• hf ε
= D(i)
{i|hf (Xi )=yi }
• D(i)
• R
εt t
≤ 2 t (1 − t)
t=1
• t εt<1/2
2 t (1 − t) < 1
• WeakLearner
ε
17. • 1
• R R+1
N R 1/2
R+1
wi ≥ βt
i=1 t=1
•
• N
R+1 R+1
wi ≥ wi
i=1 {i|hf (Xi )=yi }
t 1−|hf (Xi )−yi |
t+1
wi = wi βi
R
1−|hf (Xi )−yi |
t+1
wi = D(i) βt
{i|hf (Xi )=yi } {i|hf (Xi )=yi } t=1
18. R hf (hf (Xi ) = yi )
1−|hf (Xi )−yi |
βt hf (Xi ) = 1 yi = 0 hf (Xi ) = 0
t=1
2
hf 2
hf (Xi ) = 1 yi = 0 5.1
hf (Xi ) = 1 yi = 0, hf(Xi)=1
(hf (Xi ) = R i )
y R
1
Xi ) = 1 yi = 0 h(− log βt )hf (Xi ) y≥ = 1 (− log βt )
f (Xi ) = 0 i
t=1 t=1
2
R
0 t=1 (log5.1
βt )
R R R
1 1
− log βt )hf (Xi ) ≥ (− log(log βt )(1 − hf (Xi )) ≥
βt ) (log βt )
2
t=1 t=1 2 t=1
1 − hf (Xi ) = 1− | hf (Xi ) − yi |
R R 1/2
R
1t f (Xi )−yi | ≥
1−|h
g βt )(1 − hf (Xi )) ≥ (log βt ) β βt
t=1 2 t=1
t=1
19. t t
t=1 t=1
hf (Xi )hf (Xi ) = 0 yi = i1= 1
=0 y 5.1
R R
1
(− log βt )ht (Xi ) < (− log βt )
t=1 t=1
2
174 −1 hf (Xi ) = 1− | ht (Xi ) 5 yi |
−
1
R R 2
1−|ht (Xi )−yi |
βt > βt
t=1 t=1
hf (Xi ) = 1 yi = 0 hf (Xi ) =
yi = 1
1
R R 2
1−|ht (Xi )−yi |
βt ≥ βt
t=1 t=1
21. = ·
t=1 βt
t=1
• 2
5.2 • 5.2 N N
t+1
wi N ≥ t
wi N× 2 i
i=1
t+1
wi i=1≥ wi × 2
t
i
•
: α≥0 r = {0, 1}
i=1 i=1
: α≥0 r = {0, 1}
αr ≤ 1 − (1 − α)r
αr ≤ 1 − (1 − α)r
22. 5.6. 175
N N
1−|ht (Xi )−yi |
t+1
wi = t
wi βt
i=1 i=1
N
≤ wi (1 − (1 − βt )(1 − |ht (Xi ) − yi |))
t
i=1
N N N
= wi − (1 − βt )
t t
wi − wi |ht (Xi ) − yi |
t
i=1 i=1 i=1
N N N
= wi − (1 − βt )
t t
wi − t
t
wi
i=1 i=1 i=1
N N
= wi − (1 − βt )
t t
wi (1 − t )
i=1 i=1
N
= t
wi × (1 − (1 − βt )(1 − t ))
i=1
βt = t /(1 − t)
N
= t
wi ×2 t
i=1
23. •
5.3 t WeakLearner
•
t
εt t WeakLearner
hf hf ε
R
≤ 2 t (1 − t)
• 1/2
t=1
R N N N R
: 5.1 βt ≤ 5.2 wi
R+1
≤ R
wi ×2 t ≤ 1
wi 2 t
t=1 t=1
R 1/2 i=1 N i=1 i=1
N
βt wi = 1
1
≤ R+1
wi ( 5.1 )
t=1 i=1
i=1
R
N
= 2
≤
t=1
t R
wi × 2 t( 5.2 )
i=1
βt = t /(1 − t ) 5.2 t = R − 1, R − 2, . . . , 1
R R
−1/2 N R
≤ 2 t× βt = 12 t (1 − t)
≤ wi 2 t
t=1 t=1