combination weak classifiers to become a strong one

Published in: Engineering
1. 1. Hank 2013/06/26 AdaBOOST Classifier
2. 2. Contents  Concept  Code Tracing  Haar + AdaBoost  Notice  Usage  Appendix
3. 3. Adaboost v.2c3 Concept-Classifier  Training procedures  Give +ve and –ve examples to the system, then the system will learn to classify an unknown input.  E.g. give pictures of faces (+ve examples) and non- faces (-ve examples) to train the system.  Detection procedures  Input an unknown (e.g. an image) , the system will tell you it is a face or not. Face non-face
4. 4.     )( otherwise1 if1 )( becomes)(, .variablesareconstants,are and),]),[((:functiontheis 1},-or1{polaritywhere )( otherwise1 )(if1 )( use.oyou want tcasewhichcontroltopolarityuseand equationbecometotogether2and1casecombine,-At time- otherwise1 constantsgivenare,where1 )( :aswrittenbecanIt.1otherwise 1then,areawhite""in theis],[xpointaIf ---Case2- otherwise1 constantsgivenare,where,if1 )( :aswrittenbecanIt.1otherwise 1then,areagray""in theis][pointaIf ---Case1- ib cpmuvp xh iequationcpmuvp u,vm,cwhere cmuvvuxff p i pxfp xh p (i)t m,xc ,mu)if -(v- xh h(x) h(x)v)(u m,xcv-mu xh h(x) h(x)(u,v)x tt t tt t t tttt t t                                    4 (Updated)!! First let us learn what is what a weak classifier h( )  v=mu+c or v-mu=c •m,c are used to define the line •Any points in the gray area satisfy v-mu<c •Any points in the white area satisfy v-mu>c v c Gradient m (0,0) v-mu<c v-mu>c u
5. 5. Adaboost - Adaptive Boosting 5  Instead of resampling, uses training set re-weighting  Each training sample uses a weight to determine the probability of being selected for a training set.  AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier  Final classification based on weighted vote of weak classifiers
6. 6. Concept Weak learners from the family of lines h => p(error) = 0.5 it is at chance Each data point has a class label: wt =1 and a weight: +1 ( ) -1 ( ) yt =
7. 7. Concept This one seems to be the best Each data point has a class label: wt =1 and a weight: +1 ( ) -1 ( ) yt = This is a ‘weak classifier’: It performs slightly better than chance.
8. 8. Concept We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt =
9. 9. We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt = Concept
10. 10. Concept We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt =
11. 11. We set a new problem for which the previous weak classifier performs at chance again Each data point has a class label: wt wt exp{-yt Ht} We update the weights: +1 ( ) -1 ( ) yt = Concept
12. 12. The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers. f1 f2 f3 f4 Concept
13. 13. An example to show how Adaboost works Adaboost v.2c13  Training,  Present ten samples to the system :[xi={ui,vi},yi={’+’ or ‘-’}]  5 +ve (blue, diamond) samples  5 –ve (red, circle) samples  Train up the system  Detection  Give an input xj=(1.5,3.4)  The system will tell you it is ‘+’ or ‘-’. E.g. Face or non-face  Example:  u=weight, v=height  Classification: suitability to play in the basket ball team. [xi={-0.48,0},yi=’+’] [xi={-0.2,-0.5},yi=’+’]u-axis v-axis
14. 14. Adaboost concept Adaboost v.2c14  Use this training data, how to make a classifier One axis-parallel weak classifier cannot achieve 100% classification. E.g. h1(), h2(), h3() all fail. That means no matter how you place the decision line (horizontally or vertically) you cannot get 100% classification result. You may try it yourself! The above strong classifier should; work, but how can we find it? ANSWER: Combine many weak classifiers to achieve it. Training data 6 squares, 5 circles. h1( ) h2 ( ) h3( ) The solution is a H_complex( ) Objective: Train a classifier to classify an unknown input to see if it is a circle or square.
15. 15. How? Each classifier may not be perfect but each can achieve over 50% correct rate.  1         T t tt (x)hαsignH(x) 1 Adaboost v.2c15 Classification Result Combine to form the Final strong classifier h1( ) h2() h3( ) h4( ) h5( ) h6() h7() 2 3 4 5 6 7 7,..,2,1for, classifierweak eachforWeight ii
16. 16. Adaboost Algorithm Adaboost v.2c16                                                                               otherwise iosigny )(xhαtS)(xhαxo CE )(xhαtI)(xhαsigny x )(xhαtI)(xhαsigny x I )(xhαtI n E )(xhαtECE Z xhyiD iDStep ε ε .ε otherwise yxh IIiD εhD Xh ,...Tt )( niD YyXx),,y),..(xy(x ti i t iit t i t ii i i t ii i n i tj j ijt t ititt t t t t t t iit yxhyxh n i tt q q tt t t iinn, iitiit 0 )(if1 ,,and,)(outputThe } break;t,Tthen0If 1,,errorhence, i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf 0,,errorhence, i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf :followsasdefinedis)(and ,,, 1 errorclassifiercurrentthehilew ,,errorclassifiercascadedtalCurrent to:Step4 nexplanatioforslidenextsee, ))(exp()( )(:3 value).confidence(orweight, 1 ln 2 1 :Step2 stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking 0 y)incorrectld(classifie)(if1 where,*)(error:Step1b minarg:meansthat,respect toerror with theminimizesthat}1,1{:classifiertheFind:Step1a{ 1For examples)1(negativeofnumberLexamples;1positiveofnumberM LMnsuch that;/1)((weight)ondistributiInitialze }1,1{,where:Given 1 1 1 1 1 1 )()( 1 1 11                classifierstrongfinalThe 1         T t tt (x)hαsignH(x) Initialization Main Training loop The final strong classifier See enlarged versions in the following slides )(xhy(i)eD)(xhy(i)eD weightincorrrectweightcorrectZ ondistrubutiyprobabilitDionnormalizatZ iti α classifiedyincorrectln i titi -α classifiedcorrectlyn i t classifiedyincorrectln i classifiedcorrectlyn i t tt tt        __ 1 __ 1 __ 1 __ 1 __ aissofactor,where
17. 17. Initialization Adaboost v.2c17  examples)1(negativeofnumberL examples;1positiveofnumberM LMnsuch that ;/1)((weight)ondistributiInitialze }1,1{,where:Given 1 11       )( niD YyXx),,y),..(xy(x t iinn,
18. 18. Main loop (step1,2,3) Adaboost v.2c18         nexplanatioforslidenextsee, ))(exp()( )(:3 value).confidence(orweight, 1 ln 2 1 :Step2 stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking 0 )yincrroectlclassified()(if1 where,*)(error:Step1b minarg:meansthat,respect toerror with theminimizesthat}1,1{:classifiertheFind:Step1a{ 1For 1 )()( 1 t ititt t t t t t t iit yxhyxh n i tt q q tt t Z xhyiD iDStep ε ε .ε otherwise yxh IIiD εhD Xh ,...Tt iitiit                        
19. 19. Main loop (step 4) Adaboost v.2c19                                               otherwise iosigny )(xhαtS)(xhαxo CE )(xhαtI)(xhαsigny x )(xhαtI)(xhαsigny x I )(xhαtI n E )(xhαtECE ti i t iit t i t ii i i t ii i n i tj j ijt 0 )(if1 ,,and,)(outputThe } break;t,Tthen0If 1,,errorhence, i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf 0,,errorhence, i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf :followsasdefinedis)(and ,,, 1 errorclassifiercurrentthehilew ,,errorclassifiercascadedtalCurrent to:Step4 1 1 1 1 1             classifierstrongfinalThe 1         T t tt (x)hαsignH(x)
20. 20. AdaBoost chooses this weight update function deliberately Because, •when a training sample is correctly classified, weight decreases •when a training sample is incorrectly classified, weight increases Note: Normalization factor Zt in step3 Adaboost v.2c20 )(xhy(i)eD)(xhy(i)eD weightincorrrectweightcorrectZ ondistrubutiyprobabilitDionnormalizatZ Z xhyiD iDStep call iti α classifiedyincorrectln i titi -α classifiedcorrectlyn i t classifiedyincorrectln i classifiedcorrectlyn i t tt t ititt t tt           __ 1 __ 1 __ 1 __ 1 1 __ abecomessofactor,where , ))(exp()( )(:3 :Re  ))(exp()()(1 itittt xhyiDiD 
21. 21. Note: Stopping criterion of the main loop  The main loops stops when all training data are correctly classified by the cascaded classifier up to stage t.         } break;t,Tthen0If 1,,errorhence, i.e.classifiercascadedcurrentby theclassifiedyincorrectlisIf 0,,errorhence, i.e.,classifiercascadedrentrcuby thedclassifiecorrectlyisIf :followsasdefinedis)(and ,,, 1 errorclassifiercurrentthehilew ,,errorclassifiercascadedtalCurrent to:Step4 1 1 1 1                           t i t ii i i t ii i n i tj j ijt CE )(xhαtI)(xhαsigny x )(xhαtI)(xhαsigny x I )(xhαtI n E )(xhαtECE          Adaboost v.2c21
22. 22. Dt(i) =weight Adaboost v.2c22  Dt(i) = probability distribution of the i-th training sample at time t . i=1,2…n.  It shows how much you trust this sample.  At t=1, all samples are the same with equal weight. Dt=1(all i)=same  At t >1 , Dt>1(i) will be modified, we will see later.
23. 23. An example to show how Adaboost works Adaboost v.2c23  Training,  Present ten samples to the system :[xi={ui,vi},yi={’+’ or ‘-’}]  5 +ve (blue, diamond) samples  5 –ve (red, circle) samples  Train up the classification system.  Detection example:  Give an input xj=(1.5,3.4)  The system will tell you it is ‘+’ or ‘-’. E.g. Face or non-face.  Example:  You may treat u=weight, v=height  Classification task: suitability to play in the basket ball team. [xi={-0.48,0},yi=’+’] [xi={-0.2,-0.5},yi=’+’]u-axis v-axis
24. 24. Initialization  M=5 +ve (blue, diamond) samples  L=5 –ve (red, circle) samples  n=M+L=10  Initialize weight D(t=1)(i)= 1/10 for all i=1,2,..,10,  So, D(1)(1)=0.1, D(1) (2)=0.1,……, D(1)(10)=0.1 exampleLexample;positiveM LMnthatsuch;/1)(Initialze }1,1{,wherewhere:Given 1 11 negative niD YyXx),,y),..(x,y(x t iinn     Adaboost v.2c24
25. 25. Select h( ): For simplicity in implementation we use the Axis-parallel weak classifier  0 0 bycontrolledbecanlinetheofpositionthe line)(vertcialmgradientoflineais or bycontrolledbecanlinetheofpositionthe line)l(horizonta0mgradientoflineais classifierweakparallel-Axis .variablesareconstants,are),(:functiontheis threshold1},-or1{polaritywhere )( otherwise0 )(if1 )( Recall u f v f u,vm,ccmuff vp i pxfp xh tt tttt t              Adaboost v.2c25 ha (x) hb(x) u0 v0
26. 26. Step1a, 1b  Assume h() can only be horizontal or vertical separators. (axis-parallel weak classifier)  There are still many ways to set h(), here, if this hq() is selected, there will be 3 incorrectly classified training samples.  See the 3 circled training samples  We can go through all h( )s and select the best with the least misclassification (see the following 2 slides)   stop.otherwiseok)is0.5ansmaller th(error:50:teprerequisi:stepchecking:Step1b minarg:meansThat respect toerror withtheminimizethat}1,1{:classifiertheFind:{Step1a .ε εh DXh t q q t tt       Adaboost v.2c26 Incorrectly classified by hq() hq()
27. 27. Example :Training example slides from [Smyth 2007] classifier the ten red (circle)/blue (diamond) dots Step 1a:  },-{p (x)h vvux pupu xh i i i 11polarity axis.verticalthe toparallelisbecause usednotis),,( otherwise1 if1 )(         Adaboost v.2c27 Initialize: Dn (t=1)=1/10 You may choose one of the following axis-parallel (vertical line) classifiers Vertical Dotted lines are possible choices hi=1(x) ………….. hi=4(x) ……………… hi=9(x) u1 u2 u3 u4 u5 u6 u7 u8 u9 u-axis v-axis There are 9x2 choices here, hi=1,2,3,..9, (polarity +1) h’i=1,2,3,..9, (polarity -1)
28. 28. Example :Training example slides from [Smyth 2007] classifier the ten red (circle)/blue (diamond) dots Step 1a:  },-{p (x)h uvux pvpv xh j j j 11polarity axis.horizontalthe toparallelisbecause usednotis),,( otherwise1 if1 )(         28 Initialize: Dn (t=1)=1/10 You may choose one of the following axis-parallel (horizontal lines) classifiers Horizontal dotted lines are possible choices hj=1(x) hj=2(x) : hj=4(x) : : : : : hj=9(x) v1 v2 v3 V4 V5 V6 V7 V8 v9 u-axis v-axis There are 9x2 choices here, hj=1,2,3,..9, (polarity +1) h’j=1,2,3,..9, (polarity -1) All together including the previous slide 36 choices
29. 29. Step 1b: Find and check the error of the weak classifier h( )  To evaluate how successful is your selected weak classifier h( ), we can evaluate the error rate of the weak classifier  ɛt = Misclassification probability of h( )  Checking: If εt>= 0.5 (something wrong), stop the training  Because, by definition a weak classifier should be slightly better than a random choice--probability =0.5  So if εt >= 0.5 , your h( ) is a bad choice, redesign another h”( ) and do the training based on the new h”( ).       stop.otherwise,50:teprerequisi:stepchecking:Step1b 0 )classifiedly(incorrect)(if1 where,*)( )()( 1 .ε otherwise yxh IIiD t iit yxhyxh n i tt iitiit          Adaboost v.2c29
30. 30.  Assume h() can only be horizontal or vertical separators.  How many different classifiers are available?  If hj() is selected as shown, circle the misclassified training samples. Find ɛ( ) to see misclassification probability if the probability distribution (D) for each sample is the same.  Find h() with minimum error. stop.otherwise,50:teprerequisi:stepchecking:Step1b respect toerror withtheminimizesthat}1,1{:classfiertheFind:{Step1a .ε DXh t tt   Adaboost v.2c30 hj()
31. 31. Result of step2 at t=1 Adaboost v.2c31  Incorrectly classified by ht=1(x) ht=1(x)
32. 32. Step2 at t=1 (refer to the previous slide)  Using εt=1=0.3, because 3 samples are incorrectly classified 424.0 30.0 3.01 ln 2 1 .classifierofrateerrorweightedtheiswhere 1 ln 2 1 :Step2 3.01.01.01.0 1 1         t tt t t t t so hε ε ε ε                   otherwise yxh I IiD iit yxh yxh n i tt iit iit 0 )(if1 where ,*)( )( )( 1  Adaboost v.2c32 The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf Also see appendix.
33. 33. Step3 at t=1, update Dt to Dt+1  Update the weight Dt(i) for each training sample i function)(prob.ondistrubutiaisso factor,ionnormalizatwhere ))(exp()( )(:3 1 t t t ititt t D Z Z xhyiD iDStep     Adaboost v.2c33 The proof can be found at http://vision.ucsd.edu/~bbabenko/data/boosting_note.pdf Also see appendix.
34. 34. Step 3: Find first Z (the normalization factor). Note that Dt=1=0.1, at=1 =0.424  911.0 456.0455.0 52.1*3*1.065.0*7*1.0*3*1.0*7*1.0 )__()__( initput,1)(so),(:classifiedyincorrectl initput,1)(so),(:classifiedcorrectly )( __ 1t samplesincorrect3andcorrect7 424.0,1.0 1 424.0424.0 )()()( )1( )( )1( )()( )( )( 11                               t xhyi α t xhy α t xhyi α t xhy α tt iiiiii iiiiii xhyi )(xhyα t xhy )(xhyα tt xhy xhyi t tt Z ee weightincorrecttotalweightcorrecttotal (i)eD(i)eD(i)eD(i)eDZ (i)xhyxhy (i)xhyxhy i(i)eD(i)eDZ weightincorrectweightcorrectZ αD ii t iii t ii t iii t ii itit iii itit iii ii Adaboost v.2c34 Note: currently t=1, Dt=1(i)=0.1 for all i 7 correctly classified 3 incorrectly classified
35. 35. Step 3: Example: update Dt to Dt+1 If correctly classified, weight Dt+1 will decrease, and vice versa.      167.052.1 911.0 1.0 911.0 1.0 )( 0714.065.0 911.0 1.0 911.0 1.0 )( ,911.0since 52.1*1.0 1.01.0 )( 65.0 1.0 )( 1.0 )( 1 1 1 1 42.01 1 1 42.0 )( 1                    eiDincrease eiDdecrease SoZ e Z e Z iD Z iD e Z e Z iD D incorrectt correctt t tt incorrectt t correctt t correct t t t Adaboost v.2c35
36. 36. Now run the main training loop second time t=2  167.052.1 911.0 1.0 911.0 1.0 )( 0714.065.0 911.0 1.0 911.0 1.0 )( 1 1 1 1        eiD eiD incorrectt correctt Adaboost v.2c36
37. 37. Now run the main training loop second time t=2, and then t=3 Adaboost v.2c37  Final classifier by combining three weak classifiers
38. 38. Combined classifier for t=1,2,3 Exercise: work out 1and 2   )()()(*424.0)( 33221 1 xhαxhαxhsignxH (x)hαsignH(x) tt T t tt            Adaboost v.2c38 Combine to form the classifier. May need one more step for the final classifier ht=1() ht=2() ht=3() 1 2 3
39. 39. Code trace
40. 40. 1 2 For loop (numStages) 1
41. 41. CvCascadeBoost::train  update_weights( 0 );  do{  CvCascadeBoostTree* tree = new CvCascadeBoostTree; if( !tree->train( data, subsample_mask, this ) ){ delete tree;  continue;  }  cvSeqPush( weak, &tree );  update_weights( tree );  trim_weights();  } while( !isErrDesired() && (weak->total < params.weak_count) ); weak_eval[i] = f(x_i) in [- 1,1] w_i *= exp(-y_i*f(x_i))
42. 42. Trace code  Main related files  traincascade.cpp  classifier.train  Main Boosting algorithm  CvCascadeClassifier::train (file: CascadeClassifier.cpp), 只要觀察裡面的 for numStages loop 1. updateTrainingSet 1. 只取之前 stage失敗的->predict=1 2. fillPassedSamples 1. imgReader.getPos與 imgReader.getNeg不太一樣 2. 利用 CvCascadeBoost::predict (boost.cpp) 來選擇加入的 samples, stage (stage為0時, 全取->predict(i)=1) 1. acceptanceRatio = negCount / negConsumed 3. 每個 stage會計算 tempLeafFARate, 若已經比 requiredLeafFARate 小, 則結束 4. CvCascadeBoost::train (file: boost.cpp) 1. new CvCascadeBoostTrainData 會在此時被 new 2. update_weights -> 若還不存在 tree, 則各 tree的 weight會在此時被 update 3. featureEvaluator 可任意被置換為 e.g. HaarEvaluator
43. 43. Usage  Pre-processing  opencv_createsamples.exe  Training  opencv_traincascade.exe -featureType HAAR -data classifier/ -vec positive.vec -bg negative.dat -w 30 -h 30 -numPos 696 - numNeg 545 –numStage 16  Parameters:  maxFalseAlarm: 最高可容忍的 false alarm rate, 此參數會影 響各 stage的停止條件  requiredLeafFARate = pow(maxFalseAlarm, numStages ) /max_depth
44. 44. Usage  # pre-processing  # resize images in directory, you need to have imageMagicK utility  ################# 1. collect file names #############################  # notice: a. negative image size should be larger than posititve ones  find ./dataset/positive/resize/ -name '*.jpg' > temp.dat  find ./dataset/negative/ -name '*.jpg' > negative.dat  sed 's/\$/ 1 0 0 30 30/' temp.dat > positive.dat  rm temp.dat  ################# 2. create samples #################################  ./opencv_createsamples.exe -info positive.dat -vec positive.vec -w 30 -h 30 -show  ################## 3. train samples #################################  ./opencv_traincascade.exe -featureType HAAR -data classifier -vec positive.vec -bg negative.dat -w 30 -h 30 -numPos 100 -numNeg 300 -numStages 18
45. 45. Usage  Detection  Windows-based  haarClassifier.load  haarClassifier.detectMultiScale(procImg, resultRect, 1.1, 3, 0, cvSize(12, 12), cvSize(80, 80));  Detect on your own  haarClassifier.load  haarClassifier.featureEvaluator->setImage( scaledImage, originalWindowSize )  haarClassifier.runAt(evaluator, Point(0, 0), gypWeight);  Notes  Infinite loop in CvCascadeClassifier::fillPassedSamples  Solution:  Add more samples  Reduce stages
46. 46. Appendix-Haar-like Features
47. 47. Example )Sum(r)Sum(r blacki,whitei, if •Feature’s value is calculated as the difference between the sum of the pixels within white and black rectangle regions.       thresholdfif thresholdfif xh i i i 1 1 )(
48. 48. Reference  http://docs.opencv.org/doc/user_guide/ug_trainca scade.html