SlideShare a Scribd company logo
1 of 23
Comparison Study
of Decision Tree Ensembles
for Regression
SEONHO PARK
Objectives
• Empirical study of Ensemble trees for regression problems
• To verify its performance and time efficiency
• Candidates from open source
• Scikit-Learn
• BaggingRegressor
• RandomForestRegressor
• ExtraTreesRegressor
• AdaBoostRegressor
• GradientBoostingRegressor
• XGBoost
• XGBRegressor
Decision Tree
1x
2x2 2.5?x >
1 3.0?x >
N Y
N Y
• Expressed as a recursive partition of the feature space
• Use for both classifier and regressor
• Building blocks: nodes, leaves
• Node splits the instance space into two or more sub-spaces according to a certain
discrete function of the input feature values
2.5
3.0
Decision Tree Inducers
• How to generate decision tree?
• Rule to determine the decision tree is how to split and prune nodes
• Decision trees inducers:
ID3(Quinlan, 1986), C4.5(Quinlan, 1993), CART(Breiman et al., 1984)
• CART is most generable and popular
CART
• CART stands for Classification and Regression Trees
• Has ability to generate regression trees
• Minimization of misclassification costs
• In regression, the costs are represented for least squares between target values and
expected values
• Maximization of change of impurity function:
• For regression,
argmax ( ) ( ( )) ( ( ))
j
R
j p l l r r
x
x i t P i t P i té ù= - -ê úë û
[ ]arg min Var( ) Var( )
j
R
j l r
x
x Y Y= +
CART
• Pruning
• minimum number of points
Figure: Roman Timofeev, Classification and Regression Trees Theory and Applications, (2004)
minN
Decision Tree Pros And Cons
• Advantages
• Explicability: Easy to understand and interpret(white boxes)
• Make minimal assumptions
• Requires little data preparation
• Addressing nonlinearity in an intuitive manner
• Can handle both nominal and numerical features
• Perform well with large datasets
• Disadvantages
• Heuristics such as the greedy algorithm  local optimal decision at each node
• Instability, Overfitting – not to be robust to noise(outlier)
Ensemble Methods
• Tactics of Ensemble Tree can be classified by two types : Bagging and Boosting
• Bagging Methods: Tree Bagging, Random Forest, Extra Trees
• Boosting Methods: AdaBoost, Gradient Boosting
Figure: http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_iris.html
Averaging Methods
• Random Forest (L. Breiman, 2001)
• Tree Bagging + Split among a random subset of the feature
• Extra Trees (Extremely Randomized Trees) (P. Geurts et al., 2006)
• Random Forest + Extra Tree
• Extra Tree: thresholds at nodes are drawn at random
• Tree Bagging (L. Breiman, 1996)
• What is Bagging?
• BAGGING is abbreviation for Bootstrap AGGregatING
• Boosting: samples are drawn with replacement
• Drawn as random subsets of the features  ‘Random Subspace’(1999)
• Drawn as random subsets of both samples and features  ‘Random Patches’ (2012)
Boosting Methods – AdaBoost
• AdaBoost (Y. Freund, and R. Schapire, 1995)
• AdaBoost is abbreviation for ‘Adaptive Boosting’
• Sequential decision making method
• Boosted classifier in the form:
Hypothesis of weak learner
weight
Hypothesis of Strong learner
Figure: Schapire and Freund, Boosting: Foundations and algorithms (2012)
1
( ) ( )
T
t t
t
H x h xr
=
= å
Boosting Methods – AdaBoost

• Supposed that you are given (x1,y1),(x2,y2),…,(xn,yn), and the task is to fit model H(x).
And your friend wants to help you and gives you a model H. you check his model and
find it is good but not perfect. There are some mistakes: H(x1) = 0.8, H(x2) = 1.4…,
while y1= 0.9, y2=1.3… How can you improve this model?
• Rule
• Use friend model H without any modification of it
• Can add additional model h to improve prediction, so the new prediction will be
H+h
1
( ) ( )
T
t t
t
H x h xr
=
= å 1( ) ( )T T T TH x H h xr-= +
Boosting Methods – AdaBoost
1 1 1
2 2 2
( ) ( )
( ) ( )
...
( ) ( )n n n
H x h x y
H x h x y
H x h x y
+ =
+ =
+ =
• Wish to improve the model such that:

1 1 1
2 2 2
1
( ) ( )
( ) ( )
...
( ) ( )n n
h x y H x
h x y H x
h x y H x
= -
= -
= -
• Fit a weak learner h to data
(x1,y1-H(x1)),(x2,y2-H(x2)),…,(xn,yn-H(xn))
residual
Boosting Methods – Gradient Boosting
• AdaBoost: updates with loss function residual which will be converged to 0
• In scikit-learn, AdaBoost.R2 algorithm is implemented
• Gradient Boosting (L. Breiman, 1997)
: updates with negative gradients of loss functions which will be converged to 0
0y H- =
0
L
H
¶
- =
¶
*Drucker,H., Improving Regressors using Boosting Techniques (1997)
Boosting Methods – Gradient Boosting
• Loss function
• First order optimality
• If loss function is as follows:
• Negative gradients can be interpret as residuals
( , )L y H
( , )
0, 1,i i
i
L y H
i n
H
¶
= " =
¶
2
2
1
( , )
2
L y H y H= -
( , )
, 1,i i
i i
i
L y H
y H i n
H
¶
= - " =
¶
Boosting Methods – Gradient Boosting
• Square loss function is not adequate to treat the outliers  overfitting
• Other loss functions
• Absolute loss
• Huber loss
( , )L y H y H= -
( )
21
( ) if ,
2( , )
/ 2 otherwise
y H y H
L y H
y H
d
d d
ìïï - - £ïï= í
ïï - -ïïî
• Among the 29 kaggle challenge winning solutions during 2015,
• 17 used XGBoost (Gradient Boosting Trees)
(8 solely used XGBoost, 9 used XGBoost + deep neural nets)
• 11 used deep neural nets
(2 solely used, 9 combined with XGBoost)
• In KDDCup 2015, Ensemble Trees was used in every winning team in the top 10
XGBoost
*Tianqi Chen, XGBoost: A Scalable Tree Boosting System (2016)
Ensemble Method Pros and Cons
• Advantages
• Avoid overfitting
• Fast and scalable  handle large-scale data
• Almost work ‘out-of-the-box’
• Disadvantages
• Overfitting
• ad hoc heuristic
• Not provide probabilistic framework (confidence intervals, posterior
distributions)
Empirical Test Suits
• Diabetes1)
• Concrete Slump Test2)
• Machine CPU1)
• Body Fat3)
• Yacht Hydrodynamics2)
• Chemical4)
• Boston Housing5)
• Istanbul stock exchange2)
• Concrete compressive strength2)
• Engine4)
• Airfoil Self-Noise2)
• Wine Quality (Red) 2)
• Pumadyn (32) 1)
• Pumadyn (8) 1)
• Bank (8) 1)
• Bank (32) 1)
• Wine Quality (White) 2)
• Computer Activity6)
• Computer Activity_small6)
• Kinematics of Robot Arm1)
• Combined Cycle Power Plant2)
• California Housing7)
• Friedman8)
1)http://www.dcc.fc.up.pt/~ltorgo/
2)https://archive.ics.uci.edu/ml/datasets/
3)http://www.people.vcu.edu/~rjohnson/bios546/programs/
4)MATLAB neural fitting toolbox
5)https://rpubs.com/yroy/Boston
6)http://www.cs.toronto.edu/~delve/
7)5)http://www.cs.cmu.edu/afs/cs/academic/class/15381-s07/www/hw6/cal_housing.arff
8)http://tunedit.org/repo/UCI/numeric/fried.arff
Description of Comparison Methods
• Corrected t-test*
where , and denote the difference
• Data set is divided into a learning sample of a given size and a test sample of
size
• Assumed to follow a student distribution with d.o.f.
• We used confidential interval to 95% (type 1 error) to verify the hypothesis
• In this task, we repeated 30 times independently ( is 30)
• Parameters used for ensemble trees are as defaults
*Nadeau, C., Bengio, Y., Inference for the generalization error (2003)
i
d
i i
A Be e-
Tn
Ln
sN
21
( )
d
corr
T
d
s L
t
n
N n
m
s
=
+
1
sN
ii
d
s
d
N
m =
=
å
2
2 1
( )
1
sN
i di
d
s
d
N
m
s =
-
=
-
å
1sN -
• Accuracy: R2
• GradientBoosting>XGBoost>ExtraTrees>Bagging>RandomForest>AdaBoost
Win/Draw/Loss records comparing the algorithm in the column versus the algorithm in the row
Bagging
Random
Forest
Extra Trees AdaBoost
Gradient
Boosting
XGBoost
Bagging - 0/27/0 10/16/1 0/8/19 11/9/7 7/13/7
Random
Forest
0/27/0 - 7/19/1 0/8/19 11/9/7 8/12/7
Extra Trees 1/16/10 1/19/7 - 0/7/20 8/12/7 7/13/7
AdaBoost 19/8/0 7/9/11 20/7/0 - 20/6/1 19/8/0
Gradient
Boosting
7/9/11 7/12/8 7/12/0 1/6/20 - 1/24/2
XGBoost 7/13/7 7/12/8 7/13/7 0/8/19 2/24/1 -
Empirical Test Results
( )
( )
2
1
2
1
1
s
s
N
i ii
N
i ii
y y
y y
=
=
-
-
-
å
å
%
Empirical Test Results
• Accuracy: R2 ( )
( )
2
1
2
1
1
s
s
N
i ii
N
i ii
y y
y y
=
=
-
-
-
å
å
%
• Computational Cost
• ExtraTrees>XGBoost>RandomForest>Bagging>GradientBoosting>AdaBoost
Bagging
Random
Forest
Extra Trees AdaBoost
Gradient
Boosting
XGBoost
Bagging - 11/13/3 20/7/0 0/4/23 7/3/17 11/14/2
Random
Forest
3/13/11 - 24/3/0 0/2/25 3/7/17 10/15/2
Extra Trees 0/7/20 0/3/24 - 0/0/27 0/0/27 2/23/2
AdaBoost 23/4/0 25/2/0 27/0/0 - 24/3/0 21/4/2
Gradient
Boosting
17/3/7 17/7/3 27/0/0 0/3/24 - 18/7/2
XGBoost 2/14/11 2/15/10 2/23/2 2/4/21 2/7/18 -
Empirical Test Results
Win/Draw/Loss records comparing the algorithm in the column versus the algorithm in the row
Empirical Test Results
• Computational Cost

More Related Content

What's hot

XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RShirin Elsinghorst
 
Kaggle talk series top 0.2% kaggler on amazon employee access challenge
Kaggle talk series  top 0.2% kaggler on amazon employee access challengeKaggle talk series  top 0.2% kaggler on amazon employee access challenge
Kaggle talk series top 0.2% kaggler on amazon employee access challengeVivian S. Zhang
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret packageVivian S. Zhang
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahoutGaurav Kasliwal
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Abhishek Thakur
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Vivian S. Zhang
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlpankit_ppt
 

What's hot (20)

Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
Demystifying Xgboost
Demystifying XgboostDemystifying Xgboost
Demystifying Xgboost
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
Kaggle talk series top 0.2% kaggler on amazon employee access challenge
Kaggle talk series  top 0.2% kaggler on amazon employee access challengeKaggle talk series  top 0.2% kaggler on amazon employee access challenge
Kaggle talk series top 0.2% kaggler on amazon employee access challenge
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret package
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
 
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
Data Science Academy Student Demo day--Peggy sobolewski,analyzing transporati...
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 

Viewers also liked

Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park
 
new optimization algorithm for topology optimization
new optimization algorithm for topology optimizationnew optimization algorithm for topology optimization
new optimization algorithm for topology optimizationSeonho Park
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
사회 연결망의 링크 예측
사회 연결망의 링크 예측사회 연결망의 링크 예측
사회 연결망의 링크 예측Kyunghoon Kim
 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentationewig123
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisTalha Kabakus
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
 
Lie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsLie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsTerry Taewoong Um
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewAdam Pah
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
One-Shot Learning
One-Shot LearningOne-Shot Learning
One-Shot LearningJisung Kim
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Sri Ambati
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 

Viewers also liked (20)

Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
new optimization algorithm for topology optimization
new optimization algorithm for topology optimizationnew optimization algorithm for topology optimization
new optimization algorithm for topology optimization
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
사회 연결망의 링크 예측
사회 연결망의 링크 예측사회 연결망의 링크 예측
사회 연결망의 링크 예측
 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentation
 
Tree advanced
Tree advancedTree advanced
Tree advanced
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC Analysis
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
Lie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot MechanicsLie Group Formulation for Robot Mechanics
Lie Group Formulation for Robot Mechanics
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overview
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
One-Shot Learning
One-Shot LearningOne-Shot Learning
One-Shot Learning
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 

Similar to Comparison Study of Decision Tree Ensembles for Regression

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2Nandhini S
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...PyData
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor
 
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGA GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGLubna_Alhenaki
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random ForestBong-Ho Lee
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesIntuit Inc.
 
Genetic programming
Genetic programmingGenetic programming
Genetic programmingOmar Ghazi
 

Similar to Comparison Study of Decision Tree Ensembles for Regression (20)

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGA GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random Forest
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
Machine learning
Machine learningMachine learning
Machine learning
 
Genetic programming
Genetic programmingGenetic programming
Genetic programming
 

Recently uploaded

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Comparison Study of Decision Tree Ensembles for Regression

  • 1. Comparison Study of Decision Tree Ensembles for Regression SEONHO PARK
  • 2. Objectives • Empirical study of Ensemble trees for regression problems • To verify its performance and time efficiency • Candidates from open source • Scikit-Learn • BaggingRegressor • RandomForestRegressor • ExtraTreesRegressor • AdaBoostRegressor • GradientBoostingRegressor • XGBoost • XGBRegressor
  • 3. Decision Tree 1x 2x2 2.5?x > 1 3.0?x > N Y N Y • Expressed as a recursive partition of the feature space • Use for both classifier and regressor • Building blocks: nodes, leaves • Node splits the instance space into two or more sub-spaces according to a certain discrete function of the input feature values 2.5 3.0
  • 4. Decision Tree Inducers • How to generate decision tree? • Rule to determine the decision tree is how to split and prune nodes • Decision trees inducers: ID3(Quinlan, 1986), C4.5(Quinlan, 1993), CART(Breiman et al., 1984) • CART is most generable and popular
  • 5. CART • CART stands for Classification and Regression Trees • Has ability to generate regression trees • Minimization of misclassification costs • In regression, the costs are represented for least squares between target values and expected values • Maximization of change of impurity function: • For regression, argmax ( ) ( ( )) ( ( )) j R j p l l r r x x i t P i t P i té ù= - -ê úë û [ ]arg min Var( ) Var( ) j R j l r x x Y Y= +
  • 6. CART • Pruning • minimum number of points Figure: Roman Timofeev, Classification and Regression Trees Theory and Applications, (2004) minN
  • 7. Decision Tree Pros And Cons • Advantages • Explicability: Easy to understand and interpret(white boxes) • Make minimal assumptions • Requires little data preparation • Addressing nonlinearity in an intuitive manner • Can handle both nominal and numerical features • Perform well with large datasets • Disadvantages • Heuristics such as the greedy algorithm  local optimal decision at each node • Instability, Overfitting – not to be robust to noise(outlier)
  • 8. Ensemble Methods • Tactics of Ensemble Tree can be classified by two types : Bagging and Boosting • Bagging Methods: Tree Bagging, Random Forest, Extra Trees • Boosting Methods: AdaBoost, Gradient Boosting Figure: http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_iris.html
  • 9. Averaging Methods • Random Forest (L. Breiman, 2001) • Tree Bagging + Split among a random subset of the feature • Extra Trees (Extremely Randomized Trees) (P. Geurts et al., 2006) • Random Forest + Extra Tree • Extra Tree: thresholds at nodes are drawn at random • Tree Bagging (L. Breiman, 1996) • What is Bagging? • BAGGING is abbreviation for Bootstrap AGGregatING • Boosting: samples are drawn with replacement • Drawn as random subsets of the features  ‘Random Subspace’(1999) • Drawn as random subsets of both samples and features  ‘Random Patches’ (2012)
  • 10. Boosting Methods – AdaBoost • AdaBoost (Y. Freund, and R. Schapire, 1995) • AdaBoost is abbreviation for ‘Adaptive Boosting’ • Sequential decision making method • Boosted classifier in the form: Hypothesis of weak learner weight Hypothesis of Strong learner Figure: Schapire and Freund, Boosting: Foundations and algorithms (2012) 1 ( ) ( ) T t t t H x h xr = = å
  • 11. Boosting Methods – AdaBoost  • Supposed that you are given (x1,y1),(x2,y2),…,(xn,yn), and the task is to fit model H(x). And your friend wants to help you and gives you a model H. you check his model and find it is good but not perfect. There are some mistakes: H(x1) = 0.8, H(x2) = 1.4…, while y1= 0.9, y2=1.3… How can you improve this model? • Rule • Use friend model H without any modification of it • Can add additional model h to improve prediction, so the new prediction will be H+h 1 ( ) ( ) T t t t H x h xr = = å 1( ) ( )T T T TH x H h xr-= +
  • 12. Boosting Methods – AdaBoost 1 1 1 2 2 2 ( ) ( ) ( ) ( ) ... ( ) ( )n n n H x h x y H x h x y H x h x y + = + = + = • Wish to improve the model such that:  1 1 1 2 2 2 1 ( ) ( ) ( ) ( ) ... ( ) ( )n n h x y H x h x y H x h x y H x = - = - = - • Fit a weak learner h to data (x1,y1-H(x1)),(x2,y2-H(x2)),…,(xn,yn-H(xn)) residual
  • 13. Boosting Methods – Gradient Boosting • AdaBoost: updates with loss function residual which will be converged to 0 • In scikit-learn, AdaBoost.R2 algorithm is implemented • Gradient Boosting (L. Breiman, 1997) : updates with negative gradients of loss functions which will be converged to 0 0y H- = 0 L H ¶ - = ¶ *Drucker,H., Improving Regressors using Boosting Techniques (1997)
  • 14. Boosting Methods – Gradient Boosting • Loss function • First order optimality • If loss function is as follows: • Negative gradients can be interpret as residuals ( , )L y H ( , ) 0, 1,i i i L y H i n H ¶ = " = ¶ 2 2 1 ( , ) 2 L y H y H= - ( , ) , 1,i i i i i L y H y H i n H ¶ = - " = ¶
  • 15. Boosting Methods – Gradient Boosting • Square loss function is not adequate to treat the outliers  overfitting • Other loss functions • Absolute loss • Huber loss ( , )L y H y H= - ( ) 21 ( ) if , 2( , ) / 2 otherwise y H y H L y H y H d d d ìïï - - £ïï= í ïï - -ïïî
  • 16. • Among the 29 kaggle challenge winning solutions during 2015, • 17 used XGBoost (Gradient Boosting Trees) (8 solely used XGBoost, 9 used XGBoost + deep neural nets) • 11 used deep neural nets (2 solely used, 9 combined with XGBoost) • In KDDCup 2015, Ensemble Trees was used in every winning team in the top 10 XGBoost *Tianqi Chen, XGBoost: A Scalable Tree Boosting System (2016)
  • 17. Ensemble Method Pros and Cons • Advantages • Avoid overfitting • Fast and scalable  handle large-scale data • Almost work ‘out-of-the-box’ • Disadvantages • Overfitting • ad hoc heuristic • Not provide probabilistic framework (confidence intervals, posterior distributions)
  • 18. Empirical Test Suits • Diabetes1) • Concrete Slump Test2) • Machine CPU1) • Body Fat3) • Yacht Hydrodynamics2) • Chemical4) • Boston Housing5) • Istanbul stock exchange2) • Concrete compressive strength2) • Engine4) • Airfoil Self-Noise2) • Wine Quality (Red) 2) • Pumadyn (32) 1) • Pumadyn (8) 1) • Bank (8) 1) • Bank (32) 1) • Wine Quality (White) 2) • Computer Activity6) • Computer Activity_small6) • Kinematics of Robot Arm1) • Combined Cycle Power Plant2) • California Housing7) • Friedman8) 1)http://www.dcc.fc.up.pt/~ltorgo/ 2)https://archive.ics.uci.edu/ml/datasets/ 3)http://www.people.vcu.edu/~rjohnson/bios546/programs/ 4)MATLAB neural fitting toolbox 5)https://rpubs.com/yroy/Boston 6)http://www.cs.toronto.edu/~delve/ 7)5)http://www.cs.cmu.edu/afs/cs/academic/class/15381-s07/www/hw6/cal_housing.arff 8)http://tunedit.org/repo/UCI/numeric/fried.arff
  • 19. Description of Comparison Methods • Corrected t-test* where , and denote the difference • Data set is divided into a learning sample of a given size and a test sample of size • Assumed to follow a student distribution with d.o.f. • We used confidential interval to 95% (type 1 error) to verify the hypothesis • In this task, we repeated 30 times independently ( is 30) • Parameters used for ensemble trees are as defaults *Nadeau, C., Bengio, Y., Inference for the generalization error (2003) i d i i A Be e- Tn Ln sN 21 ( ) d corr T d s L t n N n m s = + 1 sN ii d s d N m = = å 2 2 1 ( ) 1 sN i di d s d N m s = - = - å 1sN -
  • 20. • Accuracy: R2 • GradientBoosting>XGBoost>ExtraTrees>Bagging>RandomForest>AdaBoost Win/Draw/Loss records comparing the algorithm in the column versus the algorithm in the row Bagging Random Forest Extra Trees AdaBoost Gradient Boosting XGBoost Bagging - 0/27/0 10/16/1 0/8/19 11/9/7 7/13/7 Random Forest 0/27/0 - 7/19/1 0/8/19 11/9/7 8/12/7 Extra Trees 1/16/10 1/19/7 - 0/7/20 8/12/7 7/13/7 AdaBoost 19/8/0 7/9/11 20/7/0 - 20/6/1 19/8/0 Gradient Boosting 7/9/11 7/12/8 7/12/0 1/6/20 - 1/24/2 XGBoost 7/13/7 7/12/8 7/13/7 0/8/19 2/24/1 - Empirical Test Results ( ) ( ) 2 1 2 1 1 s s N i ii N i ii y y y y = = - - - å å %
  • 21. Empirical Test Results • Accuracy: R2 ( ) ( ) 2 1 2 1 1 s s N i ii N i ii y y y y = = - - - å å %
  • 22. • Computational Cost • ExtraTrees>XGBoost>RandomForest>Bagging>GradientBoosting>AdaBoost Bagging Random Forest Extra Trees AdaBoost Gradient Boosting XGBoost Bagging - 11/13/3 20/7/0 0/4/23 7/3/17 11/14/2 Random Forest 3/13/11 - 24/3/0 0/2/25 3/7/17 10/15/2 Extra Trees 0/7/20 0/3/24 - 0/0/27 0/0/27 2/23/2 AdaBoost 23/4/0 25/2/0 27/0/0 - 24/3/0 21/4/2 Gradient Boosting 17/3/7 17/7/3 27/0/0 0/3/24 - 18/7/2 XGBoost 2/14/11 2/15/10 2/23/2 2/4/21 2/7/18 - Empirical Test Results Win/Draw/Loss records comparing the algorithm in the column versus the algorithm in the row
  • 23. Empirical Test Results • Computational Cost