SlideShare a Scribd company logo
1 of 16
Download to read offline
L1-based compression of random forest model 
Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkel 
a.joly@ulg.ac.be - www.ajoly.org 
Departement of EE and CS & GIGA-Research
High dimensional supervised learning applications 
3D Image 
segmentation 
Genomics Electrical grid 
x=original image 
y=segmented image 
x=DNA sequence 
y=phenotype 
x=system state 
y=stability 
From 105 to 109 dimensions.
Tree based ensemble methods 
From a dataset of input-output pairs f(xi ; yi )gni 
=1  X  Y, we 
approximate f : X ! Y by learning an ensemble of M decision trees. 
Label 3 
Yes 
Yes 
No 
No 
... 
Label 1 
Label 2 
The estimator ^f of f is obtained by averaging the predictions of the 
ensemble of trees.
High model complexity ! Large memory requirement 
The complexity of tree based methods is measured by the number of 
internal nodes and increases with 
I the ensemble size M; 
I the number of samples n in the dataset. 
The variance of individual trees increases with the dimension p of the 
original feature space ! M(p) should increase with p to yield near optimal 
accuracy. 
Complexity grows as nM(p) ! may require huge amount of storage. 
Memory limitation will be an issue in high dimensional problems.
L1-based compression of random forest model (I) 
We first learn an ensemble of M extremely randomized trees (Geurts, et al., 
2006) . . . 
Example 
1,1 
1,2 1,3 
1,4 1,5 
1,6 1,7 
2,1 
2,2 2,3 
2,4 2,5 2,6 2,7 
3,1 
3,2 3,3 
3,4 3,5 
. . . and associate to each node an indicator function 1m;l (x) which is equal 
to 1 if the sample (x; y) reaches the l -th node of the m-th tree, 0 otherwise.
L1-based compression of random forest model (II) 
The node indicator functions 1m;l (x) may be used to lift the input space X 
towards its induced feature space Z 
z(x) = (11;1(x); : : : ; 11;N1(x); : : : ; 1M;1(x); : : : ; 1M;NM(x)) : 
Example for one sample xs 
1,1 
1,2 1,3 
1,4 1,5 
1,6 1,7 
2,1 
2,2 2,3 
2,4 2,5 2,6 2,7 
3,1 
3,2 3,3 
3,4 3,5
L1-based compression of random forest model (III) 
A variable selection method (regularization with L1-norm) is applied on the 
induced space Z to compress the tree ensemble using the solution of
j (t) 
q 
j=0 = arg min
Xn 
i=1 
0 
@yi
0  
Xq 
j=1 
1 
A
j zj (xi ) 
2 
s.t. 
Xq 
j=1 
j
j j  t: 
Pruning: a test node is deleted if all its descendants (including the test 
node itself) correspond to
j (t) = 0.
Overall assessment on 3 datasets 
Datasets Error Complexity 
ET rET ET rET ET/rET 
Friedman1 0:19587 0:18593 29900 885 34 
Two-norm 0:04177 0:06707 4878 540 9 
SEFTi 0:86159 0:84131 39436 2055 19 
ET Extra trees; 
rET L1-based compression of ET. 
Table: Parameters of the Extra-Tree method: M = 100; K = p; nmin = 1 on 
Friedman1 and Two-norm, nmin = 10 on SEFTi.
An increase of t decreases the error of rET until t = 3 with 
drastic pruning 
(a) Estimated risk (b) Complexity 
Friedman1 : M = 100, K = p = 10 and nmin = 1
Managing complexity in the extra tree method 
Bound M 
Restrict the size M of the tree based ensemble. 
Pre-pruning 
Pre-pruning reduces the complexity of tree based methods by imposing a 
condition to split a node e.g. 
I minimum number of samples nmin in order to split, 
I minimum decrease of an impurity measure, 
I . . .

More Related Content

What's hot

MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
The Statistical and Applied Mathematical Sciences Institute
 

What's hot (20)

Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
FUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis ToolFUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis Tool
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Newton Raphson iterative Method
Newton Raphson iterative MethodNewton Raphson iterative Method
Newton Raphson iterative Method
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
Pattern classification
Pattern classificationPattern classification
Pattern classification
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
12. Random Forest
12. Random Forest12. Random Forest
12. Random Forest
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 

Similar to L1-based compression of random forest modelSlide

Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Umberto Picchini
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
butest
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
nozomuhamada
 

Similar to L1-based compression of random forest modelSlide (20)

Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Benelearn2016
Benelearn2016Benelearn2016
Benelearn2016
 
Dynamics of structures with uncertainties
Dynamics of structures with uncertaintiesDynamics of structures with uncertainties
Dynamics of structures with uncertainties
 
CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Habilitation à diriger des recherches
Habilitation à diriger des recherchesHabilitation à diriger des recherches
Habilitation à diriger des recherches
 
Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
 
Probability Assignment Help
Probability Assignment HelpProbability Assignment Help
Probability Assignment Help
 
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
MUMS Opening Workshop - Model Uncertainty in Data Fusion for Remote Sensing -...
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
Large variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disasterLarge variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disaster
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

L1-based compression of random forest modelSlide

  • 1. L1-based compression of random forest model Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkel a.joly@ulg.ac.be - www.ajoly.org Departement of EE and CS & GIGA-Research
  • 2. High dimensional supervised learning applications 3D Image segmentation Genomics Electrical grid x=original image y=segmented image x=DNA sequence y=phenotype x=system state y=stability From 105 to 109 dimensions.
  • 3. Tree based ensemble methods From a dataset of input-output pairs f(xi ; yi )gni =1 X Y, we approximate f : X ! Y by learning an ensemble of M decision trees. Label 3 Yes Yes No No ... Label 1 Label 2 The estimator ^f of f is obtained by averaging the predictions of the ensemble of trees.
  • 4. High model complexity ! Large memory requirement The complexity of tree based methods is measured by the number of internal nodes and increases with I the ensemble size M; I the number of samples n in the dataset. The variance of individual trees increases with the dimension p of the original feature space ! M(p) should increase with p to yield near optimal accuracy. Complexity grows as nM(p) ! may require huge amount of storage. Memory limitation will be an issue in high dimensional problems.
  • 5. L1-based compression of random forest model (I) We first learn an ensemble of M extremely randomized trees (Geurts, et al., 2006) . . . Example 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5 . . . and associate to each node an indicator function 1m;l (x) which is equal to 1 if the sample (x; y) reaches the l -th node of the m-th tree, 0 otherwise.
  • 6. L1-based compression of random forest model (II) The node indicator functions 1m;l (x) may be used to lift the input space X towards its induced feature space Z z(x) = (11;1(x); : : : ; 11;N1(x); : : : ; 1M;1(x); : : : ; 1M;NM(x)) : Example for one sample xs 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5
  • 7. L1-based compression of random forest model (III) A variable selection method (regularization with L1-norm) is applied on the induced space Z to compress the tree ensemble using the solution of
  • 8. j (t) q j=0 = arg min
  • 9. Xn i=1 0 @yi
  • 10. 0 Xq j=1 1 A
  • 11. j zj (xi ) 2 s.t. Xq j=1 j
  • 12. j j t: Pruning: a test node is deleted if all its descendants (including the test node itself) correspond to
  • 13. j (t) = 0.
  • 14. Overall assessment on 3 datasets Datasets Error Complexity ET rET ET rET ET/rET Friedman1 0:19587 0:18593 29900 885 34 Two-norm 0:04177 0:06707 4878 540 9 SEFTi 0:86159 0:84131 39436 2055 19 ET Extra trees; rET L1-based compression of ET. Table: Parameters of the Extra-Tree method: M = 100; K = p; nmin = 1 on Friedman1 and Two-norm, nmin = 10 on SEFTi.
  • 15. An increase of t decreases the error of rET until t = 3 with drastic pruning (a) Estimated risk (b) Complexity Friedman1 : M = 100, K = p = 10 and nmin = 1
  • 16. Managing complexity in the extra tree method Bound M Restrict the size M of the tree based ensemble. Pre-pruning Pre-pruning reduces the complexity of tree based methods by imposing a condition to split a node e.g. I minimum number of samples nmin in order to split, I minimum decrease of an impurity measure, I . . .
  • 17. The accuracy and complexity of an rET model does not depend on nmin, for nmin small enough (nmin 10 ) (c) Estimated risk (d) Complexity Friedman1 : M = 100, K = p = 10 and t = t cv
  • 18. After variance reduction has stabilized (M ' 100), further increasing M keeps enhancing the accuracy of the rET model without increasing complexity (e) Estimated risk (f) Complexity Friedman1 : nmin = 10, K = p = 10 and t = t cv
  • 19. Conclusion perspectives 1. Drastic pruning while preserving accuracy. 2. Strong compressibility of the tree ensemble suggests that it could be possible to design novel algorithms suited to very high dimensional input space. 3. Future research will target similar compression ratio without using the complete set of node indicator functions of the forest model.
  • 20. L1-based compression of random forest model Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkel a.joly@ulg.ac.be - www.ajoly.org Departement of EE and CS GIGA-Research
  • 22. Overall assessment on 3 datasets Datasets Error Complexity ET rET Lasso ET rET Lasso Friedman1 0:19587 0:18593 0:282441 29900 885 4 Two-norm 0:04177 0:06707 0:033500 4878 540 20 SEFTi 0:86159 0:84131 0:988031 39436 2055 14 ET Extra trees; rET L1-based compression of ET. Table: Overall assessment (parameters of the Extra-Tree method: M = 100; K = p; nmin = 1 on Friedman1 and Two-norm, nmin = 10 on SEFTi).