SlideShare a Scribd company logo
1 of 14
Download to read offline
Bias-variance decomposition 
in Random Forests 
Gilles Louppe @glouppe 
Paris ML S02E04, December 8, 2014 
1 / 14
Motivation 
In supervised learning, combining the predictions of several 
randomized models often achieves better results than a single 
non-randomized model. 
Why ? 
2 / 14
Supervised learning 
 The inputs are random variables X = X1, ..., Xp ; 
 The output is a random variable Y . 
 Data comes as a
nite learning set 
L = f(xi , yi )ji = 0, . . . ,N - 1g, 
where xi 2 X = X1  ...  Xp and yi 2 Y are randomly drawn 
from PX,Y . 
 The goal is to
nd a model 'L : X7! Y minimizing 
Err ('L) = EX,Y fL(Y ,'L(X))g. 
3 / 14
Performance evaluation 
Classi
cation 
 Symbolic output (e.g., Y = fyes, nog) 
 Zero-one loss 
L(Y ,'L(X)) = 1(Y6= 'L(X)) 
Regression 
 Numerical output (e.g., Y = R) 
 Squared error loss 
L(Y ,'L(X)) = (Y - 'L(X))2 
4 / 14
Decision trees 
0.7 
0.5 
X1 
X2 
t5 t3 t4 
푡2 
푡1 
푋1 ≤ 0.7 
푡3 
푡4 푡5 
풙 
푝(푌 = 푐|푋 = 풙) 
Split node 
≤  Leaf node 
푋2 ≤ 0.5 ≤  
t 2 ' : nodes of the tree ' 
Xt : split variable at t 
vt 2 R : split threshold at t 
'(x) = arg maxc2Y p(Y = cjX = x) 
5 / 14
Bias-variance decomposition in regression 
Theorem. For the squared error loss, the bias-variance 
decomposition of the expected generalization error at X = x is 
ELfErr ('L(x))g = noise(x) + bias2(x) + var(x) 
where 
noise(x) = Err ('B(x)), 
bias2(x) = ('B(x) - ELf'L(x)g)2, 
var(x) = ELf(ELf'L(x)g - 'L(x))2g. 
6 / 14
Bias-variance decomposition 
7 / 14
Diagnosing the generalization error of a decision tree 
 (Residual error : Lowest achievable error, independent of 'L.) 
 Bias : Decision trees usually have low bias. 
 Variance : They often suer from high variance. 
 Solution : Combine the predictions of several randomized trees 
into a single model. 
8 / 14
Random forests 
풙 
휑1 휑푀 
푝휑1 (푌 = 푐|푋 = 풙) 
… 
푝휑푚 (푌 = 푐|푋 = 풙) 
Σ 
푝휓(푌 = 푐|푋 = 풙) 
Randomization 
 Bootstrap samples  Random selection of K 6 p split variables g Random Forests  Random selection of the threshold g Extra-Trees 
9 / 14
Bias-variance decomposition (cont.) 
Theorem. For the squared error loss, the bias-variance 
decomposition of the expected generalization error 
ELfErr ( L,1,...,M (x))g at X = x of an ensemble of M 
randomized models 'L,m is 
ELfErr ( L,1,...,M (x))g = noise(x) + bias2(x) + var(x), 
where 
noise(x) = Err ('B(x)), 
bias2(x) = ('B(x) - EL,f'L,(x)g)2, 
var(x) = (x)2 
L,(x) + 
1 - (x) 
M 
2 
L,(x). 
and where (x) is the Pearson correlation coecient between the 
predictions of two randomized trees built on the same learning set. 
10 / 14
Interpretation of (x) (Louppe, 2014) 
Theorem. (x) = 
VLfEjLf'L,(x)gg 
VLfEjLf'L,(x)gg+ELfVjLf'L,(x)gg 
In other words, it is the ratio between 
 the variance due to the learning set and 
 the total variance, accounting for random eects due to both 
the learning set and the random perburbations. 
(x) ! 1 when variance is mostly due to the learning set ; 
(x) ! 0 when variance is mostly due to the random 
perturbations ; 
(x)  0. 
11 / 14

More Related Content

What's hot

Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
Gram-Schmidt process linear algbera
Gram-Schmidt process linear algberaGram-Schmidt process linear algbera
Gram-Schmidt process linear algberaPulakKundu1
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
 
Mean Squared Error (MSE) of an Estimator
Mean Squared Error (MSE) of an EstimatorMean Squared Error (MSE) of an Estimator
Mean Squared Error (MSE) of an EstimatorSuruchi Somwanshi
 
Probability Density Functions
Probability Density FunctionsProbability Density Functions
Probability Density Functionsguestb86588
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Solution of non-linear equations
Solution of non-linear equationsSolution of non-linear equations
Solution of non-linear equationsZunAib Ali
 
Health care analytics
Health care analyticsHealth care analytics
Health care analyticsGaurav Dubey
 
Electromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM SpectrumElectromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM SpectrumSuleyman Demirel University
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distributionmathscontent
 
Calculating a two sample z test by hand
Calculating a two sample z test by handCalculating a two sample z test by hand
Calculating a two sample z test by handKen Plummer
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control StudySatish Gupta
 

What's hot (20)

Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Gram-Schmidt process linear algbera
Gram-Schmidt process linear algberaGram-Schmidt process linear algbera
Gram-Schmidt process linear algbera
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
 
Mean Squared Error (MSE) of an Estimator
Mean Squared Error (MSE) of an EstimatorMean Squared Error (MSE) of an Estimator
Mean Squared Error (MSE) of an Estimator
 
Bernoulli distribution
Bernoulli distributionBernoulli distribution
Bernoulli distribution
 
Probability Density Functions
Probability Density FunctionsProbability Density Functions
Probability Density Functions
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Solution of non-linear equations
Solution of non-linear equationsSolution of non-linear equations
Solution of non-linear equations
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
PSO
PSOPSO
PSO
 
Health care analytics
Health care analyticsHealth care analytics
Health care analytics
 
Electromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM SpectrumElectromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
Electromagnetic analysis using FDTD method, Biological Effects Of EM Spectrum
 
Bernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial DistributionBernoullis Random Variables And Binomial Distribution
Bernoullis Random Variables And Binomial Distribution
 
An Overview of Simple Linear Regression
An Overview of Simple Linear RegressionAn Overview of Simple Linear Regression
An Overview of Simple Linear Regression
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Parameter estimation
Parameter estimationParameter estimation
Parameter estimation
 
Calculating a two sample z test by hand
Calculating a two sample z test by handCalculating a two sample z test by hand
Calculating a two sample z test by hand
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 

Similar to Bias-variance decomposition in Random Forests

Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
 
Numarical values
Numarical valuesNumarical values
Numarical valuesAmanSaeed11
 
Numarical values highlighted
Numarical values highlightedNumarical values highlighted
Numarical values highlightedAmanSaeed11
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsVjekoslavKovac1
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Frank Nielsen
 
02-Random Variables.ppt
02-Random Variables.ppt02-Random Variables.ppt
02-Random Variables.pptAkliluAyele3
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 

Similar to Bias-variance decomposition in Random Forests (20)

Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Estimationtheory2
Estimationtheory2Estimationtheory2
Estimationtheory2
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 
Numarical values
Numarical valuesNumarical values
Numarical values
 
Numarical values highlighted
Numarical values highlightedNumarical values highlighted
Numarical values highlighted
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Distributions
DistributionsDistributions
Distributions
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurations
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
 
02-Random Variables.ppt
02-Random Variables.ppt02-Random Variables.ppt
02-Random Variables.ppt
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Lecture notes
Lecture notes Lecture notes
Lecture notes
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 

Recently uploaded

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Bias-variance decomposition in Random Forests

  • 1. Bias-variance decomposition in Random Forests Gilles Louppe @glouppe Paris ML S02E04, December 8, 2014 1 / 14
  • 2. Motivation In supervised learning, combining the predictions of several randomized models often achieves better results than a single non-randomized model. Why ? 2 / 14
  • 3. Supervised learning The inputs are random variables X = X1, ..., Xp ; The output is a random variable Y . Data comes as a
  • 4. nite learning set L = f(xi , yi )ji = 0, . . . ,N - 1g, where xi 2 X = X1 ... Xp and yi 2 Y are randomly drawn from PX,Y . The goal is to
  • 5. nd a model 'L : X7! Y minimizing Err ('L) = EX,Y fL(Y ,'L(X))g. 3 / 14
  • 7. cation Symbolic output (e.g., Y = fyes, nog) Zero-one loss L(Y ,'L(X)) = 1(Y6= 'L(X)) Regression Numerical output (e.g., Y = R) Squared error loss L(Y ,'L(X)) = (Y - 'L(X))2 4 / 14
  • 8. Decision trees 0.7 0.5 X1 X2 t5 t3 t4 푡2 푡1 푋1 ≤ 0.7 푡3 푡4 푡5 풙 푝(푌 = 푐|푋 = 풙) Split node ≤ Leaf node 푋2 ≤ 0.5 ≤ t 2 ' : nodes of the tree ' Xt : split variable at t vt 2 R : split threshold at t '(x) = arg maxc2Y p(Y = cjX = x) 5 / 14
  • 9. Bias-variance decomposition in regression Theorem. For the squared error loss, the bias-variance decomposition of the expected generalization error at X = x is ELfErr ('L(x))g = noise(x) + bias2(x) + var(x) where noise(x) = Err ('B(x)), bias2(x) = ('B(x) - ELf'L(x)g)2, var(x) = ELf(ELf'L(x)g - 'L(x))2g. 6 / 14
  • 11. Diagnosing the generalization error of a decision tree (Residual error : Lowest achievable error, independent of 'L.) Bias : Decision trees usually have low bias. Variance : They often suer from high variance. Solution : Combine the predictions of several randomized trees into a single model. 8 / 14
  • 12. Random forests 풙 휑1 휑푀 푝휑1 (푌 = 푐|푋 = 풙) … 푝휑푚 (푌 = 푐|푋 = 풙) Σ 푝휓(푌 = 푐|푋 = 풙) Randomization Bootstrap samples Random selection of K 6 p split variables g Random Forests Random selection of the threshold g Extra-Trees 9 / 14
  • 13. Bias-variance decomposition (cont.) Theorem. For the squared error loss, the bias-variance decomposition of the expected generalization error ELfErr ( L,1,...,M (x))g at X = x of an ensemble of M randomized models 'L,m is ELfErr ( L,1,...,M (x))g = noise(x) + bias2(x) + var(x), where noise(x) = Err ('B(x)), bias2(x) = ('B(x) - EL,f'L,(x)g)2, var(x) = (x)2 L,(x) + 1 - (x) M 2 L,(x). and where (x) is the Pearson correlation coecient between the predictions of two randomized trees built on the same learning set. 10 / 14
  • 14. Interpretation of (x) (Louppe, 2014) Theorem. (x) = VLfEjLf'L,(x)gg VLfEjLf'L,(x)gg+ELfVjLf'L,(x)gg In other words, it is the ratio between the variance due to the learning set and the total variance, accounting for random eects due to both the learning set and the random perburbations. (x) ! 1 when variance is mostly due to the learning set ; (x) ! 0 when variance is mostly due to the random perturbations ; (x) 0. 11 / 14
  • 15. Diagnosing the generalization error of random forests Bias : Identical to the bias of a single randomized tree. Variance : var(x) = (x)2 L,(x) + 1-(x) M 2 L,(x) As M ! 1, var(x) ! (x)2 L,(x) The stronger the randomization, (x) ! 0, var(x) ! 0. The weaker the randomization, (x) ! 1, var(x) ! 2 L,(x) Bias-variance trade-o. Randomization increases bias but makes it possible to reduce the variance of the corresponding ensemble model through averaging. The crux of the problem is to
  • 16. nd the right trade-o. Tips : tune max features in Random Forests. 12 / 14
  • 18. cation Theorem. For the zero-one loss and binary classi
  • 19. cation, the expected generalization error ELfErr ('L(x))g at X = x decomposes as follows : ELfErr ('L(x))g = P('B(x)6= Y ) + ( 0.5 - ELfbp p bp L(Y = 'B(x))g VLfL(Y = 'B(x))g )(2P('B(x) = Y ) - 1) For ELfbp L(Y = 'B(x)g 0.5, VLfbp L(Y = 'B(x))g ! 0 makes ! 0 and the expected generalization error tends to the error of the Bayes model. Conversely, for ELfbp L(Y = 'B(x)g 0.5, VLfbp L(Y = 'B(x))g ! 0 makes ! 1 and the error is maximal. 13 / 14