SlideShare a Scribd company logo
1 of 59
Download to read offline
Prof. Pier Luca Lanzi
Regression
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
Simple Linear Regression
2
Prof. Pier Luca Lanzi
It all starts from some data …
Prof. Pier Luca Lanzi
The training data points
Prof. Pier Luca Lanzi
Can we predict the value of y from x?
Prof. Pier Luca Lanzi
Training data points with a simple linear regression model
Prof. Pier Luca Lanzi
But the model we built will not be
used not on the same data …
Prof. Pier Luca Lanzi
The previous model applied to the model learned from the training data
Prof. Pier Luca Lanzi
regression = model building + model usage
Prof. Pier Luca Lanzi
How Do We Evaluate
a Regression Model?
• Given N examples, pairs xi yi, linear regression computes a model
• So that for each point,
• We evaluate the model by computing the Residual Sum of
Squares (RSS) computed as,
10
Prof. Pier Luca Lanzi
The goal of linear regression is thus to
find the weights that minimize RSS
Prof. Pier Luca Lanzi
How Do We Compute
the Best Weights?
• Approach 1
§Set the gradient of RSS(w0,w1) to zero; but the approach is
infeasible in practice
• Approach 2
§Apply gradient descent
§If η is large, we are making large steps but might not converge,
if η is small, we might be very slow. Typically, η adapts over
time, e.g., η(t) = α/t or α/sqrt(t)
12
Prof. Pier Luca Lanzi
The RSS Gradient
• For simple linear regression the gradient of RSS has only two
components one for w0 and one for w1
13
Prof. Pier Luca Lanzi
Multiple Linear Regression
14
Prof. Pier Luca Lanzi
Input variable: LSTAT - % lower status of the population
Output variable: MEDV - Median value of owner-occupied homes in $1000's
Prof. Pier Luca Lanzi
Can we predict the property value
using other variables?
Prof. Pier Luca Lanzi
Multiple Linear Regression
• Given a set of examples associating LSTATi values to MEDVi
values, simple linear regression find a function f(.) such that
• Where 𝛆i is the error to be minimized
• Typically, we assume a model and fit the model into the data
• With linear model we assume that f(.) is computed as
• A polynomial model would fit the data points with a function,
17
Prof. Pier Luca Lanzi
Multiple Linear Regression
• Given, D input variables, assumes that the output y can be
computed as,
• The model cost is computed using the residual sum of square,
RSS(w) as,
18
Prof. Pier Luca Lanzi
Coefficient of Determination R2
• Total sum of squares
• Coefficient of determination
• R2 measures of how well the regression line approximates the
real data points. When R2 is 1, the regression line perfectly fits
the data.
19
Prof. Pier Luca Lanzi
Multiple Linear Regression:
General Formulation
• In general, given a set of input variables x, a set of N examples xi,
yi and a set of D features hj computed from the input variables xi,
multiple linear regression assumes a model,
• hj(.) identify variables derived from the original inputs
• hj(.) could be the squared value of an existing variable, a
trigonometric function, the age given the date of birth, etc.
20
Prof. Pier Luca Lanzi
Multiple Linear Regression:
General Formulation
• Multiple linear regression aims at minimizing,
• For this purpose, it can apply gradient descent to update the
weights as,
21
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Multiple Linear Regression with
Gradient Descent
24
Prof. Pier Luca Lanzi
Model Evaluation
25
Prof. Pier Luca Lanzi
Model Evaluation
• Models should be evaluated using data that have not been used
to build the model itself
• For example, would be feasible to evaluate students using exactly
the same problems solved in class?
• The available data must be split between training and test
• Training data will be used to build the model
• Test data will be used to evaluate the model performance
26
Prof. Pier Luca Lanzi
Holdout Evaluation
• Reserves a certain amount for testing and uses the remainder for
training
§Too small training sets might result in poor weight estimation
§Too small test sets might result in a poor estimation of future
performance
• Typically,
§Reserve ½ for training and ½ for testing
§Reserve 2/3 for training and 1/3 for testing
• For small or “unbalanced” datasets, samples might not be
representative
27
Prof. Pier Luca Lanzi
Given the original dataset, we split the data into 2/3 train and 1/3 test and then apply multiple linear
regression using polynomials of increasing degree. The plot show how RSS and R2 vary.
Prof. Pier Luca Lanzi
Holdout Evaluation using
the Housing Data
• Given the original dataset, we split the data into 2/3 train and 1/3
test and then apply multiple linear regression using polynomials of
increasing degree.
• RSS initially decreases as polynomials better approximate the data
but then higher degree polynomials overfit
• The same is shown by the R2 statistics
29
Prof. Pier Luca Lanzi
Cross-Validation
• First step
§Data is split into k subsets of equal size
• Second step
§Each subset in turn is used for testing and the remainder for training
• This is called k-fold cross-validation and avoids overlapping test sets
• Often the subsets are stratified before cross-validation is performed
• The error estimates are averaged to yield an overall error estimate
30
Prof. Pier Luca Lanzi
Ten-fold Crossvalidation 31
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
test train
1 2 3 4 5 6 7 8 9 10
testtrain
2 3 4 5 6 7 8 9 10
test train
1
train
… … …
p1
p2
p10
The final performance is computed as the average pi
Prof. Pier Luca Lanzi
Cross-Validation
• Standard method for evaluation stratified ten-fold cross-validation
• Why ten? Extensive experiments have shown that this is the best
choice to get an accurate estimate
• Stratification reduces the estimate’s variance
• Even better: repeated stratified cross-validation
• E.g. ten-fold cross-validation is repeated ten times and results are
averaged (reduces the variance)
• Other approaches appear to be robust, e.g., 5x2 crossvalidation
32
Prof. Pier Luca Lanzi
As we increase the degree of the fitting polynomial, the crossvalidation error starts to increase,
because the model the model starts to overfit! Best performance for the 5th degree polynomial.
Prof. Pier Luca Lanzi
Fitting using the 5th degree polynomial
Prof. Pier Luca Lanzi
Overfitting
35
Prof. Pier Luca Lanzi
What is Overfitting?
Very good performance
on the training set
Terrible performance
on the test set
Prof. Pier Luca Lanzi
In regression, overfitting is often associated
to large weights estimates
Add to the usual cost (RSS) a term
to penalize large weights to avoid overfitting
Total cost = Measure of Fit + Magnitude of Coefficients
Prof. Pier Luca Lanzi
Ridge Regression (L2 Regularization)
• Minimizes the cost function,
• If α is zero, the cost is exactly the same as before; if α is infinite,
then the only solution corresponds to having all the weights to 0
• In the gradient descent algorithm the update for weight j
becomes,
38
Prof. Pier Luca Lanzi
Lasso Regression (L1 Regularization)
• Minimizes the cost function,
• If α is zero, the cost is exactly the same as before; if α is infinite,
then the only solution corresponds to having all the weights to 0
• In gradient descent, weight j is modified as,
39
Prof. Pier Luca Lanzi
Example
40
Prof. Pier Luca Lanzi
Simple example data generated using a trigonometric function.
Prof. Pier Luca Lanzi
Applying multiple linear regression
Prof. Pier Luca Lanzi
Absolute value of the largest weight computed using simple linear
regression with polynomials of increasing degree
Prof. Pier Luca Lanzi
Ridge Regression
Prof. Pier Luca Lanzi
Absolute value of the largest weight computed using
ridge regression with polynomials of increasing degree
Prof. Pier Luca Lanzi
Applying ridge regression with polynomials of increasing degrees
Prof. Pier Luca Lanzi
Computed weight values when applying ridge regression with
a polynomial of degree 10 and different values of lambda
Prof. Pier Luca Lanzi
Lasso
Prof. Pier Luca Lanzi
Absolute value of the largest weight computed using
Lasso with polynomials of increasing degree
Prof. Pier Luca Lanzi
Computed weight values when applying Lasso with
a polynomial of degree 10 and different values of lambda
Prof. Pier Luca Lanzi
Lasso tends to zero out less important
features and produces sparser solutions
Basically, by penalizing large weights it
also performs feature selection
Prof. Pier Luca Lanzi
Choosing α
52
Prof. Pier Luca Lanzi
Available Data
Training
model building
Testing
model evaluation
Training
model building
Validation
select α
Testing
model evaluation
Training & α Selection
select the λ with the smallest crossvalidation error then train
Testing
model evaluation
Prof. Pier Luca Lanzi
Selecting the Best α
• To select the best value of α we cannot use the test set since it is
going to be used for evaluating the final model (which uses α)
• We need to reserve part of the training data to evaluate possible
candidate values of α and to select the best one
• If we have enough data, we can extract a validation set from the
training data which will be used to select α
• If we don’t have enough data, we should select α by applying k-
fold crossvalidation over the training data choosing the α
corresponding to the lowest average cost over the k folds
54
Prof. Pier Luca Lanzi
Applying Lasso with a α of 0.01 with different polynomials
55
Prof. Pier Luca Lanzi
Applying Lasso with differen values of α – Best α is 0.01
Prof. Pier Luca Lanzi
Summary
57
Prof. Pier Luca Lanzi
Linear Regression Algorithms
• The goal is to minimize the residual sum of squares (RSS)
• Exact methods
§Compute the set of weights that minimizes RSS
• Gradient descent (batch and stochastic)
§Start with a random set of weights and update them based on
the direction that minimizes RSS
• Ridge regression/Lasso
§Compute the cost also using the magnitude of the coefficients
§The larger the coefficients the more likely we are overfitting
58
Prof. Pier Luca Lanzi
Assignments
• Check the Python notebooks discussing simple and multiple linear
regression, Lasso and Ridge Regression
59

More Related Content

What's hot

DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsPier Luca Lanzi
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningPier Luca Lanzi
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationPier Luca Lanzi
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2Pier Luca Lanzi
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 

What's hot (20)

DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification Rules
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 

Similar to DMTM Lecture 03 Regression

ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsAndrea Arcuri
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring EvaluationYun Huang
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Julián Urbano
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluationkhairulhuda242
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learningShajun Nisha
 
Proving Lower Bounds to Answer the P versus NP question
Proving Lower Bounds to Answer the P versus NP questionProving Lower Bounds to Answer the P versus NP question
Proving Lower Bounds to Answer the P versus NP questionguest577718
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelJunya Tanaka
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Proving Lower Bounds to Answer the P versus NP Problem
Proving Lower Bounds to Answer the P versus NP ProblemProving Lower Bounds to Answer the P versus NP Problem
Proving Lower Bounds to Answer the P versus NP Problemguest577718
 
Causality and Propensity Score Methods
Causality and Propensity Score MethodsCausality and Propensity Score Methods
Causality and Propensity Score Methodsinovex GmbH
 

Similar to DMTM Lecture 03 Regression (20)

ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Model selection
Model selectionModel selection
Model selection
 
Proving Lower Bounds to Answer the P versus NP question
Proving Lower Bounds to Answer the P versus NP questionProving Lower Bounds to Answer the P versus NP question
Proving Lower Bounds to Answer the P versus NP question
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Proving Lower Bounds to Answer the P versus NP Problem
Proving Lower Bounds to Answer the P versus NP ProblemProving Lower Bounds to Answer the P versus NP Problem
Proving Lower Bounds to Answer the P versus NP Problem
 
Causality and Propensity Score Methods
Causality and Propensity Score MethodsCausality and Propensity Score Methods
Causality and Propensity Score Methods
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionPier Luca Lanzi
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelinePier Luca Lanzi
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityPier Luca Lanzi
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationPier Luca Lanzi
 
VDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designVDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designPier Luca Lanzi
 

More from Pier Luca Lanzi (15)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 Introduction
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipeline
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with Unity
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generation
 
VDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game designVDP2016 - Lecture 13 Data driven game design
VDP2016 - Lecture 13 Data driven game design
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

DMTM Lecture 03 Regression

  • 1. Prof. Pier Luca Lanzi Regression Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi Simple Linear Regression 2
  • 3. Prof. Pier Luca Lanzi It all starts from some data …
  • 4. Prof. Pier Luca Lanzi The training data points
  • 5. Prof. Pier Luca Lanzi Can we predict the value of y from x?
  • 6. Prof. Pier Luca Lanzi Training data points with a simple linear regression model
  • 7. Prof. Pier Luca Lanzi But the model we built will not be used not on the same data …
  • 8. Prof. Pier Luca Lanzi The previous model applied to the model learned from the training data
  • 9. Prof. Pier Luca Lanzi regression = model building + model usage
  • 10. Prof. Pier Luca Lanzi How Do We Evaluate a Regression Model? • Given N examples, pairs xi yi, linear regression computes a model • So that for each point, • We evaluate the model by computing the Residual Sum of Squares (RSS) computed as, 10
  • 11. Prof. Pier Luca Lanzi The goal of linear regression is thus to find the weights that minimize RSS
  • 12. Prof. Pier Luca Lanzi How Do We Compute the Best Weights? • Approach 1 §Set the gradient of RSS(w0,w1) to zero; but the approach is infeasible in practice • Approach 2 §Apply gradient descent §If η is large, we are making large steps but might not converge, if η is small, we might be very slow. Typically, η adapts over time, e.g., η(t) = α/t or α/sqrt(t) 12
  • 13. Prof. Pier Luca Lanzi The RSS Gradient • For simple linear regression the gradient of RSS has only two components one for w0 and one for w1 13
  • 14. Prof. Pier Luca Lanzi Multiple Linear Regression 14
  • 15. Prof. Pier Luca Lanzi Input variable: LSTAT - % lower status of the population Output variable: MEDV - Median value of owner-occupied homes in $1000's
  • 16. Prof. Pier Luca Lanzi Can we predict the property value using other variables?
  • 17. Prof. Pier Luca Lanzi Multiple Linear Regression • Given a set of examples associating LSTATi values to MEDVi values, simple linear regression find a function f(.) such that • Where 𝛆i is the error to be minimized • Typically, we assume a model and fit the model into the data • With linear model we assume that f(.) is computed as • A polynomial model would fit the data points with a function, 17
  • 18. Prof. Pier Luca Lanzi Multiple Linear Regression • Given, D input variables, assumes that the output y can be computed as, • The model cost is computed using the residual sum of square, RSS(w) as, 18
  • 19. Prof. Pier Luca Lanzi Coefficient of Determination R2 • Total sum of squares • Coefficient of determination • R2 measures of how well the regression line approximates the real data points. When R2 is 1, the regression line perfectly fits the data. 19
  • 20. Prof. Pier Luca Lanzi Multiple Linear Regression: General Formulation • In general, given a set of input variables x, a set of N examples xi, yi and a set of D features hj computed from the input variables xi, multiple linear regression assumes a model, • hj(.) identify variables derived from the original inputs • hj(.) could be the squared value of an existing variable, a trigonometric function, the age given the date of birth, etc. 20
  • 21. Prof. Pier Luca Lanzi Multiple Linear Regression: General Formulation • Multiple linear regression aims at minimizing, • For this purpose, it can apply gradient descent to update the weights as, 21
  • 24. Prof. Pier Luca Lanzi Multiple Linear Regression with Gradient Descent 24
  • 25. Prof. Pier Luca Lanzi Model Evaluation 25
  • 26. Prof. Pier Luca Lanzi Model Evaluation • Models should be evaluated using data that have not been used to build the model itself • For example, would be feasible to evaluate students using exactly the same problems solved in class? • The available data must be split between training and test • Training data will be used to build the model • Test data will be used to evaluate the model performance 26
  • 27. Prof. Pier Luca Lanzi Holdout Evaluation • Reserves a certain amount for testing and uses the remainder for training §Too small training sets might result in poor weight estimation §Too small test sets might result in a poor estimation of future performance • Typically, §Reserve ½ for training and ½ for testing §Reserve 2/3 for training and 1/3 for testing • For small or “unbalanced” datasets, samples might not be representative 27
  • 28. Prof. Pier Luca Lanzi Given the original dataset, we split the data into 2/3 train and 1/3 test and then apply multiple linear regression using polynomials of increasing degree. The plot show how RSS and R2 vary.
  • 29. Prof. Pier Luca Lanzi Holdout Evaluation using the Housing Data • Given the original dataset, we split the data into 2/3 train and 1/3 test and then apply multiple linear regression using polynomials of increasing degree. • RSS initially decreases as polynomials better approximate the data but then higher degree polynomials overfit • The same is shown by the R2 statistics 29
  • 30. Prof. Pier Luca Lanzi Cross-Validation • First step §Data is split into k subsets of equal size • Second step §Each subset in turn is used for testing and the remainder for training • This is called k-fold cross-validation and avoids overlapping test sets • Often the subsets are stratified before cross-validation is performed • The error estimates are averaged to yield an overall error estimate 30
  • 31. Prof. Pier Luca Lanzi Ten-fold Crossvalidation 31 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 test train 1 2 3 4 5 6 7 8 9 10 testtrain 2 3 4 5 6 7 8 9 10 test train 1 train … … … p1 p2 p10 The final performance is computed as the average pi
  • 32. Prof. Pier Luca Lanzi Cross-Validation • Standard method for evaluation stratified ten-fold cross-validation • Why ten? Extensive experiments have shown that this is the best choice to get an accurate estimate • Stratification reduces the estimate’s variance • Even better: repeated stratified cross-validation • E.g. ten-fold cross-validation is repeated ten times and results are averaged (reduces the variance) • Other approaches appear to be robust, e.g., 5x2 crossvalidation 32
  • 33. Prof. Pier Luca Lanzi As we increase the degree of the fitting polynomial, the crossvalidation error starts to increase, because the model the model starts to overfit! Best performance for the 5th degree polynomial.
  • 34. Prof. Pier Luca Lanzi Fitting using the 5th degree polynomial
  • 35. Prof. Pier Luca Lanzi Overfitting 35
  • 36. Prof. Pier Luca Lanzi What is Overfitting? Very good performance on the training set Terrible performance on the test set
  • 37. Prof. Pier Luca Lanzi In regression, overfitting is often associated to large weights estimates Add to the usual cost (RSS) a term to penalize large weights to avoid overfitting Total cost = Measure of Fit + Magnitude of Coefficients
  • 38. Prof. Pier Luca Lanzi Ridge Regression (L2 Regularization) • Minimizes the cost function, • If α is zero, the cost is exactly the same as before; if α is infinite, then the only solution corresponds to having all the weights to 0 • In the gradient descent algorithm the update for weight j becomes, 38
  • 39. Prof. Pier Luca Lanzi Lasso Regression (L1 Regularization) • Minimizes the cost function, • If α is zero, the cost is exactly the same as before; if α is infinite, then the only solution corresponds to having all the weights to 0 • In gradient descent, weight j is modified as, 39
  • 40. Prof. Pier Luca Lanzi Example 40
  • 41. Prof. Pier Luca Lanzi Simple example data generated using a trigonometric function.
  • 42. Prof. Pier Luca Lanzi Applying multiple linear regression
  • 43. Prof. Pier Luca Lanzi Absolute value of the largest weight computed using simple linear regression with polynomials of increasing degree
  • 44. Prof. Pier Luca Lanzi Ridge Regression
  • 45. Prof. Pier Luca Lanzi Absolute value of the largest weight computed using ridge regression with polynomials of increasing degree
  • 46. Prof. Pier Luca Lanzi Applying ridge regression with polynomials of increasing degrees
  • 47. Prof. Pier Luca Lanzi Computed weight values when applying ridge regression with a polynomial of degree 10 and different values of lambda
  • 48. Prof. Pier Luca Lanzi Lasso
  • 49. Prof. Pier Luca Lanzi Absolute value of the largest weight computed using Lasso with polynomials of increasing degree
  • 50. Prof. Pier Luca Lanzi Computed weight values when applying Lasso with a polynomial of degree 10 and different values of lambda
  • 51. Prof. Pier Luca Lanzi Lasso tends to zero out less important features and produces sparser solutions Basically, by penalizing large weights it also performs feature selection
  • 52. Prof. Pier Luca Lanzi Choosing α 52
  • 53. Prof. Pier Luca Lanzi Available Data Training model building Testing model evaluation Training model building Validation select α Testing model evaluation Training & α Selection select the λ with the smallest crossvalidation error then train Testing model evaluation
  • 54. Prof. Pier Luca Lanzi Selecting the Best α • To select the best value of α we cannot use the test set since it is going to be used for evaluating the final model (which uses α) • We need to reserve part of the training data to evaluate possible candidate values of α and to select the best one • If we have enough data, we can extract a validation set from the training data which will be used to select α • If we don’t have enough data, we should select α by applying k- fold crossvalidation over the training data choosing the α corresponding to the lowest average cost over the k folds 54
  • 55. Prof. Pier Luca Lanzi Applying Lasso with a α of 0.01 with different polynomials 55
  • 56. Prof. Pier Luca Lanzi Applying Lasso with differen values of α – Best α is 0.01
  • 57. Prof. Pier Luca Lanzi Summary 57
  • 58. Prof. Pier Luca Lanzi Linear Regression Algorithms • The goal is to minimize the residual sum of squares (RSS) • Exact methods §Compute the set of weights that minimizes RSS • Gradient descent (batch and stochastic) §Start with a random set of weights and update them based on the direction that minimizes RSS • Ridge regression/Lasso §Compute the cost also using the magnitude of the coefficients §The larger the coefficients the more likely we are overfitting 58
  • 59. Prof. Pier Luca Lanzi Assignments • Check the Python notebooks discussing simple and multiple linear regression, Lasso and Ridge Regression 59