SlideShare a Scribd company logo
1 of 6
Download to read offline
0
Avoid Overfitting with Regularization
By
Ahmed Fawzy Gad
Information Technology (IT) Department
Faculty of Computers and Information (FCI)
Menoufia University
Egypt
ahmed.fawzy@ci.menofia.edu.eg
14-Jan-2018
‫المنوفية‬ ‫جامعة‬
‫والمعلومات‬ ‫الحاسبات‬ ‫كلية‬
MENOUFIA UNIVERSITY
FACULTY OF COMPUTERS
AND INFORMATION ‫المنوفية‬ ‫جامعة‬
1
Have you ever created a machine learning model that is perfect for the training samples but gives very bad predictions
with unseen samples! Did you ever think why this happens? This article explains overfitting which is one of the reasons
for poor predictions for unseen samples. Also, regularization technique based on regression is presented by simple steps
to make it clear how to avoid overfitting.
The focus of machine learning (ML) is to train an algorithm with training data in order create a model that is able to make
the correct predictions for unseen data (test data). To create a classifier, for example, a human expert will start by
collecting the data required to train the ML algorithm. The human is responsible for finding the best types of features to
represent each class which is capable of discriminating between the different classes. Such features will be used to train
the ML algorithm. Suppose we are to build a ML model that classifies images as containing cats or not using the following
training data.
The first question we have to answer is “what are the best features to use?”. This is a critical question in ML as the better
the used features the better the predictions the trained ML model makes and vice versa. Let us try to visualize such images
and extract some features that are representative of cats. Some of the representative features may be the existence of
two dark eye pupils and two ears with a diagonal direction. Assuming that we extracted such features, somehow, from
the above training images and a trained ML model is created. Such model can work with a wide range of cat images
because the used features are existing in most of the cats. We can test the model using someunseen data as the following.
Assuming that the classification accuracy of the test data is x%.
One may want to increase the classification accuracy. The first thing to think of is by using more features than the two
ones used previously. This is because the more discriminative features to use, the better the accuracy. By inspecting the
training data again, we can find more features such as the overall image color as all training cat samples are white and the
eye irises color as the training data has a yellow iris color. The feature vector will have the 4 features shown below. They
will be used to retrain the ML model.
Feature Dark Eye Pupils Diagonal Ears White Cat Color Yellow Eye Irises
After creating the trained model next is to test it. The expected result after using the new feature vector is that the
classification accuracy will decrease to be less than x%. But why? The cause of accuracy drop is using some features that
are already existing in the training data but not existing generally in all cat images. The features are not general across all
cat images. All used training images have a while image color and a yellow eye irises but they are generalized to all cats.
In the testing data, some cats have a black or yellow color which is not white as used in training. Some cats have not the
irises color yellow.
2
Our case in which the used features are powerful for the training samples but very poor for the testing samples is known
as overfitting. Themodel is trained with some features thatare exclusive to the training data but not existing in the testing
data.
The goal of the previous discussion is to make the idea of overfitting simple by a high-level example. To get into the details
itis preferableto work with a simpler example. Thatis why therest of thediscussion will bebased on a regression example.
Understand Regularization based on a Regression Example
Assume we want to create a regression model that fits the data shown below. We can use polynomial regression.
The simplest model that we can start with is the linear model with a first-degree polynomial equation:
y1 = f1(x) = Θ1x + Θ0
Where Θ0 and Θ1 are the model parameters & 𝑥 is the only feature used.
The plot of the previous model is shown below:
Based on a loss function such as the one shown below, we can conclude that the model is not fitting the data well.
L =
∑ |f1(x 𝑖) − d𝑖|𝑁
𝑖=0
𝑁
Where f𝑖(x 𝑖) is the expected output for sample 𝑖 and d𝑖 is the desired output for the same sample.
The model is too simple and there are many predictions that are not accurate. For such reason, we should create a more
complex model that can fit the data well. For such reason, we can increase the degree of the equation from one to two.
It will be as follows:
y2 = f1(x) = Θ2x2
+ Θ1x + Θ0
By using the same feature x after being raised to power 2 (x2
), we created a new feature and we will not only capture the
linear properties of the data, but also some non-linear properties. The graph of the new model will be as follows:
3
The graph shows that the second degree polynomial fits the data better than the first degree. But also the quadratic
equation does not fit well some of the data samples. This is why we can create a more complex model of the third degree
with the following equation:
y3 = f3(x) = Θ3x3
+ Θ2x2
+ Θ1x + Θ0
The graph will be as follows:
It is noted that the model fits the data better after adding a new feature that capturing the data properties of the third
degree. To fit the data better than before, we can increase the degree of the equation to be of the fourth degree as in the
following equation:
y4 = f4(x) = Θ4x4
+ Θ3x3
+ Θ2x2
+ Θ1x + Θ0
The graph will be as follows:
It seems that the higher the degree of the polynomial equation the better it fits the data. But there are some important
questions to be answered. If increasing the degree of the polynomial equation by adding new features enhances the
results, so why not using a very high degree such as 100th
degree? What is the best degree to be used for a problem?
Model Capacity/Complexity
There is a term called model capacity or complexity. Model capacity/complexity refers to the level of variation that the
model can work with. The higher the capacity the more variation the model can cope with. The first model y1 is said to be
of a small capacity compared to y4. In our case, the capacity increases by increasing the polynomial degree.
For sure the higher the degree of the polynomial equation the more fit it will be for the data. But remember that increasing
the polynomial degree increases the complexity of the model. Using a model with a capacity higher than required may
lead to overfitting. The model becomes very complex and fits the training data very well but unfortunately, it is a very
weak for unseen data. The goal of ML is not only creating a model that is robust with the training data but also to the
unseen data samples.
The model of the fourth degree (y4) is very complex. Yes, it fits the seen data well but it will not for unseen data. For such
case, the newly used feature in y4 which is 𝑥4
captures more details than required. Because that new feature makes the
model too complex, we should get rid of it.
In this example, we actually know which features to remove. So, we can remove it and return back to the previous model
of the third-degree (Θ4x4
+ Θ3x3
+ Θ2x2
+ Θ1x + Θ0). But in actual work, we do not know which features to remove.
4
Moreover, assume that the new feature is not too bad and we do not want to completely remove it and just want to
penalize it. What should we do?
Looking back at the loss function, the only goal is to minimize/penalize the prediction error. We can set a new objective
to minimize/penalize the effect of the new feature 𝑥4
as much as possible. After modifying the loss function to penalize
x3, it will be as follows:
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + Θ4x4
]
𝑁
Our objective now is to minimize the loss function. We are now just interested in minimizing this term Θ4x4
. It is obvious
that to minimize Θ4x4
we should minimize Θ4 as it is the only free parameter we can change. We can set its value to a
value equal to zero if we want to remove that feature completely in case it is very bad one as shown below:
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + 0 ∗ x4
]
𝑁
By removing it, we go back to the third-degree polynomial equation (y3). y3 does not fit the seen data perfectly as in y4
but generally, it will have a better performance for unseen data than y4.
But in case it x4
is a relatively good feature and we just want to penalize it but not to remove it completely, we can set it
to a value close to zero but not to zero (say 0.1) as shown next. By doing that, we limit the effect of x4. As a result, the
new model will not be complex as before.
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + 0.1 ∗ x4
]
𝑁
Going back to y2, it seems that it is the simpler than y3. It can work well with both seen and unseen data samples. So, we
should remove the new feature used in y3 which is x3 or just penalize it if it relatively does well. We can modify the loss
function to do that.
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + 0.1 ∗ x4 + Θ3x3]
𝑁
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + 0.1 ∗ x4 + 0.04 ∗ x3]
𝑁
Regularization
Note that we actually knew that y2 is the best model to fit the data because the data graph is available for us. It is a very
simple task that we can solve manually. But if such information is not available for us and as the number of samples and
data complexity increases, we will not be able to reach such conclusions easily. There must be something automatic to
tell us which degree will fit the data and tell us which features to penalize to get the best predictions for unseen data. This
is regularization.
Regularization helps us to select the model complexity to fit the data. It is useful to automatically penalize features that
make the model too complex. Remember that regularization is useful if the features are not bad and relatively helps us to
get good predictions and we just need to penalize but not to remove them completely. Regularization penalizes all used
features, not a selected subset. Previously, we penalized just two features x4
and x3
not all features. But it is not the case
with regularization.
Using regularization, a new term is added to the loss function to penalize the features so the loss function will be as
follows:
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + ∑ λΘ𝑗
𝑁
𝑗=1 ]
𝑁
It can also be written as follows after moving Λ outside the summation:
5
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + λ ∑ Θ𝑗
𝑁
𝑗=1 ]
𝑁
The newly added term λ ∑ Θ𝑗
𝑁
𝑗=1 is used to penalize the features to control the level of model complexity. Our previous
goal before adding the regularization term is to minimize the prediction error as much as possible. Now our goal is to
minimize the error but to be careful of not making the model too complex and avoids overfitting.
There is a regularization parameter called lambda (λ) which controls how to penalize the features. It is a hyperparameter
with no fixed value. Its value is variable based on the task at hand. As its value increases as there will be high penalization
for the features. As a result, the model becomes simpler. When its values decrease there will be a low penalization of the
features and thus the model complexity increases. A value of zero means no removal of features at all.
When λ is zero, then the values of Θ𝑗 will not be penalized at all as shown in the next equation. This is because setting λ
to zero means the removal of the regularization term and just leaving the error term. So, our objective will return back to
just minimize the error to be close to zero. When error minimization is the objective, the model may overfit.
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + 0 ∗ ∑ Θ𝑗
𝑁
𝑗=1 ]
𝑁
L 𝑛𝑒𝑤 =
[∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0 + 0]
𝑁
L 𝑛𝑒𝑤 =
∑ |f4(x 𝑖) − d𝑖|𝑁
𝑖=0
𝑁
But when the value of the penalization parameter λ is very high (say 109), then there must be a very high penalization for
the parameters Θ𝑗 in order to keep the loss at its minimum value. As a result, the parameters Θ𝑗 will be zeros. As a result,
the model (y4) will have its Θ𝑖 pruned as shown below.
y4 = f4(x) = Θ4x4
+ Θ3x3
+ Θ2x2
+ Θ1x + Θ0
y4 = 0 ∗ x4 + 0 ∗ x3 + 0 ∗ x2 + 0 ∗ x + Θ0
y4 = Θ0
Please note that the regularization term starts its index 𝑗 from 1 not zero. Actually, we use the regularization term to
penalize features (x 𝑖). Because Θ0 has not associated feature, then there is no reason to penalize it. In such case, the
model will be y4 = Θ0 with the following graph:

More Related Content

What's hot

Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)Sanjay Saha
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)cairo university
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Bayesian classification
Bayesian classificationBayesian classification
Bayesian classificationManu Chandel
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regressionkishanthkumaar
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
07 regularization
07 regularization07 regularization
07 regularizationRonald Teo
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 

What's hot (20)

Regularization
RegularizationRegularization
Regularization
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Perceptron
PerceptronPerceptron
Perceptron
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Bayesian classification
Bayesian classificationBayesian classification
Bayesian classification
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
AlexNet
AlexNetAlexNet
AlexNet
 
AlexNet.pptx
AlexNet.pptxAlexNet.pptx
AlexNet.pptx
 
07 regularization
07 regularization07 regularization
07 regularization
 
Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 

Similar to Avoid Overfitting with Regularization

Chapter3 hundred page machine learning
Chapter3 hundred page machine learningChapter3 hundred page machine learning
Chapter3 hundred page machine learningmustafa sarac
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep DiveSara Hooker
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxMohamed Essam
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in aiRobert Antony
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeOsama Ghandour Geris
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxMohamed Essam
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfAaryanArora10
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Hayim Makabee
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms Notes01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms NotesAndres Mendez-Vazquez
 

Similar to Avoid Overfitting with Regularization (20)

Chapter3 hundred page machine learning
Chapter3 hundred page machine learningChapter3 hundred page machine learning
Chapter3 hundred page machine learning
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in ai
 
Algorithm
AlgorithmAlgorithm
Algorithm
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Getting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptxGetting_Started_with_DL_in_Keras.pptx
Getting_Started_with_DL_in_Keras.pptx
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms Notes01 Notes Introduction Analysis of Algorithms Notes
01 Notes Introduction Analysis of Algorithms Notes
 
Regresión
RegresiónRegresión
Regresión
 

More from Ahmed Gad

ICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic AlgorithmICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic AlgorithmAhmed Gad
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...Ahmed Gad
 
Python for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionPython for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionAhmed Gad
 
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...Ahmed Gad
 
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded ScenesM.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded ScenesAhmed Gad
 
Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...Ahmed Gad
 
Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)Ahmed Gad
 
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Ahmed Gad
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleAhmed Gad
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisAhmed Gad
 
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-StepBackpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-StepAhmed Gad
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientAhmed Gad
 
Python for Computer Vision - Revision
Python for Computer Vision - RevisionPython for Computer Vision - Revision
Python for Computer Vision - RevisionAhmed Gad
 
Anime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia CourseAnime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia CourseAhmed Gad
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsAhmed Gad
 
Operations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleOperations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleAhmed Gad
 
MATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and TrackingMATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and TrackingAhmed Gad
 
MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...Ahmed Gad
 
Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...Ahmed Gad
 
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...Ahmed Gad
 

More from Ahmed Gad (20)

ICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic AlgorithmICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic Algorithm
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
Python for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionPython for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd Edition
 
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
 
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded ScenesM.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
 
Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...
 
Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)
 
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step Example
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-StepBackpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
 
Python for Computer Vision - Revision
Python for Computer Vision - RevisionPython for Computer Vision - Revision
Python for Computer Vision - Revision
 
Anime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia CourseAnime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia Course
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNs
 
Operations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleOperations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by Example
 
MATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and TrackingMATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and Tracking
 
MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...
 
Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...
 
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
 

Recently uploaded

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 

Recently uploaded (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 

Avoid Overfitting with Regularization

  • 1. 0 Avoid Overfitting with Regularization By Ahmed Fawzy Gad Information Technology (IT) Department Faculty of Computers and Information (FCI) Menoufia University Egypt ahmed.fawzy@ci.menofia.edu.eg 14-Jan-2018 ‫المنوفية‬ ‫جامعة‬ ‫والمعلومات‬ ‫الحاسبات‬ ‫كلية‬ MENOUFIA UNIVERSITY FACULTY OF COMPUTERS AND INFORMATION ‫المنوفية‬ ‫جامعة‬
  • 2. 1 Have you ever created a machine learning model that is perfect for the training samples but gives very bad predictions with unseen samples! Did you ever think why this happens? This article explains overfitting which is one of the reasons for poor predictions for unseen samples. Also, regularization technique based on regression is presented by simple steps to make it clear how to avoid overfitting. The focus of machine learning (ML) is to train an algorithm with training data in order create a model that is able to make the correct predictions for unseen data (test data). To create a classifier, for example, a human expert will start by collecting the data required to train the ML algorithm. The human is responsible for finding the best types of features to represent each class which is capable of discriminating between the different classes. Such features will be used to train the ML algorithm. Suppose we are to build a ML model that classifies images as containing cats or not using the following training data. The first question we have to answer is “what are the best features to use?”. This is a critical question in ML as the better the used features the better the predictions the trained ML model makes and vice versa. Let us try to visualize such images and extract some features that are representative of cats. Some of the representative features may be the existence of two dark eye pupils and two ears with a diagonal direction. Assuming that we extracted such features, somehow, from the above training images and a trained ML model is created. Such model can work with a wide range of cat images because the used features are existing in most of the cats. We can test the model using someunseen data as the following. Assuming that the classification accuracy of the test data is x%. One may want to increase the classification accuracy. The first thing to think of is by using more features than the two ones used previously. This is because the more discriminative features to use, the better the accuracy. By inspecting the training data again, we can find more features such as the overall image color as all training cat samples are white and the eye irises color as the training data has a yellow iris color. The feature vector will have the 4 features shown below. They will be used to retrain the ML model. Feature Dark Eye Pupils Diagonal Ears White Cat Color Yellow Eye Irises After creating the trained model next is to test it. The expected result after using the new feature vector is that the classification accuracy will decrease to be less than x%. But why? The cause of accuracy drop is using some features that are already existing in the training data but not existing generally in all cat images. The features are not general across all cat images. All used training images have a while image color and a yellow eye irises but they are generalized to all cats. In the testing data, some cats have a black or yellow color which is not white as used in training. Some cats have not the irises color yellow.
  • 3. 2 Our case in which the used features are powerful for the training samples but very poor for the testing samples is known as overfitting. Themodel is trained with some features thatare exclusive to the training data but not existing in the testing data. The goal of the previous discussion is to make the idea of overfitting simple by a high-level example. To get into the details itis preferableto work with a simpler example. Thatis why therest of thediscussion will bebased on a regression example. Understand Regularization based on a Regression Example Assume we want to create a regression model that fits the data shown below. We can use polynomial regression. The simplest model that we can start with is the linear model with a first-degree polynomial equation: y1 = f1(x) = Θ1x + Θ0 Where Θ0 and Θ1 are the model parameters & 𝑥 is the only feature used. The plot of the previous model is shown below: Based on a loss function such as the one shown below, we can conclude that the model is not fitting the data well. L = ∑ |f1(x 𝑖) − d𝑖|𝑁 𝑖=0 𝑁 Where f𝑖(x 𝑖) is the expected output for sample 𝑖 and d𝑖 is the desired output for the same sample. The model is too simple and there are many predictions that are not accurate. For such reason, we should create a more complex model that can fit the data well. For such reason, we can increase the degree of the equation from one to two. It will be as follows: y2 = f1(x) = Θ2x2 + Θ1x + Θ0 By using the same feature x after being raised to power 2 (x2 ), we created a new feature and we will not only capture the linear properties of the data, but also some non-linear properties. The graph of the new model will be as follows:
  • 4. 3 The graph shows that the second degree polynomial fits the data better than the first degree. But also the quadratic equation does not fit well some of the data samples. This is why we can create a more complex model of the third degree with the following equation: y3 = f3(x) = Θ3x3 + Θ2x2 + Θ1x + Θ0 The graph will be as follows: It is noted that the model fits the data better after adding a new feature that capturing the data properties of the third degree. To fit the data better than before, we can increase the degree of the equation to be of the fourth degree as in the following equation: y4 = f4(x) = Θ4x4 + Θ3x3 + Θ2x2 + Θ1x + Θ0 The graph will be as follows: It seems that the higher the degree of the polynomial equation the better it fits the data. But there are some important questions to be answered. If increasing the degree of the polynomial equation by adding new features enhances the results, so why not using a very high degree such as 100th degree? What is the best degree to be used for a problem? Model Capacity/Complexity There is a term called model capacity or complexity. Model capacity/complexity refers to the level of variation that the model can work with. The higher the capacity the more variation the model can cope with. The first model y1 is said to be of a small capacity compared to y4. In our case, the capacity increases by increasing the polynomial degree. For sure the higher the degree of the polynomial equation the more fit it will be for the data. But remember that increasing the polynomial degree increases the complexity of the model. Using a model with a capacity higher than required may lead to overfitting. The model becomes very complex and fits the training data very well but unfortunately, it is a very weak for unseen data. The goal of ML is not only creating a model that is robust with the training data but also to the unseen data samples. The model of the fourth degree (y4) is very complex. Yes, it fits the seen data well but it will not for unseen data. For such case, the newly used feature in y4 which is 𝑥4 captures more details than required. Because that new feature makes the model too complex, we should get rid of it. In this example, we actually know which features to remove. So, we can remove it and return back to the previous model of the third-degree (Θ4x4 + Θ3x3 + Θ2x2 + Θ1x + Θ0). But in actual work, we do not know which features to remove.
  • 5. 4 Moreover, assume that the new feature is not too bad and we do not want to completely remove it and just want to penalize it. What should we do? Looking back at the loss function, the only goal is to minimize/penalize the prediction error. We can set a new objective to minimize/penalize the effect of the new feature 𝑥4 as much as possible. After modifying the loss function to penalize x3, it will be as follows: L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + Θ4x4 ] 𝑁 Our objective now is to minimize the loss function. We are now just interested in minimizing this term Θ4x4 . It is obvious that to minimize Θ4x4 we should minimize Θ4 as it is the only free parameter we can change. We can set its value to a value equal to zero if we want to remove that feature completely in case it is very bad one as shown below: L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + 0 ∗ x4 ] 𝑁 By removing it, we go back to the third-degree polynomial equation (y3). y3 does not fit the seen data perfectly as in y4 but generally, it will have a better performance for unseen data than y4. But in case it x4 is a relatively good feature and we just want to penalize it but not to remove it completely, we can set it to a value close to zero but not to zero (say 0.1) as shown next. By doing that, we limit the effect of x4. As a result, the new model will not be complex as before. L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + 0.1 ∗ x4 ] 𝑁 Going back to y2, it seems that it is the simpler than y3. It can work well with both seen and unseen data samples. So, we should remove the new feature used in y3 which is x3 or just penalize it if it relatively does well. We can modify the loss function to do that. L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + 0.1 ∗ x4 + Θ3x3] 𝑁 L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + 0.1 ∗ x4 + 0.04 ∗ x3] 𝑁 Regularization Note that we actually knew that y2 is the best model to fit the data because the data graph is available for us. It is a very simple task that we can solve manually. But if such information is not available for us and as the number of samples and data complexity increases, we will not be able to reach such conclusions easily. There must be something automatic to tell us which degree will fit the data and tell us which features to penalize to get the best predictions for unseen data. This is regularization. Regularization helps us to select the model complexity to fit the data. It is useful to automatically penalize features that make the model too complex. Remember that regularization is useful if the features are not bad and relatively helps us to get good predictions and we just need to penalize but not to remove them completely. Regularization penalizes all used features, not a selected subset. Previously, we penalized just two features x4 and x3 not all features. But it is not the case with regularization. Using regularization, a new term is added to the loss function to penalize the features so the loss function will be as follows: L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + ∑ λΘ𝑗 𝑁 𝑗=1 ] 𝑁 It can also be written as follows after moving Λ outside the summation:
  • 6. 5 L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + λ ∑ Θ𝑗 𝑁 𝑗=1 ] 𝑁 The newly added term λ ∑ Θ𝑗 𝑁 𝑗=1 is used to penalize the features to control the level of model complexity. Our previous goal before adding the regularization term is to minimize the prediction error as much as possible. Now our goal is to minimize the error but to be careful of not making the model too complex and avoids overfitting. There is a regularization parameter called lambda (λ) which controls how to penalize the features. It is a hyperparameter with no fixed value. Its value is variable based on the task at hand. As its value increases as there will be high penalization for the features. As a result, the model becomes simpler. When its values decrease there will be a low penalization of the features and thus the model complexity increases. A value of zero means no removal of features at all. When λ is zero, then the values of Θ𝑗 will not be penalized at all as shown in the next equation. This is because setting λ to zero means the removal of the regularization term and just leaving the error term. So, our objective will return back to just minimize the error to be close to zero. When error minimization is the objective, the model may overfit. L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + 0 ∗ ∑ Θ𝑗 𝑁 𝑗=1 ] 𝑁 L 𝑛𝑒𝑤 = [∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 + 0] 𝑁 L 𝑛𝑒𝑤 = ∑ |f4(x 𝑖) − d𝑖|𝑁 𝑖=0 𝑁 But when the value of the penalization parameter λ is very high (say 109), then there must be a very high penalization for the parameters Θ𝑗 in order to keep the loss at its minimum value. As a result, the parameters Θ𝑗 will be zeros. As a result, the model (y4) will have its Θ𝑖 pruned as shown below. y4 = f4(x) = Θ4x4 + Θ3x3 + Θ2x2 + Θ1x + Θ0 y4 = 0 ∗ x4 + 0 ∗ x3 + 0 ∗ x2 + 0 ∗ x + Θ0 y4 = Θ0 Please note that the regularization term starts its index 𝑗 from 1 not zero. Actually, we use the regularization term to penalize features (x 𝑖). Because Θ0 has not associated feature, then there is no reason to penalize it. In such case, the model will be y4 = Θ0 with the following graph: