SlideShare a Scribd company logo
1 of 23
Download to read offline
Presented By: Aayush Srivastava
& Divyank Saxena
Methods of
Optimization in
Machine Learning
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Join the session 5 minutes prior to
the session start time. We start on
time and conclude on time!
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your mobile devices in silent
mode, feel free to move out of
session in case you need to attend
an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during
the session.
Our Agenda
01 What is Optimization in
Machine Learning
02 What is Gradient Descent
03
What is Minibatch Stochastic
Gradient
04
What is Adam optimization
05
Demo
05
06
What is Stochastic Gradient
Descent
.
What is Optimization in ML
● Optimization in Machine Learning is a technique used to find the best set of parameters for a given
model to minimize a loss function and improve its performance. It is an essential step in the training
process of a machine learning model.
● The goal of optimization is to find the best weights and biases for the model, so that it can make
accurate predictions.
● Optimization is used in machine learning because models typically have many parameters, and finding
the best values for those parameters can be a challenging task.
● With optimization techniques, the model can automatically search for the best parameters, rather than
relying on manual tuning by the user.
.
What is Cost Function
● A cost function is a function which measures the error between predictions and their actual values
across the whole dataset.
● Minimizing the cost function helps the learning algorithm find the optimal set of parameters, such as
weights and biases, that produce the best predictions.
● Cost function is a measure of how wrong the model is in estimating the relationship between X(input)
and Y(output) Parameter
- m is the number of samples
- Sum from i to m,
- The actual calculation is just the hypothesis value for h(x)
minus the actual value of y. Then you square whatever you get.
.
What is Cost Function
● Let’s run through the calculation for best_fit_1.
1.The hypothesis is 0.50. This is the h_the ha(x(i)) part
what we think is the correct value.
2.The actual value for the sample data is 1.00.
So we are left with (0.50 — 1.00)^2 , which is 0.25.
3.Let’s add this result to an array called results and do the same for all three points
4.Results = [0.25, 2.25, 4.00]
5.Finally, we add them all up and multiply by ⅙ .We get the cost for best_fit1 = 1.083
.
What is Cost Function
● COST: best_fit_1: 1.083
best_fit_2: 0.083
best_fit_3: 0.25
● A low costs represents a smaller difference.
.
What is Loss Function
● A loss function, also known objective function, is a mathematical measure of how well a model is able
to make predictions that match the true values.
● A loss function measures the error between a single prediction and the corresponding actual value.
● Loss and cost functions are methods of measuring the error in machine learning predictions. Loss
functions measure the error per observation, whilst cost functions measure the error over all
observations.
Types:
1.Mean Squared Error (MSE): This loss function measures the average squared difference between the
predicted values and the true values.
2.Mean Absolute Error (MAE): This loss function measures the average absolute difference between the
predicted values and the true values.
● Gradient, in plain terms means slope or slant of a surface. So gradient descent literally means
descending a slope to reach the lowest point on that surface
● Gradient descent enables a model to learn the gradient or direction that the model should take in
order to reduce errors (differences between actual y and predicted y).
● This algorithm that tries to find a minimum of a function iteratively
What is Gradient Descent
.
What is Learning Rate
● Learning Rate:
The learning rate is a hyperparameter in machine learning that determines the step size at which the
optimization algorithm updates the model's parameters. It is used to control the speed at which the
model learns.
.
Limitation of Gradient Descent
● Some limitations and drawbacks that can affect its performance and efficiency.
● Local Minima: Gradient Descent can get stuck in a local minimum, which may not be the global
minimum, and therefore, the optimization will not produce the best result.
● Vanishing gradient: When training deep neural networks, the gradients can become very small,
leading to the vanishing gradient problem, which can slow down or prevent convergence.
● Stochastic Gradient Descent (SGD) is a variant of Gradient Descent optimization algorithm, that is
used to update the parameters of a model in a more efficient and faster way.
● “Stochastic” in plain terms means “random”
● In SGD, at each step, the algorithm calculates the gradient for one observation picked at random,
instead of calculating the gradient for the entire dataset..
● So, let’s have a dataset that contains 1000 rows, and when we apply SGD it will update the model
parameters 1000 times in one complete cycle of a dataset instead of one time as in Gradient Descent.
What is Stochastic Gradient Descent
● In the left diagram of the above picture, we have SGD (where 1 per step time) we take a Gradient
Descent step for each example and on the right diagram is GD(1 step per entire training set).
● This represents a significant performance improvement, when the dataset contains millions of
observations.
What is Stochastic Gradient Descent
Advantages of Stochastic Gradient Descent
● It is easier to fit into memory due to a single training sample being processed by the network
● For larger datasets it can converge faster as it causes updates to the parameters more frequently
● Due to frequent updates the steps taken towards the minima of the loss function have oscillations
which can help getting out of local minimums of the loss function
What is Stochastic Gradient Descent
● So far we encountered two extremes in the approach to gradient-based learning:
● First Gradient Descent uses the full dataset to compute gradients and to update parameters, one
pass at a time. And Conversely, Stochastic Gradient Descent processes one training example at a
time to make progress. Either of them has its own drawbacks.
● Gradient descent is not particularly data efficient whenever data is very similar. Stochastic gradient
descent is not particularly computationally efficient since CPUs and GPUs cannot exploit the full
power of vectorization.
● This suggests that there might be something in between, and in fact, that is what we have been using
so far in the examples we discussed.
What is Minibatch Stochastic Gradient
● Mini Batch Gradient Descent is considered to be the cross-over between GD and SGD. In this
approach instead of iterating through the entire dataset or one observation, we split the dataset into
small subsets (batches) and compute the gradients for each batch.
● Steps involved in Mini-batch stochastic gradient:
1. Pick a mini-batch
2. Feed it to Neural Network
3. Calculate the mean gradient of the mini-batch
4. Use the mean gradient we calculated in step 3 to update the weights
5. Repeat steps 1–4 for the mini-batches we created
What is Minibatch Stochastic Gradient
● Minibatch stochastic gradient descent is able to trade-off convergence speed and computation
efficiency. A minibatch size of 10 is more efficient than stochastic gradient descent; a minibatch size
of 100 even outperforms GD in terms of runtime.
What is Minibatch Stochastic Gradient
Advantages of Mini-Batch Gradient Descent:
● Reduces variance of the parameter update and hence lead to stable convergence
● Speeds the learning
● Helpful to estimate the approximate location of the actual minimum
Disadvantages of Mini Batch Gradient Descent:
● Loss is computed for each mini batch and hence total loss needs to be accumulated across all mini
batches
Advantages and Disadvantages
The Adam optimization algorithm is an extension to stochastic gradient descent that has recently
seen broader adoption for deep learning applications in computer vision and natural language
processing.
The method is really efficient when working with large problem involving a lot of data or parameters.
Adam is an adaptive learning rate method, which means, it computes individual learning rates for
different parameters. Its name is derived from adaptive moment estimation
What is Adam Optimizer
The method computes individual adaptive learning rates for different parameters from estimates of
first and second moments of the gradients.
Adam optimizer involves a combination of two gradient descent methodologies:
1. Momentum:
This algorithm is used to accelerate the gradient descent algorithm by taking into consideration
the ‘exponentially weighted average’ of the gradients. Using averages makes the algorithm
converge towards the minima in a faster pace.
2. Root Mean Square Propagation (RMSP):
It maintains per-parameter learning rates that are adapted based on the average of recent
magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the
algorithm does well on online and non-stationary problems (e.g. noisy).
How Adam Optimizer Work
List of attractive benefits of using Adam, as follows:
● Straightforward to implement.
● Computationally efficient.
● Less memory requirements.
● Well suited for problems that are large in terms of data and/or parameters.
● Appropriate for problems with very noisy/or sparse gradients.
● Hyper-parameters have intuitive interpretation and typically require little tuning.
Benefits of Adam Optimizer
Demo
Thank You !
Get in touch with us:
Lorem Studio, Lord Building
D4456, LA, USA

More Related Content

What's hot

Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Model evaluation - machine learning
Model evaluation - machine learningModel evaluation - machine learning
Model evaluation - machine learningSon Phan
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionChristian Perone
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selectionchenhm
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsYoung-Geun Choi
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningFrancesco Casalegno
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfittingSivapriyaS12
 

What's hot (20)

Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Model evaluation - machine learning
Model evaluation - machine learningModel evaluation - machine learning
Model evaluation - machine learning
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Activation function
Activation functionActivation function
Activation function
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selection
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 

Similar to Methods of Optimization in Machine Learning

4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptxkumarkaushal17
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningVenkata Karthik Gullapalli
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfAaryanArora10
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
Everything You Wanted to Know About Optimization
Everything You Wanted to Know About OptimizationEverything You Wanted to Know About Optimization
Everything You Wanted to Know About Optimizationindico data
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdfjuan631
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptxMariaBurgos55
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Wuhyun Rico Shin
 
Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Arjun Varma
 

Similar to Methods of Optimization in Machine Learning (20)

4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Regresión
RegresiónRegresión
Regresión
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Dnn guidelines
Dnn guidelinesDnn guidelines
Dnn guidelines
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Everything You Wanted to Know About Optimization
Everything You Wanted to Know About OptimizationEverything You Wanted to Know About Optimization
Everything You Wanted to Know About Optimization
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptx
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
 
Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)
 

More from Knoldus Inc.

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingKnoldus Inc.
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionKnoldus Inc.
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxKnoldus Inc.
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptxKnoldus Inc.
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfKnoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxKnoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesKnoldus Inc.
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxKnoldus Inc.
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 

More from Knoldus Inc. (20)

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Methods of Optimization in Machine Learning

  • 1. Presented By: Aayush Srivastava & Divyank Saxena Methods of Optimization in Machine Learning
  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Join the session 5 minutes prior to the session start time. We start on time and conclude on time! Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call. Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3. Our Agenda 01 What is Optimization in Machine Learning 02 What is Gradient Descent 03 What is Minibatch Stochastic Gradient 04 What is Adam optimization 05 Demo 05 06 What is Stochastic Gradient Descent
  • 4. . What is Optimization in ML ● Optimization in Machine Learning is a technique used to find the best set of parameters for a given model to minimize a loss function and improve its performance. It is an essential step in the training process of a machine learning model. ● The goal of optimization is to find the best weights and biases for the model, so that it can make accurate predictions. ● Optimization is used in machine learning because models typically have many parameters, and finding the best values for those parameters can be a challenging task. ● With optimization techniques, the model can automatically search for the best parameters, rather than relying on manual tuning by the user.
  • 5. . What is Cost Function ● A cost function is a function which measures the error between predictions and their actual values across the whole dataset. ● Minimizing the cost function helps the learning algorithm find the optimal set of parameters, such as weights and biases, that produce the best predictions. ● Cost function is a measure of how wrong the model is in estimating the relationship between X(input) and Y(output) Parameter - m is the number of samples - Sum from i to m, - The actual calculation is just the hypothesis value for h(x) minus the actual value of y. Then you square whatever you get.
  • 6. . What is Cost Function ● Let’s run through the calculation for best_fit_1. 1.The hypothesis is 0.50. This is the h_the ha(x(i)) part what we think is the correct value. 2.The actual value for the sample data is 1.00. So we are left with (0.50 — 1.00)^2 , which is 0.25. 3.Let’s add this result to an array called results and do the same for all three points 4.Results = [0.25, 2.25, 4.00] 5.Finally, we add them all up and multiply by ⅙ .We get the cost for best_fit1 = 1.083
  • 7. . What is Cost Function ● COST: best_fit_1: 1.083 best_fit_2: 0.083 best_fit_3: 0.25 ● A low costs represents a smaller difference.
  • 8. . What is Loss Function ● A loss function, also known objective function, is a mathematical measure of how well a model is able to make predictions that match the true values. ● A loss function measures the error between a single prediction and the corresponding actual value. ● Loss and cost functions are methods of measuring the error in machine learning predictions. Loss functions measure the error per observation, whilst cost functions measure the error over all observations. Types: 1.Mean Squared Error (MSE): This loss function measures the average squared difference between the predicted values and the true values. 2.Mean Absolute Error (MAE): This loss function measures the average absolute difference between the predicted values and the true values.
  • 9. ● Gradient, in plain terms means slope or slant of a surface. So gradient descent literally means descending a slope to reach the lowest point on that surface ● Gradient descent enables a model to learn the gradient or direction that the model should take in order to reduce errors (differences between actual y and predicted y). ● This algorithm that tries to find a minimum of a function iteratively What is Gradient Descent
  • 10. . What is Learning Rate ● Learning Rate: The learning rate is a hyperparameter in machine learning that determines the step size at which the optimization algorithm updates the model's parameters. It is used to control the speed at which the model learns.
  • 11. . Limitation of Gradient Descent ● Some limitations and drawbacks that can affect its performance and efficiency. ● Local Minima: Gradient Descent can get stuck in a local minimum, which may not be the global minimum, and therefore, the optimization will not produce the best result. ● Vanishing gradient: When training deep neural networks, the gradients can become very small, leading to the vanishing gradient problem, which can slow down or prevent convergence.
  • 12. ● Stochastic Gradient Descent (SGD) is a variant of Gradient Descent optimization algorithm, that is used to update the parameters of a model in a more efficient and faster way. ● “Stochastic” in plain terms means “random” ● In SGD, at each step, the algorithm calculates the gradient for one observation picked at random, instead of calculating the gradient for the entire dataset.. ● So, let’s have a dataset that contains 1000 rows, and when we apply SGD it will update the model parameters 1000 times in one complete cycle of a dataset instead of one time as in Gradient Descent. What is Stochastic Gradient Descent
  • 13. ● In the left diagram of the above picture, we have SGD (where 1 per step time) we take a Gradient Descent step for each example and on the right diagram is GD(1 step per entire training set). ● This represents a significant performance improvement, when the dataset contains millions of observations. What is Stochastic Gradient Descent
  • 14. Advantages of Stochastic Gradient Descent ● It is easier to fit into memory due to a single training sample being processed by the network ● For larger datasets it can converge faster as it causes updates to the parameters more frequently ● Due to frequent updates the steps taken towards the minima of the loss function have oscillations which can help getting out of local minimums of the loss function What is Stochastic Gradient Descent
  • 15. ● So far we encountered two extremes in the approach to gradient-based learning: ● First Gradient Descent uses the full dataset to compute gradients and to update parameters, one pass at a time. And Conversely, Stochastic Gradient Descent processes one training example at a time to make progress. Either of them has its own drawbacks. ● Gradient descent is not particularly data efficient whenever data is very similar. Stochastic gradient descent is not particularly computationally efficient since CPUs and GPUs cannot exploit the full power of vectorization. ● This suggests that there might be something in between, and in fact, that is what we have been using so far in the examples we discussed. What is Minibatch Stochastic Gradient
  • 16. ● Mini Batch Gradient Descent is considered to be the cross-over between GD and SGD. In this approach instead of iterating through the entire dataset or one observation, we split the dataset into small subsets (batches) and compute the gradients for each batch. ● Steps involved in Mini-batch stochastic gradient: 1. Pick a mini-batch 2. Feed it to Neural Network 3. Calculate the mean gradient of the mini-batch 4. Use the mean gradient we calculated in step 3 to update the weights 5. Repeat steps 1–4 for the mini-batches we created What is Minibatch Stochastic Gradient
  • 17. ● Minibatch stochastic gradient descent is able to trade-off convergence speed and computation efficiency. A minibatch size of 10 is more efficient than stochastic gradient descent; a minibatch size of 100 even outperforms GD in terms of runtime. What is Minibatch Stochastic Gradient
  • 18. Advantages of Mini-Batch Gradient Descent: ● Reduces variance of the parameter update and hence lead to stable convergence ● Speeds the learning ● Helpful to estimate the approximate location of the actual minimum Disadvantages of Mini Batch Gradient Descent: ● Loss is computed for each mini batch and hence total loss needs to be accumulated across all mini batches Advantages and Disadvantages
  • 19. The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. The method is really efficient when working with large problem involving a lot of data or parameters. Adam is an adaptive learning rate method, which means, it computes individual learning rates for different parameters. Its name is derived from adaptive moment estimation What is Adam Optimizer
  • 20. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients. Adam optimizer involves a combination of two gradient descent methodologies: 1. Momentum: This algorithm is used to accelerate the gradient descent algorithm by taking into consideration the ‘exponentially weighted average’ of the gradients. Using averages makes the algorithm converge towards the minima in a faster pace. 2. Root Mean Square Propagation (RMSP): It maintains per-parameter learning rates that are adapted based on the average of recent magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the algorithm does well on online and non-stationary problems (e.g. noisy). How Adam Optimizer Work
  • 21. List of attractive benefits of using Adam, as follows: ● Straightforward to implement. ● Computationally efficient. ● Less memory requirements. ● Well suited for problems that are large in terms of data and/or parameters. ● Appropriate for problems with very noisy/or sparse gradients. ● Hyper-parameters have intuitive interpretation and typically require little tuning. Benefits of Adam Optimizer
  • 22. Demo
  • 23. Thank You ! Get in touch with us: Lorem Studio, Lord Building D4456, LA, USA