How to Master Optimization in Deep Learning

•

1 like•4,533 views

Francesco Gadaleta, Chief Data Officer at Abe AI, takes a deep dive into the secret behind deep learning: function optimization. Watch this video as he goes over the most used optimization techniques for artificial intelligence and deep learning technologies. Read the full post here: http://bit.ly/2m12Nxd

Technology

How to Master Optimization
in Deep Learning
Francesco Gadaleta, PhD.
Artificial Intelligence Architect
CDO Abe.ai

My name is Francesco Gadaleta.
I am Chief Data Officer at Abe AI,
where we streamline banking with financial AI

FUNCTION OPTIMIZATION
IN MACHINE LEARNING

FUNCTION OPTIMIZATION
Minimizing or maximizing a function
(eg. the difference between the predicted
value and the true value)
This function is usually referred to as the
loss function

THE CORE OF DEEP NEURAL NETWORKS
x1
x2
x3
b=+1
W1 W2
(Logistic regression) (Logistic regression)
b1
b2

THE GRADIENT AND
GRADIENT DESCENT METHODS
The gradient is a vector-valued multivariable generalization of
the derivative.
Like the derivative, the gradient represents the slope of the
tangent of the graph of the function.
Gradient 3D plot

TYPES OF OPTIMIZATION
First-order methods minimize or maximize the loss function
using its gradient.
Second-order methods minimize or maximize using the second
derivative (Hessian). Very costly to compute
L-BFGS Limited-memory Broyden–Fletcher–Goldfarb–Shanno uses an
approximation of the Hessian

CONVEX FUNCTIONS
Computers are very good at minimizing only a specific family of
functions (convex functions)
Convex Function Non-Convex Function
Definition:

Layers
Output: predict supervised target
Hidden: learn abstract
representations
Input: raw sensory inputs.
THE CORE OF NEURAL NETWORKS

THE CORE OF NEURAL NETWORKS
(Logistic regression)
SGD Stochastic
Gradient Descent
Backpropagation
(at each layer)

GRADIENT DESCENT
LEADING TOWARDS THE MINIMUM
● Follow the negative gradient
● Tune parameters to minimize
loss function
● Direction and learning rate

GRADIENT DESCENT METHODS
MOMENTUM
● SGD has trouble around local optima
(the surface curves much more steeply in one dimension
than in another)
● Momentum accelerates SGD in the relevant
direction and reduces oscillations in the other
directions

GRADIENT DESCENT METHODS
ADAGRAD
● Adaptive Gradient adapts the learning rate
to the parameters
● Larger updates for infrequent parameters
● Smaller updates for frequent ones.

GRADIENT DESCENT METHODS
ADAM
● Best of both worlds
● Adaptive learning rates for each parameter
● Adam keeps an exponentially decaying average of past
squared gradients and of past gradients too in a similar
fashion to momentum
Calculates
(first moment, second moment) of the gradients
(the mean, the variance)

GRADIENT DESCENT METHODS
WHAT IS THE BEST OPTIMIZER
Consider one of the
adaptive learning-rate
methods.
No need to tune the
learning rate to achieve
the best results with
default values
Input data is sparse
Good and reliable for
simple networks.
In general SGD will get to
the minimum, even though
it might struggle a bit near
saddle points, taking
longer to converge.
Off-the-shelf SGD
Choose one of the
adaptive learning rate
methods for faster
convergence
Complex and deep nets

Thank you
datascienceathome.com
@thisisFrag Abe
AI-Powered Banking

Recently uploaded

MS Copilot expands with MS Graph connectorsNanddeep Nachan

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

ICT role in 21st century education and its challengesrafiqahmad00786416

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Manulife - Insurer Transformation Award 2024The Digital Insurer

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Recently uploaded (20)

MS Copilot expands with MS Graph connectors

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

ICT role in 21st century education and its challenges

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Strategies for Landing an Oracle DBA Job as a Fresher

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Manulife - Insurer Transformation Award 2024

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Boost Fertility New Invention Ups Success Rates.pdf

Artificial Intelligence Chap.5 : Uncertainty

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Ransomware_Q4_2023. The report. [EN].pdf

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

How to Master Optimization in Deep Learning

1. How to Master Optimization in Deep Learning Francesco Gadaleta, PhD. Artificial Intelligence Architect CDO Abe.ai

2. My name is Francesco Gadaleta. I am Chief Data Officer at Abe AI, where we streamline banking with financial AI

3. FUNCTION OPTIMIZATION IN MACHINE LEARNING

4. FUNCTION OPTIMIZATION Minimizing or maximizing a function (eg. the difference between the predicted value and the true value) This function is usually referred to as the loss function

5. THE CORE OF DEEP NEURAL NETWORKS x1 x2 x3 b=+1 W1 W2 (Logistic regression) (Logistic regression) b1 b2

6. THE GRADIENT AND GRADIENT DESCENT METHODS The gradient is a vector-valued multivariable generalization of the derivative. Like the derivative, the gradient represents the slope of the tangent of the graph of the function. Gradient 3D plot

7. TYPES OF OPTIMIZATION First-order methods minimize or maximize the loss function using its gradient. Second-order methods minimize or maximize using the second derivative (Hessian). Very costly to compute L-BFGS Limited-memory Broyden–Fletcher–Goldfarb–Shanno uses an approximation of the Hessian

8. CONVEX FUNCTIONS Computers are very good at minimizing only a specific family of functions (convex functions) Convex Function Non-Convex Function Definition:

9. Layers Output: predict supervised target Hidden: learn abstract representations Input: raw sensory inputs. THE CORE OF NEURAL NETWORKS

10. THE CORE OF NEURAL NETWORKS (Logistic regression) SGD Stochastic Gradient Descent Backpropagation (at each layer)

11. GRADIENT DESCENT LEADING TOWARDS THE MINIMUM ● Follow the negative gradient ● Tune parameters to minimize loss function ● Direction and learning rate

12. GRADIENT DESCENT METHODS MOMENTUM ● SGD has trouble around local optima (the surface curves much more steeply in one dimension than in another) ● Momentum accelerates SGD in the relevant direction and reduces oscillations in the other directions

13. GRADIENT DESCENT METHODS ADAGRAD ● Adaptive Gradient adapts the learning rate to the parameters ● Larger updates for infrequent parameters ● Smaller updates for frequent ones.

14. GRADIENT DESCENT METHODS ADAM ● Best of both worlds ● Adaptive learning rates for each parameter ● Adam keeps an exponentially decaying average of past squared gradients and of past gradients too in a similar fashion to momentum Calculates (first moment, second moment) of the gradients (the mean, the variance)

15. GRADIENT DESCENT METHODS WHAT IS THE BEST OPTIMIZER Consider one of the adaptive learning-rate methods. No need to tune the learning rate to achieve the best results with default values Input data is sparse Good and reliable for simple networks. In general SGD will get to the minimum, even though it might struggle a bit near saddle points, taking longer to converge. Off-the-shelf SGD Choose one of the adaptive learning rate methods for faster convergence Complex and deep nets

16. Thank you datascienceathome.com @thisisFrag Abe AI-Powered Banking

How to Master Optimization in Deep Learning

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

How to Master Optimization in Deep Learning