SlideShare a Scribd company logo
1 of 28
Download to read offline
Presented by Jason Tsai
9
1
Performance Optimization
➢
Steepest descent
(approximation with 1st
-order Taylor series expansion)
➢
Newton’s method
(approximation with 2nd
-order Taylor series expansion)
➢
Conjugate gradient
xk + 1
= xk
– Ak
–1
gk
xk + 1
= xk
– αk
gk
pk
= – gk
+ βk
pk – 1
9 OutOutlines
8
2
Taylor Series Expansion
F x( ) F x∗( )
xd
d
F x( )
x x∗=
x x∗–( )+=
1
2
---
x
2
2
d
d
F x( )
x x∗=
x x∗–( )
2 …+ +
1
n!
-----
x
n
n
d
d
F x( )
x x∗=
x x∗–( )n …+ +
8
5
Vector Case
F x( ) F x1 x2 … xn, , ,( )=
F x( ) F x∗( )
x1
∂
∂
F x( )
x x∗=
x1 x1
∗–( )
x2
∂
∂
F x( )
x x∗=
x2 x2
∗–( )+ +=
…
xn
∂
∂
F x( )
x x∗
=
xn xn
∗–( ) 1
2
---
x1
2
2
∂
∂
F x( )
x x∗
=
x1 x1
∗–( )
2
+ + +
1
2
---
x1 x2
∂
2
∂
∂
F x( )
x x∗
=
x1 x1
∗–( ) x2 x2
∗–( ) …+ +
8
6
Matrix Form
F x( ) F x∗( ) F x( )∇
T
x x∗=
x x∗–( )+=
1
2
--- x x∗–( )
T
F x( )
x x∗=
x x∗–( )∇2 …+ +
F x( )∇
x1
∂
∂
F x( )
x2
∂
∂
F x( )
…
xn
∂
∂
F x( )
= F x( )∇2
x1
2
2
∂
∂
F x( )
x1 x2
∂
2
∂
∂
F x( ) …
x1 xn
∂
2
∂
∂
F x( )
x2 x1
∂
2
∂
∂
F x( )
x2
2
2
∂
∂
F x( ) …
x2 xn
∂
2
∂
∂
F x( )
…
…
…
xn x1∂
2
∂
∂
F x( )
xn x2∂
2
∂
∂
F x( ) …
xn
2
2
∂
∂
F x( )
=
Gradient Hessian
9
2
Basic Optimization Algorithm
xk 1+ xk αkpk+=
xΔ k xk 1+ xk–( ) αkpk= =
pk - Search Direction
αk - Learning Rate
or
xk
xk 1+
αkpk
9
3
Steepest Descent
F xk 1+( ) F xk( )<
Choose the next step so that the function decreases:
F xk 1+( ) F xk xΔ k+( ) F xk( ) gk
T
xΔ k+≈=
For small changes in x we can approximate F(x):
gk F x( )∇
x xk=
≡
where
gk
T
xΔ k αkgk
T
pk 0<=
If we want the function to decrease:
pk g– k=
We can maximize the decrease by choosing:
xk 1+ xk αkgk–=
9
4
Example
F x( ) x1
2
2x1x2
2x2
2
x1+ + +=
x0
0.5
0.5
=
F x( )∇
x1∂
∂
F x( )
x2∂
∂
F x( )
2x1 2x2 1+ +
2x1 4x2+
= = g0 F x( )∇
x x0=
3
3
= =
α 0.1=
x1 x0 αg0– 0.5
0.5
0.1 3
3
– 0.2
0.2
= = =
x2 x1 αg1– 0.2
0.2
0.1 1.8
1.2
– 0.02
0.08
= = =
9
5
Plot
-2 -1 0 1 2
-2
-1
0
1
2
9
6
Stable Learning Rates (Quadratic)
F x( )
1
2
---x
T
Ax d
T
x c+ +=
F x( )∇ Ax d+=
xk 1+ xk αgk– xk α Axk d+( )–= = xk 1+ I αA–[ ]xk αd–=
I αA–[ ]zi zi αAzi– zi αλizi– 1 αλi–( )zi= = =
1 αλi–( ) 1< α
2
λi
----< α
2
λmax
------------<
Stability is determined
by the eigenvalues of
this matrix.
Eigenvalues
of [I - αA].
Stability Requirement:
(λi - eigenvalue of A)
9
7
Example
A 2 2
2 4
= λ1 0.764=( ) z1
0.851
0.526–
=
⎝ ⎠
⎜ ⎟
⎛ ⎞
,
⎩ ⎭
⎨ ⎬
⎧ ⎫
λ2 5.24 z2
0.526
0.851
=
⎝ ⎠
⎜ ⎟
⎛ ⎞
,=
⎩ ⎭
⎨ ⎬
⎧ ⎫
,
α
2
λmax
------------<
2
5.24
---------- 0.38= =
-2 -1 0 1 2
-2
-1
0
1
2
-2 -1 0 1 2
-2
-1
0
1
2
α 0.37= α 0.39=
9
8
Minimizing Along a Line
F xk αkpk+( )
d
dαk
--------- F xk αkpk+( )( ) F x( )∇
T
x xk=
pk αkpk
T
F x( )∇2
x xk=
pk+=
αk
F x( )∇
T
x xk=
pk
pk
T
F x( )∇2
x xk=
pk
------------------------------------------------–
gk
T
pk
pk
T
Akpk
--------------------–= =
Ak F x( )∇2
x xk=
≡
Choose αk to minimize
where
9
9
Example
F x( )
1
2
---x
T 2 2
2 4
x 1 0 x+= x0
0.5
0.5
=
F x( )∇
x1∂
∂
F x( )
x2∂
∂
F x( )
2x1 2x2 1+ +
2x1 4x2+
= = p0 g– 0 F x( )∇–
x x0=
3–
3–
= = =
α0
3 3
3–
3–
3– 3–
2 2
2 4
3–
3–
--------------------------------------------– 0.2= = x1 x0 α0g0– 0.5
0.5
0.2 3
3
– 0.1–
0.1–
= = =
9
10
Plot
Successive steps are orthogonal.
αkd
d
F xk αkpk+( )
αkd
d
F xk 1+( ) F x( )∇
T
x xk 1+= αkd
d
xk αkpk+[ ]= =
F x( )∇
T
x xk 1+=
pk gk 1+
T
pk= =
-2 -1 0 1 2
-2
-1
0
1
2
Contour Plot
x1
x2
9
11
Newton’s Method
F xk 1+( ) F xk Δxk+( ) F xk( ) gk
T
Δxk
1
2
---Δxk
T
AkΔxk+ +≈=
gk AkΔxk+ 0=
Take the gradient of this second-order approximation
and set it equal to zero to find the stationary point:
Δxk Ak
1–
– gk=
xk 1+ xk Ak
1–
gk–=
9
12
Example
F x( ) x1
2
2x1x2
2x2
2
x1+ + +=
x0
0.5
0.5
=
F x( )∇
x1∂
∂
F x( )
x2∂
∂
F x( )
2x1 2x2 1+ +
2x1 4x2+
= =
g0 F x( )∇
x x0=
3
3
= =
A 2 2
2 4
=
x1
0.5
0.5
2 2
2 4
1–
3
3
–
0.5
0.5
1 0.5–
0.5– 0.5
3
3
–
0.5
0.5
1.5
0
–
1–
0.5
= = = =
9
13
Plot
-2 -1 0 1 2
-2
-1
0
1
2
9 Non-Quadratic Example
F(x) = ( x2
– x1
)4
+ 8x1
x2
– x1
+ x2
+ 3
Stationary Points: x1
=
–0.42
0.42
x2
=
–0.13
0.13
x3
=
0.55
–0.55
F(x) F2(x)
2
1
0
-1
-2
-2 -1 0 1 2
2
1
0
-1
-2
-2 -1 0 1 2
14
local minimum saddle point global minimum
9
15
Different Initial Conditions
-2 -1 0 1 2
-2
-1
0
1
2
-2 -1 0 1 2
-2
-1
0
1
2
-2 -1 0 1 2
-2
-1
0
1
2
-2 -1 0 1 2
-2
-1
0
1
2
F(x)
F2(x)
-2 -1 0 1 2
-2
-1
0
1
2
-2 -1 0 1 2
-2
-1
0
1
2
Pros:
✔
Newton’s method has quadratic convergence
and it generally converge (if it does) much
faster than steepest descent.
Cons:
✗
It may converge to saddle points, oscillate,
or diverge.
✗
It requires the computation and storage of
the Hessian matrix and its inverse.
9 Problems with Newton’s method
9
16
Conjugate Vectors
F x( )
1
2
---x
T
Ax d
T
x c+ +=
pk
T
Apj 0= k j≠
A set of vectors is mutually conjugate with respect to a positive
definite Hessian matrix A if
One set of conjugate vectors consists of the eigenvectors of A.
zk
T
Azj λjzk
T
zj 0 k j≠= =
(The eigenvectors of symmetric matrices are orthogonal.)
9
17
For Quadratic Functions
F x( )∇ Ax d+=
F x( )∇2 A=
gkΔ gk 1+ gk– Axk 1+ d+( ) Axk d+( )– A xkΔ= = =
xkΔ xk 1+ xk–( ) αkpk= =
αkpk
T
Apj xk
T
Δ Apj gk
T
Δ pj 0= = = k j≠
The change in the gradient at iteration k is
where
The conjugacy conditions can be rewritten
This does not require knowledge of the Hessian matrix.
9
18
Forming Conjugate Directions
p0 g0–=
pk gk– βkpk 1–+=
βk
gk 1–
T
Δ gk
gk 1–
T
Δ pk 1–
-----------------------------= βk
gk
T
gk
gk 1–
T
gk 1–
-------------------------= βk
gk 1–
T
Δ gk
gk 1–
T
gk 1–
-------------------------=
Choose the initial search direction as the negative of the gradient.
Choose subsequent search directions to be conjugate.
where
or or
9
19
Conjugate Gradient algorithm
• The first search direction is the negative of the gradient.
• Select the learning rate to minimize along the line.
• Select the next search direction using
• If the algorithm has not converged, return to second step.
• A quadratic function will be minimized in n steps.
p0 g0–=
pk gk– βkpk 1–+=
αk
F x( )∇
T
x xk=
pk
pk
T
F x( )∇2
x xk=
pk
------------------------------------------------–
gk
T
pk
pk
T
Akpk
--------------------–= = (For quadratic
functions.)
9
20
Example
F x( )
1
2
---x
T 2 2
2 4
x 1 0 x+= x0
0.5
0.5
=
F x( )∇
x1∂
∂
F x( )
x2∂
∂
F x( )
2x1 2x2 1+ +
2x1 4x2+
= = p0 g– 0 F x( )∇–
x x0=
3–
3–
= = =
α0
3 3
3–
3–
3– 3–
2 2
2 4
3–
3–
--------------------------------------------– 0.2= = x1 x0 α0g0– 0.5
0.5
0.2 3
3
– 0.1–
0.1–
= = =
9
21
Example
g1 F x( )∇
x x1=
2 2
2 4
0.1–
0.1–
1
0
+ 0.6
0.6–
= = =
β1
g1
T
g1
g0
T
g0
------------
0.6 0.6–
0.6
0.6–
3 3
3
3
-----------------------------------------
0.72
18
---------- 0.04= = = =
p1 g1– β1p0+
0.6–
0.6
0.04
3–
3–
+
0.72–
0.48
= = =
α1
0.6 0.6–
0.72–
0.48
0.72– 0.48
2 2
2 4
0.72–
0.48
---------------------------------------------------------------–
0.72–
0.576
-------------– 1.25= = =
9 Plots
x2
= x1
+ α1
p1
=
–0.1
–0.1
+ 1.25
–0.72
0.48
=
–1
0.5
Conjugate Gradient Steepest Descent
Contour Plot
2
1
0
-1
-2
-2 -1 0 1 2
x1
22
x2

More Related Content

More from Jason Tsai

Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Jason Tsai
 
Reinforcement Learning: Chapter 15 Neuroscience
Reinforcement Learning: Chapter 15 NeuroscienceReinforcement Learning: Chapter 15 Neuroscience
Reinforcement Learning: Chapter 15 NeuroscienceJason Tsai
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路Jason Tsai
 
漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路Jason Tsai
 

More from Jason Tsai (6)

Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
 
Reinforcement Learning: Chapter 15 Neuroscience
Reinforcement Learning: Chapter 15 NeuroscienceReinforcement Learning: Chapter 15 Neuroscience
Reinforcement Learning: Chapter 15 Neuroscience
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路
 
漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路漫談人工智慧:啟發自大腦科學的深度學習網路
漫談人工智慧:啟發自大腦科學的深度學習網路
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Neural Network Design: Chapter 9 Performance Optimization

  • 3. ➢ Steepest descent (approximation with 1st -order Taylor series expansion) ➢ Newton’s method (approximation with 2nd -order Taylor series expansion) ➢ Conjugate gradient xk + 1 = xk – Ak –1 gk xk + 1 = xk – αk gk pk = – gk + βk pk – 1 9 OutOutlines
  • 4. 8 2 Taylor Series Expansion F x( ) F x∗( ) xd d F x( ) x x∗= x x∗–( )+= 1 2 --- x 2 2 d d F x( ) x x∗= x x∗–( ) 2 …+ + 1 n! ----- x n n d d F x( ) x x∗= x x∗–( )n …+ +
  • 5. 8 5 Vector Case F x( ) F x1 x2 … xn, , ,( )= F x( ) F x∗( ) x1 ∂ ∂ F x( ) x x∗= x1 x1 ∗–( ) x2 ∂ ∂ F x( ) x x∗= x2 x2 ∗–( )+ += … xn ∂ ∂ F x( ) x x∗ = xn xn ∗–( ) 1 2 --- x1 2 2 ∂ ∂ F x( ) x x∗ = x1 x1 ∗–( ) 2 + + + 1 2 --- x1 x2 ∂ 2 ∂ ∂ F x( ) x x∗ = x1 x1 ∗–( ) x2 x2 ∗–( ) …+ +
  • 6. 8 6 Matrix Form F x( ) F x∗( ) F x( )∇ T x x∗= x x∗–( )+= 1 2 --- x x∗–( ) T F x( ) x x∗= x x∗–( )∇2 …+ + F x( )∇ x1 ∂ ∂ F x( ) x2 ∂ ∂ F x( ) … xn ∂ ∂ F x( ) = F x( )∇2 x1 2 2 ∂ ∂ F x( ) x1 x2 ∂ 2 ∂ ∂ F x( ) … x1 xn ∂ 2 ∂ ∂ F x( ) x2 x1 ∂ 2 ∂ ∂ F x( ) x2 2 2 ∂ ∂ F x( ) … x2 xn ∂ 2 ∂ ∂ F x( ) … … … xn x1∂ 2 ∂ ∂ F x( ) xn x2∂ 2 ∂ ∂ F x( ) … xn 2 2 ∂ ∂ F x( ) = Gradient Hessian
  • 7. 9 2 Basic Optimization Algorithm xk 1+ xk αkpk+= xΔ k xk 1+ xk–( ) αkpk= = pk - Search Direction αk - Learning Rate or xk xk 1+ αkpk
  • 8. 9 3 Steepest Descent F xk 1+( ) F xk( )< Choose the next step so that the function decreases: F xk 1+( ) F xk xΔ k+( ) F xk( ) gk T xΔ k+≈= For small changes in x we can approximate F(x): gk F x( )∇ x xk= ≡ where gk T xΔ k αkgk T pk 0<= If we want the function to decrease: pk g– k= We can maximize the decrease by choosing: xk 1+ xk αkgk–=
  • 9. 9 4 Example F x( ) x1 2 2x1x2 2x2 2 x1+ + += x0 0.5 0.5 = F x( )∇ x1∂ ∂ F x( ) x2∂ ∂ F x( ) 2x1 2x2 1+ + 2x1 4x2+ = = g0 F x( )∇ x x0= 3 3 = = α 0.1= x1 x0 αg0– 0.5 0.5 0.1 3 3 – 0.2 0.2 = = = x2 x1 αg1– 0.2 0.2 0.1 1.8 1.2 – 0.02 0.08 = = =
  • 10. 9 5 Plot -2 -1 0 1 2 -2 -1 0 1 2
  • 11. 9 6 Stable Learning Rates (Quadratic) F x( ) 1 2 ---x T Ax d T x c+ += F x( )∇ Ax d+= xk 1+ xk αgk– xk α Axk d+( )–= = xk 1+ I αA–[ ]xk αd–= I αA–[ ]zi zi αAzi– zi αλizi– 1 αλi–( )zi= = = 1 αλi–( ) 1< α 2 λi ----< α 2 λmax ------------< Stability is determined by the eigenvalues of this matrix. Eigenvalues of [I - αA]. Stability Requirement: (λi - eigenvalue of A)
  • 12. 9 7 Example A 2 2 2 4 = λ1 0.764=( ) z1 0.851 0.526– = ⎝ ⎠ ⎜ ⎟ ⎛ ⎞ , ⎩ ⎭ ⎨ ⎬ ⎧ ⎫ λ2 5.24 z2 0.526 0.851 = ⎝ ⎠ ⎜ ⎟ ⎛ ⎞ ,= ⎩ ⎭ ⎨ ⎬ ⎧ ⎫ , α 2 λmax ------------< 2 5.24 ---------- 0.38= = -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 α 0.37= α 0.39=
  • 13. 9 8 Minimizing Along a Line F xk αkpk+( ) d dαk --------- F xk αkpk+( )( ) F x( )∇ T x xk= pk αkpk T F x( )∇2 x xk= pk+= αk F x( )∇ T x xk= pk pk T F x( )∇2 x xk= pk ------------------------------------------------– gk T pk pk T Akpk --------------------–= = Ak F x( )∇2 x xk= ≡ Choose αk to minimize where
  • 14. 9 9 Example F x( ) 1 2 ---x T 2 2 2 4 x 1 0 x+= x0 0.5 0.5 = F x( )∇ x1∂ ∂ F x( ) x2∂ ∂ F x( ) 2x1 2x2 1+ + 2x1 4x2+ = = p0 g– 0 F x( )∇– x x0= 3– 3– = = = α0 3 3 3– 3– 3– 3– 2 2 2 4 3– 3– --------------------------------------------– 0.2= = x1 x0 α0g0– 0.5 0.5 0.2 3 3 – 0.1– 0.1– = = =
  • 15. 9 10 Plot Successive steps are orthogonal. αkd d F xk αkpk+( ) αkd d F xk 1+( ) F x( )∇ T x xk 1+= αkd d xk αkpk+[ ]= = F x( )∇ T x xk 1+= pk gk 1+ T pk= = -2 -1 0 1 2 -2 -1 0 1 2 Contour Plot x1 x2
  • 16. 9 11 Newton’s Method F xk 1+( ) F xk Δxk+( ) F xk( ) gk T Δxk 1 2 ---Δxk T AkΔxk+ +≈= gk AkΔxk+ 0= Take the gradient of this second-order approximation and set it equal to zero to find the stationary point: Δxk Ak 1– – gk= xk 1+ xk Ak 1– gk–=
  • 17. 9 12 Example F x( ) x1 2 2x1x2 2x2 2 x1+ + += x0 0.5 0.5 = F x( )∇ x1∂ ∂ F x( ) x2∂ ∂ F x( ) 2x1 2x2 1+ + 2x1 4x2+ = = g0 F x( )∇ x x0= 3 3 = = A 2 2 2 4 = x1 0.5 0.5 2 2 2 4 1– 3 3 – 0.5 0.5 1 0.5– 0.5– 0.5 3 3 – 0.5 0.5 1.5 0 – 1– 0.5 = = = =
  • 18. 9 13 Plot -2 -1 0 1 2 -2 -1 0 1 2
  • 19. 9 Non-Quadratic Example F(x) = ( x2 – x1 )4 + 8x1 x2 – x1 + x2 + 3 Stationary Points: x1 = –0.42 0.42 x2 = –0.13 0.13 x3 = 0.55 –0.55 F(x) F2(x) 2 1 0 -1 -2 -2 -1 0 1 2 2 1 0 -1 -2 -2 -1 0 1 2 14 local minimum saddle point global minimum
  • 20. 9 15 Different Initial Conditions -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 F(x) F2(x) -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
  • 21. Pros: ✔ Newton’s method has quadratic convergence and it generally converge (if it does) much faster than steepest descent. Cons: ✗ It may converge to saddle points, oscillate, or diverge. ✗ It requires the computation and storage of the Hessian matrix and its inverse. 9 Problems with Newton’s method
  • 22. 9 16 Conjugate Vectors F x( ) 1 2 ---x T Ax d T x c+ += pk T Apj 0= k j≠ A set of vectors is mutually conjugate with respect to a positive definite Hessian matrix A if One set of conjugate vectors consists of the eigenvectors of A. zk T Azj λjzk T zj 0 k j≠= = (The eigenvectors of symmetric matrices are orthogonal.)
  • 23. 9 17 For Quadratic Functions F x( )∇ Ax d+= F x( )∇2 A= gkΔ gk 1+ gk– Axk 1+ d+( ) Axk d+( )– A xkΔ= = = xkΔ xk 1+ xk–( ) αkpk= = αkpk T Apj xk T Δ Apj gk T Δ pj 0= = = k j≠ The change in the gradient at iteration k is where The conjugacy conditions can be rewritten This does not require knowledge of the Hessian matrix.
  • 24. 9 18 Forming Conjugate Directions p0 g0–= pk gk– βkpk 1–+= βk gk 1– T Δ gk gk 1– T Δ pk 1– -----------------------------= βk gk T gk gk 1– T gk 1– -------------------------= βk gk 1– T Δ gk gk 1– T gk 1– -------------------------= Choose the initial search direction as the negative of the gradient. Choose subsequent search directions to be conjugate. where or or
  • 25. 9 19 Conjugate Gradient algorithm • The first search direction is the negative of the gradient. • Select the learning rate to minimize along the line. • Select the next search direction using • If the algorithm has not converged, return to second step. • A quadratic function will be minimized in n steps. p0 g0–= pk gk– βkpk 1–+= αk F x( )∇ T x xk= pk pk T F x( )∇2 x xk= pk ------------------------------------------------– gk T pk pk T Akpk --------------------–= = (For quadratic functions.)
  • 26. 9 20 Example F x( ) 1 2 ---x T 2 2 2 4 x 1 0 x+= x0 0.5 0.5 = F x( )∇ x1∂ ∂ F x( ) x2∂ ∂ F x( ) 2x1 2x2 1+ + 2x1 4x2+ = = p0 g– 0 F x( )∇– x x0= 3– 3– = = = α0 3 3 3– 3– 3– 3– 2 2 2 4 3– 3– --------------------------------------------– 0.2= = x1 x0 α0g0– 0.5 0.5 0.2 3 3 – 0.1– 0.1– = = =
  • 27. 9 21 Example g1 F x( )∇ x x1= 2 2 2 4 0.1– 0.1– 1 0 + 0.6 0.6– = = = β1 g1 T g1 g0 T g0 ------------ 0.6 0.6– 0.6 0.6– 3 3 3 3 ----------------------------------------- 0.72 18 ---------- 0.04= = = = p1 g1– β1p0+ 0.6– 0.6 0.04 3– 3– + 0.72– 0.48 = = = α1 0.6 0.6– 0.72– 0.48 0.72– 0.48 2 2 2 4 0.72– 0.48 ---------------------------------------------------------------– 0.72– 0.576 -------------– 1.25= = =
  • 28. 9 Plots x2 = x1 + α1 p1 = –0.1 –0.1 + 1.25 –0.72 0.48 = –1 0.5 Conjugate Gradient Steepest Descent Contour Plot 2 1 0 -1 -2 -2 -1 0 1 2 x1 22 x2