SlideShare a Scribd company logo
1 of 30
Neural ODE
Natan Katz
Natan.katz@gmail.com
Lecture’s Summary
• Why do we care about ODE?
• What is ODE?
• Neural ODE –History
• Neural ODE –NeurIPS paper
Why do we care?
• NeurIPS 2018 research papers competition
• 4500 papers have been submitted
• One of the best 4 :
Neural ODE (Qi Chen ,Rubanova, Bettencourt ,Duvenaud)
An new usage of both mathematical tool an approach in DL
1. Observing a network as a continuous entity
2. Observing hidden layer as a time function rather a set of
discrete entities
What are Differential Equations?
• Equations that has the form
F(X,C) =0
C is a constants vector (e.g. weights).
F is a function.. “generously differentiable”
(until now it is as complicated as a quadratic equation..)
X is a the variable of F and it contains derivatives..
Derivatives of what??!!
Classes of Differential Equations
1 Autonomous ODE - 𝑥 =f(x)
2 Non-Autonomous ODE 𝑥 =f(x,t)
3 PDE
𝜕𝑢
𝜕𝑥
+
𝜕𝑢
𝜕𝑡
−
𝜕2 𝑢
𝜕𝑥2 -g(x) = 0
4 SDE 𝑥 =f(x) +𝑑𝑊
PDE –Real Life Example
Poisson Equation ∆u =f
u is the potential of a vector field and f is the “source function”
(density or electrical charge)
Burger Equation :
𝜕𝑢
𝜕𝑡
+u
𝜕𝑢
𝜕𝑥
=μ
𝜕2 𝑢
𝜕𝑥2 u is fluid velocity ,
μ the diffusion term, For μ=0 it is used often in shock waves.
and the coolest girl in the hood Navier-Stokes
𝜕𝑈
𝜕𝑡
+ u ∙ 𝛻u =-
𝛻𝑝
𝜌
- μ ∆u +f(x, t) u is fluid velocity
Example: Black & Scholes
Stock price:
𝑑S = μS𝑑t +σS𝑑W
Derivative price (using Ito’s lemma):
𝑑V=(μS
𝜕𝑉
𝜕𝑆
+
𝜕𝑉
𝜕𝑡
+
1
2
σ2
S2 𝑑2 𝑉
𝑑2 𝑆
)dt + σS
𝜕𝑉
𝜕𝑆
dW
We wish to have a portfolio with 1 derivative (option ) and 𝛿 stocks
P =V+ 𝛿S
𝑑P =(μS
𝜕𝑉
𝜕𝑆
+
𝜕𝑉
𝜕𝑡
+
1
2
σ2
S2 𝑑2 𝑉
𝑑2 𝑆
+ 𝛿 μS)dt +(σS
𝜕𝑉
𝜕𝑆
+ 𝛿 σS) dW
Black & Scholes
Let’s get rid of the randomness
𝛿 =−
𝜕𝑉
𝜕𝑆
We assume no arbitrages (namely we can put it in the bank with risk free r)
Π = -V + S
𝜕𝑉
𝜕𝑆
=> rP𝑑t=𝑑P
Which leads to the PDE
𝜕𝑉
𝜕𝑡
+
1
2
σ2
S2 𝑑2 𝑣
𝑑2 𝑆
+rS
𝜕𝑉
𝜕𝑆
-rV=0
ODE –Basic Terminology
𝑥 =f(x) or 𝑥 =f(x,t)
Initial condition
Let the eq. 𝑥 =f(x) we add the initial condition x[0] =c
Example:
𝑥=x by integrating both sides we get
x[t] =𝑒 𝑡
a . We need the i.c. to determine a
ODE –Basic Terminology
• ODE solutions never intersect
• For most cases we cannot solve the equation analytically
We aim to study flow patterns in the state space
Ω Limit –the set of points in which flows may converge as time goes to
infinity
α Limit –the set of points in which flows may converge as time goes to minus
infinity
• Elements that we may find :fixed points, closed curves
strange attractors
ODE -Terminology
Attractors
A point or compact set in which attracts every i.c.
Fixed Point
F(x)=0 Namely the point that the flow “rests”
Stability
F.p. is stable if the flow does not leave a ε-neighborhood. (homoclinic)
Determine stability
Autonomous system
If the Jacobian has non -zero real part eigen values
• Lyapunov function
• Dulac Theorem
Non-Autonomous system
Lyapunov exponents
Bifurcations
Further Reading
• Non Autonomous DS, Kloeden & Rasmussen
• ODE - Jack Hale
• Navier Stokes –several books, papers of Edriss Titti
• Theory & applications of SDE –Zeev Schuss
• Books on Heat equation
DE & DL
• Consider Resnet
Every layer t satisfies :
ℎ 𝑡+1 =δt f(ℎ 𝑡 θ) + ℎ 𝑡
Haber & Ruthotto (2017) ,Yiping Lu ,Zhong
For infinitesimal time step (nearly continuity) We obtain:
ℎ = f(h, θ)
What does it mean?
Neural ODE –Chen Rubanova et al
One of the best research papers in NeurIPS 2018
What does it contain?
• Description of solving neural with ODE solver
• A backpropagation algorithm for ODE solver
• Comparison of this method for supervised learning
• Generative process
• Continuous normalized flow
A backpropagation algorithm for ODE solver
• There are several methods to solve ODEs such as Euler and
Runge-Kutta , their main difficulties is the amount of
gradients needed
Adjoint Method
min
θ
𝐹 F(z,θ) = 0
𝑇
𝑓 𝑧, 𝑡, θ 𝑑𝑡
g(x(0), θ) = 0
h(x, 𝑥, 𝑡, θ) =0
Note : g,h define together an initial condition problem
Adjoint Method (cont.)
So what do they do in the paper?
𝑧 =f(z,t,θ)
We assume a loss L s.t.
L(z(T) =L[z (0) + 0
𝑇
𝑓 𝑧, 𝑡, θ 𝑑𝑡] -ODE solver friendly 
We define
a(T) =
𝜕𝐿
𝜕𝑧(𝑇)
What is actually z(T)?
Adjoint Method (cont.)
We simply solve the three equations:
𝑎 = a(T) 𝑓𝑍 𝑧, 𝑡, θ
𝜕𝐿
𝜕θ
= - 𝑡
0
𝑎(𝑡)𝑓θ 𝑧, 𝑡, θ 𝑑𝑡
𝑧 =f(z,t,θ)
With the i.c. a(T), z(T) , θ0
Torch version github.com/rtqichen/torchdiffeq.
Comparison of this method for
supervised learning
They compared on MNIST:
1. Resnet
2. ODE
3. Runge-Kutta
The error is nearly similar where ResNet uses more params.
(ODE –net has about the same as a single layer with 300 units
of Resnet)
Continuous Normalization Flow- CNF
• A method that maps a generic distribution (Gaussianexponents)
Into a more complicate distributions through a sequence of maps
𝑓1 , 𝑓2 , 𝑓3 .…. 𝑓𝑘
The main difficulties here are:
𝑧1= 𝑓(𝑧0 ) => log 𝑝(𝑧1)=log 𝑝(𝑧0) -log det(𝑓𝑍[𝑧0])
Calculating determinants is “costly”.
CNF
ODE –solution:
We assume a continuous sequence of maps:
𝜕 log 𝑝( 𝑧 𝑡)
𝜕𝑡
= -tr(𝑓𝑍(t) )
Traces are easier to calculate and linear which allow us to
measure summation of fumctions as well
CNF
Generative Tools
• The main motivation: data that is irregularly sampled: traffic, medical
records . Data that is discretized although we expect a continuous
distribution to govern it.
• The ODE solution uses VAE to generate data .
For observations 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 and latent 𝑧1 , 𝑧2 , 𝑧3 … z 𝑚
𝑧0 ~ P(z)
𝑧1 , 𝑧2 , 𝑧3.. = ODEsolver(0,f, θ, 𝑡1 , 𝑡2 , 𝑡3 … t 𝑚)
𝑥𝑡 ~ P(x| 𝑧𝑡 , θ 𝑥 )
Generative ( cont)
In more details:
1. Put 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 to RNN
2. Calculate dist params 𝝀 from its hidden states (e.g. mean & std)
3. Sample 𝑧0 from q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3)
4. Run ODE solver with 𝑧0 and construct trajectory until 𝑡 𝑘
5. Decode 𝑥′
P(𝑥′
|𝑧𝑡 𝑘
, θ 𝑥)
6. Calculate KL divergence
Log(P(𝑥′
|𝑧𝑡 𝑘
, θ 𝑥)) +log(p(𝒛 𝟎)) –log(q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3))
p(𝒛 𝟎) ~N(0,1)
Thanks!!!

More Related Content

What's hot

Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsYoung-Geun Choi
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machinesMostafa G. M. Mostafa
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationMohammed Bennamoun
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Introduction to NP Completeness
Introduction to NP CompletenessIntroduction to NP Completeness
Introduction to NP CompletenessGene Moo Lee
 
Fuzzy c-means clustering for image segmentation
Fuzzy c-means  clustering for image segmentationFuzzy c-means  clustering for image segmentation
Fuzzy c-means clustering for image segmentationDharmesh Patel
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic Junya Tanaka
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
Presentation - Bi-directional A-star search
Presentation - Bi-directional A-star searchPresentation - Bi-directional A-star search
Presentation - Bi-directional A-star searchMohammad Saiful Islam
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Edureka!
 
Introduction to Soft Computing
Introduction to Soft Computing Introduction to Soft Computing
Introduction to Soft Computing Aakash Kumar
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolutionAmar Jukuntla
 

What's hot (20)

Fuzzy Logic
Fuzzy LogicFuzzy Logic
Fuzzy Logic
 
Expectation maximization
Expectation maximizationExpectation maximization
Expectation maximization
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Deep learning
Deep learningDeep learning
Deep learning
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Introduction to NP Completeness
Introduction to NP CompletenessIntroduction to NP Completeness
Introduction to NP Completeness
 
Fuzzy c-means clustering for image segmentation
Fuzzy c-means  clustering for image segmentationFuzzy c-means  clustering for image segmentation
Fuzzy c-means clustering for image segmentation
 
Inference in First-Order Logic
Inference in First-Order Logic Inference in First-Order Logic
Inference in First-Order Logic
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
Presentation - Bi-directional A-star search
Presentation - Bi-directional A-star searchPresentation - Bi-directional A-star search
Presentation - Bi-directional A-star search
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Classical Planning
Classical PlanningClassical Planning
Classical Planning
 
Introduction to Soft Computing
Introduction to Soft Computing Introduction to Soft Computing
Introduction to Soft Computing
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 

Similar to Neural ODE

Numerical_PDE_Paper
Numerical_PDE_PaperNumerical_PDE_Paper
Numerical_PDE_PaperWilliam Ruys
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesAlexander Decker
 
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...Joe Andelija
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxHamedNassar5
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Traian Rebedea
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1VitAnhNguyn94
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...Chiheb Ben Hammouda
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyOlivier Teytaud
 
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular SystemNonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular SystemPhilip Diette
 
The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...Alexander Decker
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL DivergenceNatan Katz
 

Similar to Neural ODE (20)

Numerical_PDE_Paper
Numerical_PDE_PaperNumerical_PDE_Paper
Numerical_PDE_Paper
 
Variational inference
Variational inference  Variational inference
Variational inference
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Q
QQ
Q
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
 
9 pd es
9 pd es9 pd es
9 pd es
 
Data structures
Data structuresData structures
Data structures
 
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptx
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
 
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular SystemNonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
 
The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 

More from Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUPNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 

More from Natan Katz (14)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 

Neural ODE

  • 2. Lecture’s Summary • Why do we care about ODE? • What is ODE? • Neural ODE –History • Neural ODE –NeurIPS paper
  • 3. Why do we care? • NeurIPS 2018 research papers competition • 4500 papers have been submitted • One of the best 4 : Neural ODE (Qi Chen ,Rubanova, Bettencourt ,Duvenaud) An new usage of both mathematical tool an approach in DL 1. Observing a network as a continuous entity 2. Observing hidden layer as a time function rather a set of discrete entities
  • 4. What are Differential Equations? • Equations that has the form F(X,C) =0 C is a constants vector (e.g. weights). F is a function.. “generously differentiable” (until now it is as complicated as a quadratic equation..) X is a the variable of F and it contains derivatives.. Derivatives of what??!!
  • 5. Classes of Differential Equations 1 Autonomous ODE - 𝑥 =f(x) 2 Non-Autonomous ODE 𝑥 =f(x,t) 3 PDE 𝜕𝑢 𝜕𝑥 + 𝜕𝑢 𝜕𝑡 − 𝜕2 𝑢 𝜕𝑥2 -g(x) = 0 4 SDE 𝑥 =f(x) +𝑑𝑊
  • 6. PDE –Real Life Example Poisson Equation ∆u =f u is the potential of a vector field and f is the “source function” (density or electrical charge) Burger Equation : 𝜕𝑢 𝜕𝑡 +u 𝜕𝑢 𝜕𝑥 =μ 𝜕2 𝑢 𝜕𝑥2 u is fluid velocity , μ the diffusion term, For μ=0 it is used often in shock waves. and the coolest girl in the hood Navier-Stokes 𝜕𝑈 𝜕𝑡 + u ∙ 𝛻u =- 𝛻𝑝 𝜌 - μ ∆u +f(x, t) u is fluid velocity
  • 7. Example: Black & Scholes Stock price: 𝑑S = μS𝑑t +σS𝑑W Derivative price (using Ito’s lemma): 𝑑V=(μS 𝜕𝑉 𝜕𝑆 + 𝜕𝑉 𝜕𝑡 + 1 2 σ2 S2 𝑑2 𝑉 𝑑2 𝑆 )dt + σS 𝜕𝑉 𝜕𝑆 dW We wish to have a portfolio with 1 derivative (option ) and 𝛿 stocks P =V+ 𝛿S 𝑑P =(μS 𝜕𝑉 𝜕𝑆 + 𝜕𝑉 𝜕𝑡 + 1 2 σ2 S2 𝑑2 𝑉 𝑑2 𝑆 + 𝛿 μS)dt +(σS 𝜕𝑉 𝜕𝑆 + 𝛿 σS) dW
  • 8. Black & Scholes Let’s get rid of the randomness 𝛿 =− 𝜕𝑉 𝜕𝑆 We assume no arbitrages (namely we can put it in the bank with risk free r) Π = -V + S 𝜕𝑉 𝜕𝑆 => rP𝑑t=𝑑P Which leads to the PDE 𝜕𝑉 𝜕𝑡 + 1 2 σ2 S2 𝑑2 𝑣 𝑑2 𝑆 +rS 𝜕𝑉 𝜕𝑆 -rV=0
  • 9. ODE –Basic Terminology 𝑥 =f(x) or 𝑥 =f(x,t) Initial condition Let the eq. 𝑥 =f(x) we add the initial condition x[0] =c Example: 𝑥=x by integrating both sides we get x[t] =𝑒 𝑡 a . We need the i.c. to determine a
  • 10. ODE –Basic Terminology • ODE solutions never intersect • For most cases we cannot solve the equation analytically We aim to study flow patterns in the state space Ω Limit –the set of points in which flows may converge as time goes to infinity α Limit –the set of points in which flows may converge as time goes to minus infinity • Elements that we may find :fixed points, closed curves strange attractors
  • 11. ODE -Terminology Attractors A point or compact set in which attracts every i.c. Fixed Point F(x)=0 Namely the point that the flow “rests” Stability F.p. is stable if the flow does not leave a ε-neighborhood. (homoclinic)
  • 12.
  • 13. Determine stability Autonomous system If the Jacobian has non -zero real part eigen values • Lyapunov function • Dulac Theorem Non-Autonomous system Lyapunov exponents Bifurcations
  • 14. Further Reading • Non Autonomous DS, Kloeden & Rasmussen • ODE - Jack Hale • Navier Stokes –several books, papers of Edriss Titti • Theory & applications of SDE –Zeev Schuss • Books on Heat equation
  • 15. DE & DL • Consider Resnet Every layer t satisfies : ℎ 𝑡+1 =δt f(ℎ 𝑡 θ) + ℎ 𝑡 Haber & Ruthotto (2017) ,Yiping Lu ,Zhong For infinitesimal time step (nearly continuity) We obtain: ℎ = f(h, θ)
  • 16. What does it mean?
  • 17.
  • 18. Neural ODE –Chen Rubanova et al One of the best research papers in NeurIPS 2018 What does it contain? • Description of solving neural with ODE solver • A backpropagation algorithm for ODE solver • Comparison of this method for supervised learning • Generative process • Continuous normalized flow
  • 19. A backpropagation algorithm for ODE solver • There are several methods to solve ODEs such as Euler and Runge-Kutta , their main difficulties is the amount of gradients needed Adjoint Method min θ 𝐹 F(z,θ) = 0 𝑇 𝑓 𝑧, 𝑡, θ 𝑑𝑡 g(x(0), θ) = 0 h(x, 𝑥, 𝑡, θ) =0 Note : g,h define together an initial condition problem
  • 20. Adjoint Method (cont.) So what do they do in the paper? 𝑧 =f(z,t,θ) We assume a loss L s.t. L(z(T) =L[z (0) + 0 𝑇 𝑓 𝑧, 𝑡, θ 𝑑𝑡] -ODE solver friendly  We define a(T) = 𝜕𝐿 𝜕𝑧(𝑇) What is actually z(T)?
  • 21.
  • 22. Adjoint Method (cont.) We simply solve the three equations: 𝑎 = a(T) 𝑓𝑍 𝑧, 𝑡, θ 𝜕𝐿 𝜕θ = - 𝑡 0 𝑎(𝑡)𝑓θ 𝑧, 𝑡, θ 𝑑𝑡 𝑧 =f(z,t,θ) With the i.c. a(T), z(T) , θ0 Torch version github.com/rtqichen/torchdiffeq.
  • 23. Comparison of this method for supervised learning They compared on MNIST: 1. Resnet 2. ODE 3. Runge-Kutta The error is nearly similar where ResNet uses more params. (ODE –net has about the same as a single layer with 300 units of Resnet)
  • 24. Continuous Normalization Flow- CNF • A method that maps a generic distribution (Gaussianexponents) Into a more complicate distributions through a sequence of maps 𝑓1 , 𝑓2 , 𝑓3 .…. 𝑓𝑘 The main difficulties here are: 𝑧1= 𝑓(𝑧0 ) => log 𝑝(𝑧1)=log 𝑝(𝑧0) -log det(𝑓𝑍[𝑧0]) Calculating determinants is “costly”.
  • 25. CNF ODE –solution: We assume a continuous sequence of maps: 𝜕 log 𝑝( 𝑧 𝑡) 𝜕𝑡 = -tr(𝑓𝑍(t) ) Traces are easier to calculate and linear which allow us to measure summation of fumctions as well
  • 26. CNF
  • 27. Generative Tools • The main motivation: data that is irregularly sampled: traffic, medical records . Data that is discretized although we expect a continuous distribution to govern it. • The ODE solution uses VAE to generate data . For observations 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 and latent 𝑧1 , 𝑧2 , 𝑧3 … z 𝑚 𝑧0 ~ P(z) 𝑧1 , 𝑧2 , 𝑧3.. = ODEsolver(0,f, θ, 𝑡1 , 𝑡2 , 𝑡3 … t 𝑚) 𝑥𝑡 ~ P(x| 𝑧𝑡 , θ 𝑥 )
  • 28. Generative ( cont) In more details: 1. Put 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 to RNN 2. Calculate dist params 𝝀 from its hidden states (e.g. mean & std) 3. Sample 𝑧0 from q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3) 4. Run ODE solver with 𝑧0 and construct trajectory until 𝑡 𝑘 5. Decode 𝑥′ P(𝑥′ |𝑧𝑡 𝑘 , θ 𝑥) 6. Calculate KL divergence Log(P(𝑥′ |𝑧𝑡 𝑘 , θ 𝑥)) +log(p(𝒛 𝟎)) –log(q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3)) p(𝒛 𝟎) ~N(0,1)
  • 29.