SlideShare a Scribd company logo
1 of 164
Download to read offline
TA
Control as Inference
5
Control as Inference
(POMDP)
…
…
…
???
???
???
etc.
…
‣
‣ MDP (POMDP)
Control as Inference
(POMDP)
x1, …, xN ∼ p (X)
p (X) θ p (X ∣ θ)
p (X = k ∣ θ) = μk
θ(1 − μθ)1−k
μθ 1 − μθ
μθ
1.
e.g.,
2.
e.g., 0.5
➡ …
p (X ∣ θ)
μθ
1.
e.g.,
2.
e.g., 0.5
➡ …
p (X ∣ θ)
μθ
θ
N
x
N
x
y θ
/
N
z
x
θ
N
x
y θ
/
N
z
x
θ
DNN
Y
p (Y ∣ X, θ) = Normal (fθ (X), Σ)
fθ
N
x
y θ
DNN
Y
p (Y = k ∣ X, θ) =
exp (fθ (X)[k])
∑
K
k′=1
exp (fθ (X)[k′])
fθ
N
x
y θ
VAE ( )
Z
p (X, Z ∣ θ) = p (Z ∣ θ) p (X ∣ Z, θ)
θ Z
N
z
x
θ
Maximum Likelihood Estimation (MLE)
̂θ = argmax
θ
N
∏
i=1
p (X = xi ∣ θ)
Maximum a Posteriori Estimation (MAP)
MLE
p (θ)
̂θ = argmax
θ
p (θ ∣ X = x1, …, xN)
= argmax
θ
p (θ)
N
∏
i=1
p (X = xi ∣ θ)
p (θ) = const .
Bayesian Inference
1
p (X ∣ x1, …, xN) = 𝔼p(θ ∣ X = x1, …, xN) [p (X ∣ θ)]
MLE/MAP
exp
−log p (x, θ)
θ
p (θ ∣ x)
θ
1 MLE/MAP
p (θ ∣ X = x1, …, xN)
x1, …, xN x
p (X, θ) = p (θ) p (X ∣ θ)
p (θ ∣ X = x) p (θ ∣ x)
(MCMC)
qϕ (θ)
p (θ ∣ x)
p (θ ∣ x)
Variational Inference
Kullback–Leibler divergenceqϕ (θ) p (θ ∣ x)
p (θ ∣ x) ≈ ̂qϕ (θ) = argmin
qϕ
KL (qϕ (θ) ∥ p (θ ∣ x))
qϕ (θ) Normal
(
μϕ, diag (σ2
ϕ))
ϕ = {μϕ, σ2
ϕ}
Variational Inference
( )
KL (qϕ (θ) ∥ p (θ ∣ x)) =
∫
qϕ (θ) log
qϕ (θ)
p (θ ∣ x)
dΘ
= 𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
+ log p (x)
log p (x) qϕ ℒϕ (x)
ℒϕ (x) ℒϕ (x) ≤ log p (x)
−ℒϕ (x)
Reparameterization Gradient
ℒϕ (x) ϕ
qϕ
∇ϕℒϕ (x) = − ∇ϕ 𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
Reparameterization Gradient
qϕ (θ) = Normal
(
μϕ, diag (σ2
ϕ))
𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
= 𝔼p(ϵ) log
qϕ (θ)
p (x, θ)
θ=f(ϵ, ϕ)
p (ϵ) = Normal (0, I), f (ϵ, ϕ) = μϕ + σϕ ⊙ ϵ
Reparameterization Gradient
∇ϕ 𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
= 𝔼p(ϵ) ∇ϕlog
qϕ (θ)
p (x, θ)
Θ=f(ϵ, ϕ)
≈
1
L
L
∑
l=1
∇ϕlog
qϕ (θ)
p (x, θ)
θ=f(ϵ(l), ϕ)
ϵ(1)
, ⋯, ϵ(L)
∼ p (ϵ)
Reparameterization Gradient
1.
‣
‣ ( ) http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-
reparameterisation-tricks/
2.
qϕ f
ϕ
MLE/MAP
MAP
0
MLE
δ (θ − μϕ)
δ (θ − μϕ) = lim
σ2
→0
Normal (μϕ, diag (σ2
))
p (θ) = const .
δ (x)
https://commons.wikimedia.org/wiki/File:Dirac_distribution_PDF.png
Amortized Variational Inference
z1:N
qϕ (Z1:N) =
N
∏
i=1
qϕi (Zi)
N
zi
ϕi
N
z
x
θ
Amortized Variational Inference
qϕ (Z1:N) =
N
∏
i=1
qϕ (Zi ∣ fϕ (xi))
xi fϕ
ϕ
z
N
z
x
θ
Amortized Variational Inference
DNN
qϕ (Z) =
N
∏
i=1
Normal
(
μϕ (xi), diag (σ2
ϕ (xi)))
μϕ, σ2
ϕ
N
z
x
θ
Variational Autoencoder (VAE)
DNN
Autoencoder
p (X ∣ z, θ) =
N
∏
i=1
Normal
(
μθ (zi), diag (σ2
θ (zi)))
qϕ (Z) =
N
∏
i=1
Normal
(
μϕ (xi), diag (σ2
ϕ (xi)))
μθ, σ2
θ μϕ, σ2
ϕ
qϕ
N
z
x
θ
e.g.,
Markov Chain Monte Carlo (MCMC)
p (θ ∣ x)
p (θ ∣ x) ≈
1
T
T
∑
T=1
δ (θ − θ(t)
)
θ(1)
, …, θ(T)
∼ p (θ ∣ x)
Markov Chain Monte Carlo (MCMC)
1.
2.
3. 2
θ(0)
θ(t+1)
∼ p (θ′ ∣ θ = θ(t)
)
T {θ(1)
, …, θ(T)
}
Langevin Dynamics
MCMC
pβ
(θ′ ∣ θ) = Normal
(
θ + η
∂
∂θ
log p (x, θ), 2ηβ−1
I
)
η → 0 pβ
(θ ∣ x) = (p (θ ∣ x))
β
β = 1 p (θ ∣ x)
Langevin Dynamics
https://upload.wikimedia.org/wikipedia/commons/0/0d/First_passage_time_in_double_well_potential_under_langevin_dynamics.gif
−log p (x, θ)
MLE/MAP
MAP
MLE
β → ∞
lim
β→∞
pβ
(θ′ ∣ θ) = δ
(
θ′−
(
θ + η
∂
∂θ
log p (x, θ)
))
p (θ) = const .
MCMC
•
• MCMC
Control as Inference
(POMDP)
st π at
st+1 r (st, at)
∞
∑
t=1
r (st, at) π
※
Action-Value Function (Q-function)
st at π
Qπ
(st, at) = r (st, at) + 𝔼π
[
∞
∑
k=1
r (st+k, at+k)
]
※
Optimal Action-Value Function (Optimal Q-function)
st at
Q* (st, at) = r (st, at) + max
a
∞
∑
k=1
r (st+k, at+k)
= max
π
Qπ
(st, at)
※
(State) Value Function
st π
Vπ
(st) = 𝔼π
[
∞
∑
k=0
r (st+k, at+k)
]
= 𝔼π [Qπ
(st, at)]
※
Optimal (State) Value Function
st
V* (st) = max
a
∞
∑
k=0
r (st+k, at+k)
= max
π
Vπ
(st)
= max
a
Q* (st, at)
※
Bellman Equation
Qπ
(st, at) = r (st, at) + Vπ
(st+1)
Vπ
(st) = 𝔼π [r (st, at)] + Vπ
(st+1)
※
Bellman Optimality Equation
Q* (st, at) = r (st, at) + V* (st+1)
V* (st) = max
a
r (st, a) + V* (st+1)
※
Q
Q-learning
(greedy )
Q (st, at) ← Q (st, at) + η
[
r (st, at) + max
a
Q (st+1, a) − Q (st, at)]
π (s) = argmax
a
Q (s, a)
※
Q +
Q-learning + Function Approximation
(e.g., )
DNN (e.g., DQN)
Qθ
θ ← θ − η∇θ 𝔼
[
r (st, at) + max
a
Qθ (st+1, a) − Qθ (st, at)
2
]
Qθ
※
Policy Gradient (REINFORCE)
DNN
πϕ (a ∣ s)
πϕ (a ∣ s) = Normal
(
μϕ (s), diag (σ2
ϕ (s)))
μϕ, σ2
ϕ
※
Policy Gradient (REINFORCE)
θ
ϕ ← ϕ + η∇ϕ 𝔼πϕ
[
T
∑
t=1
r (st, at)
]
∇ϕ 𝔼πϕ
[
T
∑
t=1
r (st, at)
]
= 𝔼πϕ
[
T
∑
t=1
r (st, at)
T
∑
t=1
∇ϕlog πϕ (at ∣ st)
]
※
Actor-Critic
Q
πϕ
θ
πϕ
ϕ ← ϕ + ηϕ ∇ϕ 𝔼πϕ [Q
πϕ
θ
(s, a)]
θ ← θ − ηθ ∇θ 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼πϕ [Q
πϕ
θ
(s, a)]
※
Q Q
or Actor-Critic
※
(e.g., Qt-opt Q )
vs
On-policy vs Off-policy
(on-policy)
(e.g. , )
(off-policy)
(e.g., Q )
Maximum Entropy Reinforcement Learning (MERL)
∞
∑
t=1
r (st, at) + ℋ (π (at ∣ st))
※
Soft Actor-Critic
Actor-Critic
ϕ ← ϕ + ηϕϕ
∇ϕ 𝔼πϕ [Q
πϕ
θ
(s, a)−log πϕ (a ∣ s)]
θ ← θ − ηθ ∇θ 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼π [Q
πϕ
θ
(s, a)−log πϕ (at ∣ st)]
※
https://arxiv.org/abs/1801.01290
Soft Actor-Critic
Actor-Critic
➡ Actor-Critic on-policy
πϕ 𝔼πϕ [Q
πϕ
θ (st, at)]
πϕ
Soft Actor-Critic
SAC
KL divergence
➡ SAC off-policy
𝔼πϕ [Q
πϕ
θ
(s, a)−log πϕ (a ∣ s)]
πϕ ̂π (a ∣ s) ∝ exp (Qπ
ϕ (s, a))
KL (πϕ ∥ ̂π) = − 𝔼πϕ [Q
πϕ
θ
(s, a)−log πϕ (a ∣ s)] + log
∫
exp (Q
πϕ
θ
(s, a)) da
• Q
•
• Actor-Critic
Control as Inference
(POMDP)
Control as Inference
Markov Decision Process (MDP)
N
st st+1
at
rt
at+1
rt+1
••••••
Markov Decision Process (MDP) + Optimality Variables
N
st st+1
ot
at
rt
ot+1
at+1
rt+1
••••••
Optimality Variable
‣
s a
O = 1 O = 0
r O
p (O = 1 ∣ r) ∝ exp (r (s, a))
2
1.
2.
p (s1:T, a1:T ∣ O1:T = 1)
s1:T, a1:T
p (at ∣ st, O≥t = 1)
➡
➡ p (s1:T, a1:T ∣ O1:T = 1) p (at ∣ st, O≥t = 1)
Ot = 1 ot
p (at ∣ st, o≥t) ∝ p (at ∣ st) p (o≥t ∣ st, at)
p (at ∣ st)
p (at ∣ st, o≥t) ∝ p (o≥t ∣ st, at)
Q* (st, at) = log p (o≥t ∣ st, at), V* (st) = log p (o≥t ∣ st)
Q* (st, at) = log p (ot ∣ st, at) + log p (o≥t+1 ∣ st, at)
= r (st, at) + log
∫
p (st+1 ∣ st, at) p (o≥t+1 ∣ st+1) dst+1
= r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
※
i.e.,
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
p (st+1 ∣ st, at) = δ (st+1 − f (st, at))
Q* (st, at) = r (st, at) + V* (st+1)
V* (s) = log
∫
exp (Q* (s, a)) da ≠ max Q* (s, a)
※
p (s1:T, a1:T ∣ o1:T) p (at ∣ st, o≥t)
Q* (st, at) = log p (o≥t ∣ st, at),
V* (st) = log p (o≥t ∣ st)
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
※
2
1.
2.
p (s1:T, a1:T ∣ o1:T)
s1:T, a1:T
p (at ∣ st, o≥t)
➡
p (s1:T, a1:T ∣ o1:T) ∝ p (s1)
T
∏
t=1
p (st+1 ∣ st, at) exp (r (st, at))
qϕ (s1:T, a1:T) = p (s1)
T
∏
t=1
p (st+1 ∣ st, at) πϕ (at ∣ st)
πϕ (a ∣ s) = Normal
(
μϕ (s), diag (σ2
ϕ (s)))
μϕ, σ2
ϕ ϕ s
KL divergenceqϕ (s1:T, a1:T) p (s1:T, a1:T ∣ o1:T)
KL (qϕ (s1:T, a1:T) ∥ p (s1:T, a1:T ∣ o1:T))
= 𝔼qϕ
[
log
qϕ (s1:T, a1:T)
p (s1:T, a1:T ∣ o1:T) ]
= 𝔼qϕ
[
T
∑
t=1
log πϕ (at ∣ st) − r (st, at)
]
+ log p (o1:T)
∇ϕ 𝔼qϕ
[
T
∑
t=1
r (st, at)
]
= 𝔼qϕ
[
T
∑
t=1
r (st, at)∇ϕlog qϕ (s1:T, a1:T)
]
= 𝔼qϕ
[
T
∑
t=1
r (st, at)
T
∑
t=1
∇ϕlog πϕ (at ∣ st)
]
➡
𝔼qϕ
[
T
∑
t=1
log πϕ (at ∣ st) − r (st, at)
]
2
1.
2.
p (s1:T, a1:T ∣ o1:T)
s1:T, a1:T
p (at ∣ st, o≥t)
p (at ∣ st, o≥t) ∝ exp (Q* (st, at)) Q*
p (at ∣ st, o≥t) =
exp (Q* (st, at))
∑a∈A
exp (Q* (st, a))
➡
p (at ∣ st, o≥t) =
exp (Q* (st, at))
∫ exp (Q* (st, a)) da
p (at ∣ st, o≥t) πϕ (at ∣ st)
πϕ (a ∣ s) = Normal
(
μϕ (s), diag (σ2
ϕ (s)))
μϕ, σ2
ϕ ϕ s
KL divergenceπϕ (at ∣ st) p (at ∣ st, o≥t)
KL (πϕ (at ∣ st) ∥ p (at ∣ st, o≥t))
= 𝔼πϕ
[
log
πϕ (at ∣ st)
p (at ∣ st, o≥t)]
= 𝔼πϕ [log πϕ (at ∣ st) − Q* (st, at)] + V* (st)
KL (πϕ (at ∣ st) ∥ p (at ∣ st, o≥t))
= 𝔼πϕ [log πϕ (at ∣ st)−Q* (st, at)] + V* (st)
Q* (st, at)
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
V* (s) = log
∫
exp (Q* (s, a)) da
V*
※
1.
Soft Q-learning
V* (s) = log 𝔼πϕ
[
exp (Q* (s, a))
πϕ (a ∣ s) ]
≈ log
1
L
L
∑
l=1
exp (Q* (s, a(l)
))
πϕ (a(l) ∣ s)
a1, …aL ∼ πϕ (a ∣ s)
L → ∞ V* (s)
※
2.
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
≥ r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (Vπϕ
(st+1))]
= Qπϕ
(st, at)
※
2.
V*(s) = log 𝔼πϕ
[
exp (Q*(s, a))
πϕ(a ∣ s) ]
≥ 𝔼πϕ [Q*(s, a) − log πϕ(a ∣ s)]
≥ 𝔼πϕ [Qπϕ(s, a) − log πϕ(a ∣ s)]
= Vπϕ(s)
※
2.
➡
Soft Actor-Critic
Qπϕ, Vπϕ Q*, V* πϕ (at ∣ st) = p (at ∣ st, o≥t)
Qπϕ, Vπϕ Q*, V*
※
Qπϕ Q
πϕ
θ
θ ← θ − ηθ ∇θ 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼πϕ [Q
πϕ
θ
(s, a) − log πϕ (a ∣ s)]
V
πϕ
θ
πϕ
※
Soft Actor-Critic
πϕ (at, st) ̂π (a ∣ s) ∝ exp (Q
πϕ
θ
(s, a))
KL (πϕ (at ∣ st) ∥ ̂π (at ∣ st))
= 𝔼πϕ [log πϕ (at ∣ st) − Q
πϕ
θ (st, at)] + log
∫
exp (Q
πϕ
θ
(s, a)) da
Soft Actor-Critic (SAC)
SAC off-policy
1
On-policy Off-policy
➡ On-policy
➡ Off-policy
(st, at, rt, st+1)
Control as Inference
(POMDP)
MDP
MDP
MDP
s
?
?
CartPole
MDP
DQN MDP
‣ 4
➡
Partially Observable Markov Decision Process (POMDP)
Partially Observable Markov Decision Process (POMDP)
N
xt
at
rt
••••••
st
xt+1
at+1
rt+1
st+1
POMDP + Optimality Variables
N
xt
ot
at
rt
••••••
st
xt+1
ot+1
at+1
rt+1
st+1
POMDP
POMDP p (at ∣ st, o≥t)
x s p (st ∣ xt, st−1, at−1)
p (s≤t, at ∣ x≤t, a<t, o≥t)
= p (at ∣ st, o≥t) p (s1 ∣ x1)
t
∏
τ=1
p (sτ+1 ∣ xτ+1, sτ, aτ)
p (s≤t, at ∣ x≤t, a<t, o≥t)
qϕ (s≤t, at ∣ x≤t, a<t)
= πϕ (at ∣ st) qϕ (s1 ∣ x1)
t
∏
τ=1
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
KL divergence
KL (qϕ (s≤t, at ∣ x≤t, a<t) ∥ p (s≤t, at ∣ x≤t, a<t, o≥t))
= 𝔼qϕ
[
log
qϕ (s≤t, at ∣ x≤t, a<t)
p (s≤t, at ∣ x≤t, a<t, o≥t)]
= 𝔼qϕ
[
log πϕ (at ∣ st) + log
qϕ (s1 ∣ x1)
p (x1, s1)
+
t
∑
τ=1
log
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
p (xτ+1, sτ+1 ∣ sτ, aτ)
− Q* (st, at)
]
+log p (x≤t ∣ a<t) + V* (st)
−ℒϕ (x≤t, a<t, o≥t)
KL divergence
KL (qϕ (s≤t, at ∣ x≤t, a<t) ∥ p (s≤t, at ∣ x≤t, a<t, o≥t))
= 𝔼qϕ
[
log
qϕ (s≤t, at ∣ x≤t, a<t)
p (s≤t, at ∣ x≤t, a<t, o≥t)]
= 𝔼qϕ
[
log πϕ (at ∣ st) + log
qϕ (s1 ∣ x1)
p (x1, s1)
+
t
∑
τ=1
log
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
p (xτ+1, sτ+1 ∣ sτ, aτ)
− Q* (st, at)
]
+log p (x≤t ∣ a<t) + V* (st)
−ℒϕ (x≤t, a<t, o≥t)
KL divergence
KL (qϕ (s≤t, at ∣ x≤t, a<t) ∥ pψ (s≤t, at ∣ x≤t, a<t, o≥t))
= 𝔼qϕ
[
log
qϕ (s≤t, at ∣ x≤t, a<t)
pψ (s≤t, at ∣ x≤t, a<t, o≥t) ]
= 𝔼qϕ
[
log πϕ (at ∣ st) + log
qϕ (s1 ∣ x1)
pψ (x1, s1)
+
t
∑
τ=1
log
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
pψ (xτ+1, sτ+1 ∣ sτ, aτ)
− Q* (st, at)
]
+log pψ (x≤t ∣ a<t) + V* (st)
➡
−ℒϕ,ψ (x≤t, a<t, o≥t)
log pψ (x≤t ∣ a<t) + V* (st) ≥ ℒϕ,ψ (x≤t, a<t, o≥t)
qϕ (s≤t, at ∣ x≤t, a<t) = pψ (s≤t, at ∣ x≤t, a<t, o≥t)
qϕ (s≤t, at ∣ x≤t, a<t) = pψ (s≤t, at ∣ x≤t, a<t, o≥t)
argmax
ψ
ℒϕ,ψ (x≤t, a<t, o≥t) = argmax
ψ
pψ (x≤t ∣ a<t)
ψ ℒϕ,ψ (x≤t, a<t, o≥t)
SAC
Q* (st, at) ≥ r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (Vπϕ
(st+1))]
= Qπϕ
(st, at) ≈ Q
πϕ
θ (st, at)
V*(s) ≥ 𝔼πϕ [Qπϕ(s, a) − log πϕ(a ∣ s)]
= Vπϕ(s) ≈ V
πϕ
θ
(s)
Stochastic Latent Actor-Critic (SLAC)
̂θ = argmin 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
̂ϕ, ̂ψ = argmax
ϕ,ψ
ℒϕ,ψ (x≤t, a<t, o≥t)
POMDP
POMDP
Stochastic Latent Actor-Critic (SLAC) SAC POMDP
p (at ∣ st, o≥t)
p (st ∣ xt, st−1, at−1)
pψ (xt+1, st+1 ∣ st, at)
➡ Control as Inference (or )
(Bayesian RL)
POMDP
➡
Control as Inference
(POMDP)
POMDP
➡
≒ POMDP (+ )
pψ (xt+1, st+1 ∣ st, at)
RL
1.
2.
3.
1 ~ 3
π D
D = {x1, a1, r1, …, xT, aT, rT}
D pψ
pψ (x1:T, r1:T ∣ a1:T)
π https://arxiv.org/abs/1903.00374
RL
1.
2.
➡
RL
1.
2.
RL
1.
2.
3.
1 ~ 3
π D
D = {x1, a1, r1, …, xT, aT, rT}
D pψ
pψ (x1:T, r1:T ∣ a1:T)
π https://arxiv.org/abs/1903.00374
Partially Observable Markov Decision Process
N
xt
at
rt
••••••
st
xt+1
at+1
rt+1
st+1
log pψ (x1:T, r1:T ∣ a1:T)
= log
∫
p (s1)
T
∏
t=1
pψ (st+1 ∣ st, at) pψ (rt ∣ st, at) pψ (xt ∣ st) ds1:T
= log 𝔼qϕ
[
pψ (s1)
qϕ (s1 ∣ x1)
T
∏
t=1
pψ (st+1 ∣ st, at) pψ (rt ∣ st, at) pψ (xt ∣ st)
qϕ (st+1 ∣ xt+1, rt, st, at) ]
≥ 𝔼qϕ
[
log
pψ (s1)
qϕ (s1 ∣ x1)
+
T
∑
t=1
log
pψ (st+1 ∣ st, at) pψ (rt ∣ st, at) pψ (xt ∣ st)
qϕ (st+1 ∣ xt+1, rt, st, at) ]
= ℒϕ,ψ (x1:T, r1:T, a1:T)
log pψ (x1:T, r1:T ∣ a1:T) ≥ ℒϕ,ψ (x1:T, r1:T, a1:T)
qϕ (s1:T ∣ x1:T, r1:T, a1:T) = pψ (s1:T ∣ x1:T, r1:T, a1:T)
qϕ (s1:T ∣ x1:T, r1:T, a1:T) = pψ (s1:T ∣ x1:T, r1:T, a1:T)
argmax
ψ
ℒϕ,ψ (x1:T, r1:T, a1:T) = argmax
ψ
pψ (x1:T, r1:T ∣ a1:T)
ψ ℒϕ,ψ (x1:T, r1:T, a1:T)
RL
1.
2.
3.
1 ~ 3
π D
D = {x1, a1, r1, …, xT, aT, rT}
D pψ
pψ (x1:T, r1:T ∣ a1:T)
π https://arxiv.org/abs/1903.00374
1. (Model Predictive Control,MPC)
1.
2.
3.
a(1)
t:T
, a(2)
t:T
, ⋯, a(K)
t:T
R (a(k)
t:T ) = 𝔼pψ
[
T
∑
τ=t
rψ (sτ, a(k)
τ )]
at = a
̂k
t
(
̂k = argmax
k
R (a(k)
t:T ))
1. (Model Predictive Control,MPC)
MPC 3
• Random-sample Shooting (RS)
MPC
• Cross Entropy Method (CEM)
2.
ϕ ← ϕ + η∇ϕ 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
]
2.
rψ
∇ϕ 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
]
= 𝔼p(ϵ)
[
T
∑
t=1
∇ϕrψ (st = fψ (st−1, at−1, ϵ), at = fϕ (st, ϵ))]
2.
∇ϕ 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
]
= 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
T
∑
t=1
∇ϕlog πϕ (at ∣ st)
]
3.Actor-Critic
ϕ ← ϕ + ηϕ ∇ϕ 𝔼pψ,πϕ [V
πϕ
θ
(s)]
θ ← θ − ηθ ∇θ 𝔼pψ,πϕ [
rψ (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼πϕ [Q
πϕ
θ
(s, a)]
World Models
[Ha and Schmidhuber,2018]
VAE + MDN-RNN
CMA-ES
https://www.slideshare.net/masa_s/ss-97848402
https://arxiv.org/abs/1803.10122
https://worldmodels.github.io/
[Hafner,et al.,2019]
Recurrent State Space Model ( )
CEM
PlaNet
DM Control Suite
https://arxiv.org/abs/1811.04551
https://planetrl.github.io/
Gaussian State Space Model
DNN
pψ (st+1 ∣ st, at)
= Normal
(
μψ (st, at), diag (σ2
ψ (st, at)))
μψ, σ2
ψ
xt
at
rt
st
xt+1
at+1
rt+1
st+1
Recurrent State Space Model (RSSM)
LSTM RNN
s h
z
ht+1 = fψ (ht, zt, at)
pψ (zt ∣ ht) = Normal
(
μψ (ht), diag (σ2
ψ (ht)))
fψ
xt
at
rt
zt
xt+1
at+1
rt+1
zt+1
ht ht+1
RSSM
Recurrent State Space Model (RSSM)
[Hafner,et al.,2019]
PlaNet
Actor-Critic
( )
PlaNet
λ
Dreamer
https://arxiv.org/abs/1912.01603
https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html
Vπ
(st) = 𝔼π [r (st, at)] + Vπ
(st+1)
n
Vπ
n (st) = 𝔼π
[
n−1
∑
k=1
r (st+k, at+k)
]
+ Vπ
(st+n)
2
Vπ
n (st) = 𝔼π
[
n−1
∑
k=1
r (st+k, at+k)
]
+ Vπ
(st+n)
n = 1,…, ∞
¯Vπ
(st, λ) = (1 − λ)
∞
∑
n=1
λn−1
Vπ
n (st)
λ
Dreamer λ
θ ← θ − ηθ ∇θ 𝔼pψ,πϕ [
V
πϕ
θ (st) − ¯Vπ
(st, λ)
2]
H
¯Vπ
(st, λ) ≈ (1 − λ)
H−1
∑
n=1
λn−1
Vπ
n (st) + λH−1
Vπ
H (st)
λ
No value
λ H
≒ POMDP (+ )
RL
DNN
https://www.kspub.co.jp/book/detail/1538320.html
https://www.kspub.co.jp/book/detail/5168707.html
https://www.coronasha.co.jp/np/isbn/9784339024623/
Control as Inference
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
https://arxiv.org/abs/1805.00909
UC Berkeley Deep RL course ( 14 )
http://rail.eecs.berkeley.edu/deeprlcourse-fa19/
Control as Inference
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a
Stochastic Actor https://arxiv.org/abs/1801.01290
Reinforcement Learning with Deep Energy-Based Policies
https://arxiv.org/abs/1702.08165
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable
Model https://arxiv.org/abs/1907.00953
World Models
https://arxiv.org/abs/1803.10122
Learning Latent Dynamics for Planning from Pixels
https://arxiv.org/abs/1811.04551
Dream to Control: Learning Behaviors by Latent Imagination
https://arxiv.org/abs/1912.01603
Dreamer

More Related Content

What's hot

グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門Kawamoto_Kazuhiko
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANsDeep Learning JP
 
変分推論法(変分ベイズ法)(PRML第10章)
変分推論法(変分ベイズ法)(PRML第10章)変分推論法(変分ベイズ法)(PRML第10章)
変分推論法(変分ベイズ法)(PRML第10章)Takao Yamanaka
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方joisino
 
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系についてMaximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系についてYusuke Nakata
 
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)Yusuke Nakata
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...Deep Learning JP
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)Deep Learning JP
 
POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用Yasunori Ozaki
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習Deep Learning JP
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用Ryo Iwaki
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜Jun Okumura
 
Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)Yamato OKAMOTO
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門tmtm otm
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法Deep Learning JP
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep LearningSeiya Tokui
 
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)Shota Imai
 

What's hot (20)

ELBO型VAEのダメなところ
ELBO型VAEのダメなところELBO型VAEのダメなところ
ELBO型VAEのダメなところ
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
 
変分推論法(変分ベイズ法)(PRML第10章)
変分推論法(変分ベイズ法)(PRML第10章)変分推論法(変分ベイズ法)(PRML第10章)
変分推論法(変分ベイズ法)(PRML第10章)
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
 
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系についてMaximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
 
実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE
 
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
Generative Adversarial Imitation Learningの紹介(RLアーキテクチャ勉強会)
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜
 
Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
 

Similar to Control as Inference (強化学習とベイズ統計)

【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihoodDeep Learning JP
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDBenjamin Jaedon Choi
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ssusere0a682
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半Ken'ichi Matsui
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料Ken'ichi Matsui
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.pptFaizAbaas
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelsun peiyuan
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明ssusere0a682
 
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-ssusere0a682
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120RCCSRENKEI
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesVjekoslavKovac1
 
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-ssusere0a682
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...Deep Learning JP
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationMark Chang
 
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-ssusere0a682
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNTomonari Masada
 

Similar to Control as Inference (強化学習とベイズ統計) (20)

【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
 
Q prop
Q propQ prop
Q prop
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
 
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
 
20191026 bayes dl
20191026 bayes dl20191026 bayes dl
20191026 bayes dl
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILN
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Control as Inference (強化学習とベイズ統計)