SlideShare a Scribd company logo
1 of 43
Download to read offline
2020/11/02
1
!
! control as inference active inference
!
!
!
! Christopher L Buckley
!
!
!
2
! On the Relationship Between Active Inference and Control as Inference [Millidge+ 20] Control as inference active inference
! Active inference: demystified and compared [Sajid+ 20] Active inference
! Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review [Levine 18] Control as inference
! Reinforcement Learning as Iterative and Amortised Inference [Millidge+ 20] Control as Inference amortized
! What does the free energy principle tell us about the brain? [Gershman 19] Active inference
! Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning [Tang+ 20] Control as inference Variational RL
MDP
! MDP
! state action
state transition probability
! MDP
t st ∈ 𝒮 at ∈ 𝒜 t + 1
st+1 p (st+1 |st, at)
3
st−1 st st+1
at−1 at at+1
POMDP
! MDP observation
!
! POMDP
s o
o s p(o|s)
4
st−1 st st+1
at−1 at at+1
ot−1 ot ot+1
! MDP policy
! trajectory
!
! reward
!
p (a|s)
T τ = (s1, a1, . . . , sT, aT)
r (st, at)
𝔼p(τ)
[
T
∑
t=1
r (st, at)
]
popt (a|s)
5
p(τ) = p(s1:T, a1:T) =
T
∏
t=1
p(at |st)p(st |st−1, at−1)
! plan
!
! Active inference
!
!
!
π = [a1, . . . , aT]
T τ = (s1:T, π)
π
6
p(τ) = p(π)p(s1:T |π) = p(π)
T
∏
t=1
p(st |st−1, π)
! preference
?
1.
! Control as inference RL as inference Planning as inference
! Variational RL
2.
!
! active inference
7
Control as Inference Variational RL
8
! optimality variable
!
!
=>
𝒪t ∈ {0,1}
t st at 𝒪t = 1 t
r
9
p(𝒪t = 1|st, at) := exp (r (st, at))
st
𝒪t
at
st+1
𝒪t+1
at+1
st−1
𝒪t−1
at−1
!
! optimal trajectory distribution
! p ( 𝒪1:t |τ)
10
p ( 𝒪1:T |τ) =
T
∏
t=1
p ( 𝒪t |st, at) =
T
∏
t=1
exp (r (st, at))
p (τ| 𝒪1:T) =
p ( 𝒪1:T |τ) p (τ)
p ( 𝒪1:T)
popt(τ) = p (τ| 𝒪1:T)
※ p ( 𝒪1:T = 1) = p ( 𝒪1:T)
!
!
!
!
p (τ| 𝒪1:T) ∝ p ( 𝒪1:T |τ) p (τ)
𝒪1:T
τ
q(τ)
q(τ)
11
̂q = arg min
q
DKL [q(τ)∥p (τ| 𝒪1:T)]
τ
𝒪1:t
p (τ| 𝒪1:T) ≈ q(τ)
p (τ)
p ( 𝒪1:T |τ)
ELBO
! ELBO
! ELBO
! ELBO
!
q(τ) p(τ)
12
log p ( 𝒪1:T) = log
∫
p ( 𝒪1:T, τ) dτ
= log 𝔼q(τ)
[
p ( 𝒪1:T, τ)
q (τ) ]
≥ 𝔼q(τ) [log p ( 𝒪1:T |τ) + log p (τ) − log q (τ)]
= 𝔼q(τ)
[
T
∑
t=1
r (st, at)
]
− DKL [q(τ)∥p(τ)] =: L(q)
τ
𝒪1:t
p (τ| 𝒪1:T) ≈ q(τ)
p (τ)
p ( 𝒪1:T |τ)
1.
!
!
!
!
!
control as inference; CAI
p (at ∣ st) =
1
| 𝒜|
qϕ (at ∣ st) ϕ
13
qϕ(τ) :=
T
∏
t=1
qϕ (at ∣ st) q (st ∣ st−1, at−1) =
T
∏
t=1
qϕ (at ∣ st) p (st ∣ st−1, at−1)
p(τ) :=
T
∏
t=1
p (at ∣ st) p (st ∣ st−1, at−1) =
1
| 𝒜|
T
∏
t=1
p (st ∣ st−1, at−1)
1.
! ELBO
!
!
14
L(ϕ) = 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at)
]
− DKL [qϕ(τ)∥p(τ)]
≥ 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at) − log qϕ(at |st)
]
= 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at) + ℋ (qϕ(at |st))]
J(ϕ) := 𝔼qϕ(τ)
[
T
∑
t=1
r (st, at) + ℋ (qϕ(at |st))]
Soft Actor-Critic
! Soft Actor-Critic SAC [Haarnoja+ 17, 18]
! ELBO off-policy .
! Q
! Q critic actor
!
! Control as Inference https://deeplearning.jp/reinforcement_cource-2020s/ 
! Control as Inference https://www.slideshare.net/DeepLearningJP2016/dlcontrol-as-inference-201266247
Qθ (st, at) = r(st, at) + 𝔼p(st+1|st,at) [V(st+1)]
Qθ (st, at) qϕ(at |st)
15
Jq
t (ϕ) = 𝔼qϕ(at|st)p(st) [
log (qϕ (at |st)) − Qθ (st, at)]
JQ
t (θ) = 𝔼qϕ(at|st)p(st)
[(
r (st, at) + 𝔼p(st+1|st,at) [V¯θ (st+1)] − Qθ (st, at))
2
]
Vθ(st+1) = 𝔼qϕ(at+1|st+1) [Qθ(st+1, at+1) − log qϕ(at+1 |st+1)]
Q
POMDP
! Control as inference POMDP
! VAE
16
! SLAC[Lee+ 19]
! RNN
!
! [Han+ 19]
! RNN VRNN[Chung+ 16]
! variational recurrent model VRMat
CAI
! CAI
! Mirror descent [Bubeck, 14]
=> Variational Inference Model Predictive Control VI-MPC [Okada+ 19]
!
π
𝒲(π) = 𝔼q(τ)[p(𝒪1:T |τ)]
p(𝒪1:T |τ) := f(r(τ))
17
q(i+1)
(π) ←
q(i)
(π) ⋅ 𝒲 (π) ⋅ q(i)
(π)
𝔼q(i)(π) [ 𝒲 (π) ⋅ q(i) (π)]
[Okada+ 19]
Control as inference
! CAI
! SAC VI-MPC
! amortized [Kingma+ 13]
! [Millidge+ 20]
! amortized
18
2.
! CAI
! ELBO
! ELBO
!
=> Variational RL
p (at ∣ st)
q θ
19
pθ(τ) :=
T
∏
t=1
pθ (at ∣ st) p (st ∣ st−1, at−1)
L(θ, q) = 𝔼q(τ)
[
T
∑
t=1
r (st, at)
]
− DKL [q(τ)∥pθ(τ)]
EM
! E
!
! M
! E ELBO
!
! MPO[Abdolmaleki+ 18] V-MPO[Song+ 19]
! M E
θ θ = θold
θ
θ
20
̂θ = max
θ
𝔼q(τ)[log pθ(τ)] = max
θ
𝔼q(τ)
[
T
∑
t=1
log pθ (at ∣ st)
]
q(τ) = pθold (τ| 𝒪1:T) =
p ( 𝒪1:T ∣ τ) pθold
(τ)
∑τ
p ( 𝒪1:T ∣ τ) pθold
(τ)
MPO E
! Maximum a posteriori Policy Optimization MPO [Abdolmaleki+ 18]
!
! E Q
! Q off-policy
! MPO DL
! https://www.slideshare.net/DeepLearningJP2016/dlhyper-parameter-agnostic-methods-in-reinforcement-learning
θold pθold
(at ∣ st) ̂Qθold
(st, at)
21
q(τ) =
T
∏
t=1
q (at ∣ st) p (st ∣ st−1, at−1)
q(at |st) ∝ pθold
(at ∣ st)exp
̂Qθold
(st, at)
η
η > 0
Control as inference Variational RL
! Control as inference
! Variational RL
!
22
τ
𝒪1:T
p (τ| 𝒪1:T) ≈ q(τ)
p (τ)
p ( 𝒪1:T |τ)
τ
𝒪1:T
pθ (τ| 𝒪1:T) ≈ q(τ)
pθ (τ)
p ( 𝒪1:T |τ)
θ
Control as inference Variational RL
active inference
23
!
! Friston
!
!
24
※ ver.3
https://www.slideshare.net/masatoshiyoshida/ss-238982118
!
!
!
!
! unconscious inference
!
!
!
!
25
?
要因結果
推論(知覚)
!
!
!
o s
o s
26
p(o, s) = p(o|s)p(s)
p(s|o) =
p(s)p(o|s)
∑s
p(s)p(o|s)
推論
状態
⽣成
観測
内部モデル
(世界モデル)環境
!
"
o s
!
!
! Bayesian surprise
! active learning
!
!
a o a
u(o) = DKL[p(s ∣ o, a)||p(s ∣ a)] I(a)
a
I(a) a s o
I(a)
27
I(a) :=
∑
o
p(o ∣ a)DKL[p(s ∣ o, a)||p(s ∣ a)] = 𝔼p(o∣a)[u(o)]
!
.
!
!
o1:T π = [a1, . . . , aT]
U(o1:T) =
T
∑
t=1
u (ot)
28
I(π) = 𝔼p(o1:T∣π) [U(o1:T)] =
∑
o1:T
p(o1:T ∣ π)U(o1:T)
!
! ELBO
! ELBO variational free energy
! free energy principle
!
!
q(s)
−log p(o)
29
log p(o) ≥ 𝔼q(s) [
log
p(o, s)
q(s) ]
F(o, q) := − 𝔼q(s) [
log
p(o, s)
q(s) ]
!
!
!
! 1
!
!
! 2
o
−log p(o)
q
q(s)
30
F(o, q) = − log p(o) + DKL[q(s)||p(s|o)]
! POMDP
!
!
!
!
π = [a1, . . . , aT]
31
p(o1:T, s1:T |π) =
T
∏
t=1
p(ot |st)p(st |st−1, π)
q(s1:T |π) =
T
∏
t=1
q(st |π)
F(o1:T, π) = − 𝔼q(s1:T|π)
[
log
p(o1:T, s1:T |π)
q(s1:T |π) ]
st−1 st st+1
at−1 at at+1
ot−1 ot ot+1
π
!
! expected free energy
32
G(π):= 𝔼p(o1:T ∣ s1:T, π) [F (o1:T, π)]
= − 𝔼p(o1:T ∣ s1:T, π)
𝔼q(s1:T |π)
[
log
p (o1:T, s1:T |π)
q (s1:T |π) ]
= − 𝔼q(o1:T, s1:T |π)
[
log
p (o1:T, s1:T |π)
q (s1:T |π) ]
Active inference
!
! active inference AIF
t Gt
q(st |ot, π) ≈ p(st |ot, π)
33
Gt(π) = − 𝔼q(ot, st ∣ π)
[
log
p (ot, st ∣ π)
q (st ∣ π) ]
≈ − 𝔼q(ot, st ∣ π)
[
log
p (ot |π) q (st ∣ ot, π)
q (st ∣ π) ]
= − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [
DKL [q (st ∣ ot, π)||q (st ∣ π)]]
Active inference
!
! 1
!
! active inference
!
! 1 0
q = p
34
Gt(π) = − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [
DKL [q (st ∣ ot, π)||q (st ∣ π)]]
= − 𝔼p(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼p(ot ∣ π) [
DKL [p (st ∣ ot, π)||p (st ∣ π)]]
= 𝔼p(st ∣ π) [
ℋ (p (ot ∣ π))]
− I(π) ※ p(st |st−1, π) p(st |π)
Active inference
!
! 1
!
! extrinsic value
! 2
! bayesian surprise
! intrinsic value
=>
35
−Gt(π) = 𝔼q(ot,st|π) [log p(ot |π)] + 𝔼q(ot|π) [DKL[q(st |ot, π)||q(st |π)]]
Active inference
!
!
!
!
!
[Gershman+ 19]
!
36
˜p(o1:T) = exp(r(o1:T))
※ ˜p
Control as inference active inference
37
active inference
! Active inference AIF [Millidge+ 20]
!
!
! t −Gt(ϕ)
38
˜p (st, ot, at) = p(st |ot, at)p(at |st)˜p(ot |at) ≈ q(st |ot, at)p(at |st)˜p(ot |at)
qϕ(st, at) = qϕ (at ∣ st) q(st)
−Gt(ϕ) = 𝔼qϕ(ot, st, at)
[
log
˜p (st, ot, at)
qϕ (st, at) ]
≈ 𝔼qϕ(ot, st, at) [log ˜p (ot |at) + log p (at |st) + log q(st |ot, at) − log qϕ (at |st) − log q(st)]
= 𝔼qϕ(ot, st, at) [log ˜p (ot |at)] − 𝔼qϕ(ot, st, at)
[log qϕ (at |st) − log p(at |st)] + 𝔼qϕ(ot, st, at) [log q(st |ot, at) − log q(st)]
≈ 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] − 𝔼q(st) [
DKL (qϕ (at ∣ st) ∥p (at ∣ st))]
+ 𝔼q(ot, at ∣ st) [
DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
= 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
+ 𝔼q(ot, at ∣ st) [
DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
p (at ∣ st) =
1
| 𝒜|
AIF CAI
! CAI
! AIF
! 1
! 2
! AIF
! AIF 3
! CAI AIF
!
39
𝔼q(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [
ℋ (qϕ(at |st))]
𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
+ 𝔼q(ot, at ∣ st) [
DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
Likelihood-AIF
! AIF CAI Likelihood-AIF
!
! CAI
˜p(ot) ˜p(ot |st)
−Gt
q(st) = p(st) p (at ∣ st) =
1
| 𝒜|
40
−Gt(ϕ) = 𝔼qϕ(ot, st, at)
[
log
˜p (st, ot, at)
qϕ (st, at) ]
= 𝔼qϕ(ot, st, at) [log ˜p (ot ∣ st) + log p (st) + log p (at ∣ st) − log qϕ (at ∣ st) − log q (st)]
= 𝔼qϕ(st, at) [log ˜p (ot ∣ st)] − DKL (q (st)||p (st)) − 𝔼q(st) [
DKL (qϕ (at ∣ st)||p (at ∣ st))]
−Gt(ϕ) = 𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
Likelihood-AIF CAI
! CAI
! Likelihood-AIF
! 2
! AIF POMDP MDP CAI 1
! CAI
! 2
log ˜p (ot ∣ st) = log p ( 𝒪t |st, at)
41
𝔼qϕ(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [
ℋ (qϕ(at |st))]
𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [
ℋ (qϕ (at ∣ st))]
CAI AIF
! CAI
!
!
!
!
! AIF
!
!
!
42
!
1.
! Control as inference
! Amortized
! Variational RL
2.
! active inference
!
!
!
43

More Related Content

What's hot

[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展Deep Learning JP
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説弘毅 露崎
 
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation LearningDeep Learning JP
 
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜SSII
 
深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)Masahiro Suzuki
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoderKazuki Nitta
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...Deep Learning JP
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデルMasahiro Suzuki
 
[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展Deep Learning JP
 
POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用Yasunori Ozaki
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoderSho Tatsuno
 
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)Shota Imai
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANsDeep Learning JP
 
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~SSII
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
 
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...Deep Learning JP
 
[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by FactorisingDeep Learning JP
 
金融時系列のための深層t過程回帰モデル
金融時系列のための深層t過程回帰モデル金融時系列のための深層t過程回帰モデル
金融時系列のための深層t過程回帰モデルKei Nakagawa
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...Deep Learning JP
 

What's hot (20)

[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展[DL輪読会]Control as Inferenceと発展
[DL輪読会]Control as Inferenceと発展
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
 
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
 
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
SSII2021 [TS2] 深層強化学習 〜 強化学習の基礎から応用まで 〜
 
深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoder
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデル
 
[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展
 
POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用POMDP下での強化学習の基礎と応用
POMDP下での強化学習の基礎と応用
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
 
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
 
ELBO型VAEのダメなところ
ELBO型VAEのダメなところELBO型VAEのダメなところ
ELBO型VAEのダメなところ
 
[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising
 
金融時系列のための深層t過程回帰モデル
金融時系列のための深層t過程回帰モデル金融時系列のための深層t過程回帰モデル
金融時系列のための深層t過程回帰モデル
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 

Similar to 確率的推論と行動選択

Hidden Markov Models common probability formulas
Hidden Markov Models common probability formulasHidden Markov Models common probability formulas
Hidden Markov Models common probability formulasNidhal Selmi
 
関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライドYuchi Matsuoka
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ssusere0a682
 
情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈Fukumu Tsutsumi
 
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihoodDeep Learning JP
 
Hermite integrators and Riordan arrays
Hermite integrators and Riordan arraysHermite integrators and Riordan arrays
Hermite integrators and Riordan arraysKeigo Nitadori
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNTomonari Masada
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料Ken'ichi Matsui
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelsun peiyuan
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2ybenjo
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半Ken'ichi Matsui
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...IJRTEMJOURNAL
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.pptFaizAbaas
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDBenjamin Jaedon Choi
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120RCCSRENKEI
 

Similar to 確率的推論と行動選択 (20)

Hidden Markov Models common probability formulas
Hidden Markov Models common probability formulasHidden Markov Models common probability formulas
Hidden Markov Models common probability formulas
 
関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
 
情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈情報幾何の基礎とEMアルゴリズムの解釈
情報幾何の基礎とEMアルゴリズムの解釈
 
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
 
Hermite integrators and Riordan arrays
Hermite integrators and Riordan arraysHermite integrators and Riordan arrays
Hermite integrators and Riordan arrays
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILN
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
 
ตรรกวิทยา
ตรรกวิทยาตรรกวิทยา
ตรรกวิทยา
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2Query Suggestion @ tokyotextmining#2
Query Suggestion @ tokyotextmining#2
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
 
Bayes2
Bayes2Bayes2
Bayes2
 
Radiation
RadiationRadiation
Radiation
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
 
HMM, MEMM, CRF メモ
HMM, MEMM, CRF メモHMM, MEMM, CRF メモ
HMM, MEMM, CRF メモ
 
Recent rl
Recent rlRecent rl
Recent rl
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 

More from Masahiro Suzuki

深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについてMasahiro Suzuki
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習Masahiro Suzuki
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural NetworkMasahiro Suzuki
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman FiltersMasahiro Suzuki
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel LearningMasahiro Suzuki
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...Masahiro Suzuki
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target PropagationMasahiro Suzuki
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization TrickMasahiro Suzuki
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?Masahiro Suzuki
 

More from Masahiro Suzuki (17)

深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
 
(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?(DL Hacks輪読) How transferable are features in deep neural networks?
(DL Hacks輪読) How transferable are features in deep neural networks?
 

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

確率的推論と行動選択

  • 2. ! ! control as inference active inference ! ! ! ! Christopher L Buckley ! ! ! 2 ! On the Relationship Between Active Inference and Control as Inference [Millidge+ 20] Control as inference active inference ! Active inference: demystified and compared [Sajid+ 20] Active inference ! Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review [Levine 18] Control as inference ! Reinforcement Learning as Iterative and Amortised Inference [Millidge+ 20] Control as Inference amortized ! What does the free energy principle tell us about the brain? [Gershman 19] Active inference ! Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning [Tang+ 20] Control as inference Variational RL
  • 3. MDP ! MDP ! state action state transition probability ! MDP t st ∈ 𝒮 at ∈ 𝒜 t + 1 st+1 p (st+1 |st, at) 3 st−1 st st+1 at−1 at at+1
  • 4. POMDP ! MDP observation ! ! POMDP s o o s p(o|s) 4 st−1 st st+1 at−1 at at+1 ot−1 ot ot+1
  • 5. ! MDP policy ! trajectory ! ! reward ! p (a|s) T τ = (s1, a1, . . . , sT, aT) r (st, at) 𝔼p(τ) [ T ∑ t=1 r (st, at) ] popt (a|s) 5 p(τ) = p(s1:T, a1:T) = T ∏ t=1 p(at |st)p(st |st−1, at−1)
  • 6. ! plan ! ! Active inference ! ! ! π = [a1, . . . , aT] T τ = (s1:T, π) π 6 p(τ) = p(π)p(s1:T |π) = p(π) T ∏ t=1 p(st |st−1, π)
  • 7. ! preference ? 1. ! Control as inference RL as inference Planning as inference ! Variational RL 2. ! ! active inference 7
  • 8. Control as Inference Variational RL 8
  • 9. ! optimality variable ! ! => 𝒪t ∈ {0,1} t st at 𝒪t = 1 t r 9 p(𝒪t = 1|st, at) := exp (r (st, at)) st 𝒪t at st+1 𝒪t+1 at+1 st−1 𝒪t−1 at−1
  • 10. ! ! optimal trajectory distribution ! p ( 𝒪1:t |τ) 10 p ( 𝒪1:T |τ) = T ∏ t=1 p ( 𝒪t |st, at) = T ∏ t=1 exp (r (st, at)) p (τ| 𝒪1:T) = p ( 𝒪1:T |τ) p (τ) p ( 𝒪1:T) popt(τ) = p (τ| 𝒪1:T) ※ p ( 𝒪1:T = 1) = p ( 𝒪1:T)
  • 11. ! ! ! ! p (τ| 𝒪1:T) ∝ p ( 𝒪1:T |τ) p (τ) 𝒪1:T τ q(τ) q(τ) 11 ̂q = arg min q DKL [q(τ)∥p (τ| 𝒪1:T)] τ 𝒪1:t p (τ| 𝒪1:T) ≈ q(τ) p (τ) p ( 𝒪1:T |τ)
  • 12. ELBO ! ELBO ! ELBO ! ELBO ! q(τ) p(τ) 12 log p ( 𝒪1:T) = log ∫ p ( 𝒪1:T, τ) dτ = log 𝔼q(τ) [ p ( 𝒪1:T, τ) q (τ) ] ≥ 𝔼q(τ) [log p ( 𝒪1:T |τ) + log p (τ) − log q (τ)] = 𝔼q(τ) [ T ∑ t=1 r (st, at) ] − DKL [q(τ)∥p(τ)] =: L(q) τ 𝒪1:t p (τ| 𝒪1:T) ≈ q(τ) p (τ) p ( 𝒪1:T |τ)
  • 13. 1. ! ! ! ! ! control as inference; CAI p (at ∣ st) = 1 | 𝒜| qϕ (at ∣ st) ϕ 13 qϕ(τ) := T ∏ t=1 qϕ (at ∣ st) q (st ∣ st−1, at−1) = T ∏ t=1 qϕ (at ∣ st) p (st ∣ st−1, at−1) p(τ) := T ∏ t=1 p (at ∣ st) p (st ∣ st−1, at−1) = 1 | 𝒜| T ∏ t=1 p (st ∣ st−1, at−1)
  • 14. 1. ! ELBO ! ! 14 L(ϕ) = 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) ] − DKL [qϕ(τ)∥p(τ)] ≥ 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) − log qϕ(at |st) ] = 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) + ℋ (qϕ(at |st))] J(ϕ) := 𝔼qϕ(τ) [ T ∑ t=1 r (st, at) + ℋ (qϕ(at |st))]
  • 15. Soft Actor-Critic ! Soft Actor-Critic SAC [Haarnoja+ 17, 18] ! ELBO off-policy . ! Q ! Q critic actor ! ! Control as Inference https://deeplearning.jp/reinforcement_cource-2020s/  ! Control as Inference https://www.slideshare.net/DeepLearningJP2016/dlcontrol-as-inference-201266247 Qθ (st, at) = r(st, at) + 𝔼p(st+1|st,at) [V(st+1)] Qθ (st, at) qϕ(at |st) 15 Jq t (ϕ) = 𝔼qϕ(at|st)p(st) [ log (qϕ (at |st)) − Qθ (st, at)] JQ t (θ) = 𝔼qϕ(at|st)p(st) [( r (st, at) + 𝔼p(st+1|st,at) [V¯θ (st+1)] − Qθ (st, at)) 2 ] Vθ(st+1) = 𝔼qϕ(at+1|st+1) [Qθ(st+1, at+1) − log qϕ(at+1 |st+1)] Q
  • 16. POMDP ! Control as inference POMDP ! VAE 16 ! SLAC[Lee+ 19] ! RNN ! ! [Han+ 19] ! RNN VRNN[Chung+ 16] ! variational recurrent model VRMat
  • 17. CAI ! CAI ! Mirror descent [Bubeck, 14] => Variational Inference Model Predictive Control VI-MPC [Okada+ 19] ! π 𝒲(π) = 𝔼q(τ)[p(𝒪1:T |τ)] p(𝒪1:T |τ) := f(r(τ)) 17 q(i+1) (π) ← q(i) (π) ⋅ 𝒲 (π) ⋅ q(i) (π) 𝔼q(i)(π) [ 𝒲 (π) ⋅ q(i) (π)] [Okada+ 19]
  • 18. Control as inference ! CAI ! SAC VI-MPC ! amortized [Kingma+ 13] ! [Millidge+ 20] ! amortized 18
  • 19. 2. ! CAI ! ELBO ! ELBO ! => Variational RL p (at ∣ st) q θ 19 pθ(τ) := T ∏ t=1 pθ (at ∣ st) p (st ∣ st−1, at−1) L(θ, q) = 𝔼q(τ) [ T ∑ t=1 r (st, at) ] − DKL [q(τ)∥pθ(τ)]
  • 20. EM ! E ! ! M ! E ELBO ! ! MPO[Abdolmaleki+ 18] V-MPO[Song+ 19] ! M E θ θ = θold θ θ 20 ̂θ = max θ 𝔼q(τ)[log pθ(τ)] = max θ 𝔼q(τ) [ T ∑ t=1 log pθ (at ∣ st) ] q(τ) = pθold (τ| 𝒪1:T) = p ( 𝒪1:T ∣ τ) pθold (τ) ∑τ p ( 𝒪1:T ∣ τ) pθold (τ)
  • 21. MPO E ! Maximum a posteriori Policy Optimization MPO [Abdolmaleki+ 18] ! ! E Q ! Q off-policy ! MPO DL ! https://www.slideshare.net/DeepLearningJP2016/dlhyper-parameter-agnostic-methods-in-reinforcement-learning θold pθold (at ∣ st) ̂Qθold (st, at) 21 q(τ) = T ∏ t=1 q (at ∣ st) p (st ∣ st−1, at−1) q(at |st) ∝ pθold (at ∣ st)exp ̂Qθold (st, at) η η > 0
  • 22. Control as inference Variational RL ! Control as inference ! Variational RL ! 22 τ 𝒪1:T p (τ| 𝒪1:T) ≈ q(τ) p (τ) p ( 𝒪1:T |τ) τ 𝒪1:T pθ (τ| 𝒪1:T) ≈ q(τ) pθ (τ) p ( 𝒪1:T |τ) θ Control as inference Variational RL
  • 26. ! ! ! o s o s 26 p(o, s) = p(o|s)p(s) p(s|o) = p(s)p(o|s) ∑s p(s)p(o|s) 推論 状態 ⽣成 観測 内部モデル (世界モデル)環境 ! " o s
  • 27. ! ! ! Bayesian surprise ! active learning ! ! a o a u(o) = DKL[p(s ∣ o, a)||p(s ∣ a)] I(a) a I(a) a s o I(a) 27 I(a) := ∑ o p(o ∣ a)DKL[p(s ∣ o, a)||p(s ∣ a)] = 𝔼p(o∣a)[u(o)]
  • 28. ! . ! ! o1:T π = [a1, . . . , aT] U(o1:T) = T ∑ t=1 u (ot) 28 I(π) = 𝔼p(o1:T∣π) [U(o1:T)] = ∑ o1:T p(o1:T ∣ π)U(o1:T)
  • 29. ! ! ELBO ! ELBO variational free energy ! free energy principle ! ! q(s) −log p(o) 29 log p(o) ≥ 𝔼q(s) [ log p(o, s) q(s) ] F(o, q) := − 𝔼q(s) [ log p(o, s) q(s) ]
  • 30. ! ! ! ! 1 ! ! ! 2 o −log p(o) q q(s) 30 F(o, q) = − log p(o) + DKL[q(s)||p(s|o)]
  • 31. ! POMDP ! ! ! ! π = [a1, . . . , aT] 31 p(o1:T, s1:T |π) = T ∏ t=1 p(ot |st)p(st |st−1, π) q(s1:T |π) = T ∏ t=1 q(st |π) F(o1:T, π) = − 𝔼q(s1:T|π) [ log p(o1:T, s1:T |π) q(s1:T |π) ] st−1 st st+1 at−1 at at+1 ot−1 ot ot+1 π
  • 32. ! ! expected free energy 32 G(π):= 𝔼p(o1:T ∣ s1:T, π) [F (o1:T, π)] = − 𝔼p(o1:T ∣ s1:T, π) 𝔼q(s1:T |π) [ log p (o1:T, s1:T |π) q (s1:T |π) ] = − 𝔼q(o1:T, s1:T |π) [ log p (o1:T, s1:T |π) q (s1:T |π) ]
  • 33. Active inference ! ! active inference AIF t Gt q(st |ot, π) ≈ p(st |ot, π) 33 Gt(π) = − 𝔼q(ot, st ∣ π) [ log p (ot, st ∣ π) q (st ∣ π) ] ≈ − 𝔼q(ot, st ∣ π) [ log p (ot |π) q (st ∣ ot, π) q (st ∣ π) ] = − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [ DKL [q (st ∣ ot, π)||q (st ∣ π)]]
  • 34. Active inference ! ! 1 ! ! active inference ! ! 1 0 q = p 34 Gt(π) = − 𝔼q(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼q(ot ∣ π) [ DKL [q (st ∣ ot, π)||q (st ∣ π)]] = − 𝔼p(ot, st ∣ π) [log p (ot ∣ π)] − 𝔼p(ot ∣ π) [ DKL [p (st ∣ ot, π)||p (st ∣ π)]] = 𝔼p(st ∣ π) [ ℋ (p (ot ∣ π))] − I(π) ※ p(st |st−1, π) p(st |π)
  • 35. Active inference ! ! 1 ! ! extrinsic value ! 2 ! bayesian surprise ! intrinsic value => 35 −Gt(π) = 𝔼q(ot,st|π) [log p(ot |π)] + 𝔼q(ot|π) [DKL[q(st |ot, π)||q(st |π)]]
  • 37. Control as inference active inference 37
  • 38. active inference ! Active inference AIF [Millidge+ 20] ! ! ! t −Gt(ϕ) 38 ˜p (st, ot, at) = p(st |ot, at)p(at |st)˜p(ot |at) ≈ q(st |ot, at)p(at |st)˜p(ot |at) qϕ(st, at) = qϕ (at ∣ st) q(st) −Gt(ϕ) = 𝔼qϕ(ot, st, at) [ log ˜p (st, ot, at) qϕ (st, at) ] ≈ 𝔼qϕ(ot, st, at) [log ˜p (ot |at) + log p (at |st) + log q(st |ot, at) − log qϕ (at |st) − log q(st)] = 𝔼qϕ(ot, st, at) [log ˜p (ot |at)] − 𝔼qϕ(ot, st, at) [log qϕ (at |st) − log p(at |st)] + 𝔼qϕ(ot, st, at) [log q(st |ot, at) − log q(st)] ≈ 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] − 𝔼q(st) [ DKL (qϕ (at ∣ st) ∥p (at ∣ st))] + 𝔼q(ot, at ∣ st) [ DKL (q (st ∣ ot, at) ∥q (st ∣ at))] = 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))] + 𝔼q(ot, at ∣ st) [ DKL (q (st ∣ ot, at) ∥q (st ∣ at))] p (at ∣ st) = 1 | 𝒜|
  • 39. AIF CAI ! CAI ! AIF ! 1 ! 2 ! AIF ! AIF 3 ! CAI AIF ! 39 𝔼q(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [ ℋ (qϕ(at |st))] 𝔼q(ot ∣ at) [log ˜p (ot ∣ at)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))] + 𝔼q(ot, at ∣ st) [ DKL (q (st ∣ ot, at) ∥q (st ∣ at))]
  • 40. Likelihood-AIF ! AIF CAI Likelihood-AIF ! ! CAI ˜p(ot) ˜p(ot |st) −Gt q(st) = p(st) p (at ∣ st) = 1 | 𝒜| 40 −Gt(ϕ) = 𝔼qϕ(ot, st, at) [ log ˜p (st, ot, at) qϕ (st, at) ] = 𝔼qϕ(ot, st, at) [log ˜p (ot ∣ st) + log p (st) + log p (at ∣ st) − log qϕ (at ∣ st) − log q (st)] = 𝔼qϕ(st, at) [log ˜p (ot ∣ st)] − DKL (q (st)||p (st)) − 𝔼q(st) [ DKL (qϕ (at ∣ st)||p (at ∣ st))] −Gt(ϕ) = 𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))]
  • 41. Likelihood-AIF CAI ! CAI ! Likelihood-AIF ! 2 ! AIF POMDP MDP CAI 1 ! CAI ! 2 log ˜p (ot ∣ st) = log p ( 𝒪t |st, at) 41 𝔼qϕ(st,at) [log p ( 𝒪t |st, at)] + 𝔼q(st) [ ℋ (qϕ(at |st))] 𝔼qϕ(st, at) [log ˜p (ot |st)] + 𝔼q(st) [ ℋ (qϕ (at ∣ st))]
  • 42. CAI AIF ! CAI ! ! ! ! ! AIF ! ! ! 42
  • 43. ! 1. ! Control as inference ! Amortized ! Variational RL 2. ! active inference ! ! ! 43