SlideShare a Scribd company logo
1 of 35
Download to read offline
St+1 ~ P( ′s | St ,At )
rt+1 = r(St ,At ,St+1)
At ~ π( ′a | St )
St+1 ~ P( ′s | St ,At )
rt+1 = r(St ,At ,St+1)
At ~ π( ′a | St )
π∗
= argmax
π
Eπ [ γ τ
rτ ]
τ =0
∞
∑
π∗
= argmax
π
Eπ [ γ τ
rτ ]
τ =0
∞
∑
= J
∇θ J
∇θ J = Eπθ
[∇θ log(πθ (at | st ))Qt ]
∇θ J = Es∼ρ ∇aQµ
s,a( )a=µθ s( )
∇θ µθ s( )⎡
⎣⎢
⎤
⎦⎥
∇θ J = ∇θ Eπθ
[ γ τ
rτ ]
τ =0
∞
∑
= ∇θ Es0 ~ρ,s'~p πθ at ,st( ) γ τ
rτ
τ =0
∞
∑t=0
∏
⎡
⎣
⎢
⎤
⎦
⎥
= Es0 ~ρ,s'~p ∇θ πθ at ,st( ) γ τ
rτ
τ =0
∞
∑t=0
∏
⎡
⎣
⎢
⎤
⎦
⎥
= Es~ρ πθ at ,st( )
∇θ πθ at ,st( )
t=0
∏
πθ at ,st( )
t=0
∏
γ τ
rτ
τ =0
∞
∑
t=0
∏
⎡
⎣
⎢
⎢
⎢
⎤
⎦
⎥
⎥
⎥
= Es~ρ πθ (at | st ) ∇θ log(πθ (at | st ))
t=0
∑t=0
∏ γ τ
rτ
τ =0
∞
∑
⎡
⎣
⎢
⎤
⎦
⎥
= Eπθ
[ ∇θ log(πθ (at | st ))
t=0
∑ γ τ
rτ
τ =t
∞
∑ ]
∇log p x( )( ) f x( )
∇log p x( )( ) f x( )
J = Es∼ρ [Qµθ
s,µθ s( )( )]
∇θ J = Es∼ρ ∇θQµ
s,µθ s( )( )⎡⎣ ⎤⎦
= Es∼ρ ∇aQµ
s,a( )a=µθ s( )
∇θ µθ s( )⎡
⎣⎢
⎤
⎦⎥
f st ,at( )= f st ,at( )+ ∇a f st ,a( )a=at
at − at( )
∇θ J = Eρ,π ∇θ logπθ at st( ) Q st ,at( )− f st ,at( )( )⎡
⎣
⎤
⎦ + Eρ,π ∇θ logπθ at st( ) f st ,at( )⎡
⎣
⎤
⎦
= Eρ,π ∇θ logπθ at st( ) Q st ,at( )− f st ,at( )( )⎡
⎣
⎤
⎦ + Eρ,π ∇a f st ,a( )a=at
∇θ µθ st( )⎡
⎣
⎤
⎦
∇θ J = Eρ,π ∇θ logπθ at st( ) Q st ,at( )−Qw st ,at( )( )⎡
⎣
⎤
⎦ + Eρ,π ∇aQw st ,a( )a=at
∇θ µθ st( )⎡
⎣
⎤
⎦
∇θ J = Eρ,π ∇θ logπθ at st( ) A st ,at( )− Aw st ,at( )( )⎡
⎣
⎤
⎦ + Eρ,π ∇aQw st ,a( )a=at
∇θ µθ st( )⎡
⎣
⎤
⎦
a
∇θ J = Eρ,π ∇θ logπθ at st( ) A st ,at( )− Aw st ,at( )( )⎡
⎣
⎤
⎦ + Eρ,π ∇aQw st ,a( )a=at
∇θ µθ st( )⎡
⎣
⎤
⎦
Aw = Qw st ,at( )− Eπ Qw st ,at( )⎡⎣ ⎤⎦
= Qw st ,µθ st( )( )+ ∇aQw st ,a( )a=µθ st( )
at − µθ st( )( )− Eπ Qw st ,µθ st( )( )+ ∇aQw st ,a( )a=µθ st( )
at − µθ st( )( )⎡
⎣⎢
⎤
⎦⎥
= ∇aQw st ,a( )a=µθ st( )
at − µθ st( )( )
rt+1 +γV st+1( )−V st( )
Eπ at[ ]= µθ st( )
m*
= m −η(t −τ )
E m*
⎡⎣ ⎤⎦ = E m[ ]
Var m*
⎡⎣ ⎤⎦ = Var m[ ]− 2ηCov m,t[ ]+η2
Var t[ ]
η*
=
Cov m,t[ ]
Var t[ ]
∇θ J = Eρ,π ∇θ logπθ at st( ) A st ,at( )−η st( )Aw st ,at( )( )⎡
⎣
⎤
⎦ +
Eρ,π η st( )∇aQw st ,a( )a=at
∇θ µθ st( )⎡
⎣
⎤
⎦
Var A −ηAw⎡⎣ ⎤⎦ = Var A[ ]− 2ηCov A,Aw( )+η2
Var Aw( )
η*
=
Cov A,Aw( )
Var Aw( )
Q prop
Q prop
Q prop
Q prop
Q prop
Q prop
Q prop
Q prop

More Related Content

Viewers also liked

Identification of associations between genotypes and longitudinal phenotypes ...
Identification of associations between genotypes and longitudinal phenotypes ...Identification of associations between genotypes and longitudinal phenotypes ...
Identification of associations between genotypes and longitudinal phenotypes ...
弘毅 露崎
 
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
Kaoru Nasuno
 

Viewers also liked (20)

Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @DenaICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
 
ICLR読み会 奥村純 20170617
ICLR読み会 奥村純 20170617ICLR読み会 奥村純 20170617
ICLR読み会 奥村純 20170617
 
SwiftでRiemann球面を扱う
SwiftでRiemann球面を扱うSwiftでRiemann球面を扱う
SwiftでRiemann球面を扱う
 
エフェクト用 Shader 機能紹介
エフェクト用 Shader 機能紹介エフェクト用 Shader 機能紹介
エフェクト用 Shader 機能紹介
 
エンジニアがデザインやってみた @ Aimning MeetUp 2017/10
エンジニアがデザインやってみた @ Aimning MeetUp 2017/10エンジニアがデザインやってみた @ Aimning MeetUp 2017/10
エンジニアがデザインやってみた @ Aimning MeetUp 2017/10
 
エフェクトにしっかり色を付ける方法
エフェクトにしっかり色を付ける方法エフェクトにしっかり色を付ける方法
エフェクトにしっかり色を付ける方法
 
当たり前を当たり前に:Agile2017レポート
当たり前を当たり前に:Agile2017レポート当たり前を当たり前に:Agile2017レポート
当たり前を当たり前に:Agile2017レポート
 
Proof summit 2017 for slideshare
Proof summit 2017 for slideshareProof summit 2017 for slideshare
Proof summit 2017 for slideshare
 
Identification of associations between genotypes and longitudinal phenotypes ...
Identification of associations between genotypes and longitudinal phenotypes ...Identification of associations between genotypes and longitudinal phenotypes ...
Identification of associations between genotypes and longitudinal phenotypes ...
 
Continuous control
Continuous controlContinuous control
Continuous control
 
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
 
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
【論文紹介】Reward Augmented Maximum Likelihood for Neural Structured Prediction
 
共変戻り値型って知ってますか?
共変戻り値型って知ってますか?共変戻り値型って知ってますか?
共変戻り値型って知ってますか?
 
Node and Micro-Services at IBM
Node and Micro-Services at IBMNode and Micro-Services at IBM
Node and Micro-Services at IBM
 
Effective web performance tuning for smartphone
Effective web performance tuning for smartphoneEffective web performance tuning for smartphone
Effective web performance tuning for smartphone
 
Googleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOpsGoogleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOps
 
RのffでGLMしてみたけど...
RのffでGLMしてみたけど...RのffでGLMしてみたけど...
RのffでGLMしてみたけど...
 
ディープボルツマンマシン入門
ディープボルツマンマシン入門ディープボルツマンマシン入門
ディープボルツマンマシン入門
 
FINAL FANTASY Record Keeper の作り方
FINAL FANTASY Record Keeper の作り方FINAL FANTASY Record Keeper の作り方
FINAL FANTASY Record Keeper の作り方
 

Similar to Q prop

ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ssusere0a682
 
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ssusere0a682
 
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
STAIR Lab, Chiba Institute of Technology
 
ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-
ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-
ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-
ssusere0a682
 
Maximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of BeetleMaximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of Beetle
Liang Kai Hu
 

Similar to Q prop (20)

Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
 
強化学習勉強会6の資料
強化学習勉強会6の資料強化学習勉強会6の資料
強化学習勉強会6の資料
 
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
 
関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド関西NIPS+読み会発表スライド
関西NIPS+読み会発表スライド
 
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
ゲーム理論NEXT 期待効用理論第6回 -3つの公理と期待効用定理-
 
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
ゲーム理論NEXT 期待効用理論第7/8/9回 -期待効用定理の証明1/2/3-
ゲーム理論NEXT 期待効用理論第7/8/9回 -期待効用定理の証明1/2/3-ゲーム理論NEXT 期待効用理論第7/8/9回 -期待効用定理の証明1/2/3-
ゲーム理論NEXT 期待効用理論第7/8/9回 -期待効用定理の証明1/2/3-
 
ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-
ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-
ゲーム理論 BASIC 演習73 -3人ゲーム分析:シャープレイ値-
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
 
slides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial scienceslides CIRM copulas, extremes and actuarial science
slides CIRM copulas, extremes and actuarial science
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter Three
 
K to 12 math
K to 12 mathK to 12 math
K to 12 math
 
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
 
Forward algorithm step by step
Forward algorithm step by stepForward algorithm step by step
Forward algorithm step by step
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 
Orthogonal basis and gram schmidth process
Orthogonal basis and gram schmidth processOrthogonal basis and gram schmidth process
Orthogonal basis and gram schmidth process
 
Maximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of BeetleMaximum Likelihood Estimation of Beetle
Maximum Likelihood Estimation of Beetle
 
Teoria Numérica (Palestra 01)
Teoria Numérica (Palestra 01)Teoria Numérica (Palestra 01)
Teoria Numérica (Palestra 01)
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 

Recently uploaded (20)

Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 

Q prop

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. St+1 ~ P( ′s | St ,At ) rt+1 = r(St ,At ,St+1) At ~ π( ′a | St )
  • 7. St+1 ~ P( ′s | St ,At ) rt+1 = r(St ,At ,St+1) At ~ π( ′a | St ) π∗ = argmax π Eπ [ γ τ rτ ] τ =0 ∞ ∑
  • 8. π∗ = argmax π Eπ [ γ τ rτ ] τ =0 ∞ ∑ = J ∇θ J
  • 9. ∇θ J = Eπθ [∇θ log(πθ (at | st ))Qt ] ∇θ J = Es∼ρ ∇aQµ s,a( )a=µθ s( ) ∇θ µθ s( )⎡ ⎣⎢ ⎤ ⎦⎥
  • 10. ∇θ J = ∇θ Eπθ [ γ τ rτ ] τ =0 ∞ ∑ = ∇θ Es0 ~ρ,s'~p πθ at ,st( ) γ τ rτ τ =0 ∞ ∑t=0 ∏ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = Es0 ~ρ,s'~p ∇θ πθ at ,st( ) γ τ rτ τ =0 ∞ ∑t=0 ∏ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = Es~ρ πθ at ,st( ) ∇θ πθ at ,st( ) t=0 ∏ πθ at ,st( ) t=0 ∏ γ τ rτ τ =0 ∞ ∑ t=0 ∏ ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ = Es~ρ πθ (at | st ) ∇θ log(πθ (at | st )) t=0 ∑t=0 ∏ γ τ rτ τ =0 ∞ ∑ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = Eπθ [ ∇θ log(πθ (at | st )) t=0 ∑ γ τ rτ τ =t ∞ ∑ ]
  • 11. ∇log p x( )( ) f x( )
  • 12. ∇log p x( )( ) f x( )
  • 13.
  • 14. J = Es∼ρ [Qµθ s,µθ s( )( )] ∇θ J = Es∼ρ ∇θQµ s,µθ s( )( )⎡⎣ ⎤⎦ = Es∼ρ ∇aQµ s,a( )a=µθ s( ) ∇θ µθ s( )⎡ ⎣⎢ ⎤ ⎦⎥
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. f st ,at( )= f st ,at( )+ ∇a f st ,a( )a=at at − at( ) ∇θ J = Eρ,π ∇θ logπθ at st( ) Q st ,at( )− f st ,at( )( )⎡ ⎣ ⎤ ⎦ + Eρ,π ∇θ logπθ at st( ) f st ,at( )⎡ ⎣ ⎤ ⎦ = Eρ,π ∇θ logπθ at st( ) Q st ,at( )− f st ,at( )( )⎡ ⎣ ⎤ ⎦ + Eρ,π ∇a f st ,a( )a=at ∇θ µθ st( )⎡ ⎣ ⎤ ⎦
  • 22.
  • 23. ∇θ J = Eρ,π ∇θ logπθ at st( ) Q st ,at( )−Qw st ,at( )( )⎡ ⎣ ⎤ ⎦ + Eρ,π ∇aQw st ,a( )a=at ∇θ µθ st( )⎡ ⎣ ⎤ ⎦ ∇θ J = Eρ,π ∇θ logπθ at st( ) A st ,at( )− Aw st ,at( )( )⎡ ⎣ ⎤ ⎦ + Eρ,π ∇aQw st ,a( )a=at ∇θ µθ st( )⎡ ⎣ ⎤ ⎦ a
  • 24. ∇θ J = Eρ,π ∇θ logπθ at st( ) A st ,at( )− Aw st ,at( )( )⎡ ⎣ ⎤ ⎦ + Eρ,π ∇aQw st ,a( )a=at ∇θ µθ st( )⎡ ⎣ ⎤ ⎦ Aw = Qw st ,at( )− Eπ Qw st ,at( )⎡⎣ ⎤⎦ = Qw st ,µθ st( )( )+ ∇aQw st ,a( )a=µθ st( ) at − µθ st( )( )− Eπ Qw st ,µθ st( )( )+ ∇aQw st ,a( )a=µθ st( ) at − µθ st( )( )⎡ ⎣⎢ ⎤ ⎦⎥ = ∇aQw st ,a( )a=µθ st( ) at − µθ st( )( ) rt+1 +γV st+1( )−V st( ) Eπ at[ ]= µθ st( )
  • 25.
  • 26. m* = m −η(t −τ ) E m* ⎡⎣ ⎤⎦ = E m[ ] Var m* ⎡⎣ ⎤⎦ = Var m[ ]− 2ηCov m,t[ ]+η2 Var t[ ] η* = Cov m,t[ ] Var t[ ]
  • 27. ∇θ J = Eρ,π ∇θ logπθ at st( ) A st ,at( )−η st( )Aw st ,at( )( )⎡ ⎣ ⎤ ⎦ + Eρ,π η st( )∇aQw st ,a( )a=at ∇θ µθ st( )⎡ ⎣ ⎤ ⎦ Var A −ηAw⎡⎣ ⎤⎦ = Var A[ ]− 2ηCov A,Aw( )+η2 Var Aw( ) η* = Cov A,Aw( ) Var Aw( )