SlideShare a Scribd company logo
1 of 8
Download to read offline
M = {S, A, pT, p0, g}
Pr{St+1 = s′

|At = a, St = s, …} = Pr{St+1 = s′

|At = a, St = s}
=: pT(s′

|s, a), Pr(S0 = s) =: p0(s)
π ∈ ΠM
Pr(At = a|St = s, …) = Pr(At = a|St = s)
=: π(a|s)
Vπ
Vπ
(s) :=
𝔼
π
[C0 |S0 = s], Ct :=
∞
∑
i=0
γi
g(At+i, St+i), γ ∈ [0,1)
f(π)
f(π) :=
∑
s∈S
p0(s)Vπ
(s)
π∈ΠM
f(π) M
Vπ
(s) =
𝔼
π
[C0 |S0 = s]
=
𝔼
π
[g(A0, S0) + γC1 |S0 = s]
=
∑
a∈A
π(a|s)(g(a, s) + γ
∑
a∈A
∑
s′

∈S
π(a|s)pT(s′

|s, a)
𝔼
[C1 |s1 = s′

])
=
∑
a∈A
π(a|s)(g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V(s′

)), ∀s ∈ S
V*
V*(s) := max
(π0,π1,…)
V(π0,π1,…)
(s)
V*(s) = max
(π0,π1,…)
𝔼
(π0,π1,…)
[g(A0, S0) + γC1 |S0 = s]
= max
π0
𝔼
π0
[g(A0, S0) + γ max
(π1,π2,…)
𝔼
(π1,π2,…)
[C1 |S1 ∼ pT( ⋅ |S0, A0)]|S0 = s]
= max
π0
∑
a∈A
π0(a|s)((g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

))
= max
a∈A
((g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

)), ∀s ∈ S
Bπ(V) :=
∑
a∈A
π(a| ⋅ )(g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V(s′

))
B*(V) := max
a∈A
{g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V(s′

)}
V = B(V), B := {B*, Bπ}
v, v′

: S → ℝ
v ≤ v′

⇔ v(s) ≤ v′

(s), ∀s ∈ S
∥v − v′

∥ := max
s∈S
|v(s) − v(s′

)|
v ≤ v′

⇒ B(v) ≤ B(v′

)
B(v + c) = B(v) + γc, ∀c ∈ ℝ
∥B(v) − B(v′

)∥ ≤ γ∥v − v′

∥
v* = B(v*) v*
lim
k→∞
Bk
(v0) = v*, ∀v0 : S → ℝ
B*(v)(s) = max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v(s′

)}
≤ max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v′

(s′

)}
= B*(v′

)(s), ∀s ∈ S
Bπ
B*(v + c)(s) = max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′
|s, a)(v(s′

) + c)}
= max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v(s′

)} + γc
= B*(v)(s) + γc, ∀s ∈ S
Bπ
v′

− ∥v − v′

∥ ≤ v ≤ v′

+ ∥v − v′

∥
⇒ B(v′

) − γ∥v′

− v∥ ≤ B(v) ≤ B(v′

) + γ∥v′

− v∥
⇒ ∥B(v′

) − B(v)∥ ≤ γ∥v − v′

∥
v, v′

: S → ℝ
v ≤ v′

⇔ v(s) ≤ v′

(s), ∀s ∈ S
∥v − v′

∥ := max
s∈S
|v(s) − v(s′

)|
v ≤ v′

⇒ B(v) ≤ B(v′

)
B(v + c) = B(v) + γc, ∀c ∈ ℝ
∥B(v) − B(v′

)∥ ≤ γ∥v − v′

∥
v* = B(v*) v*
lim
k→∞
Bk
(v0) = v*, ∀v0 : S → ℝ
∥v − v′

∥ ≤ ∥B(v) − B(v′

)∥ + ∥v − B(v)∥ + ∥v′

− B(v′

)∥
≤ γ∥v − v′

∥ + ∥v − B(v)∥ + ∥v′

− B(v′

)∥
⇒ ∥v − v′

| ≤
∥v − B(v)∥ + ∥v′

− B(v′

)∥
1 − γ
vk := Bk
(v0)
∥vn − vm∥ ≤
∥Bn
(v0) − Bn
(v1)∥ + ∥Bm
(v0) − Bm
(v1)∥
1 − γ
≤
γn
∥v0 − v1∥ + γm
∥v0 − v1∥
1 − γ
=
γn
+ γm
1 − γ
∥v0 − v1∥
lim
n,m→∞
∥vn − vm∥ = 0
∥vn − v*∥ ≤
∥Bn
(v0) − Bn
(v1)∥
1 − γ
=
γn
1 − γ
∥v0 − v1∥
lim
n→∞
∥vn − v*∥ = 0
B*(V) := max
a∈A
{g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V(s′

)}
πd
*
πd
* (s) := arg max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

)}
lim
k→∞
Bk
(v0) = v*, ∀v0 : S → ℝ
M = {S, A, pT, p0, g} ε ∈ (0,∞)
v′

: S → ℝ π*v′

: S → A
v′

: S → ℝ
v′

= max
a∈A
{g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)v(s′

)}
∥v − v′

∥ < ε πd
*
πd
v′

(s) := arg max
a∈A
{g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)v′

(s′

)}
v = v′


More Related Content

What's hot

Trend Based + Reg And Holtns
Trend Based + Reg And HoltnsTrend Based + Reg And Holtns
Trend Based + Reg And Holtns3abooodi
 
SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)tungwc
 
Expressões numéricas
Expressões numéricasExpressões numéricas
Expressões numéricasniltonco77
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討Tomoki Koriyama
 
11. simpl met-algebraicos
11. simpl met-algebraicos11. simpl met-algebraicos
11. simpl met-algebraicossonsolesbar
 
18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.ppt18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.pptMarcos Rdguez
 
11. simpl met algebraicos
11. simpl met algebraicos11. simpl met algebraicos
11. simpl met algebraicosboounzueta
 
simplificacion sistemas algebraicos
simplificacion sistemas algebraicossimplificacion sistemas algebraicos
simplificacion sistemas algebraicosPEDROASTURES21
 
Analysis and design of tail stock assembly
Analysis and design of tail stock assemblyAnalysis and design of tail stock assembly
Analysis and design of tail stock assemblyLunavath Suresh
 
Data Science Workflow
Data Science WorkflowData Science Workflow
Data Science WorkflowPyData
 
18. simpl met-algebraicos
18. simpl met-algebraicos18. simpl met-algebraicos
18. simpl met-algebraicosClauFdzSrz
 

What's hot (18)

Trend Based + Reg And Holtns
Trend Based + Reg And HoltnsTrend Based + Reg And Holtns
Trend Based + Reg And Holtns
 
SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)SUEC 高中 Adv Maths (Trigo Function Part 2)
SUEC 高中 Adv Maths (Trigo Function Part 2)
 
RM FUNCIONAL
RM FUNCIONALRM FUNCIONAL
RM FUNCIONAL
 
Adbequipo8..
Adbequipo8..Adbequipo8..
Adbequipo8..
 
Algebra de Boole
Algebra de BooleAlgebra de Boole
Algebra de Boole
 
Expressões numéricas
Expressões numéricasExpressões numéricas
Expressões numéricas
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
 
Eq 1º grau
Eq 1º grauEq 1º grau
Eq 1º grau
 
11. simpl met-algebraicos
11. simpl met-algebraicos11. simpl met-algebraicos
11. simpl met-algebraicos
 
18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.ppt18. simpl met-algebraicos.ppt
18. simpl met-algebraicos.ppt
 
11. simpl met algebraicos
11. simpl met algebraicos11. simpl met algebraicos
11. simpl met algebraicos
 
simplificacion sistemas algebraicos
simplificacion sistemas algebraicossimplificacion sistemas algebraicos
simplificacion sistemas algebraicos
 
Analysis and design of tail stock assembly
Analysis and design of tail stock assemblyAnalysis and design of tail stock assembly
Analysis and design of tail stock assembly
 
CAP corporate presentation 2016 (Arabic Version)
CAP corporate presentation 2016 (Arabic Version)CAP corporate presentation 2016 (Arabic Version)
CAP corporate presentation 2016 (Arabic Version)
 
Ejercicio dos
Ejercicio dosEjercicio dos
Ejercicio dos
 
Sheet no 1
Sheet no 1Sheet no 1
Sheet no 1
 
Data Science Workflow
Data Science WorkflowData Science Workflow
Data Science Workflow
 
18. simpl met-algebraicos
18. simpl met-algebraicos18. simpl met-algebraicos
18. simpl met-algebraicos
 

Similar to 強化学習勉強会の資料(3回目)

強化学習勉強会6の資料
強化学習勉強会6の資料強化学習勉強会6の資料
強化学習勉強会6の資料Yuji Okamoto
 
Formulario Trigonometria
Formulario TrigonometriaFormulario Trigonometria
Formulario TrigonometriaAntonio Guasco
 
Wu Mamber (String Algorithms 2007)
Wu  Mamber (String Algorithms 2007)Wu  Mamber (String Algorithms 2007)
Wu Mamber (String Algorithms 2007)mailund
 
【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-ssusere0a682
 
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-ssusere0a682
 
2010 gabarito fisica
2010 gabarito fisica2010 gabarito fisica
2010 gabarito fisicacavip
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeChung Hua Universit
 
Bellman ford
Bellman fordBellman ford
Bellman fordKiran K
 
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-ssusere0a682
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120RCCSRENKEI
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明ssusere0a682
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ssusere0a682
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilarWidmar Aguilar Gonzalez
 
Ejercicos laplace ruben gonzalez
Ejercicos laplace   ruben gonzalezEjercicos laplace   ruben gonzalez
Ejercicos laplace ruben gonzalezRuben Gonzalez
 
Pdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحارPdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحارMohamed Siddig Fadl Alla Moh.
 

Similar to 強化学習勉強会の資料(3回目) (20)

強化学習勉強会6の資料
強化学習勉強会6の資料強化学習勉強会6の資料
強化学習勉強会6の資料
 
Formulario Trigonometria
Formulario TrigonometriaFormulario Trigonometria
Formulario Trigonometria
 
Wu Mamber (String Algorithms 2007)
Wu  Mamber (String Algorithms 2007)Wu  Mamber (String Algorithms 2007)
Wu Mamber (String Algorithms 2007)
 
Ejercicio 211 del libro de baldor
Ejercicio 211 del libro de baldorEjercicio 211 del libro de baldor
Ejercicio 211 del libro de baldor
 
【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-【演習】Re:ゲーム理論入門 第14回 -仁-
【演習】Re:ゲーム理論入門 第14回 -仁-
 
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
ゲーム理論NEXT コア第1回 -特性関数と配分&コアの定義-
 
2010 gabarito fisica
2010 gabarito fisica2010 gabarito fisica
2010 gabarito fisica
 
Solucionario teoria-electromagnetica-hayt-2001
Solucionario teoria-electromagnetica-hayt-2001Solucionario teoria-electromagnetica-hayt-2001
Solucionario teoria-electromagnetica-hayt-2001
 
A Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter ThreeA Course in Fuzzy Systems and Control Matlab Chapter Three
A Course in Fuzzy Systems and Control Matlab Chapter Three
 
Bellman ford
Bellman fordBellman ford
Bellman ford
 
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
ゲーム理論NEXT コア第2回 -3人ゲームのコアの存在-
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 
Estadistica U4
Estadistica U4Estadistica U4
Estadistica U4
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-
 
Ejercicio 211 del libro de Baldor
Ejercicio 211 del libro de BaldorEjercicio 211 del libro de Baldor
Ejercicio 211 del libro de Baldor
 
Examens math
Examens mathExamens math
Examens math
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
 
Ejercicos laplace ruben gonzalez
Ejercicos laplace   ruben gonzalezEjercicos laplace   ruben gonzalez
Ejercicos laplace ruben gonzalez
 
Pdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحارPdf 635288601139411566 تحلية مياه البحار
Pdf 635288601139411566 تحلية مياه البحار
 

Recently uploaded

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 

Recently uploaded (20)

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

強化学習勉強会の資料(3回目)

  • 1.
  • 2.
  • 3. M = {S, A, pT, p0, g} Pr{St+1 = s′  |At = a, St = s, …} = Pr{St+1 = s′  |At = a, St = s} =: pT(s′  |s, a), Pr(S0 = s) =: p0(s) π ∈ ΠM Pr(At = a|St = s, …) = Pr(At = a|St = s) =: π(a|s) Vπ Vπ (s) := 𝔼 π [C0 |S0 = s], Ct := ∞ ∑ i=0 γi g(At+i, St+i), γ ∈ [0,1) f(π) f(π) := ∑ s∈S p0(s)Vπ (s) π∈ΠM f(π) M
  • 4. Vπ (s) = 𝔼 π [C0 |S0 = s] = 𝔼 π [g(A0, S0) + γC1 |S0 = s] = ∑ a∈A π(a|s)(g(a, s) + γ ∑ a∈A ∑ s′  ∈S π(a|s)pT(s′  |s, a) 𝔼 [C1 |s1 = s′  ]) = ∑ a∈A π(a|s)(g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V(s′  )), ∀s ∈ S V* V*(s) := max (π0,π1,…) V(π0,π1,…) (s) V*(s) = max (π0,π1,…) 𝔼 (π0,π1,…) [g(A0, S0) + γC1 |S0 = s] = max π0 𝔼 π0 [g(A0, S0) + γ max (π1,π2,…) 𝔼 (π1,π2,…) [C1 |S1 ∼ pT( ⋅ |S0, A0)]|S0 = s] = max π0 ∑ a∈A π0(a|s)((g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )) = max a∈A ((g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )), ∀s ∈ S
  • 5. Bπ(V) := ∑ a∈A π(a| ⋅ )(g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V(s′  )) B*(V) := max a∈A {g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V(s′  )} V = B(V), B := {B*, Bπ}
  • 6. v, v′  : S → ℝ v ≤ v′  ⇔ v(s) ≤ v′  (s), ∀s ∈ S ∥v − v′  ∥ := max s∈S |v(s) − v(s′  )| v ≤ v′  ⇒ B(v) ≤ B(v′  ) B(v + c) = B(v) + γc, ∀c ∈ ℝ ∥B(v) − B(v′  )∥ ≤ γ∥v − v′  ∥ v* = B(v*) v* lim k→∞ Bk (v0) = v*, ∀v0 : S → ℝ B*(v)(s) = max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v(s′  )} ≤ max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v′  (s′  )} = B*(v′  )(s), ∀s ∈ S Bπ B*(v + c)(s) = max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′ |s, a)(v(s′  ) + c)} = max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v(s′  )} + γc = B*(v)(s) + γc, ∀s ∈ S Bπ v′  − ∥v − v′  ∥ ≤ v ≤ v′  + ∥v − v′  ∥ ⇒ B(v′  ) − γ∥v′  − v∥ ≤ B(v) ≤ B(v′  ) + γ∥v′  − v∥ ⇒ ∥B(v′  ) − B(v)∥ ≤ γ∥v − v′  ∥
  • 7. v, v′  : S → ℝ v ≤ v′  ⇔ v(s) ≤ v′  (s), ∀s ∈ S ∥v − v′  ∥ := max s∈S |v(s) − v(s′  )| v ≤ v′  ⇒ B(v) ≤ B(v′  ) B(v + c) = B(v) + γc, ∀c ∈ ℝ ∥B(v) − B(v′  )∥ ≤ γ∥v − v′  ∥ v* = B(v*) v* lim k→∞ Bk (v0) = v*, ∀v0 : S → ℝ ∥v − v′  ∥ ≤ ∥B(v) − B(v′  )∥ + ∥v − B(v)∥ + ∥v′  − B(v′  )∥ ≤ γ∥v − v′  ∥ + ∥v − B(v)∥ + ∥v′  − B(v′  )∥ ⇒ ∥v − v′  | ≤ ∥v − B(v)∥ + ∥v′  − B(v′  )∥ 1 − γ vk := Bk (v0) ∥vn − vm∥ ≤ ∥Bn (v0) − Bn (v1)∥ + ∥Bm (v0) − Bm (v1)∥ 1 − γ ≤ γn ∥v0 − v1∥ + γm ∥v0 − v1∥ 1 − γ = γn + γm 1 − γ ∥v0 − v1∥ lim n,m→∞ ∥vn − vm∥ = 0 ∥vn − v*∥ ≤ ∥Bn (v0) − Bn (v1)∥ 1 − γ = γn 1 − γ ∥v0 − v1∥ lim n→∞ ∥vn − v*∥ = 0
  • 8. B*(V) := max a∈A {g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V(s′  )} πd * πd * (s) := arg max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )} lim k→∞ Bk (v0) = v*, ∀v0 : S → ℝ M = {S, A, pT, p0, g} ε ∈ (0,∞) v′  : S → ℝ π*v′  : S → A v′  : S → ℝ v′  = max a∈A {g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)v(s′  )} ∥v − v′  ∥ < ε πd * πd v′  (s) := arg max a∈A {g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)v′  (s′  )} v = v′