SlideShare a Scribd company logo
1 of 19
Download to read offline
Duality between OOP and RL
Kwanghee Choi
Local Optima 2019
Contents
I. OOP Perspective
A. Characteristics of Objects
B. Good Objects
C. Object State
D. Object Behavior
II. RL Perspective
A. Agent and Environment
B. Reward and Action
C. History and State
D. Markov Property
III. Dual Perspective
A. Feedback Loop with Messages
B. States
C. Humankind Behind the Duality
I. OOP Perspective
Reference
- 객체지향의 사실과 오해: 역할, 책임, 협력 관점에서 본 객체지향 (조영호,
2015)
https://wikibook.co.kr/object-orientation/
- Summary of the book above (Kwanghee Choi, 2019)
https://juice500ml.github.io/software_design/2019/02/16/The-Essence-of-Object-Orientation.html
- Note. Following contents heavily depend on both of the reference.
A. Characteristics of Objects
● Real-world objects are passive. Software objects are active.
They can do much more stuff than real-world objects.
They acts as if they are live beings. (Anthropomorphism)
● Real-world objects are just metaphors for software objects,
minimizing the representational gap.
● Humans think and decide autonomously.
Objects encapsulate states and behaviors to act autonomously.
● Humans make promises to collaborate for a common goal.
Objects message each other to collaborate for a single functionality.
B. Good Objects
● Object should be able to cooperate via messages, like an open port.
● Object should be autonomous, with own principles and control.
● To ensure openness and autonomy, object has
behavior (the way how object can collaborate with other objects)
and state (data needed for behaviors inside the object).
● OO is not about classes. It is about autonomous objects messaging each
other. It is about maintaining collaborations between roles with
responsibilities. Classes are just tools to implement those.
C. Object State
● State is the total information that the object has at a specific time.
● State is an abstraction of all the previous behaviors to reduce the
complexities of the real-world.
● Object has, and should be on full control unto its own state, hence the
autonomy. State and behavior are bind to one unit: an object.
D. Object Behavior
● Behavior is doing stuff to respond to incoming messages.
● Behavior changes state (side effect), and behavior depends on the state.
● Behavior is the only way for an object to participate in collaborations.
● State Encapsulation: Only behaviors are visible, states are invisible (from
the outside). The only way to manipulate its states is via behaviors.
● As the object becomes more autonomous, it gets more intelligent.
In other words, collaboration gets more flexible and concise.
● Query the state of the object (read, getter),
and command to change the state of the object (write, setter).
II. RL Perspective
Reference
- UCL COMPGI13 Reinforcement Learning (David Silver, 2015)
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
- Reinforcement Learning: An Introduction, 2nd edition (Richard S. Sutton
and Andrew G. Barto, 2018)
http://incompleteideas.net/book/the-book-2nd.html
- Following contents heavily depend on both of the reference.
A. Agent and Environment
● Reinforcement Learning trains the agent what to do
so as to maximize the reward received from the environment.
● At each step t,
agent executes action At
and receives observation Ot
and reward Rt
,
environment receives At
and emits Ot+1
and Rt+1
.
● Agent’s actions affect environment,
therefore affect the subsequent data it receives.
B. Reward and Action
● A reward Rt
is a scalar feedback signal,
which indicates how well agent is doing at step t.
● Sequential Decision Making is selecting actions
to maximize total future reward.
● Actions may have long term consequences, and rewards may be delayed.
● It may be better to sacrifice in short-term to gain in long-term.
C. History and State
● History Ht
is all observable variables up to time t,
i.e. the sequence of observations, actions, and rewards up to time t.
Ht
= O1
, R1
, A1
, … , Ot-1
, Rt-1
, At-1
, Ot
, Rt
● State St
is a function, or a summary of history f (Ht
).
● State is the information used to determine what to do.
● Depending on the history/state, agent selects actions,
and environment selects observations and rewards.
● Environment state St
e
and agent state St
a
D. Markov Property
● A state St
is Markov iff P (St+1
| St
) = P (St+1
| S1
, S2
, … , St
), in other words,
the future does not depend on the past given the present.
● The state is a sufficient statistic of the future,
which captures all relevant information from the history.
Therefore once the state is known, the history may be thrown away.
● Full Observability is achieved when agent directly observes
environment state. (Ot
= St
e
= St
a
)
● Full observability is necessary for Markov Decision Process (MDP).
III. Dual Perspective
A. Feedback Loop with Messages
● Agent and environment are two objects affecting each other,
alternating between being caller and callee.
● Message is the only way for the caller to manipulate the callee.
Therefore, action is the only way for the agent to manipulate the
environment to return the maximized reward.
Inversely, observation and reward is the only for the environment to
manipulate the agent.
● Only the observation and the reward is visible to the agent.
Environment state has to be deduced from them.
B. States
● State determines the action of the agent.
● State is the summary of the previous interaction history,
or abstraction of all the previous behaviors.
● If the state fails to do so, it loses the Markov Property,
hence resulting object depending outside of one’s knowledge.
C. Humankind Behind the Duality
● Innate human ability of seeing the world as
a set of independent and perceivable objects.
● An idealized computational model of
humans learning from interactions with the environment.
Duality between OOP and RL

More Related Content

Similar to Duality between OOP and RL

Agent architectures
Agent architecturesAgent architectures
Agent architecturesguesta6bfe2
 
AI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial IntelligenceAI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial IntelligenceKirti Verma
 
intelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdfintelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdfShivareddyGangam
 
Lecture 1 - introduction.pdf
Lecture 1 - introduction.pdfLecture 1 - introduction.pdf
Lecture 1 - introduction.pdfNamanJain758248
 
introduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.pptintroduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.pptdejene3
 
AI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptxAI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptxYousef Aburawi
 
Types of environment
Types of environmentTypes of environment
Types of environmentMegha Sharma
 

Similar to Duality between OOP and RL (20)

CS4700-Agents_v3.pptx
CS4700-Agents_v3.pptxCS4700-Agents_v3.pptx
CS4700-Agents_v3.pptx
 
Agent architectures
Agent architecturesAgent architectures
Agent architectures
 
Agent architectures
Agent architecturesAgent architectures
Agent architectures
 
Intelligent agent
Intelligent agentIntelligent agent
Intelligent agent
 
AI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial IntelligenceAI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial Intelligence
 
intelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdfintelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdf
 
Lecture 1 - introduction.pdf
Lecture 1 - introduction.pdfLecture 1 - introduction.pdf
Lecture 1 - introduction.pdf
 
Lecture 4 (1).pptx
Lecture 4 (1).pptxLecture 4 (1).pptx
Lecture 4 (1).pptx
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Infosec
InfosecInfosec
Infosec
 
introduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.pptintroduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.ppt
 
AI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptxAI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptx
 
Intelligent Agents
Intelligent AgentsIntelligent Agents
Intelligent Agents
 
Lecture 2 Agents.pptx
Lecture 2 Agents.pptxLecture 2 Agents.pptx
Lecture 2 Agents.pptx
 
Unit2: Agents and Environment
Unit2: Agents and EnvironmentUnit2: Agents and Environment
Unit2: Agents and Environment
 
Types of environment
Types of environmentTypes of environment
Types of environment
 
Slide01 - Intelligent Agents.ppt
Slide01 - Intelligent Agents.pptSlide01 - Intelligent Agents.ppt
Slide01 - Intelligent Agents.ppt
 
AI PPT-2.pptx
AI PPT-2.pptxAI PPT-2.pptx
AI PPT-2.pptx
 
Agents.ppt
Agents.pptAgents.ppt
Agents.ppt
 
Lec 2-agents
Lec 2-agentsLec 2-agents
Lec 2-agents
 

More from Kwanghee Choi

Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)Kwanghee Choi
 
Recommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsRecommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsKwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)Kwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)Kwanghee Choi
 
Before and After the AI Winter - Recap
Before and After the AI Winter - RecapBefore and After the AI Winter - Recap
Before and After the AI Winter - RecapKwanghee Choi
 
Mastering Gomoku - Recap
Mastering Gomoku - RecapMastering Gomoku - Recap
Mastering Gomoku - RecapKwanghee Choi
 
Teachings of Ada Lovelace
Teachings of Ada LovelaceTeachings of Ada Lovelace
Teachings of Ada LovelaceKwanghee Choi
 
div, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewdiv, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewKwanghee Choi
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnKwanghee Choi
 
Bandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryBandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryKwanghee Choi
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson samplingKwanghee Choi
 
Azure functions: Quickstart
Azure functions: QuickstartAzure functions: Quickstart
Azure functions: QuickstartKwanghee Choi
 
Modern convolutional object detectors
Modern convolutional object detectorsModern convolutional object detectors
Modern convolutional object detectorsKwanghee Choi
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving AverageKwanghee Choi
 
Jpl coding standard for the c programming language
Jpl coding standard for the c programming languageJpl coding standard for the c programming language
Jpl coding standard for the c programming languageKwanghee Choi
 

More from Kwanghee Choi (19)

Visual Transformers
Visual TransformersVisual Transformers
Visual Transformers
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)
 
Recommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsRecommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal Scrolls
 
추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)
 
추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)
 
Before and After the AI Winter - Recap
Before and After the AI Winter - RecapBefore and After the AI Winter - Recap
Before and After the AI Winter - Recap
 
Mastering Gomoku - Recap
Mastering Gomoku - RecapMastering Gomoku - Recap
Mastering Gomoku - Recap
 
Teachings of Ada Lovelace
Teachings of Ada LovelaceTeachings of Ada Lovelace
Teachings of Ada Lovelace
 
div, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewdiv, grad, curl, and all that - a review
div, grad, curl, and all that - a review
 
Gaussian processes
Gaussian processesGaussian processes
Gaussian processes
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
 
JFEF encoding
JFEF encodingJFEF encoding
JFEF encoding
 
Bandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryBandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summary
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
 
Azure functions: Quickstart
Azure functions: QuickstartAzure functions: Quickstart
Azure functions: Quickstart
 
Modern convolutional object detectors
Modern convolutional object detectorsModern convolutional object detectors
Modern convolutional object detectors
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving Average
 
Jpl coding standard for the c programming language
Jpl coding standard for the c programming languageJpl coding standard for the c programming language
Jpl coding standard for the c programming language
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Duality between OOP and RL

  • 1. Duality between OOP and RL Kwanghee Choi Local Optima 2019
  • 2. Contents I. OOP Perspective A. Characteristics of Objects B. Good Objects C. Object State D. Object Behavior II. RL Perspective A. Agent and Environment B. Reward and Action C. History and State D. Markov Property III. Dual Perspective A. Feedback Loop with Messages B. States C. Humankind Behind the Duality
  • 4. Reference - 객체지향의 사실과 오해: 역할, 책임, 협력 관점에서 본 객체지향 (조영호, 2015) https://wikibook.co.kr/object-orientation/ - Summary of the book above (Kwanghee Choi, 2019) https://juice500ml.github.io/software_design/2019/02/16/The-Essence-of-Object-Orientation.html - Note. Following contents heavily depend on both of the reference.
  • 5. A. Characteristics of Objects ● Real-world objects are passive. Software objects are active. They can do much more stuff than real-world objects. They acts as if they are live beings. (Anthropomorphism) ● Real-world objects are just metaphors for software objects, minimizing the representational gap. ● Humans think and decide autonomously. Objects encapsulate states and behaviors to act autonomously. ● Humans make promises to collaborate for a common goal. Objects message each other to collaborate for a single functionality.
  • 6. B. Good Objects ● Object should be able to cooperate via messages, like an open port. ● Object should be autonomous, with own principles and control. ● To ensure openness and autonomy, object has behavior (the way how object can collaborate with other objects) and state (data needed for behaviors inside the object). ● OO is not about classes. It is about autonomous objects messaging each other. It is about maintaining collaborations between roles with responsibilities. Classes are just tools to implement those.
  • 7. C. Object State ● State is the total information that the object has at a specific time. ● State is an abstraction of all the previous behaviors to reduce the complexities of the real-world. ● Object has, and should be on full control unto its own state, hence the autonomy. State and behavior are bind to one unit: an object.
  • 8. D. Object Behavior ● Behavior is doing stuff to respond to incoming messages. ● Behavior changes state (side effect), and behavior depends on the state. ● Behavior is the only way for an object to participate in collaborations. ● State Encapsulation: Only behaviors are visible, states are invisible (from the outside). The only way to manipulate its states is via behaviors. ● As the object becomes more autonomous, it gets more intelligent. In other words, collaboration gets more flexible and concise. ● Query the state of the object (read, getter), and command to change the state of the object (write, setter).
  • 10. Reference - UCL COMPGI13 Reinforcement Learning (David Silver, 2015) http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html - Reinforcement Learning: An Introduction, 2nd edition (Richard S. Sutton and Andrew G. Barto, 2018) http://incompleteideas.net/book/the-book-2nd.html - Following contents heavily depend on both of the reference.
  • 11. A. Agent and Environment ● Reinforcement Learning trains the agent what to do so as to maximize the reward received from the environment. ● At each step t, agent executes action At and receives observation Ot and reward Rt , environment receives At and emits Ot+1 and Rt+1 . ● Agent’s actions affect environment, therefore affect the subsequent data it receives.
  • 12. B. Reward and Action ● A reward Rt is a scalar feedback signal, which indicates how well agent is doing at step t. ● Sequential Decision Making is selecting actions to maximize total future reward. ● Actions may have long term consequences, and rewards may be delayed. ● It may be better to sacrifice in short-term to gain in long-term.
  • 13. C. History and State ● History Ht is all observable variables up to time t, i.e. the sequence of observations, actions, and rewards up to time t. Ht = O1 , R1 , A1 , … , Ot-1 , Rt-1 , At-1 , Ot , Rt ● State St is a function, or a summary of history f (Ht ). ● State is the information used to determine what to do. ● Depending on the history/state, agent selects actions, and environment selects observations and rewards. ● Environment state St e and agent state St a
  • 14. D. Markov Property ● A state St is Markov iff P (St+1 | St ) = P (St+1 | S1 , S2 , … , St ), in other words, the future does not depend on the past given the present. ● The state is a sufficient statistic of the future, which captures all relevant information from the history. Therefore once the state is known, the history may be thrown away. ● Full Observability is achieved when agent directly observes environment state. (Ot = St e = St a ) ● Full observability is necessary for Markov Decision Process (MDP).
  • 16. A. Feedback Loop with Messages ● Agent and environment are two objects affecting each other, alternating between being caller and callee. ● Message is the only way for the caller to manipulate the callee. Therefore, action is the only way for the agent to manipulate the environment to return the maximized reward. Inversely, observation and reward is the only for the environment to manipulate the agent. ● Only the observation and the reward is visible to the agent. Environment state has to be deduced from them.
  • 17. B. States ● State determines the action of the agent. ● State is the summary of the previous interaction history, or abstraction of all the previous behaviors. ● If the state fails to do so, it loses the Markov Property, hence resulting object depending outside of one’s knowledge.
  • 18. C. Humankind Behind the Duality ● Innate human ability of seeing the world as a set of independent and perceivable objects. ● An idealized computational model of humans learning from interactions with the environment.