The document discusses reinforcement learning and its application to training an AI agent to play Super Mario Bros. It begins by explaining how animals and humans learn through reinforcement and punishment in their environments based on the rewards and consequences of their actions. It then provides an overview of reinforcement learning, including key concepts like the Markov decision process, exploration versus exploitation, and using a replay memory buffer to train a deep learning model. It concludes by describing how the Super Mario Bros environment can be set up and used with a reinforcement learning agent, including defining the state and action spaces, rewards and penalties, and the process of minimizing the loss to optimize the agent's behavior.
6. ALL ANIMALS HAVE THE ABILITY TO LEARN
- 모든 동물은 학습능력이 있다.
- 300여개의 신경세포만을 갖고 있는 예쁜꼬마선충 또한 학습능력이 있다.
- 머리철수반사 head withdraws reflex : 위험한 물체가 있을것이라 판단에 따른 반사행동
- 예쁜꼬마선충의 머리를 건드리면 일정 거리를 뒤로 간다.
HOW ANIMALS LEARN
8. LAW OF EFFECT
- Edward Thorndike(1898)
- Law of effect : 어떤 행동의 결과가 만족스러우면 다음에도 그 행동을 반복한다.
반대로 만족하지 않으면 그 행동을 하지 않는다.
- Reinforcement(강화) : 이전에 일어난 행동을 반복하게 만드는 자극
- Punishment(처벌) : 이전에 일어난 행동을 피하게 만드는 자극
HOW ANIMALS LEARN
19. LEARNING
- Reinforcement learning은 Reward(보상)을 최대화 하는 action(행동)을 선택한다.
- Learner(배우는자)는 여러 action을 해보며, reward를 가장 높게 받는 action을 찾는다.
-선택된 action이 당장의 reward 뿐만 아닌, 다음의 상황 또는 다음 일어나게 될
reward에도 영향을 끼칠수도 있다.
Action
당장의
상황 변화
미래의 상황Reward 미래의 Reward
REINFORCEMENT LEARNING
33. WORLDS & LEVELS ( WORLD 1~4)
SUPERMARIO WITH R.L
World 1 World 3
World 2 World 4
env = gym_super_mario_bros.make('SuperMarioBros-<world>-<level>-v<version>')
34. WORLDS & LEVELS ( WORLD 5~8)
SUPERMARIO WITH R.L
World 5 World 7
World 6 World 8
env = gym_super_mario_bros.make('SuperMarioBros-<world>-<level>-v<version>')
35. ALL WORLDS AND LEVELS
SUPERMARIO WITH R.L
env = gym_super_mario_bros.make('SuperMarioBros-<world>-<level>-v<version>')
1
2
3
4
5
6
7
8
36. ALL WORLDS AND LEVELS
SUPERMARIO WITH R.L
env = gym_super_mario_bros.make('SuperMarioBros-<world>-<level>-v<version>')
1
2
3
4
5
6
7
8
1 2 3 4 1 2 3 4
37. WORLDS & LEVELS
SUPERMARIO WITH R.L
Version 1
env = gym_super_mario_bros.make('SuperMarioBros-<world>-<level>-v<version>')
Version 2 Version 3 Version 4
73. *참고 용어 기호
Time step
Action
Transition Function
Reward
Set of states
Set of actions
Start state
Discount factor
t
a
P(s′, r ∣ s, a)
r
A
S
S0
γ
Set of reward
Policy
Reward
State
R
π
r
REINFORCEMENT LEARNING
s
74. REFERENCES
1. Habituation The Birth of Intelligence
2. Law of effect : The Birth of Intelligence ,p.171
3. Thorndike, E. L. (1905). The elements of psychology. New York: A. G. Seiler.
4. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative
processes in animals. Psychological Monographs: General and Applied, 2(4), i-109.
5. SuperMario environment
https://github.com/Kautenja/gym-super-mario-bros
6. http://faculty.coe.uh.edu/smcneil/cuin6373/idhistory/thorndike_extra.html