SlideShare a Scribd company logo
1 of 27
Download to read offline
Language Understanding for
Text-based Games using
Deep Reinforcement Learning
Karthik Narasimhan, Tejas D Kulkarni, Regina Barzilay
Presented by Mark Chang
Outline
● Introduction
● Background
– Reinforcement Learning
– Q-Learning
– Deep Q-Network
– Long Short-Term Memory
● Deep Reinforcement Learning Framework
– Game Representation
– Representation Generator
– Action Scorer
– Parameter Learning
● Experiment & Result
● Conclusion
Introduction
● Learning the control policies for text-based
games.
● A deep reinforcement learning framework
– learn state representations (semantics of text)
– learn action policies (rules of game)
Introduction
● MUD( Multi-User Dungeon ) Game
– A text-based multiplayer real-time virtual world.
– Player read descriptions of the world state.
– Player take actions by natural language
commands.
You are standing very close to the bridge’s eastern foundation.
If you go east you will be back on solid ground.
The bridge sways in the wind.
> Go east
Background
● Reinforcement Learning
● Q-Learning
● Deep Q-Network
● Long Short-Term Memory
Reinforcement Learning
Environmental State
ActionObservation
Reward
Agent State
● Rules of the game are
unknown.
● No supervisor, only a
reward signal.
● Feedback is delayed.
● Agent’s actions affect
the subsequent data it
receives.
Reinforcement Learning
● Markov Decision Process
– is a finite set of states
– is a finite set of actions
– is a state transition probability matrix
– is a reward function
– is a discount factor
● Return : total discounted reward from time-step t.
–
● Policy : Agent Behavior
– Deterministic policy:
– Stochastic policy:
● Value function: Expected return starting from state s.
– state-value function:
– action-value function:
Reinforcement Learning
Reinforcement Learning
● Bellman Equation for Action-Value Function
Q-Learning
● A model-free, off-policy technique to learn an
optimal Q(s, a) for the agent.
● model-free: using sampling instead
of complete model
● off-policy: learning optimal policy
independently of agent's actions
● : Estimation of
● : Step Size (Learning Rate)
● How to choose in ?
● - Greedy Exploration
– ensuring continual exploration
– With probability 1 - choose the greedy action
– With probability choose an action at random
Q-Learning
Deep Q-Network
● Problem with :
– too many states/actions to store in memory
– too slow to individually learn their value
● Solution :
– Approximate using a neural network
Q-Network :
Deep Q-Network
● Loss Function:
reduce the discrepancy between
– the predicted value :
– the expected Q-value :
● Gradient of Loss Function:
– previous iteration is fixed
Long Short-Term Memory
● recurrent neural networks with memory
● able to connect and recognize long-range
patterns between words in text
Long Short-Term Memory
● Write:
● Forget:
● Read:
Deep Reinforcement Learning
Framework
Representation
Generator
Action
Scorer
Game Representation
● Represented by the tuple
● : the set of all possible game states.
● : all commands (action-object pairs).
● : stochastic transition function
between game states.
● : reward function.
● : converts game state into a textual
description , since the game state is hidden.
Representation Generator
● Reads displayed raw text
● converts it to a vector representation
● LSTM :
● mean pooling layer :
Action Scorer
● multi-layered neural network
● given the current state representation ,
produces scores for actions.
● only consider commands with one action and
object :
Parameter Learning
● learning the parameters by using
stochastic gradient descent with RMSprop
● keep track of the agent’s previous experiences
in a memory
● sample a random transition from ,
and updating the parameters.
● prioritized sampling to the transition with
reward
Parameter Learning
Experiment & Result
● Two World:
– Home world : hand-cracted simple testcase
– Fantasy World : real MUD game testcase
● Evaluation:
– the cumulative reward
– the fraction of quests completed by the agent
● Baseline:
– Random agent
– Bag-of-words
– bigram
Experiment & Result
● Home world :
– Mimic a typical house
with 4 rooms.
– Each room contains a
representative object.
– Text : Description of
the character's state
and that the quest.
● Example:
quest:
Not you are sleepy
now but you are
hungry now.
command:
go to kitchen
eat apple
Experiment & Result
Experiment & Result
Experiment & Result
Conclusion
● a deep reinforcement learning framework
● jointly learn state representations and action
policies using game rewards as feedback.
● mapping text descriptions into vectors that
capture the semantics of the game states.

More Related Content

Similar to Language Understanding for Text-based Games using Deep Reinforcement Learning

Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with DialoguesJinho Choi
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...Sharath TS
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial Ligeng Zhu
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learningMarsan Ma
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksBen Ball
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionSeung Jae Lee
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
 

Similar to Language Understanding for Text-based Games using Deep Reinforcement Learning (20)

Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Finalver
FinalverFinalver
Finalver
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with Dialogues
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Machine learning
Machine learningMachine learning
Machine learning
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learning
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. Introduction
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 

More from Mark Chang

Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationMark Chang
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationMark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningMark Chang
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningMark Chang
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain AdaptationMark Chang
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOWMark Chang
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsMark Chang
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksMark Chang
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly ProblemMark Chang
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
 
淺談深度學習
淺談深度學習淺談深度學習
淺談深度學習Mark Chang
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
TensorFlow 深度學習快速上手班--深度學習
 TensorFlow 深度學習快速上手班--深度學習 TensorFlow 深度學習快速上手班--深度學習
TensorFlow 深度學習快速上手班--深度學習Mark Chang
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用Mark Chang
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習Mark Chang
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10Mark Chang
 

More from Mark Chang (20)

Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOW
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
淺談深度學習
淺談深度學習淺談深度學習
淺談深度學習
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
TensorFlow 深度學習快速上手班--深度學習
 TensorFlow 深度學習快速上手班--深度學習 TensorFlow 深度學習快速上手班--深度學習
TensorFlow 深度學習快速上手班--深度學習
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Language Understanding for Text-based Games using Deep Reinforcement Learning

  • 1. Language Understanding for Text-based Games using Deep Reinforcement Learning Karthik Narasimhan, Tejas D Kulkarni, Regina Barzilay Presented by Mark Chang
  • 2. Outline ● Introduction ● Background – Reinforcement Learning – Q-Learning – Deep Q-Network – Long Short-Term Memory ● Deep Reinforcement Learning Framework – Game Representation – Representation Generator – Action Scorer – Parameter Learning ● Experiment & Result ● Conclusion
  • 3. Introduction ● Learning the control policies for text-based games. ● A deep reinforcement learning framework – learn state representations (semantics of text) – learn action policies (rules of game)
  • 4. Introduction ● MUD( Multi-User Dungeon ) Game – A text-based multiplayer real-time virtual world. – Player read descriptions of the world state. – Player take actions by natural language commands. You are standing very close to the bridge’s eastern foundation. If you go east you will be back on solid ground. The bridge sways in the wind. > Go east
  • 5. Background ● Reinforcement Learning ● Q-Learning ● Deep Q-Network ● Long Short-Term Memory
  • 6. Reinforcement Learning Environmental State ActionObservation Reward Agent State ● Rules of the game are unknown. ● No supervisor, only a reward signal. ● Feedback is delayed. ● Agent’s actions affect the subsequent data it receives.
  • 7. Reinforcement Learning ● Markov Decision Process – is a finite set of states – is a finite set of actions – is a state transition probability matrix – is a reward function – is a discount factor
  • 8. ● Return : total discounted reward from time-step t. – ● Policy : Agent Behavior – Deterministic policy: – Stochastic policy: ● Value function: Expected return starting from state s. – state-value function: – action-value function: Reinforcement Learning
  • 9. Reinforcement Learning ● Bellman Equation for Action-Value Function
  • 10. Q-Learning ● A model-free, off-policy technique to learn an optimal Q(s, a) for the agent. ● model-free: using sampling instead of complete model ● off-policy: learning optimal policy independently of agent's actions ● : Estimation of ● : Step Size (Learning Rate)
  • 11. ● How to choose in ? ● - Greedy Exploration – ensuring continual exploration – With probability 1 - choose the greedy action – With probability choose an action at random Q-Learning
  • 12. Deep Q-Network ● Problem with : – too many states/actions to store in memory – too slow to individually learn their value ● Solution : – Approximate using a neural network Q-Network :
  • 13. Deep Q-Network ● Loss Function: reduce the discrepancy between – the predicted value : – the expected Q-value : ● Gradient of Loss Function: – previous iteration is fixed
  • 14. Long Short-Term Memory ● recurrent neural networks with memory ● able to connect and recognize long-range patterns between words in text
  • 15. Long Short-Term Memory ● Write: ● Forget: ● Read:
  • 17. Game Representation ● Represented by the tuple ● : the set of all possible game states. ● : all commands (action-object pairs). ● : stochastic transition function between game states. ● : reward function. ● : converts game state into a textual description , since the game state is hidden.
  • 18. Representation Generator ● Reads displayed raw text ● converts it to a vector representation ● LSTM : ● mean pooling layer :
  • 19. Action Scorer ● multi-layered neural network ● given the current state representation , produces scores for actions. ● only consider commands with one action and object :
  • 20. Parameter Learning ● learning the parameters by using stochastic gradient descent with RMSprop ● keep track of the agent’s previous experiences in a memory ● sample a random transition from , and updating the parameters. ● prioritized sampling to the transition with reward
  • 22. Experiment & Result ● Two World: – Home world : hand-cracted simple testcase – Fantasy World : real MUD game testcase ● Evaluation: – the cumulative reward – the fraction of quests completed by the agent ● Baseline: – Random agent – Bag-of-words – bigram
  • 23. Experiment & Result ● Home world : – Mimic a typical house with 4 rooms. – Each room contains a representative object. – Text : Description of the character's state and that the quest. ● Example: quest: Not you are sleepy now but you are hungry now. command: go to kitchen eat apple
  • 27. Conclusion ● a deep reinforcement learning framework ● jointly learn state representations and action policies using game rewards as feedback. ● mapping text descriptions into vectors that capture the semantics of the game states.