SlideShare a Scribd company logo
1 of 87
Download to read offline
2
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
2
Introduction
Introduction
3
4
Language Empowering Intelligent Assistant
Apple Siri (2011) Google Now (2012)
Facebook M & Bot (2015) Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
Google Assistant (2016)
5
Why We Need?
 Get things done
 E.g. set up alarm/reminder, take note
 Easy access to structured data, services and apps
 E.g. find docs/photos/restaurants
 Assist your daily schedule and routine
 E.g. commute alerts to/from work
 Be more productive in managing your work and personal life
5
6
Why Natural Language?
 Global Digital Statistics (2015 January)
6
Global Population
7.21B
Active Internet Users
3.01B
Active Social
Media Accounts
2.08B
Active Unique
Mobile Users
3.65B
The more natural and convenient input of devices evolves towards speech.
7
Spoken Dialogue System (SDS)
 Spoken dialogue systems are intelligent agents that are able to help users
finish tasks more efficiently via spoken interactions.
 Spoken dialogue systems are being incorporated into various devices
(smart-phones, smart TVs, in-car navigating system, etc).
7
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
Good dialogue systems assist users to access information conveniently
and finish tasks efficiently.
8
App  Bot
 A bot is responsible for a “single” domain, similar to an app
8
Users can initiate dialogues instead of following the GUI design
9
GUI v.s. CUI (Conversational UI)
9
https://github.com/enginebai/Movie-lol-android
10
GUI v.s. CUI (Conversational UI)
Website/APP’s GUI Msg’s CUI
Situation Navigation, no specific goal Searching, with specific goal
Information
Quantity
More Less
Information
Precision
Low High
Display Structured Non-structured
Interface Graphics Language
Manipulation Click mainly use texts or speech as input
Learning Need time to learn and adapt No need to learn
Entrance App download
Incorporated in any msg-based
interface
Flexibility Low, like machine manipulation High, like converse with a human
10
Two Branches of Bots
 Personal assistant, helps
users achieve a certain task
 Combination of rules and
statistical components
 POMDP for spoken dialog systems
(Williams and Young, 2007)
 End-to-end trainable task-oriented
dialogue system (Wen et al., 2016)
 End-to-end reinforcement learning
dialogue system (Li et al., 2017; Zhao
and Eskenazi, 2016)
 No specific goal, focus on natural
responses
 Using variants of seq2seq model
 A neural conversation model (Vinyals and
Le, 2015)
 Reinforcement learning for dialogue
generation (Li et al., 2016)
 Conversational contextual cues for
response ranking (AI-Rfou et al., 2016)
11
Task-Oriented Bot Chit-Chat Bot
12
Task-Oriented Dialogue System (Young, 2000)
12
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
13
Interaction Example
13
User
Intelligent
Agent Q: How does a dialogue system process this
request?
Good Taiwanese eating places include Din Tai
Fung, Boiling Point, etc. What do you want to
choose? I can help you go there.
find a good eating place for taiwanese food
14
Task-Oriented Dialogue System (Young, 2000)
14
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
15
1. Domain Identification
Requires Predefined Domain Ontology
15
find a good eating place for taiwanese food
User
Organized Domain Knowledge (Database)Intelligent
Agent
Restaurant DB Taxi DB Movie DB
Classification!
16
2. Intent Detection
Requires Predefined Schema
16
find a good eating place for taiwanese food
User
Intelligent
Agent
Restaurant DB
FIND_RESTAURANT
FIND_PRICE
FIND_TYPE
:
Classification!
17
3. Slot Filling
Requires Predefined Schema
find a good eating place for taiwanese food
User
Intelligent
Agent
17
Restaurant DB
Restaurant Rating Type
Rest 1 good Taiwanese
Rest 2 bad Thai
: : :
FIND_RESTAURANT
rating=“good”
type=“taiwanese”
SELECT restaurant {
rest.rating=“good”
rest.type=“taiwanese”
}Semantic Frame Sequence Labeling
O O B-rating O O O B-type O
18
Task-Oriented Dialogue System (Young, 2000)
18
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
19
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
find a good eating place for taiwanese food
19
location rating type
loc, rating
rating,
type
loc,
type
all
i want it near to my office
NULL
20
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
find a good eating place for taiwanese food
20
location rating type
loc, rating
rating,
type
loc,
type
all
i want it near to my office
NULL
21
State Tracking
Handling Errors and Confidence
User
Intelligent
Agent
find a good eating place for taixxxx food
21
FIND_RESTAURANT
rating=“good”
type=“taiwanese”
FIND_RESTAURANT
rating=“good”
type=“thai”
FIND_RESTAURANT
rating=“good”
location rating type
loc, rating
rating,
type
loc,
type
all
NULL
?
?
rating=“good”
, type=“thai”
rating=“good”,
type=“taiwanese”
?
?
22
Dialogue Policy for Agent Action
 Inform(location=“Taipei 101”)
 “The nearest one is at Taipei 101”
 Request(location)
 “Where is your home?”
 Confirm(type=“taiwanese”)
 “Did you want Taiwanese food?”
22
23
Task-Oriented Dialogue System (Young, 2000)
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text Input
Are there any action movies to see this weekend?
Speech Signal
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Backend Action /
Knowledge Providers
Natural Language
Generation (NLG)Text response
Where are you located?
24
Output / Natural Language Generation
 Goal: generate natural language or GUI given the selected dialogue
action for interactions
 Inform(location=“Taipei 101”)
 “The nearest one is at Taipei 101” v.s.
 Request(location)
 “Where is your home?” v.s.
 Confirm(type=“taiwanese”)
 “Did you want Taiwanese food?” v.s.
24
Background Knowledge25
Neural Network Basics
Reinforcement Learning
26
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
26
27
Program for Solving Tasks
 Task: predicting positive or negative given a product review
27
“I love this product!” “It’s a little expensive.”“It claims too much.”
+ ?-
program.py
Some tasks are complex, and we don’t know how to write a program to solve them.
if input contains “love”, “like”, etc.
output = positive
if input contains “too much”, “bad”, etc.
output = negative
program.py program.py
“I love this product!” “It’s a little expensive.”“It claims too much.”
+ ?-
f f f
Given a large amount of data, the machine learns what the function f should be.
28
Learning ≈ Looking for a Function
 Speech Recognition
 Image Recognition
 Go Playing
 Chat Bot
 f
 f
 f
 f
cat
“你好 (Hello) ”
5-5 (next move)
“Where is HKUST?” “The address is…”
29
Machine Learning
29
Machine
Learning
Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Deep learning is a type of machine learning approaches, called “neural networks”.
30
A Single Neuron
z
1w
2w
Nw
…
1x
2x
Nx

b
 z
 z
zbias
y
  z
e
z 


1
1

Sigmoid function
Activation function
1
w, b are the parameters of this neuron
30
31
A Single Neuron
z
1w
2w
Nw
…1x
2x
Nx

b
bias
y
1





5.0"2"
5.0"2"
ynot
yis
A single neuron can only handle binary classification
31
MN
RRf :
32
A Layer of Neurons
 Handwriting digit classification
MN
RRf :
A layer of neurons can handle multiple possible output,
and the result depends on the max one
…
1x
2x
Nx

1
 1y

…
…
“1” or not
“2” or not
“3” or not
2y
3y
10 neurons/10 classes
Which
one is
max?
33
Deep Neural Networks (DNN)
 Fully connected feedforward network
1x
2x
……
Layer 1
……
1y
2y
……
Layer 2
……
Layer L
……
……
……
Input Output
MyNx
vector
x
vector
y
Deep NN: multiple hidden layers
MN
RRf :
34
Recurrent Neural Network (RNN)
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
: tanh, ReLU
time
RNN can learn accumulated sequential information (time-series)
35
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
35
36
Reinforcement Learning
 RL is a general purpose framework for decision making
 RL is for an agent with the capacity to act
 Each action influences the agent’s future state
 Success is measured by a scalar reward signal
 Goal: select actions to maximize future reward
Scenario of Reinforcement Learning
Agent learns to take actions to maximize expected reward.
37
Environment
Observation ot Action at
Reward rt
If win, reward = 1
If loss, reward = -1
Otherwise, reward = 0
Next Move
38
Supervised v.s. Reinforcement
 Supervised
 Reinforcement
38
Hello 
Agent
……
Agent
……. ……. ……
Bad
“Hello” Say “Hi”
“Bye bye” Say “Good bye”
Learning from teacher
Learning from critics
39
Sequential Decision Making
 Goal: select actions to maximize total future reward
 Actions may have long-term consequences
 Reward may be delayed
 It may be better to sacrifice immediate reward to gain more
long-term reward
39
40
Deep Reinforcement Learning
Environment
Observation Action
Reward
Function
Input
Function
Output
Used to pick the
best function
…
…
…
DNN
Modular Dialogue System41
42
Task-Oriented Dialogue System (Young, 2000)
42
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking (DST)
• Dialogue Policy
Natural Language
Generation (NLG)
Hypothesis
are there any action movies to
see this weekend
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
Text response
Where are you located?
Text Input
Are there any action movies to see this weekend?
Speech Signal
Backend Action /
Knowledge Providers
http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
43
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
43
44
Language Understanding (LU)
 Pipelined
44
1. Domain
Classification
2. Intent
Classification
3. Slot Filling
LU – Domain/Intent Classification
• Given a collection of utterances ui with labels ci, D=
{(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels
for new utterances uk.
Mainly viewed as an utterance classification task
45
find me a cheap taiwanese restaurant in oakland
Movies
Restaurants
Sports
Weather
Music
…
Find_movie
Buy_tickets
Find_restaurant
Book_table
Find_lyrics
…
46
Deep Neural Networks for Domain/Intent
Classification (Ravuri and Stolcke, 2015)
 RNN and LSTMs for utterance classification
46
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf
Intent decision after reading all words performs better
LU – Slot Filling
47
flights from Boston to New York today
O O B-city O B-city I-city O
O O B-dept O B-arrival I-arrival B-date
As a sequence
tagging task
• Given a collection tagged word sequences,
S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)),
((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …}
where ti ∊ M, the goal is to estimate tags for a new word
sequence.
flights from Boston to New York today
Entity Tag
Slot Tag
48
Recurrent Neural Nets for Slot Tagging – I
(Yao et al, 2013; Mesnil et al, 2015)
 Variations:
a. RNNs with LSTM cells
b. Input, sliding window of n-grams
c. Bi-directional LSTMs
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0
𝑓
ℎ1
𝑓
ℎ2
𝑓
ℎ 𝑛
𝑓
ℎ0
𝑏
ℎ1
𝑏
ℎ2
𝑏 ℎ 𝑛
𝑏
𝑦0 𝑦1 𝑦2 𝑦𝑛
(b) LSTM-LA (c) bLSTM
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0 ℎ1 ℎ2 ℎ 𝑛
(a) LSTM
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0 ℎ1 ℎ2 ℎ 𝑛
http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380
49
Recurrent Neural Nets for Slot Tagging – II
(Kurata et al., 2016; Simonnet et al., 2015)
 Encoder-decoder networks
 Leverages sentence level
information
 Attention-based encoder-
decoder
 Use of attention (as in MT)
in the encoder-decoder
network
 Attention is estimated using
a feed-forward network with
input: ht and st at time t
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤 𝑛 𝑤2 𝑤1 𝑤0
ℎ 𝑛 ℎ2 ℎ1 ℎ0
𝑤0 𝑤1 𝑤2 𝑤 𝑛
𝑦0 𝑦1 𝑦2 𝑦𝑛
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0 ℎ1 ℎ2 ℎ 𝑛
𝑠0 𝑠1 𝑠2 𝑠 𝑛
ci
ℎ0 ℎ 𝑛…
http://www.aclweb.org/anthology/D16-1223
ht-
1
ht+
1
ht
W W W W
taiwanese
B-type
U
food
U
please
U
V
O
V
O
V
hT+1
EOS
U
FIND_RES
T
V
Slot Filling Intent
Prediction
Joint Semantic Frame Parsing
Sequence-
based
(Hakkani-Tur
et al., 2016)
• Slot filling and
intent prediction
in the same
output sequence
Parallel
(Liu and
Lane, 2016)
• Intent prediction
and slot filling
are performed
in two branches
50 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454
51
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
51
52
Elements of Dialogue Management
52(Figure from Gašić)
Dialogue State Tracking
53
Dialogue State Tracking (DST)
 Maintain a probabilistic distribution instead of a 1-best
prediction for better robustness
53Slide credited by Sungjin Lee
Incorrect
for both!
54
Dialogue State Tracking (DST)
 Maintain a probabilistic distribution instead of a 1-best
prediction for better robustness to SLU errors or
ambiguous input
54
How can I help you?
Book a table at Sumiko for 5
How many people?
3
Slot Value
# people 5 (0.5)
time 5 (0.5)
Slot Value
# people 3 (0.8)
time 5 (0.8)
55
Neural Belief Tracker (Henderson et al., 2013;
Henderson et al., 2014; Mrkšić et al., 2015; Mrkšić et al., 2016)
55(Figure from Wen et al, 2016)
http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777
56
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
56
57
Elements of Dialogue Management
57(Figure from Gašić)
Dialogue Policy Optimization
58
Dialogue Policy Optimization
 Dialogue management in a RL framework
58
U s e r
Reward R Observation OAction A
Environment
Agent
Natural Language Generation Language Understanding
Dialogue Manager
Slides credited by Pei-Hao Su
Optimized dialogue policy selects the best action that can maximize the future reward.
Correct rewards are a crucial factor in dialogue policy training
59
Dialogue Reinforcement Learning Signal
Typical reward function
 -1 for per turn penalty
 Large reward at completion if successful
Typically requires domain knowledge
✔ Simulated user
✔ Paid users (Amazon Mechanical Turk)
✖ Real users
|||
…
﹅
59
The user simulator is usually required for
dialogue system training before deployment
61
Deep Q-Networks for Policy (Li et al., 2017)
 Deep RL for training DM
 Input: current semantic frame observation, database
returned results
 Output: system action
61
Semantic Frame
request_movie
genre=action, date=this weekend
System Action/Policy
request_location
DQN-based
Dialogue
Management
(DM)
Simulated User
Backend DB
https://arxiv.org/abs/1703.01008
62
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
62
63
Natural Language Generation (NLG)
 Mapping semantic frame into natural language
inform(name=Seven_Days, foodtype=Chinese)
Seven Days is a nice Chinese restaurant
63
64
Template-Based NLG
 Define a set of rules to map frames to NL
64
Pros: simple, error-free, easy to control
Cons: time-consuming, poor scalability
Semantic Frame Natural Language
confirm() “Please tell me more about the product your are
looking for.”
confirm(area=$V) “Do you want somewhere in the $V?”
confirm(food=$V) “Do you want a $V restaurant?”
confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
65
RNN Language Generator (Wen et al., 2015)
<BOS> SLOT_NAME serves SLOT_FOOD .
<BOS> EAT serves British .
delexicalisation
Inform(name=EAT, food=British)
0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… …
dialog act 1-hot
representation
SLOT_NAME serves SLOT_FOOD . <EOS>
65
End-to-End Learning for Dialogues
Multimodality
Dialogue Breath
Dialogue Depth
Recent Trends and Challenges66
67
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
67
68
E2E Joint NLU and DM (Yang et al., 2017)
 Idea: errors from DM can be propagated to NLU
for better robustness
DM
68
Model DM NLU
Baseline (CRF+SVMs) 7.7 33.1
Pipeline-BLSTM 12.0 36.4
JointModel 22.8 37.4
Both DM and NLU
performance is improved
https://arxiv.org/abs/1612.00913
69
0 0 0 … 0 1
Database Operator
Copy
field
…
Database
Sevendays
CurryPrince
Nirala
RoyalStandard
LittleSeuol
DB pointer
Can I have korean
Korean
0.7
British
0.2
French
0.1
…
Belief Tracker
Intent Network
Can I have <v.food>
E2E Supervised Dialogue System (Wen et al., 2016)
Generation Network
<v.name> serves great <v.food> .
Policy Network
69
zt
pt
xt
MySQL query:
“Select * where
food=Korean”
qt
https://arxiv.org/abs/1604.04562
70
E2E RL-Based Info-Bot (Dhingra et al., 2016)
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray
movie which came out in 1993.
KB-InfoBot
User
(Groundhog Day, actor, Bill Murray)
(Groundhog Day, release year, 1993)
(Australia, actor, Nicole Kidman)
(Mad Max: Fury Road, release year, 2015)
Knowledge Base (head, relation, tail)
70Idea: differentiable database for propagating the gradients
https://arxiv.org/abs/1609.00777
71
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
wi
B-
type
wi
+1
wi+2
O O
EOS
<intent
>
wi
B-
type
wi
+1
wi+2
O O
EOS
<intent
>
Semantic Frame
request_movie
genre=action,
date=this weekend
System Action /
Policy
request_location
User Dialogue Action
Inform(location=San Francisco)
Time t-1
wi
<slot>
wi
+1
wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2
Time t
Dialogue
Management
(DM)
w0 w1 w2
Natural Language Generation (NLG)
EOS
User
Goal
User Agenda Modeling
User Simulator
End-to-End Neural Dialogue System
Text Input
Are there any action movies
to see this weekend?
Idea: supervised learning for each component and reinforcement
learning for end-to-end training the neural dialogue system
71
https://arxiv.org/abs/1703.01008
72
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)
 User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
Agent: Great - I was able to purchase 2 tickets for
you to see the witch tomorrow at regal meridian 16
theater in Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you! 72
The system can learn how to efficiently
interact with users for task completion
REINFORCEMENT LEARNING SYSTEM
User: Can I get 2 tickets for the witch?
Agent: What date would you like to see it?
User: Tomorrow
Agent: Which theater do you prefer?
User: Regal meridian 16
Agent: What is your location?
User: Seattle
Agent: What time would you like to see it?
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets for
you to see the witch tomorrow at regal meridian
16 theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!
https://arxiv.org/abs/1703.01008
73
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
73
74
Brain Signal for Understanding
74
 Misunderstanding detection by brain signal
 Green: listen to the correct answer
 Red: listen to the wrong answer
http://dl.acm.org/citation.cfm?id=2388695
Detecting misunderstanding via brain signal in order to correct the understanding results
75
Video for Intent Understanding
75
Proactive (from camera)
I want to see a movie on TV!
Intent: turn_on_tv
Sir, may I turn on the TV for you?
Proactively understanding user intent to initiate the dialogues.
76
App Behavior for Understanding
76
 Task: user intent prediction
 Challenge: language ambiguity
 User preference
 Some people prefer “Message” to “Email”
 Some people prefer “Outlook” to “Gmail”
 App-level contexts
 “Message” is more likely to follow “Camera”
 “Email” is more likely to follow “Excel”
send to vivian
v.s.
Email? Message?
Communication
Considering behavioral patterns in history to model understanding for intent prediction.
77
Evolution Roadmap
77
Dialogue breadth (coverage)
Dialoguedepth(complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…
78
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
78
79
Evolution Roadmap
79
Single
domain
systems
Extended
systems
Multi-
domain
systems
Open
domain
systems
Dialogue breadth (coverage)
Dialoguedepth(complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…
80
Intent Expansion (Chen et al., 2016)
 Transfer dialogue acts across domains
 Dialogue acts are similar for multiple domains
 Learning new intents by information from other domains
CDSSM
New Intent
Intent Representation
1
2
K
:
Embedding
Generation
K+1
K+2<change_calender>
Training Data
<change_note>
“adjust my note”
:
<change_setting>
“volume turn down”
The dialogue act representations can be automatically learned for other domains
http://ieeexplore.ieee.org/abstract/document/7472838/
postpone my meeting to five pm
81
Policy for Domain Adaptation (Gašić et al., 2015)
 Bayesian committee machine (BCM) enables estimated
Q-function to share knowledge across domains
QRDR
QHDH
QL DL
Committee Model
The policy from a new domain can be boosted by the committee policy
http://ieeexplore.ieee.org/abstract/document/7404871/
82
Outline
 Introduction
 Background Knowledge
 Neural Network Basics
 Reinforcement Learning
 Modular Dialogue System
 Spoken/Natural Language Understanding (SLU/NLU)
 Dialogue Management
 Dialogue State Tracking (DST)
 Dialogue Policy Optimization
 Natural Language Generation (NLG)
 Recent Trends and Challenges
 End-to-End Neural Dialogue System
 Multimodality
 Dialogue Breath
 Dialogue Depth
82
83
Evolution Roadmap
83
Knowledge based system
Common sense system
Empathetic systems
Dialogue breadth (coverage)
Dialoguedepth(complexity)
What is influenza?
I’ve got a cold what do I do?
Tell me a joke.
I feel sad…
84
High-Level Intention for Dialogue Planning
(Sun et al., 2016; Sun et al., 2016)
 High-level intention may span several domains
Schedule a lunch with Vivian.
find restaurant check location contact play music
What kind of restaurants do you prefer?
The distance is …
Should I send the restaurant information to Vivian?
Users can interact via high-level descriptions and the
system learns how to plan the dialogues
http://dl.acm.org/citation.cfm?id=2856818; http://www.lrec-conf.org/proceedings/lrec2016/pdf/75_Paper.pdf
85
Empathy in Dialogue System (Fung et al., 2016)
 Embed an empathy module
 Recognize emotion using multimodality
 Generate emotion-aware responses
85Emotion Recognizer
vision
speech
text
https://arxiv.org/abs/1605.04072
Conclusion86
87
Concluding Remarks
87
Human-Robot interfaces is a hot topic but several components must be integrated!
Most state-of-the-art technologies are based on DNN
• Requires huge amounts of labeled data
• Several frameworks/models are available
Fast domain adaptation with scarse data + re-use of rules/knowledge
Handling reasoning
Data collection and analysis from un-structured data
Complex-cascade systems requires high accuracy for working good as a whole
THANKS FOR ATTENTION!
Q & A
We thanks Tsung-Hsien Wen, Pei-Hao Su, Li Deng, Sungjin Lee,
Milica Gašić, Lihong Li for sharing their slides

More Related Content

What's hot

Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)
인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)
인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)wonseok jung
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model trainingJaey Jeong
 
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsDongmin Lee
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit IIImalathieswaran29
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017Taehoon Kim
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Saptak Sen
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTDavide Gallitelli
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal TransportGabriel Peyré
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitEric Wendelin
 
Validating credit cards on mobile using deep learning
Validating credit cards on mobile using deep learningValidating credit cards on mobile using deep learning
Validating credit cards on mobile using deep learningDataWorks Summit
 
AI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAndrew Ferlitsch
 
Derivation of cost function for logistic regression
Derivation of cost function for logistic regressionDerivation of cost function for logistic regression
Derivation of cost function for logistic regressionhyunsung lee
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
Unit Test and TDD
Unit Test and TDDUnit Test and TDD
Unit Test and TDDViet Tran
 

What's hot (20)

Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)
인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)
인공지능 슈퍼마리오의 거의 모든 것( Pycon 2018 정원석)
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model training
 
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular Methods
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
 
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Managing and Deploying High Performance Computing Clusters using Windows HPC ...
Managing and Deploying High Performance Computing Clusters using Windows HPC ...
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Validating credit cards on mobile using deep learning
Validating credit cards on mobile using deep learningValidating credit cards on mobile using deep learning
Validating credit cards on mobile using deep learning
 
AI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAI - Introduction to Bellman Equations
AI - Introduction to Bellman Equations
 
Derivation of cost function for logistic regression
Derivation of cost function for logistic regressionDerivation of cost function for logistic regression
Derivation of cost function for logistic regression
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Unit Test and TDD
Unit Test and TDDUnit Test and TDD
Unit Test and TDD
 

Similar to Deep Learning for Dialogue Systems

Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUYun-Nung (Vivian) Chen
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人台灣資料科學年會
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人Yun-Nung (Vivian) Chen
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue SystemsMLReview
 
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)AI Frontiers
 
Statistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsStatistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsYun-Nung (Vivian) Chen
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...Yun-Nung (Vivian) Chen
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Yun-Nung (Vivian) Chen
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowTechWell
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerYun-Nung (Vivian) Chen
 
Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Yun-Nung (Vivian) Chen
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...
Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...
Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...Association for Computational Linguistics
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language InterfaceIRJET Journal
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialAlyona Medelyan
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentNVIDIA Taiwan
 

Similar to Deep Learning for Dialogue Systems (20)

Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHU
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
 
Chatbot的智慧與靈魂
Chatbot的智慧與靈魂Chatbot的智慧與靈魂
Chatbot的智慧與靈魂
 
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
 
Statistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsStatistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent Assistants
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea Flow
 
DeepPavlov 2019
DeepPavlov 2019DeepPavlov 2019
DeepPavlov 2019
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
 
Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...
Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...
Yun-Nung (Vivian) Chen - 2015 - Unsupervised Learning and Modeling of Knowled...
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken Content
 

More from Yun-Nung (Vivian) Chen

End-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue SystemsEnd-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue SystemsYun-Nung (Vivian) Chen
 
How the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesHow the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesYun-Nung (Vivian) Chen
 
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Yun-Nung (Vivian) Chen
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...Yun-Nung (Vivian) Chen
 
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Yun-Nung (Vivian) Chen
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Yun-Nung (Vivian) Chen
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Yun-Nung (Vivian) Chen
 
An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingYun-Nung (Vivian) Chen
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Yun-Nung (Vivian) Chen
 

More from Yun-Nung (Vivian) Chen (11)

End-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue SystemsEnd-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue Systems
 
How the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesHow the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in Dialogues
 
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
 
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
 
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course LecturesAutomatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction from Spoken Course Lectures
 
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
 
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
 
An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task Understanding
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Deep Learning for Dialogue Systems

  • 1.
  • 2. 2 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 2
  • 4. 4 Language Empowering Intelligent Assistant Apple Siri (2011) Google Now (2012) Facebook M & Bot (2015) Google Home (2016) Microsoft Cortana (2014) Amazon Alexa/Echo (2014) Google Assistant (2016)
  • 5. 5 Why We Need?  Get things done  E.g. set up alarm/reminder, take note  Easy access to structured data, services and apps  E.g. find docs/photos/restaurants  Assist your daily schedule and routine  E.g. commute alerts to/from work  Be more productive in managing your work and personal life 5
  • 6. 6 Why Natural Language?  Global Digital Statistics (2015 January) 6 Global Population 7.21B Active Internet Users 3.01B Active Social Media Accounts 2.08B Active Unique Mobile Users 3.65B The more natural and convenient input of devices evolves towards speech.
  • 7. 7 Spoken Dialogue System (SDS)  Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.  Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc). 7 JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion Good dialogue systems assist users to access information conveniently and finish tasks efficiently.
  • 8. 8 App  Bot  A bot is responsible for a “single” domain, similar to an app 8 Users can initiate dialogues instead of following the GUI design
  • 9. 9 GUI v.s. CUI (Conversational UI) 9 https://github.com/enginebai/Movie-lol-android
  • 10. 10 GUI v.s. CUI (Conversational UI) Website/APP’s GUI Msg’s CUI Situation Navigation, no specific goal Searching, with specific goal Information Quantity More Less Information Precision Low High Display Structured Non-structured Interface Graphics Language Manipulation Click mainly use texts or speech as input Learning Need time to learn and adapt No need to learn Entrance App download Incorporated in any msg-based interface Flexibility Low, like machine manipulation High, like converse with a human 10
  • 11. Two Branches of Bots  Personal assistant, helps users achieve a certain task  Combination of rules and statistical components  POMDP for spoken dialog systems (Williams and Young, 2007)  End-to-end trainable task-oriented dialogue system (Wen et al., 2016)  End-to-end reinforcement learning dialogue system (Li et al., 2017; Zhao and Eskenazi, 2016)  No specific goal, focus on natural responses  Using variants of seq2seq model  A neural conversation model (Vinyals and Le, 2015)  Reinforcement learning for dialogue generation (Li et al., 2016)  Conversational contextual cues for response ranking (AI-Rfou et al., 2016) 11 Task-Oriented Bot Chit-Chat Bot
  • 12. 12 Task-Oriented Dialogue System (Young, 2000) 12 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  • 13. 13 Interaction Example 13 User Intelligent Agent Q: How does a dialogue system process this request? Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. find a good eating place for taiwanese food
  • 14. 14 Task-Oriented Dialogue System (Young, 2000) 14 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers
  • 15. 15 1. Domain Identification Requires Predefined Domain Ontology 15 find a good eating place for taiwanese food User Organized Domain Knowledge (Database)Intelligent Agent Restaurant DB Taxi DB Movie DB Classification!
  • 16. 16 2. Intent Detection Requires Predefined Schema 16 find a good eating place for taiwanese food User Intelligent Agent Restaurant DB FIND_RESTAURANT FIND_PRICE FIND_TYPE : Classification!
  • 17. 17 3. Slot Filling Requires Predefined Schema find a good eating place for taiwanese food User Intelligent Agent 17 Restaurant DB Restaurant Rating Type Rest 1 good Taiwanese Rest 2 bad Thai : : : FIND_RESTAURANT rating=“good” type=“taiwanese” SELECT restaurant { rest.rating=“good” rest.type=“taiwanese” }Semantic Frame Sequence Labeling O O B-rating O O O B-type O
  • 18. 18 Task-Oriented Dialogue System (Young, 2000) 18 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers
  • 19. 19 State Tracking Requires Hand-Crafted States User Intelligent Agent find a good eating place for taiwanese food 19 location rating type loc, rating rating, type loc, type all i want it near to my office NULL
  • 20. 20 State Tracking Requires Hand-Crafted States User Intelligent Agent find a good eating place for taiwanese food 20 location rating type loc, rating rating, type loc, type all i want it near to my office NULL
  • 21. 21 State Tracking Handling Errors and Confidence User Intelligent Agent find a good eating place for taixxxx food 21 FIND_RESTAURANT rating=“good” type=“taiwanese” FIND_RESTAURANT rating=“good” type=“thai” FIND_RESTAURANT rating=“good” location rating type loc, rating rating, type loc, type all NULL ? ? rating=“good” , type=“thai” rating=“good”, type=“taiwanese” ? ?
  • 22. 22 Dialogue Policy for Agent Action  Inform(location=“Taipei 101”)  “The nearest one is at Taipei 101”  Request(location)  “Where is your home?”  Confirm(type=“taiwanese”)  “Did you want Taiwanese food?” 22
  • 23. 23 Task-Oriented Dialogue System (Young, 2000) Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text Input Are there any action movies to see this weekend? Speech Signal Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Backend Action / Knowledge Providers Natural Language Generation (NLG)Text response Where are you located?
  • 24. 24 Output / Natural Language Generation  Goal: generate natural language or GUI given the selected dialogue action for interactions  Inform(location=“Taipei 101”)  “The nearest one is at Taipei 101” v.s.  Request(location)  “Where is your home?” v.s.  Confirm(type=“taiwanese”)  “Did you want Taiwanese food?” v.s. 24
  • 25. Background Knowledge25 Neural Network Basics Reinforcement Learning
  • 26. 26 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 26
  • 27. 27 Program for Solving Tasks  Task: predicting positive or negative given a product review 27 “I love this product!” “It’s a little expensive.”“It claims too much.” + ?- program.py Some tasks are complex, and we don’t know how to write a program to solve them. if input contains “love”, “like”, etc. output = positive if input contains “too much”, “bad”, etc. output = negative program.py program.py “I love this product!” “It’s a little expensive.”“It claims too much.” + ?- f f f Given a large amount of data, the machine learns what the function f should be.
  • 28. 28 Learning ≈ Looking for a Function  Speech Recognition  Image Recognition  Go Playing  Chat Bot  f  f  f  f cat “你好 (Hello) ” 5-5 (next move) “Where is HKUST?” “The address is…”
  • 30. 30 A Single Neuron z 1w 2w Nw … 1x 2x Nx  b  z  z zbias y   z e z    1 1  Sigmoid function Activation function 1 w, b are the parameters of this neuron 30
  • 31. 31 A Single Neuron z 1w 2w Nw …1x 2x Nx  b bias y 1      5.0"2" 5.0"2" ynot yis A single neuron can only handle binary classification 31 MN RRf :
  • 32. 32 A Layer of Neurons  Handwriting digit classification MN RRf : A layer of neurons can handle multiple possible output, and the result depends on the max one … 1x 2x Nx  1  1y  … … “1” or not “2” or not “3” or not 2y 3y 10 neurons/10 classes Which one is max?
  • 33. 33 Deep Neural Networks (DNN)  Fully connected feedforward network 1x 2x …… Layer 1 …… 1y 2y …… Layer 2 …… Layer L …… …… …… Input Output MyNx vector x vector y Deep NN: multiple hidden layers MN RRf :
  • 34. 34 Recurrent Neural Network (RNN) http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ : tanh, ReLU time RNN can learn accumulated sequential information (time-series)
  • 35. 35 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 35
  • 36. 36 Reinforcement Learning  RL is a general purpose framework for decision making  RL is for an agent with the capacity to act  Each action influences the agent’s future state  Success is measured by a scalar reward signal  Goal: select actions to maximize future reward
  • 37. Scenario of Reinforcement Learning Agent learns to take actions to maximize expected reward. 37 Environment Observation ot Action at Reward rt If win, reward = 1 If loss, reward = -1 Otherwise, reward = 0 Next Move
  • 38. 38 Supervised v.s. Reinforcement  Supervised  Reinforcement 38 Hello  Agent …… Agent ……. ……. …… Bad “Hello” Say “Hi” “Bye bye” Say “Good bye” Learning from teacher Learning from critics
  • 39. 39 Sequential Decision Making  Goal: select actions to maximize total future reward  Actions may have long-term consequences  Reward may be delayed  It may be better to sacrifice immediate reward to gain more long-term reward 39
  • 40. 40 Deep Reinforcement Learning Environment Observation Action Reward Function Input Function Output Used to pick the best function … … … DNN
  • 42. 42 Task-Oriented Dialogue System (Young, 2000) 42 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Natural Language Generation (NLG) Hypothesis are there any action movies to see this weekend Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location Text response Where are you located? Text Input Are there any action movies to see this weekend? Speech Signal Backend Action / Knowledge Providers http://rsta.royalsocietypublishing.org/content/358/1769/1389.short
  • 43. 43 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 43
  • 44. 44 Language Understanding (LU)  Pipelined 44 1. Domain Classification 2. Intent Classification 3. Slot Filling
  • 45. LU – Domain/Intent Classification • Given a collection of utterances ui with labels ci, D= {(u1,c1),…,(un,cn)} where ci ∊ C, train a model to estimate labels for new utterances uk. Mainly viewed as an utterance classification task 45 find me a cheap taiwanese restaurant in oakland Movies Restaurants Sports Weather Music … Find_movie Buy_tickets Find_restaurant Book_table Find_lyrics …
  • 46. 46 Deep Neural Networks for Domain/Intent Classification (Ravuri and Stolcke, 2015)  RNN and LSTMs for utterance classification 46 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf Intent decision after reading all words performs better
  • 47. LU – Slot Filling 47 flights from Boston to New York today O O B-city O B-city I-city O O O B-dept O B-arrival I-arrival B-date As a sequence tagging task • Given a collection tagged word sequences, S={((w1,1,w1,2,…, w1,n1), (t1,1,t1,2,…,t1,n1)), ((w2,1,w2,2,…,w2,n2), (t2,1,t2,2,…,t2,n2)) …} where ti ∊ M, the goal is to estimate tags for a new word sequence. flights from Boston to New York today Entity Tag Slot Tag
  • 48. 48 Recurrent Neural Nets for Slot Tagging – I (Yao et al, 2013; Mesnil et al, 2015)  Variations: a. RNNs with LSTM cells b. Input, sliding window of n-grams c. Bi-directional LSTMs 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 𝑓 ℎ1 𝑓 ℎ2 𝑓 ℎ 𝑛 𝑓 ℎ0 𝑏 ℎ1 𝑏 ℎ2 𝑏 ℎ 𝑛 𝑏 𝑦0 𝑦1 𝑦2 𝑦𝑛 (b) LSTM-LA (c) bLSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 (a) LSTM 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380
  • 49. 49 Recurrent Neural Nets for Slot Tagging – II (Kurata et al., 2016; Simonnet et al., 2015)  Encoder-decoder networks  Leverages sentence level information  Attention-based encoder- decoder  Use of attention (as in MT) in the encoder-decoder network  Attention is estimated using a feed-forward network with input: ht and st at time t 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤 𝑛 𝑤2 𝑤1 𝑤0 ℎ 𝑛 ℎ2 ℎ1 ℎ0 𝑤0 𝑤1 𝑤2 𝑤 𝑛 𝑦0 𝑦1 𝑦2 𝑦𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 𝑠0 𝑠1 𝑠2 𝑠 𝑛 ci ℎ0 ℎ 𝑛… http://www.aclweb.org/anthology/D16-1223
  • 50. ht- 1 ht+ 1 ht W W W W taiwanese B-type U food U please U V O V O V hT+1 EOS U FIND_RES T V Slot Filling Intent Prediction Joint Semantic Frame Parsing Sequence- based (Hakkani-Tur et al., 2016) • Slot filling and intent prediction in the same output sequence Parallel (Liu and Lane, 2016) • Intent prediction and slot filling are performed in two branches 50 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_MultiJoint.pdf; https://arxiv.org/abs/1609.01454
  • 51. 51 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 51
  • 52. 52 Elements of Dialogue Management 52(Figure from Gašić) Dialogue State Tracking
  • 53. 53 Dialogue State Tracking (DST)  Maintain a probabilistic distribution instead of a 1-best prediction for better robustness 53Slide credited by Sungjin Lee Incorrect for both!
  • 54. 54 Dialogue State Tracking (DST)  Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input 54 How can I help you? Book a table at Sumiko for 5 How many people? 3 Slot Value # people 5 (0.5) time 5 (0.5) Slot Value # people 3 (0.8) time 5 (0.8)
  • 55. 55 Neural Belief Tracker (Henderson et al., 2013; Henderson et al., 2014; Mrkšić et al., 2015; Mrkšić et al., 2016) 55(Figure from Wen et al, 2016) http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777
  • 56. 56 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 56
  • 57. 57 Elements of Dialogue Management 57(Figure from Gašić) Dialogue Policy Optimization
  • 58. 58 Dialogue Policy Optimization  Dialogue management in a RL framework 58 U s e r Reward R Observation OAction A Environment Agent Natural Language Generation Language Understanding Dialogue Manager Slides credited by Pei-Hao Su Optimized dialogue policy selects the best action that can maximize the future reward. Correct rewards are a crucial factor in dialogue policy training
  • 59. 59 Dialogue Reinforcement Learning Signal Typical reward function  -1 for per turn penalty  Large reward at completion if successful Typically requires domain knowledge ✔ Simulated user ✔ Paid users (Amazon Mechanical Turk) ✖ Real users ||| … ﹅ 59 The user simulator is usually required for dialogue system training before deployment
  • 60. 61 Deep Q-Networks for Policy (Li et al., 2017)  Deep RL for training DM  Input: current semantic frame observation, database returned results  Output: system action 61 Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location DQN-based Dialogue Management (DM) Simulated User Backend DB https://arxiv.org/abs/1703.01008
  • 61. 62 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 62
  • 62. 63 Natural Language Generation (NLG)  Mapping semantic frame into natural language inform(name=Seven_Days, foodtype=Chinese) Seven Days is a nice Chinese restaurant 63
  • 63. 64 Template-Based NLG  Define a set of rules to map frames to NL 64 Pros: simple, error-free, easy to control Cons: time-consuming, poor scalability Semantic Frame Natural Language confirm() “Please tell me more about the product your are looking for.” confirm(area=$V) “Do you want somewhere in the $V?” confirm(food=$V) “Do you want a $V restaurant?” confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
  • 64. 65 RNN Language Generator (Wen et al., 2015) <BOS> SLOT_NAME serves SLOT_FOOD . <BOS> EAT serves British . delexicalisation Inform(name=EAT, food=British) 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… … dialog act 1-hot representation SLOT_NAME serves SLOT_FOOD . <EOS> 65
  • 65. End-to-End Learning for Dialogues Multimodality Dialogue Breath Dialogue Depth Recent Trends and Challenges66
  • 66. 67 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 67
  • 67. 68 E2E Joint NLU and DM (Yang et al., 2017)  Idea: errors from DM can be propagated to NLU for better robustness DM 68 Model DM NLU Baseline (CRF+SVMs) 7.7 33.1 Pipeline-BLSTM 12.0 36.4 JointModel 22.8 37.4 Both DM and NLU performance is improved https://arxiv.org/abs/1612.00913
  • 68. 69 0 0 0 … 0 1 Database Operator Copy field … Database Sevendays CurryPrince Nirala RoyalStandard LittleSeuol DB pointer Can I have korean Korean 0.7 British 0.2 French 0.1 … Belief Tracker Intent Network Can I have <v.food> E2E Supervised Dialogue System (Wen et al., 2016) Generation Network <v.name> serves great <v.food> . Policy Network 69 zt pt xt MySQL query: “Select * where food=Korean” qt https://arxiv.org/abs/1604.04562
  • 69. 70 E2E RL-Based Info-Bot (Dhingra et al., 2016) Movie=?; Actor=Bill Murray; Release Year=1993 Find me the Bill Murray’s movie. I think it came out in 1993. When was it released? Groundhog Day is a Bill Murray movie which came out in 1993. KB-InfoBot User (Groundhog Day, actor, Bill Murray) (Groundhog Day, release year, 1993) (Australia, actor, Nicole Kidman) (Mad Max: Fury Road, release year, 2015) Knowledge Base (head, relation, tail) 70Idea: differentiable database for propagating the gradients https://arxiv.org/abs/1609.00777
  • 70. 71 E2E Task-Completion Bot (TC-Bot) (Li et al., 2017) wi B- type wi +1 wi+2 O O EOS <intent > wi B- type wi +1 wi+2 O O EOS <intent > Semantic Frame request_movie genre=action, date=this weekend System Action / Policy request_location User Dialogue Action Inform(location=San Francisco) Time t-1 wi <slot> wi +1 wi+2 O O EOS <intent> Language Understanding (LU) Time t-2 Time t Dialogue Management (DM) w0 w1 w2 Natural Language Generation (NLG) EOS User Goal User Agenda Modeling User Simulator End-to-End Neural Dialogue System Text Input Are there any action movies to see this weekend? Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system 71 https://arxiv.org/abs/1703.01008
  • 71. 72 E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)  User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle. RULE BASED SYSTEM User: Can I get 2 tickets for the witch? Agent: Which theater do you want? User: Regal meridian 16 Agent: What time would you like to see it? User: 9:30 pm Agent: What date would you like to see it? User: Tomorrow! Agent: How many people are you? User: Just two adults. Agent: What is your location? User: Seattle! Agent: What kind of video format do you like? User: Oh, any kind is okay. Agent: Do you have any price constraint? User: No, I don’t care. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thank you. Agent: Thank you! 72 The system can learn how to efficiently interact with users for task completion REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch? Agent: What date would you like to see it? User: Tomorrow Agent: Which theater do you prefer? User: Regal meridian 16 Agent: What is your location? User: Seattle Agent: What time would you like to see it? User: 9:30 pm. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thanks. Agent: Thanks! https://arxiv.org/abs/1703.01008
  • 72. 73 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 73
  • 73. 74 Brain Signal for Understanding 74  Misunderstanding detection by brain signal  Green: listen to the correct answer  Red: listen to the wrong answer http://dl.acm.org/citation.cfm?id=2388695 Detecting misunderstanding via brain signal in order to correct the understanding results
  • 74. 75 Video for Intent Understanding 75 Proactive (from camera) I want to see a movie on TV! Intent: turn_on_tv Sir, may I turn on the TV for you? Proactively understanding user intent to initiate the dialogues.
  • 75. 76 App Behavior for Understanding 76  Task: user intent prediction  Challenge: language ambiguity  User preference  Some people prefer “Message” to “Email”  Some people prefer “Outlook” to “Gmail”  App-level contexts  “Message” is more likely to follow “Camera”  “Email” is more likely to follow “Excel” send to vivian v.s. Email? Message? Communication Considering behavioral patterns in history to model understanding for intent prediction.
  • 76. 77 Evolution Roadmap 77 Dialogue breadth (coverage) Dialoguedepth(complexity) What is influenza? I’ve got a cold what do I do? Tell me a joke. I feel sad…
  • 77. 78 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 78
  • 78. 79 Evolution Roadmap 79 Single domain systems Extended systems Multi- domain systems Open domain systems Dialogue breadth (coverage) Dialoguedepth(complexity) What is influenza? I’ve got a cold what do I do? Tell me a joke. I feel sad…
  • 79. 80 Intent Expansion (Chen et al., 2016)  Transfer dialogue acts across domains  Dialogue acts are similar for multiple domains  Learning new intents by information from other domains CDSSM New Intent Intent Representation 1 2 K : Embedding Generation K+1 K+2<change_calender> Training Data <change_note> “adjust my note” : <change_setting> “volume turn down” The dialogue act representations can be automatically learned for other domains http://ieeexplore.ieee.org/abstract/document/7472838/ postpone my meeting to five pm
  • 80. 81 Policy for Domain Adaptation (Gašić et al., 2015)  Bayesian committee machine (BCM) enables estimated Q-function to share knowledge across domains QRDR QHDH QL DL Committee Model The policy from a new domain can be boosted by the committee policy http://ieeexplore.ieee.org/abstract/document/7404871/
  • 81. 82 Outline  Introduction  Background Knowledge  Neural Network Basics  Reinforcement Learning  Modular Dialogue System  Spoken/Natural Language Understanding (SLU/NLU)  Dialogue Management  Dialogue State Tracking (DST)  Dialogue Policy Optimization  Natural Language Generation (NLG)  Recent Trends and Challenges  End-to-End Neural Dialogue System  Multimodality  Dialogue Breath  Dialogue Depth 82
  • 82. 83 Evolution Roadmap 83 Knowledge based system Common sense system Empathetic systems Dialogue breadth (coverage) Dialoguedepth(complexity) What is influenza? I’ve got a cold what do I do? Tell me a joke. I feel sad…
  • 83. 84 High-Level Intention for Dialogue Planning (Sun et al., 2016; Sun et al., 2016)  High-level intention may span several domains Schedule a lunch with Vivian. find restaurant check location contact play music What kind of restaurants do you prefer? The distance is … Should I send the restaurant information to Vivian? Users can interact via high-level descriptions and the system learns how to plan the dialogues http://dl.acm.org/citation.cfm?id=2856818; http://www.lrec-conf.org/proceedings/lrec2016/pdf/75_Paper.pdf
  • 84. 85 Empathy in Dialogue System (Fung et al., 2016)  Embed an empathy module  Recognize emotion using multimodality  Generate emotion-aware responses 85Emotion Recognizer vision speech text https://arxiv.org/abs/1605.04072
  • 86. 87 Concluding Remarks 87 Human-Robot interfaces is a hot topic but several components must be integrated! Most state-of-the-art technologies are based on DNN • Requires huge amounts of labeled data • Several frameworks/models are available Fast domain adaptation with scarse data + re-use of rules/knowledge Handling reasoning Data collection and analysis from un-structured data Complex-cascade systems requires high accuracy for working good as a whole
  • 87. THANKS FOR ATTENTION! Q & A We thanks Tsung-Hsien Wen, Pei-Hao Su, Li Deng, Sungjin Lee, Milica Gašić, Lihong Li for sharing their slides