Genislab builds better products and faster go-to-market with Lean project man...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understanding Portability to a New Language
1. A Graph-based Cross-lingual Projection Approach for Spoken Language
Understanding Portability to a New Language
Seokhwan Kim
Human Language Technology Department, Institute for Infocomm Research, Singapore
Introduction
Statistical approaches to SLU require a sufficient number of
training examples to obtain good results
Cross-lingual SLU using SMT technologies can improve the
portability of SLU to a new language
Previous work on cross-lingual SLU have focused on filtering
out or correcting the noisy translations as post-processing
We propose a graph-based projection approach to improve
the robustness to the translation errors in cross-lingual SLU
Cross-lingual SLU Using SMT
TrainOnTarget
Dataset
in Ls
SMT
from Ls to Lt
Translated
Dataset
in Lt
SLU
in Lt
User Input
in Lt
TestTraining
Annotations for a given word sequence x = {x1, · · · , xn}
NE
an NE tag sequence y = {y1, · · · , yn}
DA
a class variable z
Example
xs
ys
zs
xt
yt
zt
Show me flights to New York on Nov 18th
î &â 11* 18Ò Êë ß ÃK š JBn
to.city
-b
to.city
-i
month
-b
day
-b
o oooo
to.city
-b
month
-b
day
-b
o o o oo o
show_flight
show_flight
Direct Projection
The simplest way of projection
It propagates the annotations only with word alignments themselves
It considers only the translation for each single utterance
It is performed by a single pass process
The results of direct projection can be unreliable because of
erroneous translations and word alignments
Graph-based Projection
Graph Construction for NE
Nodes
All trigrams in the dataset
Edges
Monolingual: w(vi, vj) = simcosine(f(vi), f(vj)) =
f(vi)·f(vj)
|f(vi)||f(vj)|
Bilingual: w(vk
s , vl
t ) =
count(vk
s ,vl
t )
vm
t
count(vk
s ,vm
t )
Initial values
Based on the manual annotations of NE in Ls
vt
vt
vt
vt
vs
vs
vs
vs
Graph Construction for DA
Nodes
Utterance nodes U = {u1, · · · , um}
Trigram nodes V
Edges
The edge between ui and vj has a binary weight value indicating whether vj in ui
Initial values
Based on the manual annotations of DA in Ls
ut
ut
vt
vt
vt
vt
vs
vs
vs
vs
us
us
Label Propagation
A graph-based semi-supervised learning algorithm
It induces labels for all of the unlabeled nodes on the graph
Experimental Settings
Data
3,351 pairs of bi-utterances in English and Korean
Manually annotated with 30 DA classes and 30 NE classes
Toolkits
Moses and SRILM for SMT
Junto toolkit for Graph-based projection
Maximum Entropy for DA identification
Conditional Random Fields for NE recognition
Measures
5-fold cross validation to the manual annotations on Lt
Precision/recall/F-measure for NE recognition
Accuracy for DA identification.
Experimental results
NE
Korean→English English→Korean
P R F P R F
Supervised 97.6 95.4 96.4 97.1 96.9 97.0
TestOnSource 45.2 16.4 24.0 63.8 19.9 30.3
Direct 43.1 11.9 18.7 50.9 14.8 23.0
Graph-based 50.7 39.8 44.6 67.2 43.4 52.7
DA
Accuracy (%)
Korean→English English→Korean
Supervised 87.7 83.3
TestOnSource 58.9 70.2
Direct 56.5 69.6
Graph-based 63.5 74.3
Conclusion
This paper presented a graph-based projection approach for
cross-lingual SLU using SMT
Our approach performed a label propagation algorithm on a
proposed graph that was defined with the translations for all
over the dataset
The feasibility of our approach was demonstrated by English
and Korean SLU models
Experimental results show that our graph-based projection
helped to improve the performances of the cross-lingual SLU
than previous approaches
1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg WWW: http://hlt.i2r.a-star.edu.sg/