SlideShare a Scribd company logo
1 of 5
Download to read offline
A Vietnamese Language Model
Based on Recurrent Neural Network
Viet-Trung Tran∗, Kiem-Hieu Nguyen∗, Duc-Hanh Bui†
Hanoi University of Science and Technology
∗Email: {trungtv,hieunk}@soict.hust.edu.vn
†Email: hanhbd.hedspi@gmail.com
Abstract—Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text predic-
tion, machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.
I. INTRODUCTION
One of fundamental tasks in NLP is statistical language
modeling (SLM), statistically formalized as inferring a proba-
bility distribution of word sequence. SLM has many applica-
tions such as machine translation, speech recognition and text
prediction [1], [2].
N-gram has been de-facto for language modeling [3]. An
n-gram model predicts the next word using (n − 1) preceding
words in the context p(w|w0...wn−2). Such probability is
estimated by occurrence frequency of word sequences of
length n from a large text collection. At n = 2, 3, 4, or 5,
the model captures frequent word patterns as long as enough
training texts are provided. However, as n increases, it faces
the curse of dimensionality issue.
• Firstly, given a context, it could only predict a word that
has been “seen” before with the context in the training
texts.
• Secondly, it can only “see” a few words back, i.e. (n −
1) preceding words. A longer context, such as previous
sentences, couldn’t be considered in predicting the next
word. This is not intuitively the way human learn and
capture a context.
Neural language models, i.e. neural networks-based lan-
guage model, successfully surpassed these two shortcoming
[4], [5].
For the first shortcoming, similarly to an n-gram model, a
feed-forward neural network (FFNN) predicts the next word
using preceding n−1 words context [4]. The idea is distributed
word representations learned by the network can capture words
similarity. For instance, by capturing similarity between ‘con
chó’ (dog) and ‘con mèo’ (cat), ‘căn phòng’ (room) and ‘phòng
ngủ’ (bedroom), ‘chạy’ (running) and ‘đi’ (walking), and so
on, the model could generate the sentence “Một con chó đang
chạy trong căn phòng” (A dog was running in a room) given
the sentence “Con mèo đang đi trong phòng ngủ” (The cat is
walking in the bedroom). Their model reportedly outperforms
n-gram language model on benchmark datasets.
For the second shortcoming, although the avantage of re-
current neural networks (RNNs) over Feedforward neural net-
works (FNNs) is still a debating subject [6], RNNs explicitly
allow learning from arbitrary long context using a recurrent
mechanism within the network [5], [7]. The long-term context
is particularly useful in learning via the long-short-term mem-
ory architecture [8]. Experimental results reported in [5] show
that an RNN-based language model outperforms its FFNN-
based counterpart.
One may argue for n-gram approach for its simplicity
and efficiency. But as hardware is more and more evolving;
and efficient learning algorithms for neural networks such as
[9] are proposed, neural language modeling is promising. In
another direction, [10] proposes a multi-modality framework
for jointly modeling images and texts in a unified neural
language model.
In this work, we investigate in experimenting RNNs for
Vietnamese language model. Our contributions are summa-
rized as follows:
• Building an RNN-based syllable-level language model for
Vietnamese.
• Building an RNN-based character-level language model
for Vietnamese.
• Running extensive experiments on a 24M syllables
dataset constructed from 1,500 movie subtitles.
The rest of the paper is organized into four sections:
Section II discusses our main contributions of building RNN-
based language models for Vietnamese; Section III records
our test-bed and results of the experiments. Related works are
introduced in Section IV. We conclude the paper with final
remarks in Section V.
II. RNN-BASED LANGUAGE MODELING FOR VIETNAMESE
In this section, we first present RNN-based language mod-
eling (Section II-A. We then briefly introduce some impor-
tant characteristics of Vietnamese (Section II-B). Finally, we
proposed RNN-based language models at both character and
syllable levels for Vietnamese (Sections II-C and II-D).
A. RNN
Given a training sequence (x1, ..., xT ), an RNN computes
the output vectors (o1, ..., oT ) and then applies the softmax
function which maximizes the total log probability of the
obtained predictive distribution P(xt+1|x0x1...xt). In a simple
configuration, our RNN is constructed with three layer: an
input layer x, a hidden layer s and an output layer y. At a
time t, input, output and hidden layer state of the network
are denoted as x(t), o(t), and s(t) respectively. As our
network is recurrent, the previous hidden layer state s(t − 1)
is intentionally input to the hidden layer s(t). Mathematically,
these layers are computed as follows:
s(t) = f(Ux(t) + Ws(t − 1)) (1)
o(t) = softmax(V s(t)) (2)
U, W, V denote the parameters of our RNN to be learn. f
is an activation function such as sigmoid, ReLU, and tanh.
Our network and its unfolding in time of the computation are
shown in Figure 1.
Figure 1. An RNN and its unfolding in time of the computation
RNN has the following characteristics:
• The hidden state s(t) is seen as the memory of the
network. s(t) captures all the information of the previous
time steps. In practice due to the complexity of the train-
ing algorithm, s(t) computation only takes into account
a limited number of recently previous steps.
• RNN is different from other type of neural network in
which it uses the same parameters across all steps. More
precisely, (U, V, W) are shared and are independent of the
inputs. This reduce the number of parameters to learn.
As clearly discussed in [11], training a RNN with gradient
decent is practically difficult. This means that RNN is hard
to learn long-term dependencies as the gradient blows up or
vanishes exponentially over time. This is called the vanishing
gradient problem. To address this problem, the hidden layer
within our RNN will be implemented with a so-called long
short term memory (LSTM) layout.
LSTM cells (in hidden layer) are designed to be capable
of learning long-term dependencies (Figure 2). It is first
introduced in [12]. Each LSTM cell consists of three gates to
control the information into the cell and from the cell output
back to the network. The forget gate decides whether the cell
state at time (t − 1) will be filtered out or not. The input
gate regulates the inputs. Similarly, the output gate activates
or deactivates its outputs to the network (Figure 2).
Figure 2. An LSTM cell
Beside powering RNN with LSTM cells, an RNN can also
be stacking to form a deeper neural network. In this setting,
the output of the first layer will become the input of the
second and so on, as depicted in Figure 3. By going deeper,
RNN can capture more context, thus, better modeling the
sequence distribution [13]. Of course, this increases the model
parameters, training time, and requires more input dataset.
Figure 3. Stacking multiple networks
B. Characteristics of Vietnamese language
In this section, we briefly describe some important charac-
teristics of Vietnamese regarding under-word level language
modeling task.
Vietnamese scripts are written in Latin characters (‘kí tự’ or
‘chữ’ in Vietnamese). There are 29 characters in Vietnamese
alphabet, including 12 vowels and 17 consonants. There are
five accent marks in Vietnamese. An accent mark could be
attached to a vowel to modify pronounce tone, resulting in 60
additional ‘vowel’ variants (Table I [14]).
Characters are grouped into syllables (‘âm tiết’ in Viet-
namese). Syllables, in turn, are grouped into words, which are
the smallest meaningful unit in a language. Words could be
Initial char Variants
a [a, à, á, ả, ã, ạ, ă, ằ, ắ, ẳ, ẵ, ặ, â, ầ, ấ, ẩ, ẫ, ậ]
e [e, è, é, ẻ, ẽ, ẹ, ê, ề, ế, ể, ễ, ệ]
o [o, ò, ó, ỏ, õ, ọ, ô, ồ, ố, ổ, ỗ, ộ, ơ, ờ, ớ, ở, ỡ, ợ]
u [u, ù, ú, ủ, ũ, ụ, ư, ừ, ứ, ử, ữ, ự]
i [i, ì, í, ỉ, ĩ, ị]
y [y, ỳ, ý, ỷ, ỹ, ỵ]
Table I
VOWEL CHARACTERS AND VARIANTS IN VIETNAMESE ALPHABET.
composed of a single syllable, e.g. ‘mua’ (to buy), or multiple
syllables, e.g. ‘nghiên cứu’ (research). Typically, words are
composed of two syllables, standing for 70% of words in a
general-domain Vietnamese dictionary [15].
Due to complicated originality of Vietnamese morphology
and lexicon, word segmentation is a fundamental yet chal-
lenging task in Vietnamese language processing [16]. In this
paper, we shall let word-level language model for further
investigation. We instead focus on sub-word levels, i.e. char
and syllable levels.
C. Character-Level Language Model
At character level, our objective is to predict the next
character given a sequence of previous characters. The input X
is thus a character sequence. x(t) ∈ X is the character at time
t. Since the number of character (char-level vocabulary size
V ) is small, we directly use the one-hot-encoding. Each x(t)
is a vector of size V with the component corresponding to the
character itself having value one and all the other components
having value zero.
The output o(t) at each time step t is appropriately a vector
of size Vchar. Specifically, each component of o(t) indicates
the probability of generating that character giving the previous
context from 0 to t. Given h len as size of the hidden layer
s (Section II-A), followings are domain space of our learning
parameters:
x(t) ∈ RVchar
o(t) ∈ RVchar
s(t) ∈ Rh len
U ∈ RVchar×h len
V ∈ Rh len×Vchar
W ∈ RVchar×Vchar
(3)
D. Syllable-Level Language Model
At syllable level, our objective is to predict the next syllable
given a sequence of previous syllables. Therefore, x(t) would
be the one-hot encoding of the syllable at time t. We observe
that the number of all Vietnamese syllables, Vsyl, is about
7000. Given Vsyl and h len as sizes of syllable vocabulary
and the hidden layer s, respectively, the domain space of our
learning parameters is the same as in Equation 3 except with
a much bigger Vsyl.
III. EXPERIMENTS
A. Datasets
This work focuses on conversational data instead of general
texts (e.g. from news articles). Therefore, we have manually
collected movie subtitles from 1,500 movies available on the
Internet. Subtitles were contributed by volunteers translators
from its original languages to Vietnamese. For instance, three
sequences {“How many bags can you carry?”, “If you don’t
think for yourself”, “think for your mother!”} were extracted
from the part as shown in Figure 4.
132
00:07:33,920 –> 00:07:35,560
Chú có thể vác được bao nhiêu bao cơ chứ?
133
00:07:35,680 –> 00:07:36,760
Nếu chú không nghĩ đến bản thân...
134
00:07:36,880 –> 00:07:39,280
...thì cũng phải nghĩ đến mẹ ở nhà!
Figure 4. A sample movie subtitle original file with triples of (frame number,
frame offset, and subtitle).
We picked movie subtitle for two reasons. First, it is widely
available and public over Internet. Second, the language itself
is close to conversational Vietnamese, generated by thousands
of contributors, thus barely imply specific writing style.
The following steps were processed to clean data:
• Tokenizing: Each sequence is tokenized into tokens using
a regular expression processor. A token could be a
syllable, a foreign word, or a punctuation mark.
• Detecting and removing incomplete translations: We
could not expect high commitment from volunteer trans-
lators. In fact, we found many subtitles which were par-
tially translated. Fortunately, most of the origin subtitles
were in English. Such origin subtitles obviously generate
noise when learning a language model for Vietnamese. In
order to detect and remove such subtitles, we use Prince-
ton WordNet ([17]) to extract an English vocabulary. We
then scan each subtitle for tokens that could be found
in the vocabulary. If a half of all tokens matches this
criteria, the subtitle is considered untranslated. We also
used langdetect package1
to filter out sentence others than
in Vietnamese. Manual observation on a random subset
of removed subtitles shows that this heuristic filter works
sufficiently for our purpose.
• Normalizing foreign words: For tokens that were detected
by the English vocabulary, the tokens were all assigned
the value ‘OOV’.
• Adding start symbol: Each sequence in our dataset is
a subtitle displayed in a movie frame. As seen in
conversational data and particularly movie subtitle data,
a sequence is not necessarily a complete sentence or
1https://pypi.python.org/pypi/langdetect
syntactic phrase in linguistics. Unlike general texts, this
characteristic is challenging for language modeling. To
signify the beginning of a sequence, we added a ‘START’
symbol ahead.
• Punctuation: We keep all punctuation as in original sub-
titles.
Data for syllable models: Since the inputs to our RNNs
are vectors, not strings, we create a mapping between syl-
lables and indices, INDEX TO SYLLABLE, and SYLLA-
BLE TO INDEX. For example, a syllable ‘anh’ has index
1043, which holds two entries in the two mapping tables. A
training sample x looks like [0, 56, 78, 454] where 0 is the
index of ‘START’.
Data for char models: Similarly to syllable models, a
mapping between characters and indices have to be cre-
ated using two mapping tables, INDEX TO CHAR and
CHAR TO INDEX. Again, 0 is the index of ‘START’.
B. Evaluation
We built four neural language models on the Tensor-
flow framework [18], with three variable parameter values
10, 20, 40 of the sequence length:
• Char-1-layer-40-seq: A simple char-level model with
only one hidden layer and sequence length of 40.
• Char-3-layer-40-seq: A more complicated char-level
model with three hidden layers and sequence length of
40.
• Syl-1-layer-[10,20,40]-seq: Three variant of a simple
syllable-level model with one hidden layer and sequence
length of 10, 20, and 40, respectively.
• Syl-3-layer-[10,20,40]-seq: Three variant of a deep
syllable-level model with three hidden layers and se-
quence length of 10, 20, and 40, respectively.
BASELINE: We used a bi-gram language model as base-
line2
. To confront unknown words, we used an add-one
Laplace smoothing.
Model PPL
Char-1-layer-40-seq 116
Char-3-layer-40-seq 109
Syl-1-layer-10-seq 85.6
Syl-3-layer-10-seq 89.1
Syl-1-layer-20-seq 83.8
Syl-3-layer-20-seq 95.5
Syl-1-layer-40-seq 99.7
Syl-3-layer-40-seq 95.7
Bi-gram 145
Table II
PERPLEXITY PER SYLLABLE ON OUR MODELS.
We measured the perplexity which is the inverse probability
of the input sequence. The perplexity is computed as in 4:
PPL = N
N
i=1
1
P(wi|w1...wi−1)
(4)
2Note that this is not the most accurate in n-gram approach. However, we
select this model because it is easy to implement, parameter-free, and is a
relatively strong baseline.
Where N is the number of words and P(wi|w1...wi−1) is the
probability of word wi given the previous sequence wi...wi−1.
As shown in Table II, all neural models significantly out-
perform the bi-gram baseline. Our best model obtained a
perplexity of 83.8 for syllable-based, one layer and trained
with RNN sequence length of 20. Note that RNN sequence
length signifies the maximum size of an unfolded recurrent
network, which means the largest context used to generate a
new character/syllable. This shows that increasing the context
size to 40 in syllable model does not help reduce overall
perplexity. This could be due to the nature of our training
corpus: it contains mostly conversational short sentences. We
let the investigation of this hypothesis as a future work.
Intuitively, the character language models in our settings
are not better than the syllable models as they have to first
predict multiple characters in order to form a syllable. Stacking
3-layers RNN increases the number of learning parameters,
thus giving the models more expressive power to predict new
words. However, in the cases of Syl-3-layer-[10,20]-seq, more
number of learning parameters hurts perplexity scores.
We show below in Figure 5 several predictions from our
best-performed syllable model i.e. Syl-1-layer-20-seq in Ta-
ble II.
START cô
Suggestions 1: ấy, 2: OOV, 3:
——————————————
START cô ấy
Suggestions 1: , 2: bị, 3: đã
——————————————
START cô ấy là
Suggestions 1: OOV, 2: một, 3: người
——————————————
START cô ấy là bác
Suggestions 1: sĩ, 2: sỹ, 3:
——————————————
START cô ấy là bác sĩ
Suggestions 1: , 2: OOV, 3: của
——————————————
START cô ấy là bác sĩ tâm
Suggestions 1: thần, 2: lý, 3: lí
——————————————
START cô ấy là bác sĩ tâm lý
Suggestions 1: , 2: của, 3: OOV
Figure 5. A sample syllable prediction that suggested 3 options with highest
probabilities for the sentence: "cô ấy là bác sĩ tâm lý" (she is a psychiatrist).
IV. RELATED WORK
This section discusses related works in language modeling
for Vietnamese. Language modeling is an important NLP
tasks. However, to our knowledge, only a handful works focus
on this task for Vietnamese [19], [20], [21].
In a pioneering work on language modeling for automatic
speech recognition in Vietnamese, [19] applies n-gram statis-
tical language modeling to train on a corpus of 10M sentences
collected from Web documents. They report a perplexity
per word of 252 on the best configuration. A priori and
a posterior methods for calculating perplexity over n-gram
language models for Vietnamese are also discussed in [20].
Perhaps closer to our direction is the work presented in
[21]. They investigate neural language model at syllable level.
Experiments were conducted on a corpus of 76k sentences in
Vietnamese text. As reported, a feed forward neural language
model achieves a perplexity of 121.
Word-level RNN language models can also be built for
Vietnamese providing word segmented texts. Both word and
syllable-level language models have the drawback of learning
new words. They need to be retrained upon adding new
words to the vocabulary. On the other hand, character level
language model has no problem with learning new words since
words are constructed by only predefined characters. Hence, a
character level language model not only predicts the next word,
but also completes current word. We also plan to investigate
as embedding vectors trained from a large text collection to
better capture word semantic of meaning and to reduce vector
size.
V. CONCLUSIONS
This paper presents a neural language model for Viet-
namese. We investigate RNNs at both character and syllable
levels. Experimental results show that the best configuration
yields a perplexity of 83.8, which is a reasonable performance
for a language model. All of our models outperform n-gram
language model in the experiments.
For future work, we plan to investigate RNNs at word level.
We anticipate that embedding representations using techniques
like Convolutional Neural Networks will also improve both
accuracy and efficiency of our approach.
ACKNOWLEDGMENT
This work is supported by Hanoi University of Science and
Technology (HUST) under project No. T2016-PC-047
REFERENCES
[1] R. Rosenfeld, “Two Decades Of Statistical Language Modeling: Where
Do We Go From Here?” in Proceedings of the IEEE, vol. 88, 2000.
[Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
10.1.1.29.463
[2] K. Yao, G. Zweig, M. Hwang, Y. Shi, and D. Yu,
“Recurrent neural networks for language understanding,”
in INTERSPEECH 2013, 14th Annual Conference of the
International Speech Communication Association, Lyon, France,
August 25-29, 2013, 2013, pp. 2524–2528. [Online]. Available:
http://www.isca-speech.org/archive/interspeech 2013/i13 2524.html
[3] S. F. Chen and J. Goodman, “An empirical study of smoothing
techniques for language modeling,” in Proceedings of the 34th
Annual Meeting on Association for Computational Linguistics, ser.
ACL ’96. Stroudsburg, PA, USA: Association for Computational
Linguistics, 1996, pp. 310–318. [Online]. Available: http://dx.doi.org/
10.3115/981863.981904
[4] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A neural
probabilistic language model,” J. Mach. Learn. Res., vol. 3, pp.
1137–1155, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.
cfm?id=944919.944966
[5] T. Mikolov, M. Karafi´at, L. Burget, J. Cernock´y, and S. Khudanpur,
“Recurrent neural network based language model,” in INTERSPEECH
2010, 11th Annual Conference of the International Speech Communi-
cation Association, Makuhari, Chiba, Japan, September 26-30, 2010,
2010, pp. 1045–1048.
[6] M. Sundermeyer, I. Oparin, J.-L. Gauvain, B. Freiberg, R. Schl
”uter, and H. Ney, “Comparison of feedforward and recurrent neural
network language models,” in IEEE International Conference on Acous-
tics, Speech, and Signal Processing, Vancouver, Canada, May 2013, pp.
8430–8434.
[7] T. Mikolov and G. Zweig, “Context dependent recurrent neural network
language model,” in 2012 IEEE Spoken Language Technology Workshop
(SLT), Miami, FL, USA, December 2-5, 2012, 2012, pp. 234–239.
[Online]. Available: http://dx.doi.org/10.1109/SLT.2012.6424228
[8] M. Sundermeyer, R. Schl
”uter, and H. Ney, “Lstm neural networks for language modeling,” in
Interspeech, Portland, OR, USA, Sep. 2012, pp. 194–197.
[9] F. Morin and Y. Bengio, “Hierarchical probabilistic neural network
language model,” in Proceedings of the Tenth International Workshop
on Artificial Intelligence and Statistics, AISTATS 2005, Bridgetown,
Barbados, January 6-8, 2005, 2005. [Online]. Available: http:
//www.gatsby.ucl.ac.uk/aistats/fullpapers/208.pdf
[10] R. Kiros, R. Salakhutdinov, and R. S. Zemel, “Multimodal neural
language models,” in Proceedings of the 31th International Conference
on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014,
2014, pp. 595–603. [Online]. Available: http://jmlr.org/proceedings/
papers/v32/kiros14.html
[11] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependen-
cies with gradient descent is difficult,” IEEE Transactions on Neural
Networks, vol. 5, no. 2, pp. 157–166, 1994.
[12] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available:
http://dx.doi.org/10.1162/neco.1997.9.8.1735
[13] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct
deep recurrent neural networks.” CoRR, vol. abs/1312.6026, 2013.
[14] K. Nguyen and C. Ock, “Diacritics restoration in vietnamese: Letter
based vs. syllable based model,” in PRICAI 2010: Trends in Artificial
Intelligence, 11th Pacific Rim International Conference on Artificial
Intelligence, Daegu, Korea, August 30-September 2, 2010. Proceedings,
ser. Lecture Notes in Computer Science, B. Zhang and M. A. Orgun,
Eds., vol. 6230. Springer, 2010, pp. 631–636.
[15] D. Dinh, P. Pham, and Q. Ngo, “Some lexical issues in building
electronic vietnamese dictionary,” in Proceedings of PAPILLON-2003
Workshop on Multilingual Lexical Databases, 2003.
[16] H. P. Le, N. T. M. Huyˆen, A. Roussanaly, and H. T. Vinh, “A hybrid
approach to word segmentation of vietnamese texts,” in Language and
Automata Theory and Applications, Second International Conference,
LATA 2008, Tarragona, Spain, March 13-19, 2008. Revised Papers,
ser. Lecture Notes in Computer Science, C. Mart´ın-Vide, F. Otto, and
H. Fernau, Eds., vol. 5196. Springer, 2008, pp. 240–249.
[17] G. A. Miller, “Wordnet: A lexical database for english,” Commun.
ACM, vol. 38, no. 11, pp. 39–41, Nov. 1995. [Online]. Available:
http://doi.acm.org/10.1145/219717.219748
[18] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale
machine learning on heterogeneous distributed systems,” arXiv preprint
arXiv:1603.04467, 2016.
[19] V. B. Le, B. Bigi, L. Besacier, and E. Castelli, “Using the web for fast
language model construction in minority languages,” in 8th European
Conference on Speech Communication and Technology, EUROSPEECH
2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4,
2003. ISCA, 2003.
[20] L. Q. Ha, T. T. T. Van, H. T. Long, N. H. Tinh, N. N. Tham,
and L. T. Ngoc, “A Posteriori individual word language models
for vietnamese language,” IJCLCLP, vol. 15, no. 2, 2010. [Online].
Available: http://www.aclclp.org.tw/clclp/v15n2/v15n2a2.pdf
[21] A. Gandhe, F. Metze, and I. R. Lane, “Neural network language models
for low resource languages,” in INTERSPEECH 2014, 15th Annual
Conference of the International Speech Communication Association,
Singapore, September 14-18, 2014, H. Li, H. M. Meng, B. Ma, E. Chng,
and L. Xie, Eds. ISCA, 2014, pp. 2615–2619.

More Related Content

What's hot

Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结君 廖
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language modelc sharada
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Analyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language modelsAnalyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language modelsken-ando
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachIJECEIAES
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...NAIST Machine Translation Study Group
 

What's hot (20)

Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Language models
Language modelsLanguage models
Language models
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Analyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language modelsAnalyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language models
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
 
Arabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approachArabic named entity recognition using deep learning approach
Arabic named entity recognition using deep learning approach
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 

Viewers also liked

Keeping up to date & comparing journal apps. the stockholm workshop 2016
Keeping up to date &  comparing journal apps. the stockholm workshop 2016Keeping up to date &  comparing journal apps. the stockholm workshop 2016
Keeping up to date & comparing journal apps. the stockholm workshop 2016Guus van den Brekel
 
Social Media is a Promise (here's how to keep it) by Graham D Brown
Social Media is a Promise (here's how to keep it) by Graham D Brown Social Media is a Promise (here's how to keep it) by Graham D Brown
Social Media is a Promise (here's how to keep it) by Graham D Brown Graham Brown
 
Guia de estudio i ngles
Guia de estudio i nglesGuia de estudio i ngles
Guia de estudio i nglesClaudio Vera
 
η αβ στη νοηματική από τα παιδιά
η αβ στη νοηματική από τα παιδιάη αβ στη νοηματική από τα παιδιά
η αβ στη νοηματική από τα παιδιάJohn C. Yiannoudis
 
How do Amazon, Apple and O2 create young fans?
How do Amazon, Apple and O2 create young fans?How do Amazon, Apple and O2 create young fans?
How do Amazon, Apple and O2 create young fans?Graham Brown
 
Presentation4.pptx [autosaved]
Presentation4.pptx [autosaved]Presentation4.pptx [autosaved]
Presentation4.pptx [autosaved]baski6308
 
Hopelessly Addicted to Web 2.0, Social Networks, & Facebook
Hopelessly Addicted to Web 2.0, Social Networks, & FacebookHopelessly Addicted to Web 2.0, Social Networks, & Facebook
Hopelessly Addicted to Web 2.0, Social Networks, & FacebookDave McClure
 
Olga Guitar Chords And Tablature (B)
Olga Guitar Chords And Tablature (B)Olga Guitar Chords And Tablature (B)
Olga Guitar Chords And Tablature (B)guest16eee4
 
Oats & More: Case Study
Oats & More: Case StudyOats & More: Case Study
Oats & More: Case StudyNewsworks
 
Social media - Developments
Social media - DevelopmentsSocial media - Developments
Social media - DevelopmentsArthur Weiss
 
Cockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and Watson
Cockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and WatsonCockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and Watson
Cockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and WatsonCapgemini
 
Engaging Audiences, while Increasing Content Monetization
Engaging Audiences, while Increasing Content MonetizationEngaging Audiences, while Increasing Content Monetization
Engaging Audiences, while Increasing Content MonetizationNEPA BlogCon
 
TMA Mobile Solutions (TMS)
TMA Mobile Solutions (TMS)TMA Mobile Solutions (TMS)
TMA Mobile Solutions (TMS)quangbui
 
The Elephant in the Mobile Industry’s Room: Women
The Elephant in the Mobile Industry’s Room: WomenThe Elephant in the Mobile Industry’s Room: Women
The Elephant in the Mobile Industry’s Room: WomenGraham Brown
 

Viewers also liked (20)

GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
Keeping up to date & comparing journal apps. the stockholm workshop 2016
Keeping up to date &  comparing journal apps. the stockholm workshop 2016Keeping up to date &  comparing journal apps. the stockholm workshop 2016
Keeping up to date & comparing journal apps. the stockholm workshop 2016
 
Overload
OverloadOverload
Overload
 
3 worksheet unhealthy habits
3 worksheet unhealthy habits3 worksheet unhealthy habits
3 worksheet unhealthy habits
 
Social Media is a Promise (here's how to keep it) by Graham D Brown
Social Media is a Promise (here's how to keep it) by Graham D Brown Social Media is a Promise (here's how to keep it) by Graham D Brown
Social Media is a Promise (here's how to keep it) by Graham D Brown
 
Guia de estudio i ngles
Guia de estudio i nglesGuia de estudio i ngles
Guia de estudio i ngles
 
η αβ στη νοηματική από τα παιδιά
η αβ στη νοηματική από τα παιδιάη αβ στη νοηματική από τα παιδιά
η αβ στη νοηματική από τα παιδιά
 
How do Amazon, Apple and O2 create young fans?
How do Amazon, Apple and O2 create young fans?How do Amazon, Apple and O2 create young fans?
How do Amazon, Apple and O2 create young fans?
 
Apple inc.
Apple inc.Apple inc.
Apple inc.
 
Presentation4.pptx [autosaved]
Presentation4.pptx [autosaved]Presentation4.pptx [autosaved]
Presentation4.pptx [autosaved]
 
Hopelessly Addicted to Web 2.0, Social Networks, & Facebook
Hopelessly Addicted to Web 2.0, Social Networks, & FacebookHopelessly Addicted to Web 2.0, Social Networks, & Facebook
Hopelessly Addicted to Web 2.0, Social Networks, & Facebook
 
Olga Guitar Chords And Tablature (B)
Olga Guitar Chords And Tablature (B)Olga Guitar Chords And Tablature (B)
Olga Guitar Chords And Tablature (B)
 
Oats & More: Case Study
Oats & More: Case StudyOats & More: Case Study
Oats & More: Case Study
 
Social media - Developments
Social media - DevelopmentsSocial media - Developments
Social media - Developments
 
Cockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and Watson
Cockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and WatsonCockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and Watson
Cockpit for Big Systems and Big IoT Systems Leveraging IBM Bluemix and Watson
 
Engaging Audiences, while Increasing Content Monetization
Engaging Audiences, while Increasing Content MonetizationEngaging Audiences, while Increasing Content Monetization
Engaging Audiences, while Increasing Content Monetization
 
Scarab4 slideshare
Scarab4 slideshareScarab4 slideshare
Scarab4 slideshare
 
TMA Mobile Solutions (TMS)
TMA Mobile Solutions (TMS)TMA Mobile Solutions (TMS)
TMA Mobile Solutions (TMS)
 
The Elephant in the Mobile Industry’s Room: Women
The Elephant in the Mobile Industry’s Room: WomenThe Elephant in the Mobile Industry’s Room: Women
The Elephant in the Mobile Industry’s Room: Women
 
Afforestt
AfforesttAfforestt
Afforestt
 

Similar to A Vietnamese Language Model Based on Recurrent Neural Network

Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGESYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
 
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGESYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEkevig
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKijnlc
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERTATPowr
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmMeetupDataScienceRoma
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicijnlc
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsVincenzo Lomonaco
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01ijcsit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET Journal
 

Similar to A Vietnamese Language Model Based on Recurrent Neural Network (20)

Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGESYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
 
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGESYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGE
 
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORKSENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
SENTIMENT ANALYSIS IN MYANMAR LANGUAGE USING CONVOLUTIONAL LSTM NEURAL NETWORK
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERT
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
 

More from Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreViet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnViet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processingViet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookViet-Trung TRAN
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studyViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkViet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learningViet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposalsViet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar itemsViet-Trung TRAN
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasetsViet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasets
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

A Vietnamese Language Model Based on Recurrent Neural Network

  • 1. A Vietnamese Language Model Based on Recurrent Neural Network Viet-Trung Tran∗, Kiem-Hieu Nguyen∗, Duc-Hanh Bui† Hanoi University of Science and Technology ∗Email: {trungtv,hieunk}@soict.hust.edu.vn †Email: hanhbd.hedspi@gmail.com Abstract—Language modeling plays a critical role in many natural language processing (NLP) tasks such as text predic- tion, machine translation and speech recognition. Traditional statistical language models (e.g. n-gram models) can only offer words that have been seen before and can not capture long word context. Neural language model provides a promising solution to surpass this shortcoming of statistical language model. This paper investigates Recurrent Neural Networks (RNNs) language model for Vietnamese, at character and syllable-levels. Experiments were conducted on a large dataset of 24M syllables, constructed from 1,500 movie subtitles. The experimental results show that our RNN-based language models yield reasonable performance on the movie subtitle dataset. Concretely, our models outperform n-gram language models in term of perplexity score. I. INTRODUCTION One of fundamental tasks in NLP is statistical language modeling (SLM), statistically formalized as inferring a proba- bility distribution of word sequence. SLM has many applica- tions such as machine translation, speech recognition and text prediction [1], [2]. N-gram has been de-facto for language modeling [3]. An n-gram model predicts the next word using (n − 1) preceding words in the context p(w|w0...wn−2). Such probability is estimated by occurrence frequency of word sequences of length n from a large text collection. At n = 2, 3, 4, or 5, the model captures frequent word patterns as long as enough training texts are provided. However, as n increases, it faces the curse of dimensionality issue. • Firstly, given a context, it could only predict a word that has been “seen” before with the context in the training texts. • Secondly, it can only “see” a few words back, i.e. (n − 1) preceding words. A longer context, such as previous sentences, couldn’t be considered in predicting the next word. This is not intuitively the way human learn and capture a context. Neural language models, i.e. neural networks-based lan- guage model, successfully surpassed these two shortcoming [4], [5]. For the first shortcoming, similarly to an n-gram model, a feed-forward neural network (FFNN) predicts the next word using preceding n−1 words context [4]. The idea is distributed word representations learned by the network can capture words similarity. For instance, by capturing similarity between ‘con chó’ (dog) and ‘con mèo’ (cat), ‘căn phòng’ (room) and ‘phòng ngủ’ (bedroom), ‘chạy’ (running) and ‘đi’ (walking), and so on, the model could generate the sentence “Một con chó đang chạy trong căn phòng” (A dog was running in a room) given the sentence “Con mèo đang đi trong phòng ngủ” (The cat is walking in the bedroom). Their model reportedly outperforms n-gram language model on benchmark datasets. For the second shortcoming, although the avantage of re- current neural networks (RNNs) over Feedforward neural net- works (FNNs) is still a debating subject [6], RNNs explicitly allow learning from arbitrary long context using a recurrent mechanism within the network [5], [7]. The long-term context is particularly useful in learning via the long-short-term mem- ory architecture [8]. Experimental results reported in [5] show that an RNN-based language model outperforms its FFNN- based counterpart. One may argue for n-gram approach for its simplicity and efficiency. But as hardware is more and more evolving; and efficient learning algorithms for neural networks such as [9] are proposed, neural language modeling is promising. In another direction, [10] proposes a multi-modality framework for jointly modeling images and texts in a unified neural language model. In this work, we investigate in experimenting RNNs for Vietnamese language model. Our contributions are summa- rized as follows: • Building an RNN-based syllable-level language model for Vietnamese. • Building an RNN-based character-level language model for Vietnamese. • Running extensive experiments on a 24M syllables dataset constructed from 1,500 movie subtitles. The rest of the paper is organized into four sections: Section II discusses our main contributions of building RNN- based language models for Vietnamese; Section III records our test-bed and results of the experiments. Related works are introduced in Section IV. We conclude the paper with final remarks in Section V. II. RNN-BASED LANGUAGE MODELING FOR VIETNAMESE In this section, we first present RNN-based language mod- eling (Section II-A. We then briefly introduce some impor- tant characteristics of Vietnamese (Section II-B). Finally, we proposed RNN-based language models at both character and syllable levels for Vietnamese (Sections II-C and II-D).
  • 2. A. RNN Given a training sequence (x1, ..., xT ), an RNN computes the output vectors (o1, ..., oT ) and then applies the softmax function which maximizes the total log probability of the obtained predictive distribution P(xt+1|x0x1...xt). In a simple configuration, our RNN is constructed with three layer: an input layer x, a hidden layer s and an output layer y. At a time t, input, output and hidden layer state of the network are denoted as x(t), o(t), and s(t) respectively. As our network is recurrent, the previous hidden layer state s(t − 1) is intentionally input to the hidden layer s(t). Mathematically, these layers are computed as follows: s(t) = f(Ux(t) + Ws(t − 1)) (1) o(t) = softmax(V s(t)) (2) U, W, V denote the parameters of our RNN to be learn. f is an activation function such as sigmoid, ReLU, and tanh. Our network and its unfolding in time of the computation are shown in Figure 1. Figure 1. An RNN and its unfolding in time of the computation RNN has the following characteristics: • The hidden state s(t) is seen as the memory of the network. s(t) captures all the information of the previous time steps. In practice due to the complexity of the train- ing algorithm, s(t) computation only takes into account a limited number of recently previous steps. • RNN is different from other type of neural network in which it uses the same parameters across all steps. More precisely, (U, V, W) are shared and are independent of the inputs. This reduce the number of parameters to learn. As clearly discussed in [11], training a RNN with gradient decent is practically difficult. This means that RNN is hard to learn long-term dependencies as the gradient blows up or vanishes exponentially over time. This is called the vanishing gradient problem. To address this problem, the hidden layer within our RNN will be implemented with a so-called long short term memory (LSTM) layout. LSTM cells (in hidden layer) are designed to be capable of learning long-term dependencies (Figure 2). It is first introduced in [12]. Each LSTM cell consists of three gates to control the information into the cell and from the cell output back to the network. The forget gate decides whether the cell state at time (t − 1) will be filtered out or not. The input gate regulates the inputs. Similarly, the output gate activates or deactivates its outputs to the network (Figure 2). Figure 2. An LSTM cell Beside powering RNN with LSTM cells, an RNN can also be stacking to form a deeper neural network. In this setting, the output of the first layer will become the input of the second and so on, as depicted in Figure 3. By going deeper, RNN can capture more context, thus, better modeling the sequence distribution [13]. Of course, this increases the model parameters, training time, and requires more input dataset. Figure 3. Stacking multiple networks B. Characteristics of Vietnamese language In this section, we briefly describe some important charac- teristics of Vietnamese regarding under-word level language modeling task. Vietnamese scripts are written in Latin characters (‘kí tự’ or ‘chữ’ in Vietnamese). There are 29 characters in Vietnamese alphabet, including 12 vowels and 17 consonants. There are five accent marks in Vietnamese. An accent mark could be attached to a vowel to modify pronounce tone, resulting in 60 additional ‘vowel’ variants (Table I [14]). Characters are grouped into syllables (‘âm tiết’ in Viet- namese). Syllables, in turn, are grouped into words, which are the smallest meaningful unit in a language. Words could be
  • 3. Initial char Variants a [a, à, á, ả, ã, ạ, ă, ằ, ắ, ẳ, ẵ, ặ, â, ầ, ấ, ẩ, ẫ, ậ] e [e, è, é, ẻ, ẽ, ẹ, ê, ề, ế, ể, ễ, ệ] o [o, ò, ó, ỏ, õ, ọ, ô, ồ, ố, ổ, ỗ, ộ, ơ, ờ, ớ, ở, ỡ, ợ] u [u, ù, ú, ủ, ũ, ụ, ư, ừ, ứ, ử, ữ, ự] i [i, ì, í, ỉ, ĩ, ị] y [y, ỳ, ý, ỷ, ỹ, ỵ] Table I VOWEL CHARACTERS AND VARIANTS IN VIETNAMESE ALPHABET. composed of a single syllable, e.g. ‘mua’ (to buy), or multiple syllables, e.g. ‘nghiên cứu’ (research). Typically, words are composed of two syllables, standing for 70% of words in a general-domain Vietnamese dictionary [15]. Due to complicated originality of Vietnamese morphology and lexicon, word segmentation is a fundamental yet chal- lenging task in Vietnamese language processing [16]. In this paper, we shall let word-level language model for further investigation. We instead focus on sub-word levels, i.e. char and syllable levels. C. Character-Level Language Model At character level, our objective is to predict the next character given a sequence of previous characters. The input X is thus a character sequence. x(t) ∈ X is the character at time t. Since the number of character (char-level vocabulary size V ) is small, we directly use the one-hot-encoding. Each x(t) is a vector of size V with the component corresponding to the character itself having value one and all the other components having value zero. The output o(t) at each time step t is appropriately a vector of size Vchar. Specifically, each component of o(t) indicates the probability of generating that character giving the previous context from 0 to t. Given h len as size of the hidden layer s (Section II-A), followings are domain space of our learning parameters: x(t) ∈ RVchar o(t) ∈ RVchar s(t) ∈ Rh len U ∈ RVchar×h len V ∈ Rh len×Vchar W ∈ RVchar×Vchar (3) D. Syllable-Level Language Model At syllable level, our objective is to predict the next syllable given a sequence of previous syllables. Therefore, x(t) would be the one-hot encoding of the syllable at time t. We observe that the number of all Vietnamese syllables, Vsyl, is about 7000. Given Vsyl and h len as sizes of syllable vocabulary and the hidden layer s, respectively, the domain space of our learning parameters is the same as in Equation 3 except with a much bigger Vsyl. III. EXPERIMENTS A. Datasets This work focuses on conversational data instead of general texts (e.g. from news articles). Therefore, we have manually collected movie subtitles from 1,500 movies available on the Internet. Subtitles were contributed by volunteers translators from its original languages to Vietnamese. For instance, three sequences {“How many bags can you carry?”, “If you don’t think for yourself”, “think for your mother!”} were extracted from the part as shown in Figure 4. 132 00:07:33,920 –> 00:07:35,560 Chú có thể vác được bao nhiêu bao cơ chứ? 133 00:07:35,680 –> 00:07:36,760 Nếu chú không nghĩ đến bản thân... 134 00:07:36,880 –> 00:07:39,280 ...thì cũng phải nghĩ đến mẹ ở nhà! Figure 4. A sample movie subtitle original file with triples of (frame number, frame offset, and subtitle). We picked movie subtitle for two reasons. First, it is widely available and public over Internet. Second, the language itself is close to conversational Vietnamese, generated by thousands of contributors, thus barely imply specific writing style. The following steps were processed to clean data: • Tokenizing: Each sequence is tokenized into tokens using a regular expression processor. A token could be a syllable, a foreign word, or a punctuation mark. • Detecting and removing incomplete translations: We could not expect high commitment from volunteer trans- lators. In fact, we found many subtitles which were par- tially translated. Fortunately, most of the origin subtitles were in English. Such origin subtitles obviously generate noise when learning a language model for Vietnamese. In order to detect and remove such subtitles, we use Prince- ton WordNet ([17]) to extract an English vocabulary. We then scan each subtitle for tokens that could be found in the vocabulary. If a half of all tokens matches this criteria, the subtitle is considered untranslated. We also used langdetect package1 to filter out sentence others than in Vietnamese. Manual observation on a random subset of removed subtitles shows that this heuristic filter works sufficiently for our purpose. • Normalizing foreign words: For tokens that were detected by the English vocabulary, the tokens were all assigned the value ‘OOV’. • Adding start symbol: Each sequence in our dataset is a subtitle displayed in a movie frame. As seen in conversational data and particularly movie subtitle data, a sequence is not necessarily a complete sentence or 1https://pypi.python.org/pypi/langdetect
  • 4. syntactic phrase in linguistics. Unlike general texts, this characteristic is challenging for language modeling. To signify the beginning of a sequence, we added a ‘START’ symbol ahead. • Punctuation: We keep all punctuation as in original sub- titles. Data for syllable models: Since the inputs to our RNNs are vectors, not strings, we create a mapping between syl- lables and indices, INDEX TO SYLLABLE, and SYLLA- BLE TO INDEX. For example, a syllable ‘anh’ has index 1043, which holds two entries in the two mapping tables. A training sample x looks like [0, 56, 78, 454] where 0 is the index of ‘START’. Data for char models: Similarly to syllable models, a mapping between characters and indices have to be cre- ated using two mapping tables, INDEX TO CHAR and CHAR TO INDEX. Again, 0 is the index of ‘START’. B. Evaluation We built four neural language models on the Tensor- flow framework [18], with three variable parameter values 10, 20, 40 of the sequence length: • Char-1-layer-40-seq: A simple char-level model with only one hidden layer and sequence length of 40. • Char-3-layer-40-seq: A more complicated char-level model with three hidden layers and sequence length of 40. • Syl-1-layer-[10,20,40]-seq: Three variant of a simple syllable-level model with one hidden layer and sequence length of 10, 20, and 40, respectively. • Syl-3-layer-[10,20,40]-seq: Three variant of a deep syllable-level model with three hidden layers and se- quence length of 10, 20, and 40, respectively. BASELINE: We used a bi-gram language model as base- line2 . To confront unknown words, we used an add-one Laplace smoothing. Model PPL Char-1-layer-40-seq 116 Char-3-layer-40-seq 109 Syl-1-layer-10-seq 85.6 Syl-3-layer-10-seq 89.1 Syl-1-layer-20-seq 83.8 Syl-3-layer-20-seq 95.5 Syl-1-layer-40-seq 99.7 Syl-3-layer-40-seq 95.7 Bi-gram 145 Table II PERPLEXITY PER SYLLABLE ON OUR MODELS. We measured the perplexity which is the inverse probability of the input sequence. The perplexity is computed as in 4: PPL = N N i=1 1 P(wi|w1...wi−1) (4) 2Note that this is not the most accurate in n-gram approach. However, we select this model because it is easy to implement, parameter-free, and is a relatively strong baseline. Where N is the number of words and P(wi|w1...wi−1) is the probability of word wi given the previous sequence wi...wi−1. As shown in Table II, all neural models significantly out- perform the bi-gram baseline. Our best model obtained a perplexity of 83.8 for syllable-based, one layer and trained with RNN sequence length of 20. Note that RNN sequence length signifies the maximum size of an unfolded recurrent network, which means the largest context used to generate a new character/syllable. This shows that increasing the context size to 40 in syllable model does not help reduce overall perplexity. This could be due to the nature of our training corpus: it contains mostly conversational short sentences. We let the investigation of this hypothesis as a future work. Intuitively, the character language models in our settings are not better than the syllable models as they have to first predict multiple characters in order to form a syllable. Stacking 3-layers RNN increases the number of learning parameters, thus giving the models more expressive power to predict new words. However, in the cases of Syl-3-layer-[10,20]-seq, more number of learning parameters hurts perplexity scores. We show below in Figure 5 several predictions from our best-performed syllable model i.e. Syl-1-layer-20-seq in Ta- ble II. START cô Suggestions 1: ấy, 2: OOV, 3: —————————————— START cô ấy Suggestions 1: , 2: bị, 3: đã —————————————— START cô ấy là Suggestions 1: OOV, 2: một, 3: người —————————————— START cô ấy là bác Suggestions 1: sĩ, 2: sỹ, 3: —————————————— START cô ấy là bác sĩ Suggestions 1: , 2: OOV, 3: của —————————————— START cô ấy là bác sĩ tâm Suggestions 1: thần, 2: lý, 3: lí —————————————— START cô ấy là bác sĩ tâm lý Suggestions 1: , 2: của, 3: OOV Figure 5. A sample syllable prediction that suggested 3 options with highest probabilities for the sentence: "cô ấy là bác sĩ tâm lý" (she is a psychiatrist). IV. RELATED WORK This section discusses related works in language modeling for Vietnamese. Language modeling is an important NLP tasks. However, to our knowledge, only a handful works focus on this task for Vietnamese [19], [20], [21]. In a pioneering work on language modeling for automatic speech recognition in Vietnamese, [19] applies n-gram statis- tical language modeling to train on a corpus of 10M sentences
  • 5. collected from Web documents. They report a perplexity per word of 252 on the best configuration. A priori and a posterior methods for calculating perplexity over n-gram language models for Vietnamese are also discussed in [20]. Perhaps closer to our direction is the work presented in [21]. They investigate neural language model at syllable level. Experiments were conducted on a corpus of 76k sentences in Vietnamese text. As reported, a feed forward neural language model achieves a perplexity of 121. Word-level RNN language models can also be built for Vietnamese providing word segmented texts. Both word and syllable-level language models have the drawback of learning new words. They need to be retrained upon adding new words to the vocabulary. On the other hand, character level language model has no problem with learning new words since words are constructed by only predefined characters. Hence, a character level language model not only predicts the next word, but also completes current word. We also plan to investigate as embedding vectors trained from a large text collection to better capture word semantic of meaning and to reduce vector size. V. CONCLUSIONS This paper presents a neural language model for Viet- namese. We investigate RNNs at both character and syllable levels. Experimental results show that the best configuration yields a perplexity of 83.8, which is a reasonable performance for a language model. All of our models outperform n-gram language model in the experiments. For future work, we plan to investigate RNNs at word level. We anticipate that embedding representations using techniques like Convolutional Neural Networks will also improve both accuracy and efficiency of our approach. ACKNOWLEDGMENT This work is supported by Hanoi University of Science and Technology (HUST) under project No. T2016-PC-047 REFERENCES [1] R. Rosenfeld, “Two Decades Of Statistical Language Modeling: Where Do We Go From Here?” in Proceedings of the IEEE, vol. 88, 2000. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.29.463 [2] K. Yao, G. Zweig, M. Hwang, Y. Shi, and D. Yu, “Recurrent neural networks for language understanding,” in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, 2013, pp. 2524–2528. [Online]. Available: http://www.isca-speech.org/archive/interspeech 2013/i13 2524.html [3] S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, ser. ACL ’96. Stroudsburg, PA, USA: Association for Computational Linguistics, 1996, pp. 310–318. [Online]. Available: http://dx.doi.org/ 10.3115/981863.981904 [4] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A neural probabilistic language model,” J. Mach. Learn. Res., vol. 3, pp. 1137–1155, Mar. 2003. [Online]. Available: http://dl.acm.org/citation. cfm?id=944919.944966 [5] T. Mikolov, M. Karafi´at, L. Burget, J. Cernock´y, and S. Khudanpur, “Recurrent neural network based language model,” in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communi- cation Association, Makuhari, Chiba, Japan, September 26-30, 2010, 2010, pp. 1045–1048. [6] M. Sundermeyer, I. Oparin, J.-L. Gauvain, B. Freiberg, R. Schl ”uter, and H. Ney, “Comparison of feedforward and recurrent neural network language models,” in IEEE International Conference on Acous- tics, Speech, and Signal Processing, Vancouver, Canada, May 2013, pp. 8430–8434. [7] T. Mikolov and G. Zweig, “Context dependent recurrent neural network language model,” in 2012 IEEE Spoken Language Technology Workshop (SLT), Miami, FL, USA, December 2-5, 2012, 2012, pp. 234–239. [Online]. Available: http://dx.doi.org/10.1109/SLT.2012.6424228 [8] M. Sundermeyer, R. Schl ”uter, and H. Ney, “Lstm neural networks for language modeling,” in Interspeech, Portland, OR, USA, Sep. 2012, pp. 194–197. [9] F. Morin and Y. Bengio, “Hierarchical probabilistic neural network language model,” in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, AISTATS 2005, Bridgetown, Barbados, January 6-8, 2005, 2005. [Online]. Available: http: //www.gatsby.ucl.ac.uk/aistats/fullpapers/208.pdf [10] R. Kiros, R. Salakhutdinov, and R. S. Zemel, “Multimodal neural language models,” in Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, 2014, pp. 595–603. [Online]. Available: http://jmlr.org/proceedings/ papers/v32/kiros14.html [11] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependen- cies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994. [12] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735 [13] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks.” CoRR, vol. abs/1312.6026, 2013. [14] K. Nguyen and C. Ock, “Diacritics restoration in vietnamese: Letter based vs. syllable based model,” in PRICAI 2010: Trends in Artificial Intelligence, 11th Pacific Rim International Conference on Artificial Intelligence, Daegu, Korea, August 30-September 2, 2010. Proceedings, ser. Lecture Notes in Computer Science, B. Zhang and M. A. Orgun, Eds., vol. 6230. Springer, 2010, pp. 631–636. [15] D. Dinh, P. Pham, and Q. Ngo, “Some lexical issues in building electronic vietnamese dictionary,” in Proceedings of PAPILLON-2003 Workshop on Multilingual Lexical Databases, 2003. [16] H. P. Le, N. T. M. Huyˆen, A. Roussanaly, and H. T. Vinh, “A hybrid approach to word segmentation of vietnamese texts,” in Language and Automata Theory and Applications, Second International Conference, LATA 2008, Tarragona, Spain, March 13-19, 2008. Revised Papers, ser. Lecture Notes in Computer Science, C. Mart´ın-Vide, F. Otto, and H. Fernau, Eds., vol. 5196. Springer, 2008, pp. 240–249. [17] G. A. Miller, “Wordnet: A lexical database for english,” Commun. ACM, vol. 38, no. 11, pp. 39–41, Nov. 1995. [Online]. Available: http://doi.acm.org/10.1145/219717.219748 [18] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016. [19] V. B. Le, B. Bigi, L. Besacier, and E. Castelli, “Using the web for fast language model construction in minority languages,” in 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003. ISCA, 2003. [20] L. Q. Ha, T. T. T. Van, H. T. Long, N. H. Tinh, N. N. Tham, and L. T. Ngoc, “A Posteriori individual word language models for vietnamese language,” IJCLCLP, vol. 15, no. 2, 2010. [Online]. Available: http://www.aclclp.org.tw/clclp/v15n2/v15n2a2.pdf [21] A. Gandhe, F. Metze, and I. R. Lane, “Neural network language models for low resource languages,” in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, H. Li, H. M. Meng, B. Ma, E. Chng, and L. Xie, Eds. ISCA, 2014, pp. 2615–2619.