Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Vietnamese Language Model
Based on
Recurrent Neural Network
Viet-Trung Tran, Kiem-Hieu Nguyen, Duc-Hanh Bui
Hanoi Univer...
Outline
Statistical language model
Current state of the art
RNN for Vietnamese language model
Experimental results
Conclus...
Statistical language
model
A probability distribution of word sequence
E.g. “go to the airport”
? = P(“airport”|“go to the...
Challenges
Meaningful
grammatically correct
understandable
Context-aware
E.g. I am from Vietnam. My mother-tongue is Vietn...
Common approach
N-gram language model
Katz's back-off: estimates the conditional
probability of a word given its history in...
N-gram language model
Only see a few words back
Only predict words seen in the same context
6
Friday, October 7, 16
Deep learning for NLP
Word embedding
(SOCHER ET AL. (2013A))
MIKOLOV ET AL. (2013B).
7
Friday, October 7, 16
Recurrent neural
network for text
8
INPUT : GO TO THE
OUTPUT : TO THE SCHOOL
PROBABILITY (SCHOOL | GO TO THE)
Friday, Octo...
RNN vs. N-gram
Foldable word context vs. fix n-gam context
Personalization through continuous learning
More meaningful text...
RNN for Vietnamese
language model
Character level language model
{previous characters} -> next characters
Syllable level l...
LSTM cell
SOURCE: HTTP://COLAH.GITHUB.IO/POSTS/2015-08-
UNDERSTANDING-LSTMS/
11
Friday, October 7, 16
Stacking multiple layers
12
Friday, October 7, 16
Experiments
1,500 MOVIES - 2.056.308 SENTENCES
13
Friday, October 7, 16
Experimental results
14
Friday, October 7, 16
15
Friday, October 7, 16
Conclusion
First neural language model for Vietnamese
Largest experimental dataset
Future work
Word embedding
Neural net c...
Thank you for your
attention
17
Friday, October 7, 16
Conversational
Chú hoài linh đẹp trai. Chú hoài linh
Chào buổi sáng
chị hát hay wa!! nghe thick a.
chị khởi my ơi e rất la...
lịch sử ghi nhớ năm 1979
tại hội nghị, đồng chí Phạm Ngọc Thủy Võ Văn
Kiệt
tại hội nghị, đồng chí Hồ Chí Minh nói
tại hội ...
Upcoming SlideShare
Loading in …5
×

A Vietnamese Language Model Based on Recurrent Neural Network

Language modeling plays a critical role in many
natural language processing (NLP) tasks such as text prediction,
machine translation and speech recognition. Traditional
statistical language models (e.g. n-gram models) can only offer
words that have been seen before and can not capture long word
context. Neural language model provides a promising solution to
surpass this shortcoming of statistical language model. This paper
investigates Recurrent Neural Networks (RNNs) language model
for Vietnamese, at character and syllable-levels. Experiments
were conducted on a large dataset of 24M syllables, constructed
from 1,500 movie subtitles. The experimental results show that
our RNN-based language models yield reasonable performance
on the movie subtitle dataset. Concretely, our models outperform
n-gram language models in term of perplexity score.

A Vietnamese Language Model Based on Recurrent Neural Network

  1. 1. A Vietnamese Language Model Based on Recurrent Neural Network Viet-Trung Tran, Kiem-Hieu Nguyen, Duc-Hanh Bui Hanoi University of Science and Technology 1Friday, October 7, 16
  2. 2. Outline Statistical language model Current state of the art RNN for Vietnamese language model Experimental results Conclusion 2 Friday, October 7, 16
  3. 3. Statistical language model A probability distribution of word sequence E.g. “go to the airport” ? = P(“airport”|“go to the”) Applications: Spelling checkers, smart keyboards Enhance speed recognition/machine translation LABAN KEY 3 Friday, October 7, 16
  4. 4. Challenges Meaningful grammatically correct understandable Context-aware E.g. I am from Vietnam. My mother-tongue is Vietnamese Out of vocabulary Slang, abbreviations, etc. 4 Friday, October 7, 16
  5. 5. Common approach N-gram language model Katz's back-off: estimates the conditional probability of a word given its history in the n-gram When trigram unavailable -> back-off to bi-gram -> uni-gram SOURCE: HTTPS://EN.WIKIPEDIA.ORG/WIKI/KATZ%27S_BACK-OFF_MODEL 5 Friday, October 7, 16
  6. 6. N-gram language model Only see a few words back Only predict words seen in the same context 6 Friday, October 7, 16
  7. 7. Deep learning for NLP Word embedding (SOCHER ET AL. (2013A)) MIKOLOV ET AL. (2013B). 7 Friday, October 7, 16
  8. 8. Recurrent neural network for text 8 INPUT : GO TO THE OUTPUT : TO THE SCHOOL PROBABILITY (SCHOOL | GO TO THE) Friday, October 7, 16
  9. 9. RNN vs. N-gram Foldable word context vs. fix n-gam context Personalization through continuous learning More meaningful text suggestions Naturally support phrase, terms suggestions 9 Friday, October 7, 16
  10. 10. RNN for Vietnamese language model Character level language model {previous characters} -> next characters Syllable level language model {previous syllables} -> next syllables 10 Friday, October 7, 16
  11. 11. LSTM cell SOURCE: HTTP://COLAH.GITHUB.IO/POSTS/2015-08- UNDERSTANDING-LSTMS/ 11 Friday, October 7, 16
  12. 12. Stacking multiple layers 12 Friday, October 7, 16
  13. 13. Experiments 1,500 MOVIES - 2.056.308 SENTENCES 13 Friday, October 7, 16
  14. 14. Experimental results 14 Friday, October 7, 16
  15. 15. 15 Friday, October 7, 16
  16. 16. Conclusion First neural language model for Vietnamese Largest experimental dataset Future work Word embedding Neural net compression Conversational neural machine translation 16 Friday, October 7, 16
  17. 17. Thank you for your attention 17 Friday, October 7, 16
  18. 18. Conversational Chú hoài linh đẹp trai. Chú hoài linh Chào buổi sáng chị hát hay wa!! nghe thick a. chị khởi my ơi e rất la hâm mộ chú hoài linh thật đẹp zai và chú Trấn thành đẹp qá 18 Friday, October 7, 16
  19. 19. lịch sử ghi nhớ năm 1979 tại hội nghị, đồng chí Phạm Ngọc Thủy Võ Văn Kiệt tại hội nghị, đồng chí Hồ Chí Minh nói tại hội nghị, đồng chí Võ Nguyên Giáp và đồng chí Hồ Chí Minh đã ngồi ở tại đại hội Đảng lần thứ nhất vào năm 1945, Ngay từ những ngày đầu, Đúng như nhận xét của Giáo sư Nguyễn Văn Linh 19 Friday, October 7, 16

×