SlideShare a Scribd company logo
1 of 142
NLP 에 대한 이해와
Tensorflow 를 활용한 실무 적용
WRITTEN BY SeungWooKim
tmddno1@gmail.com
현 POSCO IT 사업부 - AI TFT 리더
POSCO IT 사업부 AI 프로젝트 지원 FrameWork 개발 리더
POSCO AI Chat Bot 시범 서비스 개발 리더
POSCO ICT BigData & AI 사내 강사
성균관대학교 컴퓨터 공학 전공
tmddno1@gmail.com
1. 강의 도커 환경
https://github.com/TensorMSA/skp_edu_docker
2. 강의 소스 코드
git clone https://github.com/TensorMSA/tensormsa_jupyter.git
강의 목표
"피자 주문을 ChatBot Messenger 를 통해서 서비스 하고 싶다..
어떤 데이터를 수집하고, 어떤 신경망을 사용하고, 어떻게 아키택쳐를
구성해야 목표를 달성 할 수 있을까?"
예를 들어 위와 같이 자연어 처리와 관련된 어떤 문제가 주어졌을 때
데이터와 딥러닝 관점에서 문제를 접근 할 수 있는 통찰력 획득
[다음 세션]
이번 시간에 배운 재료를 아키택쳐 관점에서의 어플리케이션 레벨에서
적용하고 응용하는 방법에 대한 세션
1.NLP & Deep Learning
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-2-1.Lexical Analysis Basic Process
2-2-2.Deep Learning on Lexical Analysis
2-2-3.Prerequisite Knowledge
2-2-4.BiLstmCrf for Named Entity Recognition
2-3.Syntactic Analysis ㅛ
2-3-1.Dependency Parsing
2-3-2.Google SyntaxNet with Docker
2-4.Semantic Analysis
2-4-1.Semantic Role Labeling
2-4-2.Char CNN for Sentence Classification
2-5.Discourse Analysis
2-5-1.RNN for understand global Conversation
3.Language Generation
3-1.Basic Seq2Seq
3-2.Other types of Seq2Seq (Attention, Pointer)
4.Tips
4-1.Hyper Parameter Random Search
4-2.Genetic Algorithm for Hyper Parameter Search
4-3.Auto Hyper Parameter Search with Multi GPU Server
1.NLP & Deep Learning
NLP and Deep Learning
Today’s Focus
이미지등 다른 분야와 마찬가지로
DL 이 좋은 성능을 보여주지만,
분야의 특성상 100% DL 로 대체될
수는 없다.
기존 연구 분야에 대한 이해 중요
https://www.slideshare.net/ssuser06e0c5/ss-64417928
What’s NLP (Natural Language Process) ?
Let’s find out with examples
NLP Applications
Mostly Solved Making Good Progress Still Really Hard
Spam Detection
(스팸분석)
Text Categorization
(텍스트 분류)
Part of Speech Tagging
(단어 분석)
Named Entity Recognition
(의미 구분 분석)
Information Extraction
(정보 추출)
Sentiment Analysis
(감정분석)
Coreference Resolution
(같은 단어 복수 참조)
Word Sense
Disambiguation
(복수 의미 분류)
Syntactic Parsing
(구문해석)
Machine Translation
(기계번역)
Semantic Search
(의미 분석 검색)
Question & Answer
(질의 응답)
Textual inference
(문장 추론)
Summarization
(텍스트 요약)
Discourse & Dialog
(대화 & 토론)
NLP Applications
Text Categorization
Text Classification assigns one or more classes to a document according to their content. Classes are
selected from a previously established taxonomy (a hierarchy of catergories or classes).
Spam Detection
Spam Detection is also the part of Text Classification problem.
Part of Speech
grammatical tagging or word-category disambiguation, is the process of marking up a word in a
text (corpus) as corresponding to a particular part of speech, based on both its definition and its
context
NLP Applications
Low Level Information Extraction
NLP Applications
Information Extraction on Broader view
https://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwievZKlmMzVAhVCgrwKHbM_D88QFggyMAE&url=https%3A
%2F%2Fweb.stanford.edu%2Fclass%2Fcs124%2Flec%2FInformation_Extraction_and_Named_Entity_Recognition.pptx&usg=AFQjCNFUT9ZjvrDrx
F9su0J9KiWobVP4Kg
Rule Based
Extraction
Named Entity
recognition
Syntax Anal
Relation Search
Ontology
Information
Extraction
NLP ApplicationsNLP Applications
Coreference Resolution
I did not vote for the Donald Trump because I think he is too reckless
Coreference resolution is the task of finding all expressions that refer to the same entity in a
text. It is an important step for a lot of higher level NLP tasks that involve natural language
understanding such as document summarization, question answering, and information
extraction.
Deep Reinforcement Learning for Mention-Ranking Coreference Models
Improving Coreference Resolution by Learning Entity-Level Distributed Representations
https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
NLP ApplicationsNLP Applications
Word Sense Disambiguation
[Example]
1. a type of fish
2. tones of low frequency
and the sentences:
1. I went fishing for some sea bass.
2. The bass line of the song is too weak.
http://www.cs.cornell.edu/courses/cs4740/2014sp/lectures/wsd-1.pdf
supervised way lable data example
simi-supervised way ontology based
NLP Applications
Syntatic Parsing
syntatic parsing is Find structural relationships between words in a sentence
https://web.stanford.edu/~jurafsky/slp3/12.pdf
NLP Applications
Machine Translation
Machine translation (MT) is automated translation. It is the process by which computer software is
used to translate a text from one natural language (such as English) to another (such as Spanish).
NLP Applications
Semantic Search
Semantic search seeks to improve search accuracy by understanding a searcher’s intent through
contextual meaning.
Question and Answer
Able to answer questions in natural language based on Knowledge data (usually ontology)
ex) Best example is IBM Watson
Textural Inference
Recognize, generate, or extract pairs <T,H> of natural language
expressions, such that a human who reads (and trusts) T would infer that His most likely also true
Summarization
Extracting interesting parts of the text and create a summary by using these parts of the text and
allow for rephrasings to make summary more grammatically correct.
Discourse & Dialog
Do conversation with understanding the whole history of dialog and semantic meaning of speaker.
Level of NLP
○ pragmatics : use of language
○ Semantics : meaning of words & sentences
○ (Surface) Syntax : Phrase & Sentence
○ Morphology : morpheme, word
○ Phonology : phoneme (abstract unit of speech sound)
○ Phonetics : phone (acoustic unit of speech sound)
음성과 단어
단어의 구성
단어의 순서
단어&문장 의미
대화의도 & 맥락
High
Low
2.Language Analysis Process
Spoken Utterance
Lexical (어휘) Analysis : Word Structure
Speech Recognition
Written Utterance
Syntactic (구문) Analysis : Sentence Structure
Morphemes, Word
Semantic (의미) Analysis : Meaning of Words & Sentence
Sentence
Discourse (대화) Analysis : Relationship between sentence
Context beyond Sentence
Language Analysis
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-3.Syntactic Analysis
2-4.Semantic Analysis
2-5.Discourse Analysis
Language Analysis - Speech Recognition
AI Speaker Alexa Alexa Microphone System
Language Analysis - Speech Recognition
Deep Learning for Classification Hidden Markov Model for Language Model
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-2-1.Lexical Analysis Basic Process
2-2-2.Deep Learning on Lexical Analysis
2-2-3.Prerequisite Knowledge
2-2-4.BiLstmCrf for Named Entity Recognition
Language Analysis - Lexical Analysis
Main Factors on Lexical Analysis
Sentence
Splitting
Tokenizing Morphological
Part of Speech
Tagging
Lexical Analysis - Sentence Splitting & Tokenizing
What if there is no line change char (‘n’) ? Where is the EOS point?
What if sentence is not separated into words properly with space?
[Examples]
[Problems]
Language Analysis - Lexical Analysis - Morphological
Word stemming lemmatization
Love Lov Love
Loves Lov Love
Loved Lov Love
Loving Lov Love
Innovation Innovat Innovation
Innovations Innovat Innovation
Innovate Innovat Innovate
Innovates Innovat Innovate
Innovative Innovat Innovative
Morphing Examples Stemming & lemmatization
Morphology is process of finding morpheme which is smallest“meaningful unit (Lexical meaning
or grammatical function)” and other features like stem in a language that carries information.
Language Analysis - Lexical Analysis - Part of Speech Tagging
Ambiguity
“that” can be a subordinating conjunction or a relative pronoun
- The fact that/IN you’re here
- A man that/WDT I know
“Around” can be a preposition, particle, or adverb
- I bought it at the shop around/IN the corner.
- I never got around/RP to getting a car.
- A new Toyota Prius costs around/RB $25K.
Degree of ambiguity (in Brown corpus)
- 11.5% of word types (40% of word tokens) are ambiguous
# of Tags 1 2 3 4 5 6 7
# of Words 35340 3760 264 61 12 2 1
#Ambiguity Problem is much serious in Korean
Part-of-speech tagging is one of the most important text analysis tasks used to classify words into
their part-of-speech and label them according the tagset which is a collection of tags used for the pos
tagging. Part-of-speech tagging also known as word classes or lexical categories
Language Analysis - Lexical Analysis - Implementation
Hannanum Kkma Komoran Mecab Twitter
하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG 하늘 / Noun
을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa
나 / N 날 / VV 나 / NP 나 / NP 나 / Noun
는 / J 는 / ETD 는 / JX 는 / JX 는 / Josa
자동차 / N 자동차 / NNG 자동차 /
NNG
자동차 /
NNG
자동차 /
Noun
Anal Result Comparison Library Performance Comparison
Language Analysis - Lexical Analysis - Implementation
[Code]
Language Analysis - Lexical Analysis - Implementation
[Code]
Language Analysis - Lexical Analysis - Implementation
[Code]
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-2-1.Lexical Analysis Basic Process
2-2-2.Deep Learning on Lexical Analysis
2-2-3.Prerequisite Knowledge
2-2-4.BiLstmCrf for Named Entity Recognition
Language Analysis - Lexical Analysis
[Deep Learning - Sequence Labeling - BiLSTM-CRF]
(1) Word Segmentation
(2) POS Tagging
(3) Chunking
(4) Clause Identification
(5) Named Entity Recognition
(6) Semantic Role Labeling
(7) Information Extraction
What we can do with sequence labeling What’s sequence labeling
Language Analysis - Lexical Analysis
[Deep Learning - Sequence Labeling - BiLSTM-CRF]
Word POS Chunk NE
West NNP B-NP B-MISC
Indian NNP I-NP I-MISC
all-around NN I-NP O
Phil NNP I-NP B-PER
Simons NNP I-NP I-PER
took VBD B-VP O
four CD B-NP O
for IN B-PP O
38 CD B-NP O
on IN B-PP O
Friday NNP B-NP O
iob data set example
POS Tag 의미
ttps://docs.google.com/spreadsheet/ccc?key=0ApcJghR6UMXxdEdU
RGY2YzIwb3dSZ290RFpSaUkzZ0E&usp=sharing
Chunk Tag 의미
B : Begin of Chunk
I : Continuation of Chunk
E: End of Chunk
NP : Noun
VP : Verb
NER BIO Tag 의미
B : Start with new Chunk
I : word inside Chunk
O: Outside of Chunk
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
BiLSTM-CRF Description
Before we Talk about
BiLstmCrf which is really important
algorithm for sequence labelling..
Let’s talk about necessary knowledge
that we have to know really briefly
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-2-1.Lexical Analysis Basic Process
2-2-2.Deep Learning on Lexical Analysis
2-2-3. Prerequisite Knowledge
2-2-4.BiLstmCrf for Named Entity Recognition
Language Analysis - Lexical Analysis - Check Prerequisite
[Those will be needed to understand what I am trying to explain]
Concept of perceptron
& Deep Neural Network
Concept of SoftMax
DNN & Matrix
Gradient Descent Back Propagation
Activation Functions
Language Analysis - Brief Explanation
# tf Graph input
x = tf.placeholder("float", [None, 784])
y = tf.placeholder("float", [None, 10])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([784, 256])),
'h2': tf.Variable(tf.random_normal([256, 256])),
'out': tf.Variable(tf.random_normal([256, 10]))
}
biases = {
'b1': tf.Variable(tf.random_normal([256])),
'b2': tf.Variable(tf.random_normal([256])),
'out': tf.Variable(tf.random_normal([10]))
}
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
pred = tf.matmul(layer_2, weights['out']) + biases['out']
hypothesis = tf.nn.softmax(pred )
# Define loss and optimizer
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis),
reduction_indices=1))
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
input Hidden Out
784
256
10
Hidden
256
784
256
786 256
256 10
256
S
O
F
T
M
A
X
Y=Activation(W*x + b)
[Error]
Cross
Entropy
W W1
A(W*x + b)
b
b
A(W*x + b)x
2
1
3
4
5
256
786
1
Language Analysis - Lexical Analysis - Check Prerequisite
[Those will be needed to understand what I am trying to explain]
Dynamic RNN BiDirectional LSTM
Word EmbeddingRecurrent Neural Network LSTM (Long Short Term Memory)
Language Analysis - Brief Explain
START 오늘 날씨 는 ? PAD PAD END
START 오늘 날씨 는 어때 ? PAD END
START 오늘 비가 오 려 나 ? END
Case of long sentence …
Vanishing Problem happens
Various length of data cause
waste of computing power
Here we have concept of Dynamic RNN
BiDirectional Lstm learn given data from backward Long Short Term Memory Cell
Cell State
https://brunch.co.kr/@chris-song/9
updateforget out
cell state
https://blog.altoros.com/the-magic-behind-google-translate-
sequence-to-sequence-models-and-tensorflow.html
Language Analysis - Word embedding
Word Embedding 이란 ?
텍스트를 구성하는 하나의 음소, 음절, 단어, 문장, 문서 단위를 수치화하여
표현하는 방법의 일종
장점 : 차원의 축소 , 의미적 유사성의 표현
단점 : 동음이의어 처리, 데이터 적을 경우 신경망 훈련시 신호 강도
Language Analysis - Word embedding - OneHot Encoding
Concept of OneHot Encoding
Language Analysis - Word embedding - Word2Vec
https://www.tensorflow.org/tutorials/word2vec
http://w.elnn.kr/search/
Concept of Word2Vector
Word2Vector Demo Site
Language Analysis - Word embedding - Word2Vec
C-Bow
the quick brown fox jumped over the lazy dog
([brown, jumped], fox)
window size : 1
brown
jumped
over
the
.
.
brown
jumped
over
fox
.
.
Input OutputHidden
Hidden Size Hidden Size
Vocab
Size
Data Set
Original
Text
Language Analysis - Word embedding - Word2Vec
the quick brown fox jumped over the lazy dog
(fox, brown), (fox, jumped)
window size : 1
brown
jumped
over
the
.
.
brown
jumped
over
fox
.
.
Input OutputHidden
Hidden Size Hidden Size
Vocab
Size
Data Set
Original
Text
Skip-Gram
Language Analysis - Word embedding - Doc2Vec
(1)PV-DM (2)PV-DBOW
(3)DM + DBOW (Vector Concat)
W2V W2V W2V
(4)AVG(TF-IDF * W2V)
the quick brown fox jumped over the lazy dog
(paragraph, the)
(paragraph, quick)
(paragraph, brown)
(paragraph, fox)
(paragraph, jumped)
.
([paragraph, quick, brown,
fox, juped], over)
([paragraph, quick, brown,
fox, juped,over],the)
vector vector vector
TF-IDF TF-IDF TF-IDF
X X X
vector
AVG
tfidf(t,d,D) = tf(t,d) x idf(t,D)
Language Analysis - Word embedding - TF-IDF
https://thinkwarelab.wordpress.com/2016/11/14/ir-tf-idf-%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B4%85%EC%8B%9C%EB%8B%A4/
http://www.popit.kr/bm25-elasticsearch-5-0%EC%97%90%EC%84%9C-%EA%B2%80%EC%83%89%ED%95%98%EB%8A%94-%EC%83%88%EB%A1%9C%EC%9A%B4-%EB%B0%A9%EB%B2%95/
Not exactly word embedding but used on nlp with deep learning pretty often
- Document similarity
- Words importance on document
- Used on search engine (like elasticsearch though it use BM25 for now)
Language Analysis - Word embedding - Char Embedding
- Introduce several ways to embed char as vector
안 녕 하 세 요
1
가 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
나 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
다 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
라 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
마 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
바 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
사 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
아 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
자 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
An Neung Ha Se Yo (ㅇ ㅏ ㄴ) (ㄴ ㅕ ㅇ) . . . .
2
a 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3
ㄱ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ㄴ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ㄷ 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ㄹ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ㅁ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
ㅂ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
ㅅ 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
ㅇ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
ㅈ 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Language Analysis - Word embedding - Word+Char
the quick brown fox jumped over the lazy dog
0.2 0.1 0.4 0.21 0 0 0
f o x fox
Word2Vector
0 1 0 0 0 0 1 0
OneHot
Encoding
OneHot
Encoding
OneHot
Encoding
1.Word2Vec 계열은 의미적 상관성을 잘 표현
2.OneHot 은 강한 신호적 특성으로 Train 에 효과적
3.Word 단위 Embedding 은 단어를 잘 기억함
4.Char 단위 Embedding 은 미훈련 단어 처리에 용이
Language Analysis - Word embedding - NGram
In case of Word2Vec it can represent only the trained word..
Words not exactly match the pretrained dict will return “UNKNOWN”
So FastText (by Facebook ) use ngram on their word embedding algorithm..
에어컨 ~ 에어조단 비교
에어컨
['$$에', '$에어', '에어컨', '어컨$', '컨$$'] => 5
에어조단
['$$에', '$에어', '에어조', '어조단', '조단$', '단$$'] => 6
일치
['$$에', '$에어'] => 2
점수
일치 2건 / 중복제거 전체 7건 => 0.2222
http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/
Language Analysis - Word embedding - vector distance
Cosine Similarity
from math import*
def square_rooted(x):
return round(sqrt(sum([a*a for a in x])),3)
def cosine_similarity(x,y):
numerator = sum(a*b for a,b in zip(x,y))
denominator = square_rooted(x)*square_rooted(y)
return round(numerator/float(denominator),3)
print cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15])
Language Analysis - Word embedding - Implementation
OneHot Encoding : Simple Test Code show concept of onehot
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
[Code]
Language Analysis - Word embedding - Implementation
Word2Vector : Using Gensim word2vec package
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
Language Analysis - Word embedding - Implementation
FastText : FaceBook fasttext with gensim wrapper
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
Language Analysis - Word embedding - Implementation
FastText : Possible to use pretrained vector and do find tuning on it
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
Language Analysis - Word embedding - Implementation
N-grams are simply all combinations of adjacent words or letters of length n that you can
find in your source text.
Language Analysis - Word embedding - Implementation
For large dataset word2vec training GPU acceleration is needed
You can also think about using Tensorflow or Keras for training model
https://github.com/SimonPavlik/word2vec-keras-in-gensim/blob/keras106/word2veckeras/word2veckeras.py
https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-2-1.Lexical Analysis Basic Process
2-2-2.Deep Learning on Lexical Analysis
2-2-3. Other prerequisite Knowledge
2-2-4.BiLstmCrf for Named Entity Recognition
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
BiLSTM-CRF Description
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
김승우 B-PERSON
전화번호 B-TARGET
검색 O
김승우 B-PERSON
이메일 B-TARGET
검색 O
김승우 B-PERSON
이미지 B-TARGET
검색 O
IOB Data
김승우 전화번호 검색
김승우 이메일 검색
김승우 이미지 검색
Plain Data
Sentence
Splitting
Token Morphing
Part of
Speech
Tagging
Lexical Analysis
Word2Vector
OneHot Encoding
1 0 0 0
0 1 0 0
0 0 1 0
김승우
전화번호
이메일
검색
B-PERSON
B-TARGET
김
우
승
Index
List
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
김승우
전화번호
이메일
검색
B-PERSON
B-TARGET
김
우
승
Index
List
[Code]
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF] 김
우
승
김승우
전화번호
이메일
Concat Vector
[Code]
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
Concat Vector
김승우
전화번호
이메일
검색
B-PERSONB-TARGET
BiLstm
Fully Connected Layer
B-? B-? B-?
[Code]
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
Conditional Random Field Soft Max
[Code]
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
http://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
Probabilistic Model for sequence data segmentation and labeling
https://www.slideshare.net/kanimozhiu/tdm-probabilistic-models-part-2
he first method makes local choices. In other words, even if we capture some information from the
context in our hh thanks to the bi-LSTM, the tagging decision is still local. We don’t make use of the
neighbooring tagging decisions. For instance, in New York, the fact that we are tagging York as a
location should help us to decide that New corresponds to the beginning of a location. Given a
sequence of words w1,…,wmw1,…,wm, a sequence of score vectors s1,…,sms1,…,sm and a
sequence of tags y1,…,ymy1,…,ym, a linear-chain CRF defines a global score s∈Rs∈R
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
Gradient
Descent
Momentum
NAG
Adagrad
Adadelta
Rmsprop
Adam
[Code]
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
https://arxiv.org/pdf/1705.08292.pdf
"Gradient descent (GD)나 Stochastic gradient descent (SGD)를 이용하여 찾은 solution이
다른 adaptive methods (e.g. AdaGrad, RMSprop, and Adam)으로 찾은 solution보다 훨씬
generalization 측면에서 뛰어나다."
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia C. Wilson] , Rebecca Roelofs] ,
Mitchell Stern] , Nathan Srebro† , and Benjamin Recht]∗ ] University of California, Berkeley. † Toyota
Technological Institute at Chicago May 24, 2017
There is no optimizer best for all cases!!
When to use adaptive optimizer?
If input embedding vectors are sparse, it’s better to use adaptive optimizer!
Language Analysis - Lexical Analysis - Sequence Labeling
[Deep Learning - BiLSTM-CRF]
Real Project BiLstm Result Sample Code Predict Test Result
Test data Not Included in Train Set
Predicts well
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-3.Syntactic Analysis
2-3-1.Dependency Parsing
2-3-2.Google SyntaxNet with Docker
Language Analysis - Syntactic Analysis
구문 분석(構文分析, 문화어: 구문해석, 문장해석)은 문장을 그것을 이루고 있는
구성 성분으로 분해하고 그들 사이의 위계 관계를 분석하여 문장의 구조를
결정하는 것을 말한다.
Graph-Based Models Transition-Based Models
CYK Style Parsing MST finding Algorithm Projective & Non Projective Model
Language Analysis - Syntactic Analysis
Transition-Based Models
Sentence W
Repeat until all words have their head
- Select two target words in data structure
(One dependent & one head candidate)
- Deterministically predict next parsing action from parsing model
- Modify structure according parsing action
C0 -> C1 -> C2 -> ……..C8 -> C9 -> C10 -> .… -> Cm D-tree
t1 t2 t3 t8 t9 t10 tm
Oracle
(Classifier)
Predict the best
transition
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Assume that we are given an oracle :
- for any non-terminal configuration, it can predict the correct transition
(for deterministic parsing)
- That is, it takes two words & magically gives us the dependency
relation b/w item if one exists
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Shift :
Move Economic from buffer B to stack S
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Left-arc :
Add left-arc (news, Economic, amod) to arc set A
Remove Economic from stack (since it now has head in A)
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Shift :
Move news from buffer B to stack S
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Left-arc :
Add left-arc (had, news, nsubj) to A
Remove news from stack (since it now has head in A)
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (ROOT, had, root) to A
keep had in stack : because it can have other dependents on the right
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Left-arc :
Add left-arc (effect, little, amod) to A
Remove little from stack (since it now has head in A)
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (had, effect, dobj) to A
Keep effect in stack : because it can have other dependents on right
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (effect, on, prep) to A
Keep on in stack : because it can have other dependents on the right
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Shift :
Move financial from buffer B to stack S
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Left-arc :
Add left-arc (market, financial, amod) to A
Remove financial from stack (since it now has head in A)
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (on, markets, pmod) to A
Keep markets in stack : because it can have other dependents on the right
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Reduce :
Remove markets, on, effect from stack (since they already have head in A)
※ All decisions like right-arc, left-arc, reduce, shift will be made by oracle
Language Analysis - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (had, period, p) to A
Keep period in stack
Done !
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-3.Syntactic Analysis
2-3-1.Dependency Parsing
2-3-2.Google SyntaxNet with Docker
Language Analysis - Syntactic Analysis - Syntax Net
We show this layout in the schematic below: the state of the system (a stack and a buffer, visualized
below for both the POS and the dependency parsing task) is used to extract sparse features, which
are fed into the network in groups. We show only a small subset of the features to simplify the
presentation in the schematic
Google SyntaxNet with Deep Learning - Pos Tagging
http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf
Language Analysis - Syntactic Analysis - Syntax Net
Google SyntaxNet with Deep Learning - A Fast and Accurate Dependency Parser using Neural Networks
https://arxiv.org/pdf/1603.06042.pdf
1 2 3
1 I _ PRP PRP _ 2 nsubj _ _
2 knew _ VBD VBD _ 0 ROOT _ _
3 I _ PRP PRP _ 5 nsubj _ _
4 could _ MD MD _ 5 aux _ _
5 do _ VB VB _ 2 ccomp _ _
6 it _ PRP PRP _ 5 dobj _ _
7 properly _ RB RB _ 5 advmod _ _
8 if _ IN IN _ 9 mark _ _
9 given _ VBN VBN _ 5 advcl _ _
10 the _ DT DT _ 12 det _ _
11 right _ JJ JJ _ 12 amod _ _
12 kind _ NN NN _ 9 dobj _ _
13 of _ IN IN _ 12 prep _ _
14 support _ NN NN _ 13 pobj _ _
15 . _ . . _ 2 punct _ _
18 units
(1),(2),(3)
18 units
(1),(2),(3)
12 units
(2),(3)
(1) The top 3 words on the stack and buffer: s1, s2, s3, b1, b2, b3; => 6
(2) The first and second leftmost / rightmost children of the top two words
on the stack: lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2. => 8
(3) The leftmost of leftmost / rightmost of rightmost children of the top two
words on the stack: lc1(lc1(si)), rc1(rc1(si)), i = 1, 2. => 4
Language Analysis - Syntactic Analysis - Syntax Net
Google SyntaxNet with Deep Learning - Local Parser
1. SHIFT: Push another word onto the top of the stack, i.e. shifting one token from the buffer to
the stack.
2. LEFT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an
arc pointing to the left. Push the first word back on the stack.
3. RIGHT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an
arc point to the right. Push the second word back on the stack.
Language Analysis - Syntactic Analysis - Syntax Net
As we describe in the paper, there are several problems with the locally normalized models we just
trained. The most important is the label-bias problem: the model doesn't learn what a good parse
looks like, only what action to take given a history of gold decisions. This is because the scores are
normalized locally using a softmax for each decision.
Google SyntaxNet with Deep Learning - Global Training
Language Analysis - Syntactic Analysis - Syntax Net
What’s Beam Search Algorithm on RNN ?
https://www.youtube.com/watch?v=UXW6Cs82UKo
Instead of try only the best every iteration, try all cases to the end and choose the sum is maximum.
But if you try to calculate all cases algorithms will be too heavy, so remain only the best few every
step and remove others (pruning). This is for find global maximum predict result .
Language Analysis - Syntactic Analysis - Syntax Net
http://universaldependencies.org/
Google SyntaxNet do not support Korean as a default language.
But as we can see bellow, we can train the model with Sejong corpus data.
Though we have to covert the format for SyntaxNet to understand.
Google SyntaxNet with Deep Learning - How about Korean
Language Analysis - Syntactic Analysis - Syntax Net
Demo Site (we also use samples on this site)
http://sejongpsg.ddns.net/syntaxnet/psg_tree.htm
SyntaxNet Korean with Docker (We pretrained Korean corpus and set up webserver for service)
https://github.com/TensorMSA/tensormsa_syntax_docker
Google SyntaxNet with Deep Learning - Test it by yourself
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-3.Syntactic Analysis
2-4.Semantic Analysis
2-4-1.Semantic Role Labeling
2-4-2.Char CNN for Sentence Classification
2-5.Discourse Analysis
Sentential semantics
- Semantic role labeling (SRL)
- Phrase similarity (=paraphrase)
- Sentence Classification, Sentence Emotion Analysis and etc
Language Analysis - Semantic Analysis
What is Semantic in study of language
Three perspectives on meaning
- Lexical semantics : individual words
- Sentential semantics : individual sentences
- Discourse or Pragmatics : longer piece of text or conversation
NLP Tasks for Semantics
Language Analysis - Semantic Analysis - SRL
What is Semantic Role Labeling (SRL)
SRL = Semantic roles express the abstract role that arguments of a predicate
can take in the event.
The police arrested the suspect in the park last night
Agent predicate Theme Location Time
Who did what to whom where when
Can we figure out that these sentences have the same meaning?
Can we figure out the bought, sold, purchase used on sentence with same meaning?
XYZ corporation bought the stock.
The sold the stock to XYZ corporation.
The stock was bought by XYZ corporation.
The purchase of the stock by XYZ corporation.
Language Analysis - Semantic Analysis - SRL
Common Semantic Role Labeling Architecture
http://naacl2013.naacl.org/Documents/semantic-role-labeling-part-1-naacl-2013-tutorial.pdf
Syntatic
Parse
Argument
Identification
Argument
Classification
Structural
Inference
Prune
Constituents
Candidates
Semantic
roles
Arguments
Step-1 Candidate Selection
- Parse the sentence
- Prune/filter the parse tree
(eliminate some tree constituents to speed up the execution)
Step-2 Argument Identification
- A binary classification of each node as Argument or NONE
- Local scoring
Step-3 Argument Classification
- A multi class (one-of-N) classification of all the argument candidates
- Global /joint scoring
ML
ML
ML
Language Analysis - Semantic Analysis - SRL
Exceptions to the Standard Architecture
1. Specialized parsing for SRL
- Syntactic parser trained to predict argument
candidates
- Semantic parsing = parsing + SRL
- SRL based on dependency parsing
2. Sequential labeling (instead of tree traversing)
- Motivated by Lack of full parse trees
Language Analysis - Semantic Analysis - SRL
Semantic Role Labeling Applications
Information : Anna is friend of mine.
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/neo4j/neo4j_basic.ipynb
Name NameRelation
session.run("MATCH (you:Person {name:'You'})"
"FOREACH (name in ['Anna'] |"
" CREATE (you)-[:FRIEND]->(:Person {name:name}))")
result = session.run("MATCH (you {name:'You'})-[:FRIEND]->(yourFriends)"
"RETURN you, yourFriends")
Neo4j Insert Query
Neo4j Jupyter example & visualize
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-3.Syntactic Analysis
2-4.Semantic Analysis
2-4-1.Semantic Role Labeling
2-4-2.Char CNN for Sentence Classification
2-5.Discourse Analysis
Language Analysis - Semantic Analysis - Text Classification
Can we figure out that these sentences are positive or negative?
돈이 아깝지 않다 (긍정)
다시는 오지 않을 거야 (부정)
음식이 정말 맛이 없다 (부정)
이 식당은 정말 맛있다 (긍정)
Analysis negative and positive with dictionary
word “않다” is usually negative but ?
돈이 아깝지 않다 => Positive
다시는 오지 않을 거야 => Negative
There are many ways of doing text classification..
Traditional Rule based Machine Learning - Logistic & SVM
Deep Learning - CharCNN, RNN, Etc..
Language Analysis - Semantic Analysis - Text Classification
Language Analysis - Semantic Analysis - Char CNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Deep Learning Method CharCNN can be a solution for this kind of problem.
1 2 3
Language Analysis - Semantic Analysis - Char CNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Preparing Data for embedding is pretty similar to other neural networks
1. Word Embedding & OneHot didn’t show that much difference.
2. Personally, prefer to concat char onehot + word2vector오늘
메뉴
는
뭐
지?
PAD
PAD
1. Need to define sentence max length
2. Need padding like other nlp neural networks
Language Analysis - Semantic Analysis - Char CNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Using Multi Convolution Filter Size
Language Analysis - Semantic Analysis - Char CNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Other steps are same (fully connected > softmax > loss> optimizer)
Language Analysis - Semantic Analysis - Char CNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
You can see Char CNN can distinguish two sentences
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-3.Syntactic Analysis
2-4.Semantic Analysis
2-5.Discourse Analysis
2-5-1.RNN for understand global Conversation
2-5-2.Memory Network for global context
Language Analysis - Dialogue Understand
https://research.fb.com/publications
Getting to a natural language dialogue state with a chatbot remains
a challenge and will require a number of research breakthroughs. At
FAIR we have chosen to tackle the problem from both ends:
general AI and reasoning by machines through communication as
well as conducting research grounded in current dialog systems,
using lessons learned from exposing actual chatbots to people.
The attempt to understand and interpret dialogue is not a new one.
As far back as 20 years, there were several efforts to build a machine
a person could talk to and teach how to have a conversation. These
incorporated technology and engineering, but were single purposed
with a very narrow focus, using pre-programmed scripted responses.
Thanks to progress in machine learning, particularly in the last few
years, having AI agents being able to converse with people in natural
language has become a more realistic endeavor that is garnering
attention from both the research community and industry.
However, most of today’s dialogue systems continue to be scripted:
their natural language understanding module may be based on
machine learning, but what they execute or answer is in general
dictated by if/then statements or rules engines. While they are
improvement on what existed decades ago, it is in large part due to
the large databases of content used to create and script their
responses.
Amazing free papers!! read it right now!
Discourse Analysis with RNN
On conversation topic changes often so keep track the topic of conversation is important.
안녕
안녕
넌 뭐할줄 아니?
기능은 XX 가 있어요
사람 좀 찾아볼까해
누구를 찾아드려요?
포항 제강부 IT담당 홍길동 팀장의
그룹장을 좀 찾아줘 (지역:포항), 부서(제강부),업무 (IT), 이름
(홍길동), 직급(팀장), 상위자(그룹장) 을
검색합니다.
내일 점심 먹자고 문자 보내줘
“내일 점식 먹자고” 로 전송합니다.
아냐. 수고했어. 나가서 먹지
대화를 초기화 합니다.
State : 초기 상태
State : 도움말 상태
State : 사람 찾기 상태
State : 조회한 사람에 문자 보내기
State : 초기 상태
Dialogue State Tracking Challenge and Accepted papers
Discourse Analysis with RNN
http://www.phontron.com/paper/yoshino16iwsds.pdfhttp://www.colips.org/workshop/dstc4/papers.html
* Dialogue State Tracking using Long Short Term Memory Neural Networks
Koichiro Yoshino, Takuya Hiraoka, Graham Neubig and Satoshi Nakamura
Let’s Predict intent of sentence on the conversation.
Basic idea is keep the RNN state info and continue prediction from that point.
Intent
Intent
Intent
Dialogue state tracking with LSTM
Doc2Vec
Doc2Vec
Doc2Vec
T
I
M
E
L
I
N
E
Key point of this code is using RNN State Vector as memory
Discourse Analysis with RNN
http://localhost:8888/tree/chap05_nlp/state_tracking
2.Language Analysis Process
2-1.Voice Recognition
2-2.Lexical Analysis
2-3.Syntactic Analysis
2-4.Semantic Analysis
2-5.Discourse Analysis
2-5-1.RNN for understand global Conversation
2-5-2.Memory Network for global context
Goal of Dialogue understand and Memory network..
Memory Network for Dialogue understand
https://arxiv.org/pdf/1503.08895v4.pdf https://arxiv.org/pdf/1503.08895v4.pdf
Here is the network architecture of end2end memory network
Memory Network for Dialogue understand
https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/
https://www.slideshare.net/mobile/carpedm20/ss-63116251
(1) Feed data (“Sentences”, “Question”, “Target”)
Memory Network for Dialogue understand
1
2
3
Convert word index to embedding vector (Training target vector A,B,C)
Memory Network for Dialogue understand
1
3
Vocab
Size
2 Dim
Size
vocab size
Mem Size
Embedding A from given context sentences multiply Input Question Embedding (using embedding B
which is not defined on this code) ※ if it’s a first layer, if not it would be output of t-1 layer
Memory Network for Dialogue understand
1
2 1
2
multiply
Set embedding C(on the code it’s B) this is also the target variable for train
Memory Network for Dialogue understand
Embedding C(one the code it’s B) Multiply softmax result
Memory Network for Dialogue understand
For the last multiply question and output of memory network again
Memory Network for Dialogue understand
stack more memory layers
Memory Network for Dialogue understand
Memory Network for Dialogue understand
Set fully connected layer and calculate error with softmax cross entropy
Memory Network for Dialogue understand
On the given code I removed 90% of data set because we are using CPU for education..
So result may can be poor…..
Memory Network for Dialogue understand
bAbi Test Results .. (comparing DMN & MemNN )
https://research.fb.com/downloads/babi/
https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/
https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano
Dynamic Memory Networks Episodic Memory
Memory Network for Dialogue understand
Other types of memory networks ..
1.NLP & Deep Learning
2.Language Analysis Process
3.Language Generation
3-1.Basic Seq2Seq
3-2.Other types of Seq2Seq (Attention, Pointer)
Response Generator - Seq2Seq Model
Seq2Seq 모델은 기계번역, 요약, 간단한 질답 등 말 그대로 Input 과 Output 이 모두 Sequence Data 인
다양한 케이스에 적용이 가능하며, 이를 간단한 트릭을 적용하여 답변을 생성하는 용도로 사용할 수 있다.
- Input : 딥 러닝 재미 즐거운 일
- Output : 딥 러닝은 재미있고 즐거운 일이다
https://arxiv.org/pdf/1406.1078.pdf
https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow
Attention Mechanism Pointer Network
https://medium.com/@devnag/pointer-networks-in-tensorflow-
with-sample-code-14645063f264
Seq2Seq 의 변형된 형태들…
Response Generator - Seq2Seq Model
※ 다음 강의에서 자세히 진행할 예정인 내용으로 상세 내용 생략
http://localhost:8888/tree/chap05_nlp/attention_seq2seq
결국 Natural Language Process 는 "기존 자연어 처리 알고리즘", "Deep
Learning" Algorithm” 그리고 각종 “Software Architecture” 의 거대한
Combination
Conclusion
기존 자연어
처리 이론
Deep Learning
Theory
Software
Architecture
Conclusion
지금까지 이야기한 내용들을 연결하여 하나의 예를 만들어 보자
Web Document Web Crawler
Lexical (어휘) Analysis
Syntactic (구문) Analysis
Semantic (의미) Analysis
Ontology
Man
Filtering
information
Dialogue (구문) Analysis
information
Lexical (어휘) Analysis
Syntactic (구문) Analysis
Semantic (의미) Analysis
Dialogue (구문) Analysis
Web Server
Response Generation
IN
OUT
4.Tips
4-1.Hyper Parameter Random Search
4-2.Genetic Algorithm
4-3.Using multiple GPU Server
Hyper Parameter Optimization
Set of graph flow
Set of graph flow
Set of graph flow
Hyper Parm Range
~
Hyper Parameter
Random Search
Genetic Algorithm
Approximation
Hyper Parameter 서치를 위한 Genetic Algorithm 에 대한 설명
1 2 3
Hyper Parameter Optimization
Hyper Parameter Random Search 에 대한 설명
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
In this more challenging optimization problem random search is still effective, but not 300 RANDOM
SEARCH FOR HYPER-PARAMETER OPTIMIZATION superior as it was as in the case of neural
network optimization. Comparing to the 3-layer DBN results in Larochelle et al. (2007), random
search found a better model than the manual search in one data set (convex), an equally good
model in four (mnist basic, mnist rotated, rectangles, and rectangles images), and an inferior model
in three (mnist background images, mnist background random, mnist rotated background images).
Hyper Parameter Optimization
[1Layer] - Grid vs Random [3Layer] - Grid+Manual vs Random
Hyper Parameter Optimization
Genetic Algorithm on Hyper parameter optimization (Approximation)
https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164
Let’s say it takes five minutes to train and evaluate a network on your dataset. And let’s say we have four parameters with
five possible settings each. To try them all would take (5**4) * 5 minutes, or 3,125 minutes, or about 52 hours.
Now let’s say we use a genetic algorithm to evolve 10 generations with a population of 20 (more on what this means
below), with a plan to keep the top 25% plus a few more, so ~8 per generation. This means that in our first generation we
score 20 networks (20 * 5 = 100 minutes). Every generation after that only requires around 12 runs, since we don’t have
the score the ones we keep. That’s 100 + (9 generations * 5 minutes * 12 networks) = 640 minutes, or 11 hours.
https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html
use multi gpu
cluster servers
hyper parameter
random search
Hyper Parameter Optimization
Let’s see how hyperparameter optimization with genetic algorithm works .. . ..
http://localhost:8888/tree/chap05_nlp/automl
다음 강의 목표
NLP 관점에서 Deep Learning 을 적용하기 위한 데이터와 모델에 대한
이해를 돕기위한 강의를 진행하였습니다.
다음 시간에는 이러한 재료들을 모아서 아키택쳐 관점에서 응용하고
활용하기 위한 방법들에 대해서 강의하고자 합니다.
감사합니다.

More Related Content

What's hot

Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)Kobkrit Viriyayudhakorn
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analyticsshanbady
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Pythonshanbady
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyoutsider2
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019 Alexis Agahi
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Jeongkyu Shin
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing WorkshopLakshya Sivaramakrishnan
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkJacob Perkins
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 

What's hot (20)

Introduction to NLTK
Introduction to NLTKIntroduction to NLTK
Introduction to NLTK
 
NLTK
NLTKNLTK
NLTK
 
Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)Thai Text processing by Transfer Learning using Transformer (Bert)
Thai Text processing by Transfer Learning using Transformer (Bert)
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Nltk
NltkNltk
Nltk
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
Python NLTK
Python NLTKPython NLTK
Python NLTK
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
Latest trends in NLP - Exploring BERT
Latest trends in NLP -  Exploring BERTLatest trends in NLP -  Exploring BERT
Latest trends in NLP - Exploring BERT
 
BERT
BERTBERT
BERT
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talk
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 

Viewers also liked

유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술NAVER D2
 
[216]네이버 검색 사용자를 만족시켜라! 의도파악과 의미검색
[216]네이버 검색 사용자를 만족시켜라!   의도파악과 의미검색[216]네이버 검색 사용자를 만족시켜라!   의도파악과 의미검색
[216]네이버 검색 사용자를 만족시켜라! 의도파악과 의미검색NAVER D2
 
[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기NAVER D2
 
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화NAVER D2
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machinesNAVER D2
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...NAVER D2
 
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼NAVER D2
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova musicNAVER D2
 
[213]building ai to recreate our visual world
[213]building ai to recreate our visual world[213]building ai to recreate our visual world
[213]building ai to recreate our visual worldNAVER D2
 
[242]open stack neutron dataplane 구현
[242]open stack neutron   dataplane 구현[242]open stack neutron   dataplane 구현
[242]open stack neutron dataplane 구현NAVER D2
 
[215]streetwise machine learning for painless parking
[215]streetwise machine learning for painless parking[215]streetwise machine learning for painless parking
[215]streetwise machine learning for painless parkingNAVER D2
 
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbmsNAVER D2
 
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법NAVER D2
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템NAVER D2
 
인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템NAVER D2
 
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기NAVER D2
 
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스NAVER D2
 
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기NAVER D2
 

Viewers also liked (20)

유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술
[244]네트워크 모니터링 시스템(nms)을 지탱하는 기술
 
[216]네이버 검색 사용자를 만족시켜라! 의도파악과 의미검색
[216]네이버 검색 사용자를 만족시켜라!   의도파악과 의미검색[216]네이버 검색 사용자를 만족시켜라!   의도파악과 의미검색
[216]네이버 검색 사용자를 만족시켜라! 의도파악과 의미검색
 
[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기
 
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
[231]운영체제 수준에서의 데이터베이스 성능 분석과 최적화
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
[246]reasoning, attention and memory toward differentiable reasoning machines
[246]reasoning, attention and memory   toward differentiable reasoning machines[246]reasoning, attention and memory   toward differentiable reasoning machines
[246]reasoning, attention and memory toward differentiable reasoning machines
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...
 
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼
[224]nsml 상상하는 모든 것이 이루어지는 클라우드 머신러닝 플랫폼
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
[213]building ai to recreate our visual world
[213]building ai to recreate our visual world[213]building ai to recreate our visual world
[213]building ai to recreate our visual world
 
[242]open stack neutron dataplane 구현
[242]open stack neutron   dataplane 구현[242]open stack neutron   dataplane 구현
[242]open stack neutron dataplane 구현
 
[215]streetwise machine learning for painless parking
[215]streetwise machine learning for painless parking[215]streetwise machine learning for painless parking
[215]streetwise machine learning for painless parking
 
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms
[223]rye, 샤딩을 지원하는 오픈소스 관계형 dbms
 
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법
[222]neural machine translation (nmt) 동작의 시각화 및 분석 방법
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템
 
인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템인공지능추천시스템 airs개발기_모델링과시스템
인공지능추천시스템 airs개발기_모델링과시스템
 
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
 
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
 
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기
[213] 의료 ai를 위해 세상에 없는 양질의 data 만드는 도구 제작하기
 

Similar to NLP Deep Learning with Tensorflow

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShashank Shisodia
 
Data Analytics using R with Yelp Dataset
Data Analytics using R with Yelp DatasetData Analytics using R with Yelp Dataset
Data Analytics using R with Yelp DatasetCédric Poottaren
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxnikshaikh786
 
Pos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil TextsPos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil Textsijcnes
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Tagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event CategorizationTagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event CategorizationEditor IJCATR
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptpavankalyanadroittec
 

Similar to NLP Deep Learning with Tensorflow (20)

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
NLP todo
NLP todoNLP todo
NLP todo
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
Data Analytics using R with Yelp Dataset
Data Analytics using R with Yelp DatasetData Analytics using R with Yelp Dataset
Data Analytics using R with Yelp Dataset
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Top 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data ScientistsTop 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data Scientists
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptx
 
Pos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil TextsPos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil Texts
 
Nlp
NlpNlp
Nlp
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
NLP
NLPNLP
NLP
 
Tagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event CategorizationTagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event Categorization
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 

More from seungwoo kim

Graph neural network #2-2 (heterogeneous graph transformer)
Graph neural network #2-2 (heterogeneous graph transformer)Graph neural network #2-2 (heterogeneous graph transformer)
Graph neural network #2-2 (heterogeneous graph transformer)seungwoo kim
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)seungwoo kim
 
Graph neural network 2부 recommendation 개요
Graph neural network  2부  recommendation 개요Graph neural network  2부  recommendation 개요
Graph neural network 2부 recommendation 개요seungwoo kim
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부seungwoo kim
 
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsEnhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsseungwoo kim
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsseungwoo kim
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researchesseungwoo kim
 
Siamese neural networks+Bert
Siamese neural networks+BertSiamese neural networks+Bert
Siamese neural networks+Bertseungwoo kim
 
MRC recent trend_ppt
MRC recent trend_pptMRC recent trend_ppt
MRC recent trend_pptseungwoo kim
 

More from seungwoo kim (10)

Graph neural network #2-2 (heterogeneous graph transformer)
Graph neural network #2-2 (heterogeneous graph transformer)Graph neural network #2-2 (heterogeneous graph transformer)
Graph neural network #2-2 (heterogeneous graph transformer)
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 
Graph neural network 2부 recommendation 개요
Graph neural network  2부  recommendation 개요Graph neural network  2부  recommendation 개요
Graph neural network 2부 recommendation 개요
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부
 
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsEnhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researches
 
Albert
AlbertAlbert
Albert
 
Siamese neural networks+Bert
Siamese neural networks+BertSiamese neural networks+Bert
Siamese neural networks+Bert
 
MRC recent trend_ppt
MRC recent trend_pptMRC recent trend_ppt
MRC recent trend_ppt
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

NLP Deep Learning with Tensorflow

  • 1. NLP 에 대한 이해와 Tensorflow 를 활용한 실무 적용 WRITTEN BY SeungWooKim tmddno1@gmail.com
  • 2. 현 POSCO IT 사업부 - AI TFT 리더 POSCO IT 사업부 AI 프로젝트 지원 FrameWork 개발 리더 POSCO AI Chat Bot 시범 서비스 개발 리더 POSCO ICT BigData & AI 사내 강사 성균관대학교 컴퓨터 공학 전공 tmddno1@gmail.com
  • 3. 1. 강의 도커 환경 https://github.com/TensorMSA/skp_edu_docker 2. 강의 소스 코드 git clone https://github.com/TensorMSA/tensormsa_jupyter.git
  • 4. 강의 목표 "피자 주문을 ChatBot Messenger 를 통해서 서비스 하고 싶다.. 어떤 데이터를 수집하고, 어떤 신경망을 사용하고, 어떻게 아키택쳐를 구성해야 목표를 달성 할 수 있을까?" 예를 들어 위와 같이 자연어 처리와 관련된 어떤 문제가 주어졌을 때 데이터와 딥러닝 관점에서 문제를 접근 할 수 있는 통찰력 획득 [다음 세션] 이번 시간에 배운 재료를 아키택쳐 관점에서의 어플리케이션 레벨에서 적용하고 응용하는 방법에 대한 세션
  • 5. 1.NLP & Deep Learning 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3.Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition 2-3.Syntactic Analysis ㅛ 2-3-1.Dependency Parsing 2-3-2.Google SyntaxNet with Docker 2-4.Semantic Analysis 2-4-1.Semantic Role Labeling 2-4-2.Char CNN for Sentence Classification 2-5.Discourse Analysis 2-5-1.RNN for understand global Conversation
  • 6. 3.Language Generation 3-1.Basic Seq2Seq 3-2.Other types of Seq2Seq (Attention, Pointer) 4.Tips 4-1.Hyper Parameter Random Search 4-2.Genetic Algorithm for Hyper Parameter Search 4-3.Auto Hyper Parameter Search with Multi GPU Server
  • 7. 1.NLP & Deep Learning
  • 8. NLP and Deep Learning Today’s Focus 이미지등 다른 분야와 마찬가지로 DL 이 좋은 성능을 보여주지만, 분야의 특성상 100% DL 로 대체될 수는 없다. 기존 연구 분야에 대한 이해 중요 https://www.slideshare.net/ssuser06e0c5/ss-64417928
  • 9. What’s NLP (Natural Language Process) ? Let’s find out with examples
  • 10. NLP Applications Mostly Solved Making Good Progress Still Really Hard Spam Detection (스팸분석) Text Categorization (텍스트 분류) Part of Speech Tagging (단어 분석) Named Entity Recognition (의미 구분 분석) Information Extraction (정보 추출) Sentiment Analysis (감정분석) Coreference Resolution (같은 단어 복수 참조) Word Sense Disambiguation (복수 의미 분류) Syntactic Parsing (구문해석) Machine Translation (기계번역) Semantic Search (의미 분석 검색) Question & Answer (질의 응답) Textual inference (문장 추론) Summarization (텍스트 요약) Discourse & Dialog (대화 & 토론)
  • 11. NLP Applications Text Categorization Text Classification assigns one or more classes to a document according to their content. Classes are selected from a previously established taxonomy (a hierarchy of catergories or classes). Spam Detection Spam Detection is also the part of Text Classification problem. Part of Speech grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context
  • 12. NLP Applications Low Level Information Extraction
  • 13. NLP Applications Information Extraction on Broader view https://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwievZKlmMzVAhVCgrwKHbM_D88QFggyMAE&url=https%3A %2F%2Fweb.stanford.edu%2Fclass%2Fcs124%2Flec%2FInformation_Extraction_and_Named_Entity_Recognition.pptx&usg=AFQjCNFUT9ZjvrDrx F9su0J9KiWobVP4Kg Rule Based Extraction Named Entity recognition Syntax Anal Relation Search Ontology Information Extraction
  • 14. NLP ApplicationsNLP Applications Coreference Resolution I did not vote for the Donald Trump because I think he is too reckless Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction. Deep Reinforcement Learning for Mention-Ranking Coreference Models Improving Coreference Resolution by Learning Entity-Level Distributed Representations https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
  • 15. NLP ApplicationsNLP Applications Word Sense Disambiguation [Example] 1. a type of fish 2. tones of low frequency and the sentences: 1. I went fishing for some sea bass. 2. The bass line of the song is too weak. http://www.cs.cornell.edu/courses/cs4740/2014sp/lectures/wsd-1.pdf supervised way lable data example simi-supervised way ontology based
  • 16. NLP Applications Syntatic Parsing syntatic parsing is Find structural relationships between words in a sentence https://web.stanford.edu/~jurafsky/slp3/12.pdf
  • 17. NLP Applications Machine Translation Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Spanish).
  • 18. NLP Applications Semantic Search Semantic search seeks to improve search accuracy by understanding a searcher’s intent through contextual meaning. Question and Answer Able to answer questions in natural language based on Knowledge data (usually ontology) ex) Best example is IBM Watson Textural Inference Recognize, generate, or extract pairs <T,H> of natural language expressions, such that a human who reads (and trusts) T would infer that His most likely also true Summarization Extracting interesting parts of the text and create a summary by using these parts of the text and allow for rephrasings to make summary more grammatically correct. Discourse & Dialog Do conversation with understanding the whole history of dialog and semantic meaning of speaker.
  • 19. Level of NLP ○ pragmatics : use of language ○ Semantics : meaning of words & sentences ○ (Surface) Syntax : Phrase & Sentence ○ Morphology : morpheme, word ○ Phonology : phoneme (abstract unit of speech sound) ○ Phonetics : phone (acoustic unit of speech sound) 음성과 단어 단어의 구성 단어의 순서 단어&문장 의미 대화의도 & 맥락 High Low
  • 21. Spoken Utterance Lexical (어휘) Analysis : Word Structure Speech Recognition Written Utterance Syntactic (구문) Analysis : Sentence Structure Morphemes, Word Semantic (의미) Analysis : Meaning of Words & Sentence Sentence Discourse (대화) Analysis : Relationship between sentence Context beyond Sentence Language Analysis
  • 22. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-5.Discourse Analysis
  • 23. Language Analysis - Speech Recognition AI Speaker Alexa Alexa Microphone System
  • 24. Language Analysis - Speech Recognition Deep Learning for Classification Hidden Markov Model for Language Model
  • 25. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3.Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  • 26. Language Analysis - Lexical Analysis Main Factors on Lexical Analysis Sentence Splitting Tokenizing Morphological Part of Speech Tagging
  • 27. Lexical Analysis - Sentence Splitting & Tokenizing What if there is no line change char (‘n’) ? Where is the EOS point? What if sentence is not separated into words properly with space? [Examples] [Problems]
  • 28. Language Analysis - Lexical Analysis - Morphological Word stemming lemmatization Love Lov Love Loves Lov Love Loved Lov Love Loving Lov Love Innovation Innovat Innovation Innovations Innovat Innovation Innovate Innovat Innovate Innovates Innovat Innovate Innovative Innovat Innovative Morphing Examples Stemming & lemmatization Morphology is process of finding morpheme which is smallest“meaningful unit (Lexical meaning or grammatical function)” and other features like stem in a language that carries information.
  • 29. Language Analysis - Lexical Analysis - Part of Speech Tagging Ambiguity “that” can be a subordinating conjunction or a relative pronoun - The fact that/IN you’re here - A man that/WDT I know “Around” can be a preposition, particle, or adverb - I bought it at the shop around/IN the corner. - I never got around/RP to getting a car. - A new Toyota Prius costs around/RB $25K. Degree of ambiguity (in Brown corpus) - 11.5% of word types (40% of word tokens) are ambiguous # of Tags 1 2 3 4 5 6 7 # of Words 35340 3760 264 61 12 2 1 #Ambiguity Problem is much serious in Korean Part-of-speech tagging is one of the most important text analysis tasks used to classify words into their part-of-speech and label them according the tagset which is a collection of tags used for the pos tagging. Part-of-speech tagging also known as word classes or lexical categories
  • 30. Language Analysis - Lexical Analysis - Implementation Hannanum Kkma Komoran Mecab Twitter 하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG 하늘 / Noun 을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa 나 / N 날 / VV 나 / NP 나 / NP 나 / Noun 는 / J 는 / ETD 는 / JX 는 / JX 는 / Josa 자동차 / N 자동차 / NNG 자동차 / NNG 자동차 / NNG 자동차 / Noun Anal Result Comparison Library Performance Comparison
  • 31. Language Analysis - Lexical Analysis - Implementation [Code]
  • 32. Language Analysis - Lexical Analysis - Implementation [Code]
  • 33. Language Analysis - Lexical Analysis - Implementation [Code]
  • 34. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3.Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  • 35. Language Analysis - Lexical Analysis [Deep Learning - Sequence Labeling - BiLSTM-CRF] (1) Word Segmentation (2) POS Tagging (3) Chunking (4) Clause Identification (5) Named Entity Recognition (6) Semantic Role Labeling (7) Information Extraction What we can do with sequence labeling What’s sequence labeling
  • 36. Language Analysis - Lexical Analysis [Deep Learning - Sequence Labeling - BiLSTM-CRF] Word POS Chunk NE West NNP B-NP B-MISC Indian NNP I-NP I-MISC all-around NN I-NP O Phil NNP I-NP B-PER Simons NNP I-NP I-PER took VBD B-VP O four CD B-NP O for IN B-PP O 38 CD B-NP O on IN B-PP O Friday NNP B-NP O iob data set example POS Tag 의미 ttps://docs.google.com/spreadsheet/ccc?key=0ApcJghR6UMXxdEdU RGY2YzIwb3dSZ290RFpSaUkzZ0E&usp=sharing Chunk Tag 의미 B : Begin of Chunk I : Continuation of Chunk E: End of Chunk NP : Noun VP : Verb NER BIO Tag 의미 B : Start with new Chunk I : word inside Chunk O: Outside of Chunk
  • 37. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] BiLSTM-CRF Description Before we Talk about BiLstmCrf which is really important algorithm for sequence labelling.. Let’s talk about necessary knowledge that we have to know really briefly
  • 38. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3. Prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  • 39. Language Analysis - Lexical Analysis - Check Prerequisite [Those will be needed to understand what I am trying to explain] Concept of perceptron & Deep Neural Network Concept of SoftMax DNN & Matrix Gradient Descent Back Propagation Activation Functions
  • 40. Language Analysis - Brief Explanation # tf Graph input x = tf.placeholder("float", [None, 784]) y = tf.placeholder("float", [None, 10]) # Store layers weight & bias weights = { 'h1': tf.Variable(tf.random_normal([784, 256])), 'h2': tf.Variable(tf.random_normal([256, 256])), 'out': tf.Variable(tf.random_normal([256, 10])) } biases = { 'b1': tf.Variable(tf.random_normal([256])), 'b2': tf.Variable(tf.random_normal([256])), 'out': tf.Variable(tf.random_normal([10])) } # Hidden layer with RELU activation layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1']) layer_1 = tf.nn.relu(layer_1) # Hidden layer with RELU activation layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']) layer_2 = tf.nn.relu(layer_2) # Output layer with linear activation pred = tf.matmul(layer_2, weights['out']) + biases['out'] hypothesis = tf.nn.softmax(pred ) # Define loss and optimizer cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), reduction_indices=1)) tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) input Hidden Out 784 256 10 Hidden 256 784 256 786 256 256 10 256 S O F T M A X Y=Activation(W*x + b) [Error] Cross Entropy W W1 A(W*x + b) b b A(W*x + b)x 2 1 3 4 5 256 786 1
  • 41. Language Analysis - Lexical Analysis - Check Prerequisite [Those will be needed to understand what I am trying to explain] Dynamic RNN BiDirectional LSTM Word EmbeddingRecurrent Neural Network LSTM (Long Short Term Memory)
  • 42. Language Analysis - Brief Explain START 오늘 날씨 는 ? PAD PAD END START 오늘 날씨 는 어때 ? PAD END START 오늘 비가 오 려 나 ? END Case of long sentence … Vanishing Problem happens Various length of data cause waste of computing power Here we have concept of Dynamic RNN BiDirectional Lstm learn given data from backward Long Short Term Memory Cell Cell State https://brunch.co.kr/@chris-song/9 updateforget out cell state https://blog.altoros.com/the-magic-behind-google-translate- sequence-to-sequence-models-and-tensorflow.html
  • 43. Language Analysis - Word embedding Word Embedding 이란 ? 텍스트를 구성하는 하나의 음소, 음절, 단어, 문장, 문서 단위를 수치화하여 표현하는 방법의 일종 장점 : 차원의 축소 , 의미적 유사성의 표현 단점 : 동음이의어 처리, 데이터 적을 경우 신경망 훈련시 신호 강도
  • 44. Language Analysis - Word embedding - OneHot Encoding Concept of OneHot Encoding
  • 45. Language Analysis - Word embedding - Word2Vec https://www.tensorflow.org/tutorials/word2vec http://w.elnn.kr/search/ Concept of Word2Vector Word2Vector Demo Site
  • 46. Language Analysis - Word embedding - Word2Vec C-Bow the quick brown fox jumped over the lazy dog ([brown, jumped], fox) window size : 1 brown jumped over the . . brown jumped over fox . . Input OutputHidden Hidden Size Hidden Size Vocab Size Data Set Original Text
  • 47. Language Analysis - Word embedding - Word2Vec the quick brown fox jumped over the lazy dog (fox, brown), (fox, jumped) window size : 1 brown jumped over the . . brown jumped over fox . . Input OutputHidden Hidden Size Hidden Size Vocab Size Data Set Original Text Skip-Gram
  • 48. Language Analysis - Word embedding - Doc2Vec (1)PV-DM (2)PV-DBOW (3)DM + DBOW (Vector Concat) W2V W2V W2V (4)AVG(TF-IDF * W2V) the quick brown fox jumped over the lazy dog (paragraph, the) (paragraph, quick) (paragraph, brown) (paragraph, fox) (paragraph, jumped) . ([paragraph, quick, brown, fox, juped], over) ([paragraph, quick, brown, fox, juped,over],the) vector vector vector TF-IDF TF-IDF TF-IDF X X X vector AVG
  • 49. tfidf(t,d,D) = tf(t,d) x idf(t,D) Language Analysis - Word embedding - TF-IDF https://thinkwarelab.wordpress.com/2016/11/14/ir-tf-idf-%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B4%85%EC%8B%9C%EB%8B%A4/ http://www.popit.kr/bm25-elasticsearch-5-0%EC%97%90%EC%84%9C-%EA%B2%80%EC%83%89%ED%95%98%EB%8A%94-%EC%83%88%EB%A1%9C%EC%9A%B4-%EB%B0%A9%EB%B2%95/ Not exactly word embedding but used on nlp with deep learning pretty often - Document similarity - Words importance on document - Used on search engine (like elasticsearch though it use BM25 for now)
  • 50. Language Analysis - Word embedding - Char Embedding - Introduce several ways to embed char as vector 안 녕 하 세 요 1 가 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 나 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 다 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 라 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 마 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 바 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 사 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 아 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 자 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 An Neung Ha Se Yo (ㅇ ㅏ ㄴ) (ㄴ ㅕ ㅇ) . . . . 2 a 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 e 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 f 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 g 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 h 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 i 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 ㄱ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄴ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄷ 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ㄹ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ㅁ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ㅂ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ㅅ 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ㅇ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ㅈ 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
  • 51. Language Analysis - Word embedding - Word+Char the quick brown fox jumped over the lazy dog 0.2 0.1 0.4 0.21 0 0 0 f o x fox Word2Vector 0 1 0 0 0 0 1 0 OneHot Encoding OneHot Encoding OneHot Encoding 1.Word2Vec 계열은 의미적 상관성을 잘 표현 2.OneHot 은 강한 신호적 특성으로 Train 에 효과적 3.Word 단위 Embedding 은 단어를 잘 기억함 4.Char 단위 Embedding 은 미훈련 단어 처리에 용이
  • 52. Language Analysis - Word embedding - NGram In case of Word2Vec it can represent only the trained word.. Words not exactly match the pretrained dict will return “UNKNOWN” So FastText (by Facebook ) use ngram on their word embedding algorithm.. 에어컨 ~ 에어조단 비교 에어컨 ['$$에', '$에어', '에어컨', '어컨$', '컨$$'] => 5 에어조단 ['$$에', '$에어', '에어조', '어조단', '조단$', '단$$'] => 6 일치 ['$$에', '$에어'] => 2 점수 일치 2건 / 중복제거 전체 7건 => 0.2222
  • 53. http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/ Language Analysis - Word embedding - vector distance Cosine Similarity from math import* def square_rooted(x): return round(sqrt(sum([a*a for a in x])),3) def cosine_similarity(x,y): numerator = sum(a*b for a,b in zip(x,y)) denominator = square_rooted(x)*square_rooted(y) return round(numerator/float(denominator),3) print cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15])
  • 54. Language Analysis - Word embedding - Implementation OneHot Encoding : Simple Test Code show concept of onehot http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/ [Code]
  • 55. Language Analysis - Word embedding - Implementation Word2Vector : Using Gensim word2vec package http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
  • 56. Language Analysis - Word embedding - Implementation FastText : FaceBook fasttext with gensim wrapper http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
  • 57. Language Analysis - Word embedding - Implementation FastText : Possible to use pretrained vector and do find tuning on it http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/ https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
  • 58. Language Analysis - Word embedding - Implementation N-grams are simply all combinations of adjacent words or letters of length n that you can find in your source text.
  • 59. Language Analysis - Word embedding - Implementation For large dataset word2vec training GPU acceleration is needed You can also think about using Tensorflow or Keras for training model https://github.com/SimonPavlik/word2vec-keras-in-gensim/blob/keras106/word2veckeras/word2veckeras.py https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py
  • 60. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-2-1.Lexical Analysis Basic Process 2-2-2.Deep Learning on Lexical Analysis 2-2-3. Other prerequisite Knowledge 2-2-4.BiLstmCrf for Named Entity Recognition
  • 61. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] BiLSTM-CRF Description http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
  • 62. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] 김승우 B-PERSON 전화번호 B-TARGET 검색 O 김승우 B-PERSON 이메일 B-TARGET 검색 O 김승우 B-PERSON 이미지 B-TARGET 검색 O IOB Data 김승우 전화번호 검색 김승우 이메일 검색 김승우 이미지 검색 Plain Data Sentence Splitting Token Morphing Part of Speech Tagging Lexical Analysis Word2Vector OneHot Encoding 1 0 0 0 0 1 0 0 0 0 1 0 김승우 전화번호 이메일 검색 B-PERSON B-TARGET 김 우 승 Index List
  • 63. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] 김승우 전화번호 이메일 검색 B-PERSON B-TARGET 김 우 승 Index List [Code]
  • 64. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] 김 우 승 김승우 전화번호 이메일 Concat Vector [Code]
  • 65. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Concat Vector 김승우 전화번호 이메일 검색 B-PERSONB-TARGET BiLstm Fully Connected Layer B-? B-? B-? [Code]
  • 66. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Conditional Random Field Soft Max [Code]
  • 67. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] http://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf Probabilistic Model for sequence data segmentation and labeling https://www.slideshare.net/kanimozhiu/tdm-probabilistic-models-part-2 he first method makes local choices. In other words, even if we capture some information from the context in our hh thanks to the bi-LSTM, the tagging decision is still local. We don’t make use of the neighbooring tagging decisions. For instance, in New York, the fact that we are tagging York as a location should help us to decide that New corresponds to the beginning of a location. Given a sequence of words w1,…,wmw1,…,wm, a sequence of score vectors s1,…,sms1,…,sm and a sequence of tags y1,…,ymy1,…,ym, a linear-chain CRF defines a global score s∈Rs∈R
  • 68. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Gradient Descent Momentum NAG Adagrad Adadelta Rmsprop Adam [Code]
  • 69. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] https://arxiv.org/pdf/1705.08292.pdf "Gradient descent (GD)나 Stochastic gradient descent (SGD)를 이용하여 찾은 solution이 다른 adaptive methods (e.g. AdaGrad, RMSprop, and Adam)으로 찾은 solution보다 훨씬 generalization 측면에서 뛰어나다." The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia C. Wilson] , Rebecca Roelofs] , Mitchell Stern] , Nathan Srebro† , and Benjamin Recht]∗ ] University of California, Berkeley. † Toyota Technological Institute at Chicago May 24, 2017 There is no optimizer best for all cases!! When to use adaptive optimizer? If input embedding vectors are sparse, it’s better to use adaptive optimizer!
  • 70. Language Analysis - Lexical Analysis - Sequence Labeling [Deep Learning - BiLSTM-CRF] Real Project BiLstm Result Sample Code Predict Test Result Test data Not Included in Train Set Predicts well http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
  • 71. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-3-1.Dependency Parsing 2-3-2.Google SyntaxNet with Docker
  • 72. Language Analysis - Syntactic Analysis 구문 분석(構文分析, 문화어: 구문해석, 문장해석)은 문장을 그것을 이루고 있는 구성 성분으로 분해하고 그들 사이의 위계 관계를 분석하여 문장의 구조를 결정하는 것을 말한다. Graph-Based Models Transition-Based Models CYK Style Parsing MST finding Algorithm Projective & Non Projective Model
  • 73. Language Analysis - Syntactic Analysis Transition-Based Models Sentence W Repeat until all words have their head - Select two target words in data structure (One dependent & one head candidate) - Deterministically predict next parsing action from parsing model - Modify structure according parsing action C0 -> C1 -> C2 -> ……..C8 -> C9 -> C10 -> .… -> Cm D-tree t1 t2 t3 t8 t9 t10 tm Oracle (Classifier) Predict the best transition
  • 74. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System
  • 75. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Assume that we are given an oracle : - for any non-terminal configuration, it can predict the correct transition (for deterministic parsing) - That is, it takes two words & magically gives us the dependency relation b/w item if one exists
  • 76. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move Economic from buffer B to stack S
  • 77. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (news, Economic, amod) to arc set A Remove Economic from stack (since it now has head in A)
  • 78. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move news from buffer B to stack S
  • 79. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (had, news, nsubj) to A Remove news from stack (since it now has head in A)
  • 80. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (ROOT, had, root) to A keep had in stack : because it can have other dependents on the right
  • 81. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (effect, little, amod) to A Remove little from stack (since it now has head in A)
  • 82. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (had, effect, dobj) to A Keep effect in stack : because it can have other dependents on right
  • 83. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (effect, on, prep) to A Keep on in stack : because it can have other dependents on the right
  • 84. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Shift : Move financial from buffer B to stack S
  • 85. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Left-arc : Add left-arc (market, financial, amod) to A Remove financial from stack (since it now has head in A)
  • 86. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (on, markets, pmod) to A Keep markets in stack : because it can have other dependents on the right
  • 87. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Reduce : Remove markets, on, effect from stack (since they already have head in A) ※ All decisions like right-arc, left-arc, reduce, shift will be made by oracle
  • 88. Language Analysis - Syntactic Analysis Transition-Based Models - Arc Eager Transition System Right-arc : Add right-arc (had, period, p) to A Keep period in stack Done !
  • 89. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-3-1.Dependency Parsing 2-3-2.Google SyntaxNet with Docker
  • 90. Language Analysis - Syntactic Analysis - Syntax Net We show this layout in the schematic below: the state of the system (a stack and a buffer, visualized below for both the POS and the dependency parsing task) is used to extract sparse features, which are fed into the network in groups. We show only a small subset of the features to simplify the presentation in the schematic Google SyntaxNet with Deep Learning - Pos Tagging http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf
  • 91. Language Analysis - Syntactic Analysis - Syntax Net Google SyntaxNet with Deep Learning - A Fast and Accurate Dependency Parser using Neural Networks https://arxiv.org/pdf/1603.06042.pdf 1 2 3 1 I _ PRP PRP _ 2 nsubj _ _ 2 knew _ VBD VBD _ 0 ROOT _ _ 3 I _ PRP PRP _ 5 nsubj _ _ 4 could _ MD MD _ 5 aux _ _ 5 do _ VB VB _ 2 ccomp _ _ 6 it _ PRP PRP _ 5 dobj _ _ 7 properly _ RB RB _ 5 advmod _ _ 8 if _ IN IN _ 9 mark _ _ 9 given _ VBN VBN _ 5 advcl _ _ 10 the _ DT DT _ 12 det _ _ 11 right _ JJ JJ _ 12 amod _ _ 12 kind _ NN NN _ 9 dobj _ _ 13 of _ IN IN _ 12 prep _ _ 14 support _ NN NN _ 13 pobj _ _ 15 . _ . . _ 2 punct _ _ 18 units (1),(2),(3) 18 units (1),(2),(3) 12 units (2),(3) (1) The top 3 words on the stack and buffer: s1, s2, s3, b1, b2, b3; => 6 (2) The first and second leftmost / rightmost children of the top two words on the stack: lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2. => 8 (3) The leftmost of leftmost / rightmost of rightmost children of the top two words on the stack: lc1(lc1(si)), rc1(rc1(si)), i = 1, 2. => 4
  • 92. Language Analysis - Syntactic Analysis - Syntax Net Google SyntaxNet with Deep Learning - Local Parser 1. SHIFT: Push another word onto the top of the stack, i.e. shifting one token from the buffer to the stack. 2. LEFT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an arc pointing to the left. Push the first word back on the stack. 3. RIGHT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an arc point to the right. Push the second word back on the stack.
  • 93. Language Analysis - Syntactic Analysis - Syntax Net As we describe in the paper, there are several problems with the locally normalized models we just trained. The most important is the label-bias problem: the model doesn't learn what a good parse looks like, only what action to take given a history of gold decisions. This is because the scores are normalized locally using a softmax for each decision. Google SyntaxNet with Deep Learning - Global Training
  • 94. Language Analysis - Syntactic Analysis - Syntax Net What’s Beam Search Algorithm on RNN ? https://www.youtube.com/watch?v=UXW6Cs82UKo Instead of try only the best every iteration, try all cases to the end and choose the sum is maximum. But if you try to calculate all cases algorithms will be too heavy, so remain only the best few every step and remove others (pruning). This is for find global maximum predict result .
  • 95. Language Analysis - Syntactic Analysis - Syntax Net http://universaldependencies.org/ Google SyntaxNet do not support Korean as a default language. But as we can see bellow, we can train the model with Sejong corpus data. Though we have to covert the format for SyntaxNet to understand. Google SyntaxNet with Deep Learning - How about Korean
  • 96. Language Analysis - Syntactic Analysis - Syntax Net Demo Site (we also use samples on this site) http://sejongpsg.ddns.net/syntaxnet/psg_tree.htm SyntaxNet Korean with Docker (We pretrained Korean corpus and set up webserver for service) https://github.com/TensorMSA/tensormsa_syntax_docker Google SyntaxNet with Deep Learning - Test it by yourself
  • 97. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-4-1.Semantic Role Labeling 2-4-2.Char CNN for Sentence Classification 2-5.Discourse Analysis
  • 98. Sentential semantics - Semantic role labeling (SRL) - Phrase similarity (=paraphrase) - Sentence Classification, Sentence Emotion Analysis and etc Language Analysis - Semantic Analysis What is Semantic in study of language Three perspectives on meaning - Lexical semantics : individual words - Sentential semantics : individual sentences - Discourse or Pragmatics : longer piece of text or conversation NLP Tasks for Semantics
  • 99. Language Analysis - Semantic Analysis - SRL What is Semantic Role Labeling (SRL) SRL = Semantic roles express the abstract role that arguments of a predicate can take in the event. The police arrested the suspect in the park last night Agent predicate Theme Location Time Who did what to whom where when Can we figure out that these sentences have the same meaning? Can we figure out the bought, sold, purchase used on sentence with same meaning? XYZ corporation bought the stock. The sold the stock to XYZ corporation. The stock was bought by XYZ corporation. The purchase of the stock by XYZ corporation.
  • 100. Language Analysis - Semantic Analysis - SRL Common Semantic Role Labeling Architecture http://naacl2013.naacl.org/Documents/semantic-role-labeling-part-1-naacl-2013-tutorial.pdf Syntatic Parse Argument Identification Argument Classification Structural Inference Prune Constituents Candidates Semantic roles Arguments Step-1 Candidate Selection - Parse the sentence - Prune/filter the parse tree (eliminate some tree constituents to speed up the execution) Step-2 Argument Identification - A binary classification of each node as Argument or NONE - Local scoring Step-3 Argument Classification - A multi class (one-of-N) classification of all the argument candidates - Global /joint scoring ML ML ML
  • 101. Language Analysis - Semantic Analysis - SRL Exceptions to the Standard Architecture 1. Specialized parsing for SRL - Syntactic parser trained to predict argument candidates - Semantic parsing = parsing + SRL - SRL based on dependency parsing 2. Sequential labeling (instead of tree traversing) - Motivated by Lack of full parse trees
  • 102. Language Analysis - Semantic Analysis - SRL Semantic Role Labeling Applications Information : Anna is friend of mine. http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/neo4j/neo4j_basic.ipynb Name NameRelation session.run("MATCH (you:Person {name:'You'})" "FOREACH (name in ['Anna'] |" " CREATE (you)-[:FRIEND]->(:Person {name:name}))") result = session.run("MATCH (you {name:'You'})-[:FRIEND]->(yourFriends)" "RETURN you, yourFriends") Neo4j Insert Query Neo4j Jupyter example & visualize
  • 103. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-4-1.Semantic Role Labeling 2-4-2.Char CNN for Sentence Classification 2-5.Discourse Analysis
  • 104. Language Analysis - Semantic Analysis - Text Classification Can we figure out that these sentences are positive or negative? 돈이 아깝지 않다 (긍정) 다시는 오지 않을 거야 (부정) 음식이 정말 맛이 없다 (부정) 이 식당은 정말 맛있다 (긍정) Analysis negative and positive with dictionary word “않다” is usually negative but ? 돈이 아깝지 않다 => Positive 다시는 오지 않을 거야 => Negative
  • 105. There are many ways of doing text classification.. Traditional Rule based Machine Learning - Logistic & SVM Deep Learning - CharCNN, RNN, Etc.. Language Analysis - Semantic Analysis - Text Classification
  • 106. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Deep Learning Method CharCNN can be a solution for this kind of problem. 1 2 3
  • 107. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Preparing Data for embedding is pretty similar to other neural networks 1. Word Embedding & OneHot didn’t show that much difference. 2. Personally, prefer to concat char onehot + word2vector오늘 메뉴 는 뭐 지? PAD PAD 1. Need to define sentence max length 2. Need padding like other nlp neural networks
  • 108. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Using Multi Convolution Filter Size
  • 109. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb Other steps are same (fully connected > softmax > loss> optimizer)
  • 110. Language Analysis - Semantic Analysis - Char CNN http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb You can see Char CNN can distinguish two sentences
  • 111. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-5.Discourse Analysis 2-5-1.RNN for understand global Conversation 2-5-2.Memory Network for global context
  • 112. Language Analysis - Dialogue Understand https://research.fb.com/publications Getting to a natural language dialogue state with a chatbot remains a challenge and will require a number of research breakthroughs. At FAIR we have chosen to tackle the problem from both ends: general AI and reasoning by machines through communication as well as conducting research grounded in current dialog systems, using lessons learned from exposing actual chatbots to people. The attempt to understand and interpret dialogue is not a new one. As far back as 20 years, there were several efforts to build a machine a person could talk to and teach how to have a conversation. These incorporated technology and engineering, but were single purposed with a very narrow focus, using pre-programmed scripted responses. Thanks to progress in machine learning, particularly in the last few years, having AI agents being able to converse with people in natural language has become a more realistic endeavor that is garnering attention from both the research community and industry. However, most of today’s dialogue systems continue to be scripted: their natural language understanding module may be based on machine learning, but what they execute or answer is in general dictated by if/then statements or rules engines. While they are improvement on what existed decades ago, it is in large part due to the large databases of content used to create and script their responses. Amazing free papers!! read it right now!
  • 113. Discourse Analysis with RNN On conversation topic changes often so keep track the topic of conversation is important. 안녕 안녕 넌 뭐할줄 아니? 기능은 XX 가 있어요 사람 좀 찾아볼까해 누구를 찾아드려요? 포항 제강부 IT담당 홍길동 팀장의 그룹장을 좀 찾아줘 (지역:포항), 부서(제강부),업무 (IT), 이름 (홍길동), 직급(팀장), 상위자(그룹장) 을 검색합니다. 내일 점심 먹자고 문자 보내줘 “내일 점식 먹자고” 로 전송합니다. 아냐. 수고했어. 나가서 먹지 대화를 초기화 합니다. State : 초기 상태 State : 도움말 상태 State : 사람 찾기 상태 State : 조회한 사람에 문자 보내기 State : 초기 상태
  • 114. Dialogue State Tracking Challenge and Accepted papers Discourse Analysis with RNN http://www.phontron.com/paper/yoshino16iwsds.pdfhttp://www.colips.org/workshop/dstc4/papers.html * Dialogue State Tracking using Long Short Term Memory Neural Networks Koichiro Yoshino, Takuya Hiraoka, Graham Neubig and Satoshi Nakamura
  • 115. Let’s Predict intent of sentence on the conversation. Basic idea is keep the RNN state info and continue prediction from that point. Intent Intent Intent Dialogue state tracking with LSTM Doc2Vec Doc2Vec Doc2Vec T I M E L I N E
  • 116. Key point of this code is using RNN State Vector as memory Discourse Analysis with RNN http://localhost:8888/tree/chap05_nlp/state_tracking
  • 117. 2.Language Analysis Process 2-1.Voice Recognition 2-2.Lexical Analysis 2-3.Syntactic Analysis 2-4.Semantic Analysis 2-5.Discourse Analysis 2-5-1.RNN for understand global Conversation 2-5-2.Memory Network for global context
  • 118. Goal of Dialogue understand and Memory network.. Memory Network for Dialogue understand https://arxiv.org/pdf/1503.08895v4.pdf https://arxiv.org/pdf/1503.08895v4.pdf
  • 119. Here is the network architecture of end2end memory network Memory Network for Dialogue understand https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/ https://www.slideshare.net/mobile/carpedm20/ss-63116251
  • 120. (1) Feed data (“Sentences”, “Question”, “Target”) Memory Network for Dialogue understand 1 2 3
  • 121. Convert word index to embedding vector (Training target vector A,B,C) Memory Network for Dialogue understand 1 3 Vocab Size 2 Dim Size vocab size Mem Size
  • 122. Embedding A from given context sentences multiply Input Question Embedding (using embedding B which is not defined on this code) ※ if it’s a first layer, if not it would be output of t-1 layer Memory Network for Dialogue understand 1 2 1 2 multiply
  • 123. Set embedding C(on the code it’s B) this is also the target variable for train Memory Network for Dialogue understand
  • 124. Embedding C(one the code it’s B) Multiply softmax result Memory Network for Dialogue understand
  • 125. For the last multiply question and output of memory network again Memory Network for Dialogue understand
  • 126. stack more memory layers Memory Network for Dialogue understand
  • 127. Memory Network for Dialogue understand Set fully connected layer and calculate error with softmax cross entropy
  • 128. Memory Network for Dialogue understand On the given code I removed 90% of data set because we are using CPU for education.. So result may can be poor…..
  • 129. Memory Network for Dialogue understand bAbi Test Results .. (comparing DMN & MemNN ) https://research.fb.com/downloads/babi/
  • 131. 1.NLP & Deep Learning 2.Language Analysis Process 3.Language Generation 3-1.Basic Seq2Seq 3-2.Other types of Seq2Seq (Attention, Pointer)
  • 132. Response Generator - Seq2Seq Model Seq2Seq 모델은 기계번역, 요약, 간단한 질답 등 말 그대로 Input 과 Output 이 모두 Sequence Data 인 다양한 케이스에 적용이 가능하며, 이를 간단한 트릭을 적용하여 답변을 생성하는 용도로 사용할 수 있다. - Input : 딥 러닝 재미 즐거운 일 - Output : 딥 러닝은 재미있고 즐거운 일이다 https://arxiv.org/pdf/1406.1078.pdf https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow
  • 133. Attention Mechanism Pointer Network https://medium.com/@devnag/pointer-networks-in-tensorflow- with-sample-code-14645063f264 Seq2Seq 의 변형된 형태들… Response Generator - Seq2Seq Model ※ 다음 강의에서 자세히 진행할 예정인 내용으로 상세 내용 생략 http://localhost:8888/tree/chap05_nlp/attention_seq2seq
  • 134. 결국 Natural Language Process 는 "기존 자연어 처리 알고리즘", "Deep Learning" Algorithm” 그리고 각종 “Software Architecture” 의 거대한 Combination Conclusion 기존 자연어 처리 이론 Deep Learning Theory Software Architecture
  • 135. Conclusion 지금까지 이야기한 내용들을 연결하여 하나의 예를 만들어 보자 Web Document Web Crawler Lexical (어휘) Analysis Syntactic (구문) Analysis Semantic (의미) Analysis Ontology Man Filtering information Dialogue (구문) Analysis information Lexical (어휘) Analysis Syntactic (구문) Analysis Semantic (의미) Analysis Dialogue (구문) Analysis Web Server Response Generation IN OUT
  • 136. 4.Tips 4-1.Hyper Parameter Random Search 4-2.Genetic Algorithm 4-3.Using multiple GPU Server
  • 137. Hyper Parameter Optimization Set of graph flow Set of graph flow Set of graph flow Hyper Parm Range ~ Hyper Parameter Random Search Genetic Algorithm Approximation Hyper Parameter 서치를 위한 Genetic Algorithm 에 대한 설명 1 2 3
  • 138. Hyper Parameter Optimization Hyper Parameter Random Search 에 대한 설명 http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf In this more challenging optimization problem random search is still effective, but not 300 RANDOM SEARCH FOR HYPER-PARAMETER OPTIMIZATION superior as it was as in the case of neural network optimization. Comparing to the 3-layer DBN results in Larochelle et al. (2007), random search found a better model than the manual search in one data set (convex), an equally good model in four (mnist basic, mnist rotated, rectangles, and rectangles images), and an inferior model in three (mnist background images, mnist background random, mnist rotated background images).
  • 139. Hyper Parameter Optimization [1Layer] - Grid vs Random [3Layer] - Grid+Manual vs Random
  • 140. Hyper Parameter Optimization Genetic Algorithm on Hyper parameter optimization (Approximation) https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164 Let’s say it takes five minutes to train and evaluate a network on your dataset. And let’s say we have four parameters with five possible settings each. To try them all would take (5**4) * 5 minutes, or 3,125 minutes, or about 52 hours. Now let’s say we use a genetic algorithm to evolve 10 generations with a population of 20 (more on what this means below), with a plan to keep the top 25% plus a few more, so ~8 per generation. This means that in our first generation we score 20 networks (20 * 5 = 100 minutes). Every generation after that only requires around 12 runs, since we don’t have the score the ones we keep. That’s 100 + (9 generations * 5 minutes * 12 networks) = 640 minutes, or 11 hours. https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html use multi gpu cluster servers hyper parameter random search
  • 141. Hyper Parameter Optimization Let’s see how hyperparameter optimization with genetic algorithm works .. . .. http://localhost:8888/tree/chap05_nlp/automl
  • 142. 다음 강의 목표 NLP 관점에서 Deep Learning 을 적용하기 위한 데이터와 모델에 대한 이해를 돕기위한 강의를 진행하였습니다. 다음 시간에는 이러한 재료들을 모아서 아키택쳐 관점에서 응용하고 활용하기 위한 방법들에 대해서 강의하고자 합니다. 감사합니다.