Warnikchow - Babeltop 1901

Human Interface Laboratory
얕은 깊은 NLP 근황토크
2019. 1. 26
Won Ik Cho

Contents
• 연사 소개
• NLP 근황
• Task proposer의 정신승리
• 한국어 NLP의 현재와 미래
• 느낀 점
1

연사 소개
• 조원익
 B.S. in EE/Mathematics (SNU, ’10~’14)
 Ph.D. student (SNU INMC, ‘14~)
• Academic background
 Interested in mathematics >> EE!
 Double major?
• Math is very difficult
• Circuit does not fit me
 Early years in Speech processing lab
• Source separation
• Voice activity & endpoint detection
• Automatic music composition
 Currently studying on computational linguistics
2

Contents
3
얕은 깊은 NLP 근황토크

NLP 근황
• 2017년 겨울의 추억
4

NLP 근황
• ~2018년의 NLP?
5

NLP 근황
• arXiving을 통한 학계 확장
6

NLP 근황
• 시장 수요 증가
7

NLP 근황
• 아시아에서는 아직...?
8

NLP 근황
• Task별로도 상당한 발전
9

NLP 근황
• 떡상하는 conference submission
10

NLP 근황
• NLP와 CL의 온도차
 NLP: algorithm이 메인. 그걸 이용해서 어떤 task를 해결할 것인가?
11

NLP 근황
 CL: Task 정의는 합당한가? linguistic한 성질을 어떻게 실험에 반영할 것인가?
12

NLP 근황
 Review on the same paper
13
This paper introduces "Effective Discourse Components" (EDCs) which are
meant to represent the illocutionary acts associated with sentence
units. They introduce and annotate a 3-way schema using effective
components for common ground, to do list, and question set actions. The
authors define these labels as ones that can be identified on corpora
without punctuation, context, or intonation.
The authors propose new categories to classify the true intent on
dialogues, in particular they propose the concept Effective Discourse
Component. In this proposal the categories are based on the expected
response from the receiver of a utterance: answer, action or neither. The
authors propose an evaluation of the classification performance on the
new set of classification based on two corpus: Cornell movie dialogue
and several dialogue systems corpus.

NLP 근황
• 그럼에도 불구하고 점차 통합되는 분위기?
14

NLP 근황
• Deep NLP 에 대한 다양한 시각
18
(그러나 BERT에 감탄)

Task proposer의 정신승리
• Tasks
 Morphology
• Word segmentation, morphological analysis ...
 Syntax-semantics
• Consistuency parsing, semantic role labeling, pos tagging ...
 Semantics
• Sentence classification (sentiment, intention etc.)
• Question answering, machine translation, summarization ...
 Pragmatics
• Dialog act tagging, dialog managing ...
 Phonetics
• Speech recognition, multimodal speech understanding
19

• SOTA
20

• My research
 Intention의 새로운 정의?
• Rhetoricalness와 discourse component를 고려하여
– 3i4k https://github.com/warnikchow/3i4k
 Intent, argument, slot-filling의 새로운 방식?
• Paraphrasing의 접근법
– sae4k https://github.com/warnikchow/sae4k
 기존의 task에 대해 더 좋은 성능을 내는
알고리즘을 만들기 어렵다면
더 효과적으로 목적을 달성하는
task를 만들면 되잖아?
21

• My research
 또다른 정신승리:
NLP는 language-dependent한 측면을 무시할 수 없다!
이는 speech도 마찬가지임 (vision과의 차이점)
• 한국어의 처리에 좀 더 적합한 theory, feature, model, dataset을
만들 수 있을 것이다?
22

한국어 NLP의 현재와 미래
• ‘한글’과 ‘한국어’
 한글날마다 두 개가 다른 개념인 것이 강조되지만, 만약 오늘날 한국어가 한
글로 되어있지 않았다면? (e.g., 알파벳, 한문)
 영어에 존재하는 여러 task들이 한국어로도 존재하나?
• POS tagging같은 경우는 task 자체에서 차이를 보인다
– POS tagging에 morphological analysis가 선행
– Word segmentation보다는 word spacing의 개념이 더 많이 사용됨
• Eojeol/morpheme으로 접근할 것인가 character/alphabet으로 접근할 것인가?
23

 Semantic/pragmatic한 task들의 접근은 더 쉽지 않다
• 물론 언어구조적인 측면에서
– scrambling language, agglutinative, wh- in-situ ...
• 원어민 화자의 입장에서는 영어 task보다 이해하기 쉬움
• 그러나 한국어 processing이 hell인 것도 팩트
– 초성체, 약어, 비표준어를 포함한 noisy user-generated text의 처리는 ...
24

 어려운 만큼 아직 연구되지 않은 분야도 많다!
• 특히 linguistic한 task에 가까울수록?
• 심화된 syntax/semantics/pragmatics/neurolinguistics와의 결합?
• 영어에는 있지만 한국어에는 데이터가 부족한 task를 제시할 수도 있고
– ex) 최근의 KorQuad? https://korquad.github.io/
• 기존의 task를 해결할 한국어에서의 새로운 방법론을 제시할 수도 있다
– ex) 띄어쓰기: pykospacing, 띄쓰봇 https://github.com/warnikchow/ttuyssubot
25

Reference (order of appearance)
• Yoav Shoham, Raymond Perrault, Erik Brynjolfsson, Jack Clark, James Manyika, Juan Carlos
Niebles, Terah Lyons, John Etchemendy, Barbara Grosz and Zoe Bauer, "The AI Index 2018
Annual Report”, AI Index Steering Committee, Human-Centered AI Initiative, Stanford University,
Stanford, CA, December 2018.
• ACL Conference acceptance rates https://aclweb.org/aclwiki/Conference_acceptance_rates
• Mikolov, Tomas, et al. "Distributed representations of words and phrases and their
compositionality." Advances in neural information processing systems. 2013.
• Sutskever, Ilya, et al. “Sequence to sequence learning with neural networks.” Advances in neural
information processing systems. 2014.
• Cho, Kyunghyun, et al. “Learning Phrase Representations using RNN Encoder-Decoder for
Statistical Machine Translation.” arXiv preprint arXiv:1406.1078. 2014.
• Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint
arXiv:1408.5882. 2014.
• Lin, Zhouhan, et al. "A structured self-attentive sentence embedding." arXiv preprint
arXiv:1703.03130. 2017.
• Transformer, BERT, LISA (Google it for further information!)
26

Thank you!
End_of_presentation

Warnikchow - Babeltop 1901

Recommended

Recommended

More Related Content

More from WarNik Chow

More from WarNik Chow (20)

Recently uploaded

Recently uploaded (7)

Warnikchow - Babeltop 1901

Editor's Notes