SlideShare a Scribd company logo
1 of 132
NLP: a peek into a day
of a computational linguist
Mariana Romanyshyn
Grammarly, Inc.
1. NLP applications in our world
2. What computational linguists do
3. Language levels
4. A closer look at part-of-speech tagging
5. A closer look at syntactic parsing
6. Let’s build something: error correction
2
Contents
3
Disclaimer
1. NLP applications in our world
5
What NLP applications do you know?
• Analysis
• Transformation
• Misc
6
Types of NLP Applications
ANALYSIS
Spam Filtering
…
7
Types of NLP Applications
ANALYSIS
Spam Filtering
Search Engines
…
8
Types of NLP Applications
ANALYSIS
Spam Filtering
Search Engines
Sentiment Analysis
…
9
Types of NLP Applications
Sentiment maps
10
11
It tastes amazing!
It tastes horrible!
It tastes normal.
ABC tastes much better than DEF.
Sentiment Analysis
12
It tastes amazing!
It tastes horrible!
It tastes normal.
ABC tastes much better than DEF.
It tastes like beer!
It tastes interesting!
It tastes like my mom said it would!
If it was served with milk, it would taste great!
Sentiment Analysis
13
“That young girl is one of the least benightedly unintelligent organic
life forms [that] it has been my profound lack of pleasure not to be
able to avoid meeting.”
— Douglas Adams
Terminal cases
14
“That young girl is one of the least benightedly unintelligent organic
life forms [that] it has been my profound lack of pleasure not to be
able to avoid meeting.”
— Douglas Adams
Terminal cases
15
Sentiment Analysis
ANALYSIS
Spam Filtering
Search Engines
Sentiment Analysis
Sarcasm Detection
…
16
Types of NLP Applications
17
Quite interesting
ANALYSIS
Spam Filtering
Search Engines
Sentiment Analysis
Sarcasm Detection
Essay Grading
…
18
Types of NLP Applications
ANALYSIS
Spam Filtering
Search Engines
Sentiment Analysis
Sarcasm Detection
Essay Grading
Good/Evil Characters
…
19
Types of NLP Applications
TRANSFORMATION
Machine Translation
…
20
Types of NLP Applications
Transformations in MT
21
TRANSFORMATION
Machine Translation
Error Correction
…
22
Types of NLP Applications
GEC should be smart
23
TRANSFORMATION
Machine Translation
Error Correction
Speech to Text / Text to Speech
…
24
Types of NLP Applications
TRANSFORMATION
Machine Translation
Error Correction
Speech to Text / Text to Speech
Question Answering
...
25
Types of NLP Applications
TRANSFORMATION
Machine Translation
Error Correction
Speech to Text / Text to Speech
Question Answering
Text Summarization
...
26
Types of NLP Applications
MISC
News reports generation
…
27
Types of NLP Applications
MISC
News reports generation
Conversational Agents
…
28
Types of NLP Applications
“I remember the first time we loaded these data sources into Siri.
I typed “start over” into the system, and Siri came back saying,
“Looking for businesses named ‘Over’ in Start, Louisiana.”
— Adam Cheyer
29
Siri
30
The story of Tay
MISC
News reports generation
Conversational Agents
Language learning
…
31
Types of NLP Applications
32
Duolingo
33
Duolingo
MISC
News & weather reports generation
Conversational Agents
Language learning
Story Cloze Task
…
34
Types of NLP Applications
Tom and Sheryl have been together for two years. One day, they
went to a carnival. Tom won Sheryl several stuffed bears. When
they reached the Ferris wheel, he got down on one knee.
Which ending is more probable?
• Tom asked Sheryl to marry him.
• He wiped mud off of his boot.
35
Story Cloze
2. What computational linguists do
37
38
39
Just FYI
3. Language levels
“Noam-enclature” and the structural linguistics
41
Language Levels
1) Language has a structure
2) Language is a system of signs
42
Units of language levels
Written text
?
Written text
Paragraph
Sentence
Word
Morpheme
Letter
43
Units of language levels
How do we split...
• text into paragraph?
44
Splitting problems
45
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
46
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.
• sentence into words?
47
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.
• sentence into words?
computer-aided, the d.t.s, San Francisco, 3$B deal
• word into morphemes?
48
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.
• sentence into words?
computer-aided, the d.t.s, San Francisco, 3$B deal
• word into morphemes?
misadventure
mislead
mistake - ?
49
Features
Quantitative features:
• number of sentences, words, words per sentence, etc.
• size and arrangement of paragraphs
• word length
• word position in a sentence
• number of syllables in a word
• ratio of vowels vs consonants
• depth of the word in the dependency tree of the sentence
• number of word senses
• ngrams
50
Ngrams
Sequences of elements and their frequencies:
• unigrams, bigrams, 3-grams, 4-grams, … n-grams
• at different language levels
– token ngrams:
• ("handsome”, ”man"): 160,000 ("pretty”, ”man"): 5,000
– character ngrams
• “st”: 14,000; “ct”: 4,000; “str”: 1,500; “ctr”: 50; “stra”: 400; “ctra”: 0
• adding grammar
51
Features
Grammatical features:
• POS tag
• morphemes: affixes, roots, endings
• constituency spans
• dependency relations
• coreference
• grammatical characteristics of various parts of speech:
– countability of nouns
– tense of verbs
– degree of comparison of adjectives
– pronoun type
– connector type
52
Features
Spelling features:
• capitalized word?
• hyphenated word?
• compound word?
Lexical-semantic features:
• WordNet
• VerbNet
• dictionaries and thesauri
• word embeddings
• modality of verbs
4. A closer look at part-of-speech
tagging
Goal: categorize words by their functions.
English:
• notional: noun, verb, adjective, adverb, pronoun (?), numeral (?)
• functional: determiner, preposition, conjunction, particle, and
interjection
54
POS: recap
Wow, two hungry cats chased down the mouse to the corner
and quickly ate it!
55
POS: practice
All you need is love . Love is all
at the way you love me all the time
. And never mind that noise you heard .
fire and of things that will bite , yeah
було так давно , коли в руках тримаю цей
Просто налийте трохи коли на пошкоджену ділянку .
ударом . Я хочу мати всьо , і всьо на
а на полі спозаранку мати жито жала , та
56
POS: more practice
Time flies like an arrow.
I saw her duck with a telescope.
She is calculating.
We watched an Indian dance.
They can fish.
More lies ahead...
Це мало мало значення.
Коло друзів та незнайомців.
57
POS: impossible cases
Time flies[Verb/Noun] like[Preposition/Verb] an arrow.
I saw her duck[Verb/Noun] with a telescope.
She is calculating[Verb/Adjective].
We watched an Indian[Adjective/Noun] dance.
They can[Modal Verb/Verb] fish[Verb/Noun].
More lies[Verb/Noun] ahead...
Це мало[Дієслово/Прислівник] мало[Дієслово/Прислівник] значення.
Коло[Іменник/Прийменник] друзів та незнайомців.
58
POS: impossible cases
59
What POS should gotta be?
I gotta tell you something.
I’ve gotta fix that thingy for her, Jack.
So, she gotta this gorgeous dress.
So, she gotta gun.
60
POS: disputable cases
What POS should gotta be?
I gotta[modal verb] tell you something.
I’ve gotta[verb, 3rd form] fix that thingy for her, Jack.
So, she gotta[verb, 2nd form] this gorgeous dress.
So, she gotta[verb, 2nd form] gun.
61
POS: disputable cases
62
If you don’t know,
how would the machine know?
63
So, what do we do?
Penn Treebank tagset:
• noun: NN, NNS, NNP, NNPS
• verb: VB, VBP, VBZ, VBG, VBD, VBN, MD
• adjective: JJ, JJR, JJS
• adverb: RB, RBR, RBS
• preposition and sub. conjunction: IN
• pronoun: PRP, PRP$
• determiner: DT
• numeral: CD
• particle: RP, TO
• interjection: UH
• coord. conjunction: CC
• wh-words: WDT, WP, WP$, WRB
• more: PDT, POS, SYM, FW, EX, LS, $, |,|, |.|, |:|, |''|, |``|, -RRB-, -LRB-
64
POS: tagsets
Very_RB peculiar_JJ retribution_NN indeed_RB seems_VBZ to_TO overtake_VB such_JJ
jokers_NNS ._.
Have_VBP you_PRP ever_RB heard_VBN of_IN Thuggee_NNP ?_.
Sort_NN of_IN remorseless_JJ ,_, is_VBZ n't_RB it_PRP ?_.
In_IN short_JJ ,_, and_CC to_TO borrow_VB an_DT arboreal_JJ phrase_NN ,_,
slash_VB timber_NN ._.
As_IN you_PRP can_MD count_VB on_IN me_PRP to_TO do_VB the_DT same_JJ ._.
Compassionately_RB yours_PRP ,_, S.J._NNP Perelman_NNP
We_PRP caught_VBD the_DT early_JJ train_NN to_IN New_NNP York_NNP ._.
Petite_JJ ,_, lovely_JJ Yvette_NNP Chadroe_NNP plays_VBZ the_DT
nymphomaniac_NN engagingly_RB ._.
He_PRP looked_VBD so_RB comfortable_JJ being_VBG straight_JJ ._.
They_PRP wanted_VBD to_TO touch_VB the_DT mystery_NN ._.
...
65
POS: corpora
• Use a classifier to tag each word independently
• Features
– left/right context: words, POS tags, words + POS tags
– probability of word + POS tag
– additional:
• possible tags for the word
• morphological characteristics (tense, plurality, degree of comparison)
• the word’s spelling (suffixes, capitalization, hyphenation)
Input: Chewie[NNP] ,[,] we[PRP] 're[VBP] home[NN/RB] - ? .[.]
Output: RB
66
POS: Classification
• Map the sentence to the most probable POS tag sequence
• Features
– left/right context: words, POS tags, words + POS tags
– probability of word + POS tag
– additional:
• possible tags for the word
• morphological characteristics (tense, plurality, degree of comparison)
• the word’s spelling (suffixes, capitalization, hyphenation)
Input: Chewie , we 're home .
Output: NNP , PRP VBP RB .
67
POS: Sequence Labelling
Notation:
• V - vocabulary
• T - POS tags
• x - sentence (observation)
• y - tag sequences (state)
• S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
– n > 0
– xi ∈ V
– yi ∈ T
68
Hidden Markov Models
S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
x: Chewie , we 're home .
y: NNP , PRP VBP RB .
NN , PRP VBP RB .
NNP , PRP VBP NN .
NN , PRP VBP NN .
…
Aim: find {x1 . . . xn, y1 . . . yn} with the highest probability.
69
Hidden Markov Models
• Markov Assumption: "The future is independent of the past
given the present."
– Trigram HMM: each state depends only on the previous two states
in the sequence
• Independence assumption:
– the state of xi depends only on the value of yi, independent of the
previous observations and states
70
HMM: assumptions
S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
x: Chewie , we 're home .
y: NNP , PRP VBP RB .
NN , PRP VBP RB .
NNP , PRP VBP NN .
NN , PRP VBP NN .
...
71
HMM: assumptions
• q(s|u, v) - the probability of tag s after the tags (u, v)
– s, u, v ∈ T
• e(x|s) - the probability of observation x paired with state s
– x ∈ V, s ∈ T
72
Trigram HMM: parameters
• q(s|u, v) - the probability of tag s after the tags (u, v)
– s, u, v ∈ T
• e(x|s) - the probability of observation x paired with state s
– x ∈ V, s ∈ T
73
Trigram HMM: parameters
74
For example
x: Chewie , we 're home .
y: NNP , PRP VBP RB .
How do we get p(x, y)?
75
For example
x: Chewie , we 're home .
y: NNP , PRP VBP RB .
p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) *
c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) *
c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) *
c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) *
c(RB->home)/c(RB) * c(|.|->.)/c(|.|)
76
One thing missing
x: Chewie , we 're home .
y: <S> <S> NNP , PRP VBP RB . </S>
p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) *
c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) *
c(<S>,<S>,NNP)/c(<S>,<S>) * c(<S>,NNP,|,|)/c(<S>,NNP) *
c(RB,|.|,</S>)/c(RB,|.|) *
c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) *
c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) *
c(RB->home)/c(RB) * c(|.|->.)/c(|.|)
Enumerating all possible tag sequences is not feasible — Tn.
E.g.:
44 tags ** 6-token sentence = 7,256,313,856 tag sequences
Ideas:
• use dynamic programming (the Viterbi algorithm)
• limit the number of candidates with a dictionary
77
HMM: problem 1
78
HMM: the Viterbi algorithm
Idea: remember decisions on the way — n*T3.
x: Chewie , we 're home .
y: <S> <S> NN , RB NNP VBP . </S>
NNP , CD WP VB .
NNS , EX PRP$ RB .
NNPS , CC VBP NN .
JJ , IN PRP JJ .
JJR , NNP JJS TO .
RRB , PRP RBS RP .
VBZ , LS CD IN .
...
79
HMM: with dictionary
Idea: use a dictionary — n*83. (Worst case is still n*T3.)
x: Chewie , we 're home .
y: <S> <S> NNP , PRP VBP VB . </S>
NN VBP
RB
NN
Zero probabilities can occur because of OOV or rare words.
Idea: use smoothing!
• add-1: pretend you saw each word one more time
(P.S. It’s usually a horrible choice, but we’ll use it today. Don’t tell anyone.)
• Good-Turing: reallocate the probability of n-grams that occur
r+1 times to the n-grams that occur r times
• Kneser-Ney: when the bigram count is near 0, rely on unigram
• ...
80
HMM: problem 2
81
Implementation
https://github.com/mariana-scorp/one-day-with-cling
Conclusion
82
“Data is ten times more powerful
than algorithms.”
— Peter Norvig
The Unreasonable Effectiveness of Data
http://youtu.be/yvDCzhbjYWs
5. A closer look at syntactic
parsing
Goal: categorize sentence parts by their functions and define dependencies.
Sentence:
• main clause
• subordinate clause
Clause:
• subject
• predicate
• direct/indirect/prepositional object
• modifier
• complement 84
Syntax: recap
Sentence:
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
85
Syntax: practice
Sentence:
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
Clauses:
• [[you] want [to receive [e-mails about my upcoming shows]]]
• [please give [me] [money]]
• [[I] can buy [a computer]]
86
Syntax: practice
Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he
didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird
with long blue tail feathers. 87
Syntax: the subject
Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he
didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird
with long blue tail feathers. 88
Syntax: the subject
Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he
didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird
with long blue tail feathers. 89
Syntax: the subject
Identify the role of the infinitive:
• The two politicians failed [to communicate].
• What we've got here is a failure [to communicate].
• [To be idle] is a short road to death, and [to be diligent] is a way of
life.
• [To become extroverted], you need to go out and socialize.
• You have [to be able [to actually quote the line]] for it [to be a
memorable quote].
90
Syntax: the infinitives
91
How do we formalize the syntactic structure?
92
Answer:
Types:
• constituency tree
– every token is a part of some phrase constituent (parent node)
– includes terminal and non-terminal nodes
– shows relations among the constituents
• dependency tree
– for every token, there is one node
– includes only terminal nodes
– shows relations among words
93
Syntactic Trees (or Parse Trees)
If you want to receive e-mails about my upcoming shows, then please give me
money so I can buy a computer.
94
Constituency Tree
If you want to receive e-mails about my upcoming shows, then please give me
money so I can buy a computer.
95
Constituency Tree
96
Constituency Treebank
(TOP (S (NP (ADJP (RB Very) (JJ peculiar)) (NN retribution)) (ADVP (RB indeed)) (VP (VBZ
seems) (S (VP (TO to) (VP (VB overtake) (NP (JJ such) (NNS jokers)))))) (. .)))
(TOP (SQ (VBP Have) (NP (PRP you)) (ADVP (RB ever)) (VP (VBN heard) (PP (IN of) (NP
(NNP Thuggee)))) (. ?)))
(TOP (UCP (ADJP (ADVP (NN Sort) (IN of)) (JJ remorseless)) (, ,) (SQ (VBZ is) (RB n't) (NP
(PRP it))) (. ?)))
(TOP (SBAR (IN As) (S (NP (PRP you)) (VP (MD can) (VP (VB count) (PP (IN on) (NP (PRP
me))) (S (VP (TO to) (VP (VB do) (NP (DT the) (JJ same)))))))) (. .)))
(TOP (FRAG (ADJP (RB Compassionately) (PRP yours)) (, ,) (NP (NNP S.J.) (NNP
Perelman))))
(TOP (S (NP (PRP We)) (VP (VBD caught) (NP (NP (DT the) (JJ early) (NN train)) (PP (IN
to) (NP (NNP New) (NNP York))))) (. .)))
(TOP (S (NP (JJ Petite) (, ,) (JJ lovely) (NNP Yvette) (NNP Chadroe)) (VP (VBZ plays) (NP
(DT the) (NN nymphomaniac)) (ADVP (RB engagingly))) (. .)))
...
Penn Treebank tagset:
• top level: TOP
• sentence: S, SBAR, SQ, SBARQ, SINV
• fragment: FRAG
• noun phrase: NP
• verb phrase: VP
• prepositional phrase: PP
• adjectival phrase: ADJP
• adverbial phrase: ADVP
• compound conjunction: CONJP
• wh-phrases: WHNP, WHPP, WHADJP, WHADVP
• more: LST, PRT, INTJ, NAC, PRN, QP, RRC, UCP, X
97
Constituency Labels
• Algorithms:
– top-down
– chart
– bottom-up
• Features include:
– grammar (a.k.a. transitions)
– spans of nodes
– labels
– right/left/right and left context
– split point, etc.
• Weights are trained on the treebank.
98
Constituency Parsing
99
Shift-reduce constituency parsing
• Data
– queue: the words of the sentence
– stack: partially completed trees
• Actions
– shift: move the word from the queue onto the stack
– reduce: add a new label on top of the first n constituents on
the stack
100
Demo
101
Syntax: impossible cases
Most cats and dogs with fleas live in the neighbourhood.
102
Syntax: impossible cases
Most cats and dogs with fleas live in the neighbourhood.
103
Syntax: impossible cases
Wanted: a nurse for a baby about twenty years old.
104
Syntax: impossible cases
Wanted: a nurse for a baby about twenty years old.
105
Syntax: impossible cases
I shot an elephant in my pajamas.
106
Syntax: impossible cases
I shot an elephant in my pajamas.
107
Syntax: impossible cases
I once saw a deer riding my bicycle.
108
Syntax: impossible cases
I once saw a deer riding my bicycle.
109
Syntax: impossible cases
I’m glad I’m a man, and so is Lola.
110
Syntax: impossible cases
I’m glad I’m a man, and so is Lola.
Types:
• constituency tree
– every token is a part of some phrase constituent (parent node)
– includes terminal and non-terminal nodes
– shows relations among the constituents
• dependency tree
– for every token, there is one node
– includes only terminal nodes
– shows relations among words
111
Syntactic Trees (or Parse Trees)
Universal dependencies:
• subject: NSUBJ, NSUBJPASS, CSUBJ, CSUBJPASS
• object: DATIVE, DOBJ, AGENT, OPRD
• complement: ACOMP, CCOMP, XCOMP, PCOMP
• auxiliary: AUX, AUXPASS
• clausal modifier: ACL, ADVCL, RELCL
• different modifier: ADVMOD, NPADVMOD, AMOD, COMPOUND, NEG, NUMMOD,
QUANTMOD
• determiner: DET, PREDET
• apposition: APPOS
• coordinating conjunction and conjuct: CC, CONJ
• prepositional modifier and its object: PREP, POBJ
• more: POSS, CASE, DEP, EXPL, INTJ, MARK, PRECONJ, PRT, PUNCT, PARATAXIS
112
Dependency Relations
113
Dependency Tree
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
114
Dependency Tree
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
• Graph-Based Parsing
– find the highest score tree from a complete graph
– slow, but performs better on long-distance dependencies
– e.g., MSTParser
• Transition-Based Parsing
– apply transition actions one by one
– faster, but performs better on short-distance dependencies
– e.g., MaltParser, the Stanford Parser, ZPAR
115
Algorithms
116
Graph-Based Parsing
• Data
– queue: the words of the sentence
– stack: partially completed trees
• Actions:
– shift: move the word from the queue onto the stack
– reduce: pop the stack, removing only its top item, as long as that
item has a head
– right-arc: create a right dependency arc between the word on top of
the stack and the next token in the queue
– left-arc: create a left dependency arc between the word on top of
the stack and the next token in the queue 117
Transition-Based Parsing
118
Demo
Features
119
120
Implementation
https://github.com/mariana-scorp/one-day-with-cling
121
Conclusion
122
Syntax: impossible cases
We eat pizza with anchovy.
123
Syntax: impossible cases
We eat pizza with anchovy.
124
Syntax: impossible cases
Насильство твій макіяж не приховає!
125
Syntax: impossible cases
Насильство твій макіяж не приховає!
6. Let’s build something:
error correction
We likes pizza with anchovy.
Children like and cherishes her kindness and cooking skills.
Some is watching the way she knits and loving it.
Colorless green ideas sleeps furiously.
Barry and Mary, whom I met at the New Year 's party, is just
the cutest people.
There is two cats and a dog.
127
Subject-verb disagreement
Text processing: tokenization, POS tagging, syntactic parsing, etc.
Detection: find a VBZ
Rules: if the verb has nsubj relation and the subject does not
have a conjunct, we should correct it…
Correction: use a dictionary of transformations
128
Rule-based Toy Solution
Text processing: tokenization, POS tagging, syntactic parsing, etc.
Detection: find a VBZ
Classifier + features: POS tag of the subject, does the subject
have a conjunct...
Correction: use a dictionary of transformations
129
ML-based Toy Solution
130
Implementation
github.com/mariana-scorp/one-day-with-cling
131
Presenter:
Mariana Romanyshyn
mariana.romanyshyn@grammarly.com
With the help of:
Oksana Kunikevych
oksana.kunikevych@grammarly.com
Khrystyna Skopyk
khrystyna.skopyk@grammarly.com
Tetiana Myronivska
tetiana.myronivska@grammarly.com
Tetiana Turchyn
tetiana.turchyn@grammarly.com
Contact us
132

More Related Content

Viewers also liked

Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
Sub Prime in simple words :)
Sub Prime in simple words :)Sub Prime in simple words :)
Sub Prime in simple words :)Arun Prabhudesai
 
Understanding the Sub-Prime Consumer in the Digital Space
Understanding the  Sub-Prime Consumer in the Digital SpaceUnderstanding the  Sub-Prime Consumer in the Digital Space
Understanding the Sub-Prime Consumer in the Digital Spacekarinabradley
 
Big data analysis in java world
Big data analysis in java worldBig data analysis in java world
Big data analysis in java worldSerg Masyutin
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projectsDmitriy Dumanskiy
 
Strategies for answering open end questions
Strategies for answering open end questionsStrategies for answering open end questions
Strategies for answering open end questionsFirefly Ludious
 
Social Media & Philanthropy
Social Media & PhilanthropySocial Media & Philanthropy
Social Media & PhilanthropyAmy Sample Ward
 
Words Of Wisdom From Robin Sharma
Words Of Wisdom From Robin SharmaWords Of Wisdom From Robin Sharma
Words Of Wisdom From Robin SharmaShreya Sethi
 
Sub prime & eurozone crisis
Sub prime & eurozone crisisSub prime & eurozone crisis
Sub prime & eurozone crisisSiddhant Agarwal
 
Super productivity the art & science of getting things done - slide deck ve...
Super productivity   the art & science of getting things done - slide deck ve...Super productivity   the art & science of getting things done - slide deck ve...
Super productivity the art & science of getting things done - slide deck ve...Neeraj Shah
 
Nick Vujicic 7 Keys To Success
Nick Vujicic 7 Keys To SuccessNick Vujicic 7 Keys To Success
Nick Vujicic 7 Keys To SuccessSofia Naznim
 
Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...
Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...
Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...CMA_Slides
 
Social Media for Youth Leaders, May 2014, for @c_of_e
Social Media for Youth Leaders, May 2014, for @c_of_eSocial Media for Youth Leaders, May 2014, for @c_of_e
Social Media for Youth Leaders, May 2014, for @c_of_eBex Lewis
 

Viewers also liked (17)

Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Sub Prime in simple words :)
Sub Prime in simple words :)Sub Prime in simple words :)
Sub Prime in simple words :)
 
Understanding the Sub-Prime Consumer in the Digital Space
Understanding the  Sub-Prime Consumer in the Digital SpaceUnderstanding the  Sub-Prime Consumer in the Digital Space
Understanding the Sub-Prime Consumer in the Digital Space
 
Big data analysis in java world
Big data analysis in java worldBig data analysis in java world
Big data analysis in java world
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Strategies for answering open end questions
Strategies for answering open end questionsStrategies for answering open end questions
Strategies for answering open end questions
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Social Media & Philanthropy
Social Media & PhilanthropySocial Media & Philanthropy
Social Media & Philanthropy
 
Words Of Wisdom From Robin Sharma
Words Of Wisdom From Robin SharmaWords Of Wisdom From Robin Sharma
Words Of Wisdom From Robin Sharma
 
React. Flux. Redux
React. Flux. ReduxReact. Flux. Redux
React. Flux. Redux
 
Sub prime & eurozone crisis
Sub prime & eurozone crisisSub prime & eurozone crisis
Sub prime & eurozone crisis
 
Super productivity the art & science of getting things done - slide deck ve...
Super productivity   the art & science of getting things done - slide deck ve...Super productivity   the art & science of getting things done - slide deck ve...
Super productivity the art & science of getting things done - slide deck ve...
 
ROBIN SHARMA CV
ROBIN SHARMA CVROBIN SHARMA CV
ROBIN SHARMA CV
 
Nick Vujicic 7 Keys To Success
Nick Vujicic 7 Keys To SuccessNick Vujicic 7 Keys To Success
Nick Vujicic 7 Keys To Success
 
Marionette talk 2016
Marionette talk 2016Marionette talk 2016
Marionette talk 2016
 
Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...
Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...
Ownership and Economics of SSDAB - SSDAB & Community Radio: Past, Present, & ...
 
Social Media for Youth Leaders, May 2014, for @c_of_e
Social Media for Youth Leaders, May 2014, for @c_of_eSocial Media for Youth Leaders, May 2014, for @c_of_e
Social Media for Youth Leaders, May 2014, for @c_of_e
 

Similar to NLP: a peek into a day of a computational linguist

Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Language Acquisition: How the Human Mind Creates and Acquires Language
Language Acquisition: How the Human Mind Creates and Acquires LanguageLanguage Acquisition: How the Human Mind Creates and Acquires Language
Language Acquisition: How the Human Mind Creates and Acquires LanguageJDMLS
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
2015 powerpoint latest workshop power point
2015 powerpoint latest workshop power point2015 powerpoint latest workshop power point
2015 powerpoint latest workshop power pointBlaine Ray
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docxarnoldmeredith47041
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docxdennisa15
 
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.pptAvneeshKumar164042
 
Writing Technology & Claro
Writing Technology & ClaroWriting Technology & Claro
Writing Technology & Claroiansyst
 
What English Do University Students Really Need
What English Do University Students Really NeedWhat English Do University Students Really Need
What English Do University Students Really NeedHala Nur
 
2.4 marven of the great north woods
2.4 marven of the great north woods2.4 marven of the great north woods
2.4 marven of the great north woodsmrstwalker2011
 
vdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.pptvdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.pptSyedNadeemAbbas6
 
American english file 2 student book by clive oxenden christina latham koenig...
American english file 2 student book by clive oxenden christina latham koenig...American english file 2 student book by clive oxenden christina latham koenig...
American english file 2 student book by clive oxenden christina latham koenig...RodrigoConceioDobler
 
Marriage of speech, vision and natural language processing
Marriage of speech, vision and natural language processingMarriage of speech, vision and natural language processing
Marriage of speech, vision and natural language processingYaman Kumar
 

Similar to NLP: a peek into a day of a computational linguist (20)

EmoGraph for Age and Gender Identification
EmoGraph for Age and Gender IdentificationEmoGraph for Age and Gender Identification
EmoGraph for Age and Gender Identification
 
Context clues
Context cluesContext clues
Context clues
 
Context clues
Context cluesContext clues
Context clues
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Language Acquisition: How the Human Mind Creates and Acquires Language
Language Acquisition: How the Human Mind Creates and Acquires LanguageLanguage Acquisition: How the Human Mind Creates and Acquires Language
Language Acquisition: How the Human Mind Creates and Acquires Language
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
2015 powerpoint latest workshop power point
2015 powerpoint latest workshop power point2015 powerpoint latest workshop power point
2015 powerpoint latest workshop power point
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docx
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docx
 
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
13-Ling-21---Lecture-12b---Language-Thought-and-Culture.ppt
 
Writing Technology & Claro
Writing Technology & ClaroWriting Technology & Claro
Writing Technology & Claro
 
What English Do University Students Really Need
What English Do University Students Really NeedWhat English Do University Students Really Need
What English Do University Students Really Need
 
2.4 marven of the great north woods
2.4 marven of the great north woods2.4 marven of the great north woods
2.4 marven of the great north woods
 
NLP 1.pptx
NLP 1.pptxNLP 1.pptx
NLP 1.pptx
 
vdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.pptvdocuments.mx_language-comprehension.ppt
vdocuments.mx_language-comprehension.ppt
 
Basic English for PT3
Basic English for PT3Basic English for PT3
Basic English for PT3
 
American english file 2 student book by clive oxenden christina latham koenig...
American english file 2 student book by clive oxenden christina latham koenig...American english file 2 student book by clive oxenden christina latham koenig...
American english file 2 student book by clive oxenden christina latham koenig...
 
Wh question words
Wh question wordsWh question words
Wh question words
 
Marriage of speech, vision and natural language processing
Marriage of speech, vision and natural language processingMarriage of speech, vision and natural language processing
Marriage of speech, vision and natural language processing
 

Recently uploaded

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

NLP: a peek into a day of a computational linguist

  • 1. NLP: a peek into a day of a computational linguist Mariana Romanyshyn Grammarly, Inc.
  • 2. 1. NLP applications in our world 2. What computational linguists do 3. Language levels 4. A closer look at part-of-speech tagging 5. A closer look at syntactic parsing 6. Let’s build something: error correction 2 Contents
  • 4. 1. NLP applications in our world
  • 5. 5 What NLP applications do you know?
  • 6. • Analysis • Transformation • Misc 6 Types of NLP Applications
  • 9. ANALYSIS Spam Filtering Search Engines Sentiment Analysis … 9 Types of NLP Applications
  • 11. 11 It tastes amazing! It tastes horrible! It tastes normal. ABC tastes much better than DEF. Sentiment Analysis
  • 12. 12 It tastes amazing! It tastes horrible! It tastes normal. ABC tastes much better than DEF. It tastes like beer! It tastes interesting! It tastes like my mom said it would! If it was served with milk, it would taste great! Sentiment Analysis
  • 13. 13 “That young girl is one of the least benightedly unintelligent organic life forms [that] it has been my profound lack of pleasure not to be able to avoid meeting.” — Douglas Adams Terminal cases
  • 14. 14 “That young girl is one of the least benightedly unintelligent organic life forms [that] it has been my profound lack of pleasure not to be able to avoid meeting.” — Douglas Adams Terminal cases
  • 16. ANALYSIS Spam Filtering Search Engines Sentiment Analysis Sarcasm Detection … 16 Types of NLP Applications
  • 18. ANALYSIS Spam Filtering Search Engines Sentiment Analysis Sarcasm Detection Essay Grading … 18 Types of NLP Applications
  • 19. ANALYSIS Spam Filtering Search Engines Sentiment Analysis Sarcasm Detection Essay Grading Good/Evil Characters … 19 Types of NLP Applications
  • 23. GEC should be smart 23
  • 24. TRANSFORMATION Machine Translation Error Correction Speech to Text / Text to Speech … 24 Types of NLP Applications
  • 25. TRANSFORMATION Machine Translation Error Correction Speech to Text / Text to Speech Question Answering ... 25 Types of NLP Applications
  • 26. TRANSFORMATION Machine Translation Error Correction Speech to Text / Text to Speech Question Answering Text Summarization ... 26 Types of NLP Applications
  • 28. MISC News reports generation Conversational Agents … 28 Types of NLP Applications
  • 29. “I remember the first time we loaded these data sources into Siri. I typed “start over” into the system, and Siri came back saying, “Looking for businesses named ‘Over’ in Start, Louisiana.” — Adam Cheyer 29 Siri
  • 31. MISC News reports generation Conversational Agents Language learning … 31 Types of NLP Applications
  • 34. MISC News & weather reports generation Conversational Agents Language learning Story Cloze Task … 34 Types of NLP Applications
  • 35. Tom and Sheryl have been together for two years. One day, they went to a carnival. Tom won Sheryl several stuffed bears. When they reached the Ferris wheel, he got down on one knee. Which ending is more probable? • Tom asked Sheryl to marry him. • He wiped mud off of his boot. 35 Story Cloze
  • 36. 2. What computational linguists do
  • 37. 37
  • 38. 38
  • 41. “Noam-enclature” and the structural linguistics 41 Language Levels 1) Language has a structure 2) Language is a system of signs
  • 42. 42 Units of language levels Written text ?
  • 44. How do we split... • text into paragraph? 44 Splitting problems
  • 45. 45 Splitting problems How do we split... • text into paragraph? bullet points, word wrapping • paragraph into sentences?
  • 46. 46 Splitting problems How do we split... • text into paragraph? bullet points, word wrapping • paragraph into sentences? Dr. Jones lectures at U.C.L.A. • sentence into words?
  • 47. 47 Splitting problems How do we split... • text into paragraph? bullet points, word wrapping • paragraph into sentences? Dr. Jones lectures at U.C.L.A. • sentence into words? computer-aided, the d.t.s, San Francisco, 3$B deal • word into morphemes?
  • 48. 48 Splitting problems How do we split... • text into paragraph? bullet points, word wrapping • paragraph into sentences? Dr. Jones lectures at U.C.L.A. • sentence into words? computer-aided, the d.t.s, San Francisco, 3$B deal • word into morphemes? misadventure mislead mistake - ?
  • 49. 49 Features Quantitative features: • number of sentences, words, words per sentence, etc. • size and arrangement of paragraphs • word length • word position in a sentence • number of syllables in a word • ratio of vowels vs consonants • depth of the word in the dependency tree of the sentence • number of word senses • ngrams
  • 50. 50 Ngrams Sequences of elements and their frequencies: • unigrams, bigrams, 3-grams, 4-grams, … n-grams • at different language levels – token ngrams: • ("handsome”, ”man"): 160,000 ("pretty”, ”man"): 5,000 – character ngrams • “st”: 14,000; “ct”: 4,000; “str”: 1,500; “ctr”: 50; “stra”: 400; “ctra”: 0 • adding grammar
  • 51. 51 Features Grammatical features: • POS tag • morphemes: affixes, roots, endings • constituency spans • dependency relations • coreference • grammatical characteristics of various parts of speech: – countability of nouns – tense of verbs – degree of comparison of adjectives – pronoun type – connector type
  • 52. 52 Features Spelling features: • capitalized word? • hyphenated word? • compound word? Lexical-semantic features: • WordNet • VerbNet • dictionaries and thesauri • word embeddings • modality of verbs
  • 53. 4. A closer look at part-of-speech tagging
  • 54. Goal: categorize words by their functions. English: • notional: noun, verb, adjective, adverb, pronoun (?), numeral (?) • functional: determiner, preposition, conjunction, particle, and interjection 54 POS: recap
  • 55. Wow, two hungry cats chased down the mouse to the corner and quickly ate it! 55 POS: practice
  • 56. All you need is love . Love is all at the way you love me all the time . And never mind that noise you heard . fire and of things that will bite , yeah було так давно , коли в руках тримаю цей Просто налийте трохи коли на пошкоджену ділянку . ударом . Я хочу мати всьо , і всьо на а на полі спозаранку мати жито жала , та 56 POS: more practice
  • 57. Time flies like an arrow. I saw her duck with a telescope. She is calculating. We watched an Indian dance. They can fish. More lies ahead... Це мало мало значення. Коло друзів та незнайомців. 57 POS: impossible cases
  • 58. Time flies[Verb/Noun] like[Preposition/Verb] an arrow. I saw her duck[Verb/Noun] with a telescope. She is calculating[Verb/Adjective]. We watched an Indian[Adjective/Noun] dance. They can[Modal Verb/Verb] fish[Verb/Noun]. More lies[Verb/Noun] ahead... Це мало[Дієслово/Прислівник] мало[Дієслово/Прислівник] значення. Коло[Іменник/Прийменник] друзів та незнайомців. 58 POS: impossible cases
  • 59. 59
  • 60. What POS should gotta be? I gotta tell you something. I’ve gotta fix that thingy for her, Jack. So, she gotta this gorgeous dress. So, she gotta gun. 60 POS: disputable cases
  • 61. What POS should gotta be? I gotta[modal verb] tell you something. I’ve gotta[verb, 3rd form] fix that thingy for her, Jack. So, she gotta[verb, 2nd form] this gorgeous dress. So, she gotta[verb, 2nd form] gun. 61 POS: disputable cases
  • 62. 62 If you don’t know, how would the machine know?
  • 63. 63 So, what do we do?
  • 64. Penn Treebank tagset: • noun: NN, NNS, NNP, NNPS • verb: VB, VBP, VBZ, VBG, VBD, VBN, MD • adjective: JJ, JJR, JJS • adverb: RB, RBR, RBS • preposition and sub. conjunction: IN • pronoun: PRP, PRP$ • determiner: DT • numeral: CD • particle: RP, TO • interjection: UH • coord. conjunction: CC • wh-words: WDT, WP, WP$, WRB • more: PDT, POS, SYM, FW, EX, LS, $, |,|, |.|, |:|, |''|, |``|, -RRB-, -LRB- 64 POS: tagsets
  • 65. Very_RB peculiar_JJ retribution_NN indeed_RB seems_VBZ to_TO overtake_VB such_JJ jokers_NNS ._. Have_VBP you_PRP ever_RB heard_VBN of_IN Thuggee_NNP ?_. Sort_NN of_IN remorseless_JJ ,_, is_VBZ n't_RB it_PRP ?_. In_IN short_JJ ,_, and_CC to_TO borrow_VB an_DT arboreal_JJ phrase_NN ,_, slash_VB timber_NN ._. As_IN you_PRP can_MD count_VB on_IN me_PRP to_TO do_VB the_DT same_JJ ._. Compassionately_RB yours_PRP ,_, S.J._NNP Perelman_NNP We_PRP caught_VBD the_DT early_JJ train_NN to_IN New_NNP York_NNP ._. Petite_JJ ,_, lovely_JJ Yvette_NNP Chadroe_NNP plays_VBZ the_DT nymphomaniac_NN engagingly_RB ._. He_PRP looked_VBD so_RB comfortable_JJ being_VBG straight_JJ ._. They_PRP wanted_VBD to_TO touch_VB the_DT mystery_NN ._. ... 65 POS: corpora
  • 66. • Use a classifier to tag each word independently • Features – left/right context: words, POS tags, words + POS tags – probability of word + POS tag – additional: • possible tags for the word • morphological characteristics (tense, plurality, degree of comparison) • the word’s spelling (suffixes, capitalization, hyphenation) Input: Chewie[NNP] ,[,] we[PRP] 're[VBP] home[NN/RB] - ? .[.] Output: RB 66 POS: Classification
  • 67. • Map the sentence to the most probable POS tag sequence • Features – left/right context: words, POS tags, words + POS tags – probability of word + POS tag – additional: • possible tags for the word • morphological characteristics (tense, plurality, degree of comparison) • the word’s spelling (suffixes, capitalization, hyphenation) Input: Chewie , we 're home . Output: NNP , PRP VBP RB . 67 POS: Sequence Labelling
  • 68. Notation: • V - vocabulary • T - POS tags • x - sentence (observation) • y - tag sequences (state) • S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn} – n > 0 – xi ∈ V – yi ∈ T 68 Hidden Markov Models
  • 69. S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn} x: Chewie , we 're home . y: NNP , PRP VBP RB . NN , PRP VBP RB . NNP , PRP VBP NN . NN , PRP VBP NN . … Aim: find {x1 . . . xn, y1 . . . yn} with the highest probability. 69 Hidden Markov Models
  • 70. • Markov Assumption: "The future is independent of the past given the present." – Trigram HMM: each state depends only on the previous two states in the sequence • Independence assumption: – the state of xi depends only on the value of yi, independent of the previous observations and states 70 HMM: assumptions
  • 71. S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn} x: Chewie , we 're home . y: NNP , PRP VBP RB . NN , PRP VBP RB . NNP , PRP VBP NN . NN , PRP VBP NN . ... 71 HMM: assumptions
  • 72. • q(s|u, v) - the probability of tag s after the tags (u, v) – s, u, v ∈ T • e(x|s) - the probability of observation x paired with state s – x ∈ V, s ∈ T 72 Trigram HMM: parameters
  • 73. • q(s|u, v) - the probability of tag s after the tags (u, v) – s, u, v ∈ T • e(x|s) - the probability of observation x paired with state s – x ∈ V, s ∈ T 73 Trigram HMM: parameters
  • 74. 74 For example x: Chewie , we 're home . y: NNP , PRP VBP RB . How do we get p(x, y)?
  • 75. 75 For example x: Chewie , we 're home . y: NNP , PRP VBP RB . p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) * c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) * c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) * c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) * c(RB->home)/c(RB) * c(|.|->.)/c(|.|)
  • 76. 76 One thing missing x: Chewie , we 're home . y: <S> <S> NNP , PRP VBP RB . </S> p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) * c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) * c(<S>,<S>,NNP)/c(<S>,<S>) * c(<S>,NNP,|,|)/c(<S>,NNP) * c(RB,|.|,</S>)/c(RB,|.|) * c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) * c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) * c(RB->home)/c(RB) * c(|.|->.)/c(|.|)
  • 77. Enumerating all possible tag sequences is not feasible — Tn. E.g.: 44 tags ** 6-token sentence = 7,256,313,856 tag sequences Ideas: • use dynamic programming (the Viterbi algorithm) • limit the number of candidates with a dictionary 77 HMM: problem 1
  • 78. 78 HMM: the Viterbi algorithm Idea: remember decisions on the way — n*T3. x: Chewie , we 're home . y: <S> <S> NN , RB NNP VBP . </S> NNP , CD WP VB . NNS , EX PRP$ RB . NNPS , CC VBP NN . JJ , IN PRP JJ . JJR , NNP JJS TO . RRB , PRP RBS RP . VBZ , LS CD IN . ...
  • 79. 79 HMM: with dictionary Idea: use a dictionary — n*83. (Worst case is still n*T3.) x: Chewie , we 're home . y: <S> <S> NNP , PRP VBP VB . </S> NN VBP RB NN
  • 80. Zero probabilities can occur because of OOV or rare words. Idea: use smoothing! • add-1: pretend you saw each word one more time (P.S. It’s usually a horrible choice, but we’ll use it today. Don’t tell anyone.) • Good-Turing: reallocate the probability of n-grams that occur r+1 times to the n-grams that occur r times • Kneser-Ney: when the bigram count is near 0, rely on unigram • ... 80 HMM: problem 2
  • 82. Conclusion 82 “Data is ten times more powerful than algorithms.” — Peter Norvig The Unreasonable Effectiveness of Data http://youtu.be/yvDCzhbjYWs
  • 83. 5. A closer look at syntactic parsing
  • 84. Goal: categorize sentence parts by their functions and define dependencies. Sentence: • main clause • subordinate clause Clause: • subject • predicate • direct/indirect/prepositional object • modifier • complement 84 Syntax: recap
  • 85. Sentence: If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer. 85 Syntax: practice
  • 86. Sentence: If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer. Clauses: • [[you] want [to receive [e-mails about my upcoming shows]]] • [please give [me] [money]] • [[I] can buy [a computer]] 86 Syntax: practice
  • 87. Identify the subject: • The walrus and the carpenter were walking close at hand. • The greatest trick the devil ever pulled was convincing the world he didn't exist. • What we've got here is a failure to communicate. • Actually being funny is mostly telling the truth about things. • To be idle is a short road to death, and to be diligent is a way of life. • Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers. 87 Syntax: the subject
  • 88. Identify the subject: • The walrus and the carpenter were walking close at hand. • The greatest trick the devil ever pulled was convincing the world he didn't exist. • What we've got here is a failure to communicate. • Actually being funny is mostly telling the truth about things. • To be idle is a short road to death, and to be diligent is a way of life. • Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers. 88 Syntax: the subject
  • 89. Identify the subject: • The walrus and the carpenter were walking close at hand. • The greatest trick the devil ever pulled was convincing the world he didn't exist. • What we've got here is a failure to communicate. • Actually being funny is mostly telling the truth about things. • To be idle is a short road to death, and to be diligent is a way of life. • Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers. 89 Syntax: the subject
  • 90. Identify the role of the infinitive: • The two politicians failed [to communicate]. • What we've got here is a failure [to communicate]. • [To be idle] is a short road to death, and [to be diligent] is a way of life. • [To become extroverted], you need to go out and socialize. • You have [to be able [to actually quote the line]] for it [to be a memorable quote]. 90 Syntax: the infinitives
  • 91. 91 How do we formalize the syntactic structure?
  • 93. Types: • constituency tree – every token is a part of some phrase constituent (parent node) – includes terminal and non-terminal nodes – shows relations among the constituents • dependency tree – for every token, there is one node – includes only terminal nodes – shows relations among words 93 Syntactic Trees (or Parse Trees)
  • 94. If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer. 94 Constituency Tree
  • 95. If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer. 95 Constituency Tree
  • 96. 96 Constituency Treebank (TOP (S (NP (ADJP (RB Very) (JJ peculiar)) (NN retribution)) (ADVP (RB indeed)) (VP (VBZ seems) (S (VP (TO to) (VP (VB overtake) (NP (JJ such) (NNS jokers)))))) (. .))) (TOP (SQ (VBP Have) (NP (PRP you)) (ADVP (RB ever)) (VP (VBN heard) (PP (IN of) (NP (NNP Thuggee)))) (. ?))) (TOP (UCP (ADJP (ADVP (NN Sort) (IN of)) (JJ remorseless)) (, ,) (SQ (VBZ is) (RB n't) (NP (PRP it))) (. ?))) (TOP (SBAR (IN As) (S (NP (PRP you)) (VP (MD can) (VP (VB count) (PP (IN on) (NP (PRP me))) (S (VP (TO to) (VP (VB do) (NP (DT the) (JJ same)))))))) (. .))) (TOP (FRAG (ADJP (RB Compassionately) (PRP yours)) (, ,) (NP (NNP S.J.) (NNP Perelman)))) (TOP (S (NP (PRP We)) (VP (VBD caught) (NP (NP (DT the) (JJ early) (NN train)) (PP (IN to) (NP (NNP New) (NNP York))))) (. .))) (TOP (S (NP (JJ Petite) (, ,) (JJ lovely) (NNP Yvette) (NNP Chadroe)) (VP (VBZ plays) (NP (DT the) (NN nymphomaniac)) (ADVP (RB engagingly))) (. .))) ...
  • 97. Penn Treebank tagset: • top level: TOP • sentence: S, SBAR, SQ, SBARQ, SINV • fragment: FRAG • noun phrase: NP • verb phrase: VP • prepositional phrase: PP • adjectival phrase: ADJP • adverbial phrase: ADVP • compound conjunction: CONJP • wh-phrases: WHNP, WHPP, WHADJP, WHADVP • more: LST, PRT, INTJ, NAC, PRN, QP, RRC, UCP, X 97 Constituency Labels
  • 98. • Algorithms: – top-down – chart – bottom-up • Features include: – grammar (a.k.a. transitions) – spans of nodes – labels – right/left/right and left context – split point, etc. • Weights are trained on the treebank. 98 Constituency Parsing
  • 99. 99 Shift-reduce constituency parsing • Data – queue: the words of the sentence – stack: partially completed trees • Actions – shift: move the word from the queue onto the stack – reduce: add a new label on top of the first n constituents on the stack
  • 101. 101 Syntax: impossible cases Most cats and dogs with fleas live in the neighbourhood.
  • 102. 102 Syntax: impossible cases Most cats and dogs with fleas live in the neighbourhood.
  • 103. 103 Syntax: impossible cases Wanted: a nurse for a baby about twenty years old.
  • 104. 104 Syntax: impossible cases Wanted: a nurse for a baby about twenty years old.
  • 105. 105 Syntax: impossible cases I shot an elephant in my pajamas.
  • 106. 106 Syntax: impossible cases I shot an elephant in my pajamas.
  • 107. 107 Syntax: impossible cases I once saw a deer riding my bicycle.
  • 108. 108 Syntax: impossible cases I once saw a deer riding my bicycle.
  • 109. 109 Syntax: impossible cases I’m glad I’m a man, and so is Lola.
  • 110. 110 Syntax: impossible cases I’m glad I’m a man, and so is Lola.
  • 111. Types: • constituency tree – every token is a part of some phrase constituent (parent node) – includes terminal and non-terminal nodes – shows relations among the constituents • dependency tree – for every token, there is one node – includes only terminal nodes – shows relations among words 111 Syntactic Trees (or Parse Trees)
  • 112. Universal dependencies: • subject: NSUBJ, NSUBJPASS, CSUBJ, CSUBJPASS • object: DATIVE, DOBJ, AGENT, OPRD • complement: ACOMP, CCOMP, XCOMP, PCOMP • auxiliary: AUX, AUXPASS • clausal modifier: ACL, ADVCL, RELCL • different modifier: ADVMOD, NPADVMOD, AMOD, COMPOUND, NEG, NUMMOD, QUANTMOD • determiner: DET, PREDET • apposition: APPOS • coordinating conjunction and conjuct: CC, CONJ • prepositional modifier and its object: PREP, POBJ • more: POSS, CASE, DEP, EXPL, INTJ, MARK, PRECONJ, PRT, PUNCT, PARATAXIS 112 Dependency Relations
  • 113. 113 Dependency Tree If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
  • 114. 114 Dependency Tree If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
  • 115. • Graph-Based Parsing – find the highest score tree from a complete graph – slow, but performs better on long-distance dependencies – e.g., MSTParser • Transition-Based Parsing – apply transition actions one by one – faster, but performs better on short-distance dependencies – e.g., MaltParser, the Stanford Parser, ZPAR 115 Algorithms
  • 117. • Data – queue: the words of the sentence – stack: partially completed trees • Actions: – shift: move the word from the queue onto the stack – reduce: pop the stack, removing only its top item, as long as that item has a head – right-arc: create a right dependency arc between the word on top of the stack and the next token in the queue – left-arc: create a left dependency arc between the word on top of the stack and the next token in the queue 117 Transition-Based Parsing
  • 122. 122 Syntax: impossible cases We eat pizza with anchovy.
  • 123. 123 Syntax: impossible cases We eat pizza with anchovy.
  • 124. 124 Syntax: impossible cases Насильство твій макіяж не приховає!
  • 125. 125 Syntax: impossible cases Насильство твій макіяж не приховає!
  • 126. 6. Let’s build something: error correction
  • 127. We likes pizza with anchovy. Children like and cherishes her kindness and cooking skills. Some is watching the way she knits and loving it. Colorless green ideas sleeps furiously. Barry and Mary, whom I met at the New Year 's party, is just the cutest people. There is two cats and a dog. 127 Subject-verb disagreement
  • 128. Text processing: tokenization, POS tagging, syntactic parsing, etc. Detection: find a VBZ Rules: if the verb has nsubj relation and the subject does not have a conjunct, we should correct it… Correction: use a dictionary of transformations 128 Rule-based Toy Solution
  • 129. Text processing: tokenization, POS tagging, syntactic parsing, etc. Detection: find a VBZ Classifier + features: POS tag of the subject, does the subject have a conjunct... Correction: use a dictionary of transformations 129 ML-based Toy Solution
  • 131. 131 Presenter: Mariana Romanyshyn mariana.romanyshyn@grammarly.com With the help of: Oksana Kunikevych oksana.kunikevych@grammarly.com Khrystyna Skopyk khrystyna.skopyk@grammarly.com Tetiana Myronivska tetiana.myronivska@grammarly.com Tetiana Turchyn tetiana.turchyn@grammarly.com Contact us
  • 132. 132