Boost PC performance: How more available memory can improve productivity
NLP: a peek into a day of a computational linguist
1. NLP: a peek into a day
of a computational linguist
Mariana Romanyshyn
Grammarly, Inc.
2. 1. NLP applications in our world
2. What computational linguists do
3. Language levels
4. A closer look at part-of-speech tagging
5. A closer look at syntactic parsing
6. Let’s build something: error correction
2
Contents
11. 11
It tastes amazing!
It tastes horrible!
It tastes normal.
ABC tastes much better than DEF.
Sentiment Analysis
12. 12
It tastes amazing!
It tastes horrible!
It tastes normal.
ABC tastes much better than DEF.
It tastes like beer!
It tastes interesting!
It tastes like my mom said it would!
If it was served with milk, it would taste great!
Sentiment Analysis
13. 13
“That young girl is one of the least benightedly unintelligent organic
life forms [that] it has been my profound lack of pleasure not to be
able to avoid meeting.”
— Douglas Adams
Terminal cases
14. 14
“That young girl is one of the least benightedly unintelligent organic
life forms [that] it has been my profound lack of pleasure not to be
able to avoid meeting.”
— Douglas Adams
Terminal cases
29. “I remember the first time we loaded these data sources into Siri.
I typed “start over” into the system, and Siri came back saying,
“Looking for businesses named ‘Over’ in Start, Louisiana.”
— Adam Cheyer
29
Siri
34. MISC
News & weather reports generation
Conversational Agents
Language learning
Story Cloze Task
…
34
Types of NLP Applications
35. Tom and Sheryl have been together for two years. One day, they
went to a carnival. Tom won Sheryl several stuffed bears. When
they reached the Ferris wheel, he got down on one knee.
Which ending is more probable?
• Tom asked Sheryl to marry him.
• He wiped mud off of his boot.
35
Story Cloze
44. How do we split...
• text into paragraph?
44
Splitting problems
45. 45
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
46. 46
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.
• sentence into words?
47. 47
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.
• sentence into words?
computer-aided, the d.t.s, San Francisco, 3$B deal
• word into morphemes?
48. 48
Splitting problems
How do we split...
• text into paragraph?
bullet points, word wrapping
• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.
• sentence into words?
computer-aided, the d.t.s, San Francisco, 3$B deal
• word into morphemes?
misadventure
mislead
mistake - ?
49. 49
Features
Quantitative features:
• number of sentences, words, words per sentence, etc.
• size and arrangement of paragraphs
• word length
• word position in a sentence
• number of syllables in a word
• ratio of vowels vs consonants
• depth of the word in the dependency tree of the sentence
• number of word senses
• ngrams
50. 50
Ngrams
Sequences of elements and their frequencies:
• unigrams, bigrams, 3-grams, 4-grams, … n-grams
• at different language levels
– token ngrams:
• ("handsome”, ”man"): 160,000 ("pretty”, ”man"): 5,000
– character ngrams
• “st”: 14,000; “ct”: 4,000; “str”: 1,500; “ctr”: 50; “stra”: 400; “ctra”: 0
• adding grammar
51. 51
Features
Grammatical features:
• POS tag
• morphemes: affixes, roots, endings
• constituency spans
• dependency relations
• coreference
• grammatical characteristics of various parts of speech:
– countability of nouns
– tense of verbs
– degree of comparison of adjectives
– pronoun type
– connector type
52. 52
Features
Spelling features:
• capitalized word?
• hyphenated word?
• compound word?
Lexical-semantic features:
• WordNet
• VerbNet
• dictionaries and thesauri
• word embeddings
• modality of verbs
54. Goal: categorize words by their functions.
English:
• notional: noun, verb, adjective, adverb, pronoun (?), numeral (?)
• functional: determiner, preposition, conjunction, particle, and
interjection
54
POS: recap
55. Wow, two hungry cats chased down the mouse to the corner
and quickly ate it!
55
POS: practice
56. All you need is love . Love is all
at the way you love me all the time
. And never mind that noise you heard .
fire and of things that will bite , yeah
було так давно , коли в руках тримаю цей
Просто налийте трохи коли на пошкоджену ділянку .
ударом . Я хочу мати всьо , і всьо на
а на полі спозаранку мати жито жала , та
56
POS: more practice
57. Time flies like an arrow.
I saw her duck with a telescope.
She is calculating.
We watched an Indian dance.
They can fish.
More lies ahead...
Це мало мало значення.
Коло друзів та незнайомців.
57
POS: impossible cases
58. Time flies[Verb/Noun] like[Preposition/Verb] an arrow.
I saw her duck[Verb/Noun] with a telescope.
She is calculating[Verb/Adjective].
We watched an Indian[Adjective/Noun] dance.
They can[Modal Verb/Verb] fish[Verb/Noun].
More lies[Verb/Noun] ahead...
Це мало[Дієслово/Прислівник] мало[Дієслово/Прислівник] значення.
Коло[Іменник/Прийменник] друзів та незнайомців.
58
POS: impossible cases
60. What POS should gotta be?
I gotta tell you something.
I’ve gotta fix that thingy for her, Jack.
So, she gotta this gorgeous dress.
So, she gotta gun.
60
POS: disputable cases
61. What POS should gotta be?
I gotta[modal verb] tell you something.
I’ve gotta[verb, 3rd form] fix that thingy for her, Jack.
So, she gotta[verb, 2nd form] this gorgeous dress.
So, she gotta[verb, 2nd form] gun.
61
POS: disputable cases
66. • Use a classifier to tag each word independently
• Features
– left/right context: words, POS tags, words + POS tags
– probability of word + POS tag
– additional:
• possible tags for the word
• morphological characteristics (tense, plurality, degree of comparison)
• the word’s spelling (suffixes, capitalization, hyphenation)
Input: Chewie[NNP] ,[,] we[PRP] 're[VBP] home[NN/RB] - ? .[.]
Output: RB
66
POS: Classification
67. • Map the sentence to the most probable POS tag sequence
• Features
– left/right context: words, POS tags, words + POS tags
– probability of word + POS tag
– additional:
• possible tags for the word
• morphological characteristics (tense, plurality, degree of comparison)
• the word’s spelling (suffixes, capitalization, hyphenation)
Input: Chewie , we 're home .
Output: NNP , PRP VBP RB .
67
POS: Sequence Labelling
68. Notation:
• V - vocabulary
• T - POS tags
• x - sentence (observation)
• y - tag sequences (state)
• S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
– n > 0
– xi ∈ V
– yi ∈ T
68
Hidden Markov Models
69. S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
x: Chewie , we 're home .
y: NNP , PRP VBP RB .
NN , PRP VBP RB .
NNP , PRP VBP NN .
NN , PRP VBP NN .
…
Aim: find {x1 . . . xn, y1 . . . yn} with the highest probability.
69
Hidden Markov Models
70. • Markov Assumption: "The future is independent of the past
given the present."
– Trigram HMM: each state depends only on the previous two states
in the sequence
• Independence assumption:
– the state of xi depends only on the value of yi, independent of the
previous observations and states
70
HMM: assumptions
71. S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
x: Chewie , we 're home .
y: NNP , PRP VBP RB .
NN , PRP VBP RB .
NNP , PRP VBP NN .
NN , PRP VBP NN .
...
71
HMM: assumptions
72. • q(s|u, v) - the probability of tag s after the tags (u, v)
– s, u, v ∈ T
• e(x|s) - the probability of observation x paired with state s
– x ∈ V, s ∈ T
72
Trigram HMM: parameters
73. • q(s|u, v) - the probability of tag s after the tags (u, v)
– s, u, v ∈ T
• e(x|s) - the probability of observation x paired with state s
– x ∈ V, s ∈ T
73
Trigram HMM: parameters
77. Enumerating all possible tag sequences is not feasible — Tn.
E.g.:
44 tags ** 6-token sentence = 7,256,313,856 tag sequences
Ideas:
• use dynamic programming (the Viterbi algorithm)
• limit the number of candidates with a dictionary
77
HMM: problem 1
78. 78
HMM: the Viterbi algorithm
Idea: remember decisions on the way — n*T3.
x: Chewie , we 're home .
y: <S> <S> NN , RB NNP VBP . </S>
NNP , CD WP VB .
NNS , EX PRP$ RB .
NNPS , CC VBP NN .
JJ , IN PRP JJ .
JJR , NNP JJS TO .
RRB , PRP RBS RP .
VBZ , LS CD IN .
...
79. 79
HMM: with dictionary
Idea: use a dictionary — n*83. (Worst case is still n*T3.)
x: Chewie , we 're home .
y: <S> <S> NNP , PRP VBP VB . </S>
NN VBP
RB
NN
80. Zero probabilities can occur because of OOV or rare words.
Idea: use smoothing!
• add-1: pretend you saw each word one more time
(P.S. It’s usually a horrible choice, but we’ll use it today. Don’t tell anyone.)
• Good-Turing: reallocate the probability of n-grams that occur
r+1 times to the n-grams that occur r times
• Kneser-Ney: when the bigram count is near 0, rely on unigram
• ...
80
HMM: problem 2
84. Goal: categorize sentence parts by their functions and define dependencies.
Sentence:
• main clause
• subordinate clause
Clause:
• subject
• predicate
• direct/indirect/prepositional object
• modifier
• complement 84
Syntax: recap
85. Sentence:
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
85
Syntax: practice
86. Sentence:
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
Clauses:
• [[you] want [to receive [e-mails about my upcoming shows]]]
• [please give [me] [money]]
• [[I] can buy [a computer]]
86
Syntax: practice
87. Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he
didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird
with long blue tail feathers. 87
Syntax: the subject
88. Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he
didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird
with long blue tail feathers. 88
Syntax: the subject
89. Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he
didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird
with long blue tail feathers. 89
Syntax: the subject
90. Identify the role of the infinitive:
• The two politicians failed [to communicate].
• What we've got here is a failure [to communicate].
• [To be idle] is a short road to death, and [to be diligent] is a way of
life.
• [To become extroverted], you need to go out and socialize.
• You have [to be able [to actually quote the line]] for it [to be a
memorable quote].
90
Syntax: the infinitives
93. Types:
• constituency tree
– every token is a part of some phrase constituent (parent node)
– includes terminal and non-terminal nodes
– shows relations among the constituents
• dependency tree
– for every token, there is one node
– includes only terminal nodes
– shows relations among words
93
Syntactic Trees (or Parse Trees)
94. If you want to receive e-mails about my upcoming shows, then please give me
money so I can buy a computer.
94
Constituency Tree
95. If you want to receive e-mails about my upcoming shows, then please give me
money so I can buy a computer.
95
Constituency Tree
98. • Algorithms:
– top-down
– chart
– bottom-up
• Features include:
– grammar (a.k.a. transitions)
– spans of nodes
– labels
– right/left/right and left context
– split point, etc.
• Weights are trained on the treebank.
98
Constituency Parsing
99. 99
Shift-reduce constituency parsing
• Data
– queue: the words of the sentence
– stack: partially completed trees
• Actions
– shift: move the word from the queue onto the stack
– reduce: add a new label on top of the first n constituents on
the stack
111. Types:
• constituency tree
– every token is a part of some phrase constituent (parent node)
– includes terminal and non-terminal nodes
– shows relations among the constituents
• dependency tree
– for every token, there is one node
– includes only terminal nodes
– shows relations among words
111
Syntactic Trees (or Parse Trees)
113. 113
Dependency Tree
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
114. 114
Dependency Tree
If you want to receive e-mails about my upcoming shows, then please
give me money so I can buy a computer.
115. • Graph-Based Parsing
– find the highest score tree from a complete graph
– slow, but performs better on long-distance dependencies
– e.g., MSTParser
• Transition-Based Parsing
– apply transition actions one by one
– faster, but performs better on short-distance dependencies
– e.g., MaltParser, the Stanford Parser, ZPAR
115
Algorithms
117. • Data
– queue: the words of the sentence
– stack: partially completed trees
• Actions:
– shift: move the word from the queue onto the stack
– reduce: pop the stack, removing only its top item, as long as that
item has a head
– right-arc: create a right dependency arc between the word on top of
the stack and the next token in the queue
– left-arc: create a left dependency arc between the word on top of
the stack and the next token in the queue 117
Transition-Based Parsing
127. We likes pizza with anchovy.
Children like and cherishes her kindness and cooking skills.
Some is watching the way she knits and loving it.
Colorless green ideas sleeps furiously.
Barry and Mary, whom I met at the New Year 's party, is just
the cutest people.
There is two cats and a dog.
127
Subject-verb disagreement
128. Text processing: tokenization, POS tagging, syntactic parsing, etc.
Detection: find a VBZ
Rules: if the verb has nsubj relation and the subject does not
have a conjunct, we should correct it…
Correction: use a dictionary of transformations
128
Rule-based Toy Solution
129. Text processing: tokenization, POS tagging, syntactic parsing, etc.
Detection: find a VBZ
Classifier + features: POS tag of the subject, does the subject
have a conjunct...
Correction: use a dictionary of transformations
129
ML-based Toy Solution