1. NLP Training – Session 3
Dr. Alexandra M. Liguori
Incubio – The Big Data Academy
Barcelona, April 22, 2015
Dr. Alexandra M. Liguori NLP Training – Session 3
3. Outline
1 Clarification about corpus
2 Recap: Typical NLP tasks
3 Automatic Question Answering
4 Reference resolution
5 Named Entity Recognition (NER)
6 Keyword / topic / information extraction
Dr. Alexandra M. Liguori NLP Training – Session 3
7. Corpus
Definition
Corpus = Large and structured set of texts.
NLP
Two types of corpora:
Training corpus ↔ to make the list of rules or to get the
statistical data
Test corpus ↔ to test the results found with the training
corpus
Dr. Alexandra M. Liguori NLP Training – Session 3
8. Typical NLP tasks: Basic and simpler tasks
Dr. Alexandra M. Liguori NLP Training – Session 3
9. Typical NLP tasks: Basic and simpler tasks
Tokenization
Dr. Alexandra M. Liguori NLP Training – Session 3
10. Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Dr. Alexandra M. Liguori NLP Training – Session 3
11. Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting
Dr. Alexandra M. Liguori NLP Training – Session 3
12. Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
Dr. Alexandra M. Liguori NLP Training – Session 3
13. Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging
Dr. Alexandra M. Liguori NLP Training – Session 3
14. Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging
POS-tagging algorithms and
tag sets
Dr. Alexandra M. Liguori NLP Training – Session 3
15. Typical NLP tasks: Complex tasks
Dr. Alexandra M. Liguori NLP Training – Session 3
16. Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Dr. Alexandra M. Liguori NLP Training – Session 3
17. Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Dr. Alexandra M. Liguori NLP Training – Session 3
18. Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Dr. Alexandra M. Liguori NLP Training – Session 3
19. Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYK
algorithm, GHR algorithm,
Stanford Parser (Java
implementation of
probabilistic algorithm)
Dr. Alexandra M. Liguori NLP Training – Session 3
20. Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYK
algorithm, GHR algorithm,
Stanford Parser (Java
implementation of
probabilistic algorithm)
Question answering
Topic extraction
NER
Semantic analysis
...
Dr. Alexandra M. Liguori NLP Training – Session 3
21. Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYK
algorithm, GHR algorithm,
Stanford Parser (Java
implementation of
probabilistic algorithm)
Question answering
Topic extraction
NER
Semantic analysis
...
Ad hoc tools, e.g.
dictionaries, ontologies,
Frames, GATE, NLTK,
Lappin & Leass algorithm...
Dr. Alexandra M. Liguori NLP Training – Session 3
22. Question Answering
Video on Bush Jr. and Condoleezza Rice from Who’s on first
Dr. Alexandra M. Liguori NLP Training – Session 3
24. Simple Question Answering
ELIZA
User_1: Men are all alike.
ELIZA_1: IN WHAT WAY
User_2: They’re always bugging us about something or other..
ELIZA_2: CAN YOU THINK OF A SPECIFIC EXAMPLE
User_3: Well, my boyfriend made me come here.
ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE
User_4: Yes, he says I’m depressed much of the time.
ELIZA_4: I AM SORRY TO HEAR YOU ARE DEPRESSED.
Dr. Alexandra M. Liguori NLP Training – Session 3
26. ELIZA
Regular expression substitutions
change all instances of my to YOUR, and I’m to YOU ARE,
etc., e.g.:
1 User_3: Well, my boyfriend made me come here.
ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE
Dr. Alexandra M. Liguori NLP Training – Session 3
27. ELIZA
Regular expression substitutions
change all instances of my to YOUR, and I’m to YOU ARE,
etc., e.g.:
1 User_3: Well, my boyfriend made me come here.
ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE
2 User_4: ... I’m depressed ... .
ELIZA_4: ... YOU ARE DEPRESSED.
Dr. Alexandra M. Liguori NLP Training – Session 3
29. ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
Dr. Alexandra M. Liguori NLP Training – Session 3
30. ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK
YOU ARE 1 /
Dr. Alexandra M. Liguori NLP Training – Session 3
31. ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK
YOU ARE 1 /
3 s/.* all .*/IN WHAT WAY/
Dr. Alexandra M. Liguori NLP Training – Session 3
32. ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK
YOU ARE 1 /
3 s/.* all .*/IN WHAT WAY/
4 s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/
Dr. Alexandra M. Liguori NLP Training – Session 3
37. Quizlyse Example
1) Input
Affirmative sentence, e.g.
Cristiano chuta el balon.
2) Intermediate output
Parsed text:
Dr. Alexandra M. Liguori NLP Training – Session 3
38. Quizlyse Example
1) Input
Affirmative sentence, e.g.
Cristiano chuta el balon.
2) Intermediate output
Parsed text:
Cristiano/NPMS000 chuta/VMIS3S0 el/DI0MS0
balon/NCMS000 ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
39. Quizlyse Example
1) Input
Affirmative sentence, e.g.
Cristiano chuta el balon.
2) Intermediate output
Parsed text:
Cristiano/NPMS000 chuta/VMIS3S0 el/DI0MS0
balon/NCMS000 ./.
Cristiano/SUBJ chuta/VERB [el balon]/DIRECT-OBJ ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
42. Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
Dr. Alexandra M. Liguori NLP Training – Session 3
43. Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ?
Dr. Alexandra M. Liguori NLP Training – Session 3
44. Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ?
4) Final Output
Automatically generated question as output; e.g.:
Dr. Alexandra M. Liguori NLP Training – Session 3
45. Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ?
4) Final Output
Automatically generated question as output; e.g.:
Qué chuta Cristiano?
Dr. Alexandra M. Liguori NLP Training – Session 3
47. Reference resolution
Discourse
Gracie: Oh yeah... And then Mr. and Mrs. Jones were having
matrimonial trouble, and my brother was hired to watch Mrs. Jones.
George: Well, I imagine she was a very attractive woman.
Gracie: She was, and my brother watched her day and night for six
months.
George: Well, what happened?
Gracie: She finally got a divorce.
George: Mrs. Jones?
Gracie: No, my brother’s wife.
Dr. Alexandra M. Liguori NLP Training – Session 3
48. Reference resolution
Discourse
Gracie: Oh yeah... And then Mr. and Mrs. Jones were having
matrimonial trouble, and my brother was hired to watch Mrs. Jones.
George: Well, I imagine she was a very attractive woman.
Gracie: She was, and my brother watched her day and night for six
months.
George: Well, what happened?
Gracie: She finally got a divorce.
George: Mrs. Jones?
Gracie: No, my brother’s wife.
Jordi se fué al restaurante de Xavi para comer pescado. Este
estaba fresco y le gustó.
Dr. Alexandra M. Liguori NLP Training – Session 3
52. Reference resolution
1 Reference phenomena
2 Constraints on coreference
3 Preferences in pronoun interpretation
Dr. Alexandra M. Liguori NLP Training – Session 3
53. Reference resolution
1 Reference phenomena
2 Constraints on coreference
3 Preferences in pronoun interpretation
4 Example of algorithm for pronoun resolution
Dr. Alexandra M. Liguori NLP Training – Session 3
56. Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
57. Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran
muy dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
58. Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran
muy dulces.
4 Demonstratives ↔ Pedro hizo unos pasteles: estos son
de chocolate, aquellos son de almendra.
Dr. Alexandra M. Liguori NLP Training – Session 3
59. Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran
muy dulces.
4 Demonstratives ↔ Pedro hizo unos pasteles: estos son
de chocolate, aquellos son de almendra.
5 Anaphora con uno/una/unos/unas ↔ Ayer Pedro hizo
una tarta. Hoy quiero hacer una yo también.
Dr. Alexandra M. Liguori NLP Training – Session 3
61. Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
Dr. Alexandra M. Liguori NLP Training – Session 3
62. Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
Dr. Alexandra M. Liguori NLP Training – Session 3
63. Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
3 Gender agreement ↔ La tarta que comí ayer la hizo Ana.
/ La tarta que comí ayer lo hizo Ana.
Dr. Alexandra M. Liguori NLP Training – Session 3
64. Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
3 Gender agreement ↔ La tarta que comí ayer la hizo Ana.
/ La tarta que comí ayer lo hizo Ana.
4 Syntactic constraints ↔ Ana se hizo una tarta. / Ana le
hizo una tarta.
Dr. Alexandra M. Liguori NLP Training – Session 3
65. Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
3 Gender agreement ↔ La tarta que comí ayer la hizo Ana.
/ La tarta que comí ayer lo hizo Ana.
4 Syntactic constraints ↔ Ana se hizo una tarta. / Ana le
hizo una tarta.
5 Selectional restrictions ↔ Ana puso el pastel en el
horno. Es redondo.
Dr. Alexandra M. Liguori NLP Training – Session 3
67. Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
Dr. Alexandra M. Liguori NLP Training – Session 3
68. Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
Dr. Alexandra M. Liguori NLP Training – Session 3
69. Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
3 Repeated mention ↔ Anne needed a car to drive to her
new job. She decided she wanted something roomy. Carol
went to the Honda dealership with her. She bought a Civic.
Dr. Alexandra M. Liguori NLP Training – Session 3
70. Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
3 Repeated mention ↔ Anne needed a car to drive to her
new job. She decided she wanted something roomy. Carol
went to the Honda dealership with her. She bought a Civic.
4 Parallelism ↔ Pedro llamó Juan por la mañana. Carlos le
llamó por la tarde.
Dr. Alexandra M. Liguori NLP Training – Session 3
71. Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
3 Repeated mention ↔ Anne needed a car to drive to her
new job. She decided she wanted something roomy. Carol
went to the Honda dealership with her. She bought a Civic.
4 Parallelism ↔ Pedro llamó Juan por la mañana. Carlos le
llamó por la tarde.
5 Verb semantics ↔ Pedro hizo un pastel para Juan. Le
gustan los dulces. / Pedro pidió un pastel a Juan. Le
gustan los dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
73. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Dr. Alexandra M. Liguori NLP Training – Session 3
74. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Dr. Alexandra M. Liguori NLP Training – Session 3
75. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
Dr. Alexandra M. Liguori NLP Training – Session 3
76. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Dr. Alexandra M. Liguori NLP Training – Session 3
77. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Dr. Alexandra M. Liguori NLP Training – Session 3
78. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Dr. Alexandra M. Liguori NLP Training – Session 3
79. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Reference pronoun to noun with highest
weight, otherwise, if there are no pronouns,
divide all weights by 2
Dr. Alexandra M. Liguori NLP Training – Session 3
80. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Reference pronoun to noun with highest
weight, otherwise, if there are no pronouns,
divide all weights by 2
Lappin & Leass
algorithm
Dr. Alexandra M. Liguori NLP Training – Session 3
81. Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Reference pronoun to noun with highest
weight, otherwise, if there are no pronouns,
divide all weights by 2
Lappin & Leass
algorithm
Proceed to 2nd sentence and repeat all steps as
above, adding all the weights along the way
Dr. Alexandra M. Liguori NLP Training – Session 3
82. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
Dr. Alexandra M. Liguori NLP Training – Session 3
83. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
Dr. Alexandra M. Liguori NLP Training – Session 3
84. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
Dr. Alexandra M. Liguori NLP Training – Session 3
85. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
Dr. Alexandra M. Liguori NLP Training – Session 3
86. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
Dr. Alexandra M. Liguori NLP Training – Session 3
87. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
5 Indirect object emphasis ↔ 40
e.g. Ana regaló el pastel a Carmen.
Dr. Alexandra M. Liguori NLP Training – Session 3
88. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
5 Indirect object emphasis ↔ 40
e.g. Ana regaló el pastel a Carmen.
6 Non-adverbial emphasis ↔ 50
e.g. Ana puso un poco de chocolate en el pastel.
Dr. Alexandra M. Liguori NLP Training – Session 3
89. Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
5 Indirect object emphasis ↔ 40
e.g. Ana regaló el pastel a Carmen.
6 Non-adverbial emphasis ↔ 50
e.g. Ana puso un poco de chocolate en el pastel.
7 Head noun emphasis ↔ 80
e.g. El libro de recetas para el pastel de chocolate está en la
mesa de la cocina.
Dr. Alexandra M. Liguori NLP Training – Session 3
90. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
Dr. Alexandra M. Liguori NLP Training – Session 3
91. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
1 Take first sentence: Pedro se comió una tarta de chocolate.
Dr. Alexandra M. Liguori NLP Training – Session 3
92. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
1 Take first sentence: Pedro se comió una tarta de chocolate.
2 Parse this first sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
93. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
1 Take first sentence: Pedro se comió una tarta de chocolate.
2 Parse this first sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
3 Calculate weights for all nouns and pronouns appearing in this
first sentence:
Dr. Alexandra M. Liguori NLP Training – Session 3
94. Algorithm for reference resolution
Weights for the nouns and pronouns from the first
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Pedro 100 80 0 0 0 50 80 310
tarta 100 0 0 50 0 50 80 280
chocolate 100 0 0 0 0 0 80 180
Dr. Alexandra M. Liguori NLP Training – Session 3
95. Algorithm for reference resolution
Weights for the nouns and pronouns from the first
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Pedro 100 80 0 0 0 50 80 310
tarta 100 0 0 50 0 50 80 280
chocolate 100 0 0 0 0 0 80 180
No pronouns whose reference needs to be resolved →
divide all the results by 2:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
Dr. Alexandra M. Liguori NLP Training – Session 3
96. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
Dr. Alexandra M. Liguori NLP Training – Session 3
97. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
1 Take second sentence: Él se la había pedido a Juan.
Dr. Alexandra M. Liguori NLP Training – Session 3
98. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
1 Take second sentence: Él se la había pedido a Juan.
2 Parse this second sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
99. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
1 Take second sentence: Él se la había pedido a Juan.
2 Parse this second sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
3 Calculate weights for all new nouns and pronouns appearing in
this second sentence:
Dr. Alexandra M. Liguori NLP Training – Session 3
100. Algorithm for reference resolution
Weights for the nouns and pronouns from the second
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Él 100 80 0 0 0 50 80 310
la 100 0 0 50 0 50 80 280
Juan 100 0 0 0 40 50 80 270
Dr. Alexandra M. Liguori NLP Training – Session 3
101. Algorithm for reference resolution
Weights for the nouns and pronouns from the second
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Él 100 80 0 0 0 50 80 310
la 100 0 0 50 0 50 80 280
Juan 100 0 0 0 40 50 80 270
The two pronouns Él and la have to be referred to nouns
from the first sentence
Dr. Alexandra M. Liguori NLP Training – Session 3
102. Algorithm for reference resolution
Results from first two sentences:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
(PRO)NOUNS TOT.
Él 310
la 280
Juan 220
Dr. Alexandra M. Liguori NLP Training – Session 3
103. Algorithm for reference resolution
Results from first two sentences:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
(PRO)NOUNS TOT.
Él 310
la 280
Juan 220
1 pronoun la is referred to noun tarta because of gender
constraints (i.e. only feminines here)
Dr. Alexandra M. Liguori NLP Training – Session 3
104. Algorithm for reference resolution
Results from first two sentences:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
(PRO)NOUNS TOT.
Él 310
la 280
Juan 220
1 pronoun la is referred to noun tarta because of gender
constraints (i.e. only feminines here)
2 pronoun Él is referred to the noun from the previous
sentence with the highest value, i.e. Pedro
Dr. Alexandra M. Liguori NLP Training – Session 3
105. Algorithm for reference resolution
Combined results of reference from first two sentences:
(PRO)NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Dr. Alexandra M. Liguori NLP Training – Session 3
106. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
Dr. Alexandra M. Liguori NLP Training – Session 3
107. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
1 Take third sentence: Le gustan los dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
108. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
1 Take third sentence: Le gustan los dulces.
2 Parse this third sentence → parsing result:
Le/PP3CSD00 gustan/VMII3P0 los/DA0MP0
dulces/NCMP000 ./.
Le/IND-OBJ gustan/VERB [los dulces]/SUBJ ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
109. Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
1 Take third sentence: Le gustan los dulces.
2 Parse this third sentence → parsing result:
Le/PP3CSD00 gustan/VMII3P0 los/DA0MP0
dulces/NCMP000 ./.
Le/IND-OBJ gustan/VERB [los dulces]/SUBJ ./.
3 Calculate weights for all new nouns and pronouns appearing in
this third sentence:
Dr. Alexandra M. Liguori NLP Training – Session 3
110. Algorithm for reference resolution
Weights for the nouns and pronouns from the third
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Le 100 0 0 0 50 50 80 280
dulces 100 100 0 0 0 50 80 330
Dr. Alexandra M. Liguori NLP Training – Session 3
111. Algorithm for reference resolution
Weights for the nouns and pronouns from the third
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Le 100 0 0 0 50 50 80 280
dulces 100 100 0 0 0 50 80 330
There is only the pronoun Le that needs to be referred to a
previous noun...
Dr. Alexandra M. Liguori NLP Training – Session 3
112. Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Dr. Alexandra M. Liguori NLP Training – Session 3
113. Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
Dr. Alexandra M. Liguori NLP Training – Session 3
114. Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
we refer Le to previous noun with highest weight, i.e.
Pedro
Dr. Alexandra M. Liguori NLP Training – Session 3
115. Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
we refer Le to previous noun with highest weight, i.e.
Pedro
referencing is completed!!!
Dr. Alexandra M. Liguori NLP Training – Session 3
116. Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
we refer Le to previous noun with highest weight, i.e.
Pedro
referencing is completed!!!
Lappin & Leass algorithm has nearly 90% accuracy.
Dr. Alexandra M. Liguori NLP Training – Session 3
118. NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
Dr. Alexandra M. Liguori NLP Training – Session 3
119. NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
Dr. Alexandra M. Liguori NLP Training – Session 3
120. NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
Dr. Alexandra M. Liguori NLP Training – Session 3
121. NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
Dr. Alexandra M. Liguori NLP Training – Session 3
122. NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.)
Dr. Alexandra M. Liguori NLP Training – Session 3
123. NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.)
3 location (e.g. ”Madrid”, "Washington D.C.”, ”L.A.”,
”Barcelona”, etc.)
Dr. Alexandra M. Liguori NLP Training – Session 3
124. NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.)
3 location (e.g. ”Madrid”, "Washington D.C.”, ”L.A.”,
”Barcelona”, etc.)
4 other (e.g. ”Hotel Catalunya”, etc. )
Dr. Alexandra M. Liguori NLP Training – Session 3
125. NER
Tools for Named Entity Recognition
GATE for English, Spanish, and many more, via graphical
interface and Java API (development at the University of
Sheffield, UK)
https://gate.ac.uk/
NETagger: Java based Illinois Named Entity Recognition
(development by Cognitive Computation Group at University of
Illinois at Urbana - Champaign)
http://cogcomp.cs.illinois.edu/page/software_view/NETagger
OpenNLP: rule based and statistical Named Entity Recognition
(development by Apache)
http://opennlp.apache.org/index.html
Stanford CoreNLP: Java-based CRF Named Entity Recognition
(development by Stanford Natural Language Processing Group)
http://nlp.stanford.edu/software/CRF-NER.shtml
Dr. Alexandra M. Liguori NLP Training – Session 3
126. Keyword / topic / information extraction
Dr. Alexandra M. Liguori NLP Training – Session 3
127. Keyword / topic / information extraction
Tools
Dr. Alexandra M. Liguori NLP Training – Session 3
128. Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
Dr. Alexandra M. Liguori NLP Training – Session 3
129. Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
1 GATE (ANNIE tool) for English, Spanish, and many more,
via graphical interface and Java API
https://gate.ac.uk/
→ simply using jape files for the LUs
Dr. Alexandra M. Liguori NLP Training – Session 3
130. Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
1 GATE (ANNIE tool) for English, Spanish, and many more,
via graphical interface and Java API
https://gate.ac.uk/
→ simply using jape files for the LUs
2 pattern.vector module from CLiPS in Python
http://www.clips.ua.ac.be/pages/luceneapi_node/pattern.vector
Dr. Alexandra M. Liguori NLP Training – Session 3
131. Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
1 GATE (ANNIE tool) for English, Spanish, and many more,
via graphical interface and Java API
https://gate.ac.uk/
→ simply using jape files for the LUs
2 pattern.vector module from CLiPS in Python
http://www.clips.ua.ac.be/pages/luceneapi_node/pattern.vector
Topic / information extraction: e.g. GATE (ANNIE tool)
for English, Spanish, and many more, via graphical
interface and Java API
→ using jape files for the LUs, FEs, and FRAMES
Dr. Alexandra M. Liguori NLP Training – Session 3