SlideShare a Scribd company logo
1 of 132
Download to read offline
NLP Training – Session 3
Dr. Alexandra M. Liguori
Incubio – The Big Data Academy
Barcelona, April 22, 2015
Dr. Alexandra M. Liguori NLP Training – Session 3
Welcome back!!!
Dr. Alexandra M. Liguori NLP Training – Session 3
Outline
1 Clarification about corpus
2 Recap: Typical NLP tasks
3 Automatic Question Answering
4 Reference resolution
5 Named Entity Recognition (NER)
6 Keyword / topic / information extraction
Dr. Alexandra M. Liguori NLP Training – Session 3
NLP: Ambiguities and Solutions
Dr. Alexandra M. Liguori NLP Training – Session 3
NLP: Ambiguities and Solutions
Dr. Alexandra M. Liguori NLP Training – Session 3
Corpus
Definition
Corpus = Large and structured set of texts.
Dr. Alexandra M. Liguori NLP Training – Session 3
Corpus
Definition
Corpus = Large and structured set of texts.
NLP
Two types of corpora:
Training corpus ↔ to make the list of rules or to get the
statistical data
Test corpus ↔ to test the results found with the training
corpus
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Basic and simpler tasks
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Basic and simpler tasks
Tokenization
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging
POS-tagging algorithms and
tag sets
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Complex tasks
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYK
algorithm, GHR algorithm,
Stanford Parser (Java
implementation of
probabilistic algorithm)
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYK
algorithm, GHR algorithm,
Stanford Parser (Java
implementation of
probabilistic algorithm)
Question answering
Topic extraction
NER
Semantic analysis
...
Dr. Alexandra M. Liguori NLP Training – Session 3
Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of Porter
Stemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYK
algorithm, GHR algorithm,
Stanford Parser (Java
implementation of
probabilistic algorithm)
Question answering
Topic extraction
NER
Semantic analysis
...
Ad hoc tools, e.g.
dictionaries, ontologies,
Frames, GATE, NLTK,
Lappin & Leass algorithm...
Dr. Alexandra M. Liguori NLP Training – Session 3
Question Answering
Video on Bush Jr. and Condoleezza Rice from Who’s on first
Dr. Alexandra M. Liguori NLP Training – Session 3
Question Answering
Dr. Alexandra M. Liguori NLP Training – Session 3
Simple Question Answering
ELIZA
User_1: Men are all alike.
ELIZA_1: IN WHAT WAY
User_2: They’re always bugging us about something or other..
ELIZA_2: CAN YOU THINK OF A SPECIFIC EXAMPLE
User_3: Well, my boyfriend made me come here.
ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE
User_4: Yes, he says I’m depressed much of the time.
ELIZA_4: I AM SORRY TO HEAR YOU ARE DEPRESSED.
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
change all instances of my to YOUR, and I’m to YOU ARE,
etc., e.g.:
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
change all instances of my to YOUR, and I’m to YOU ARE,
etc., e.g.:
1 User_3: Well, my boyfriend made me come here.
ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
change all instances of my to YOUR, and I’m to YOU ARE,
etc., e.g.:
1 User_3: Well, my boyfriend made me come here.
ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE
2 User_4: ... I’m depressed ... .
ELIZA_4: ... YOU ARE DEPRESSED.
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK
YOU ARE 1 /
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK
YOU ARE 1 /
3 s/.* all .*/IN WHAT WAY/
Dr. Alexandra M. Liguori NLP Training – Session 3
ELIZA
Regular expression substitutions
relevant patterns in the input → creat an appropriate
output; e.g.:
1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR
YOU ARE 1 /
2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK
YOU ARE 1 /
3 s/.* all .*/IN WHAT WAY/
4 s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
1) Input
Affirmative sentence, e.g.
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
1) Input
Affirmative sentence, e.g.
Cristiano chuta el balon.
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
1) Input
Affirmative sentence, e.g.
Cristiano chuta el balon.
2) Intermediate output
Parsed text:
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
1) Input
Affirmative sentence, e.g.
Cristiano chuta el balon.
2) Intermediate output
Parsed text:
Cristiano/NPMS000 chuta/VMIS3S0 el/DI0MS0
balon/NCMS000 ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
1) Input
Affirmative sentence, e.g.
Cristiano chuta el balon.
2) Intermediate output
Parsed text:
Cristiano/NPMS000 chuta/VMIS3S0 el/DI0MS0
balon/NCMS000 ./.
Cristiano/SUBJ chuta/VERB [el balon]/DIRECT-OBJ ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ?
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ?
4) Final Output
Automatically generated question as output; e.g.:
Dr. Alexandra M. Liguori NLP Training – Session 3
Quizlyse Example
3) Substitutions
Relevant patterns in the input → create an appropriate output;
e.g.:
1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) .
*/Qué 2 1 ? /
2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ?
4) Final Output
Automatically generated question as output; e.g.:
Qué chuta Cristiano?
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Discourse
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Discourse
Gracie: Oh yeah... And then Mr. and Mrs. Jones were having
matrimonial trouble, and my brother was hired to watch Mrs. Jones.
George: Well, I imagine she was a very attractive woman.
Gracie: She was, and my brother watched her day and night for six
months.
George: Well, what happened?
Gracie: She finally got a divorce.
George: Mrs. Jones?
Gracie: No, my brother’s wife.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Discourse
Gracie: Oh yeah... And then Mr. and Mrs. Jones were having
matrimonial trouble, and my brother was hired to watch Mrs. Jones.
George: Well, I imagine she was a very attractive woman.
Gracie: She was, and my brother watched her day and night for six
months.
George: Well, what happened?
Gracie: She finally got a divorce.
George: Mrs. Jones?
Gracie: No, my brother’s wife.
Jordi se fué al restaurante de Xavi para comer pescado. Este
estaba fresco y le gustó.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
1 Reference phenomena
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
1 Reference phenomena
2 Constraints on coreference
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
1 Reference phenomena
2 Constraints on coreference
3 Preferences in pronoun interpretation
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
1 Reference phenomena
2 Constraints on coreference
3 Preferences in pronoun interpretation
4 Example of algorithm for pronoun resolution
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Reference phenomena
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran
muy dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran
muy dulces.
4 Demonstratives ↔ Pedro hizo unos pasteles: estos son
de chocolate, aquellos son de almendra.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Reference phenomena
1 Indefinite noun phrases ↔ Pedro comió unos pasteles
ayer.
2 Definite noun phrases ↔ Pedro comió unos pasteles
ayer. Los pasteles eran muy dulces.
3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran
muy dulces.
4 Demonstratives ↔ Pedro hizo unos pasteles: estos son
de chocolate, aquellos son de almendra.
5 Anaphora con uno/una/unos/unas ↔ Ayer Pedro hizo
una tarta. Hoy quiero hacer una yo también.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Constraints on coreference
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
3 Gender agreement ↔ La tarta que comí ayer la hizo Ana.
/ La tarta que comí ayer lo hizo Ana.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
3 Gender agreement ↔ La tarta que comí ayer la hizo Ana.
/ La tarta que comí ayer lo hizo Ana.
4 Syntactic constraints ↔ Ana se hizo una tarta. / Ana le
hizo una tarta.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Constraints on coreference
1 Number agreement ↔ Los pasteles que comí ayer los
hizo Ana. / Los pasteles que comí ayer lo hizo Ana.
2 Person and case agreement ↔ Ana y Carmen hicieron
unos pastels. Les gustan.
3 Gender agreement ↔ La tarta que comí ayer la hizo Ana.
/ La tarta que comí ayer lo hizo Ana.
4 Syntactic constraints ↔ Ana se hizo una tarta. / Ana le
hizo una tarta.
5 Selectional restrictions ↔ Ana puso el pastel en el
horno. Es redondo.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Preferences in pronoun interpretation
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
3 Repeated mention ↔ Anne needed a car to drive to her
new job. She decided she wanted something roomy. Carol
went to the Honda dealership with her. She bought a Civic.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
3 Repeated mention ↔ Anne needed a car to drive to her
new job. She decided she wanted something roomy. Carol
went to the Honda dealership with her. She bought a Civic.
4 Parallelism ↔ Pedro llamó Juan por la mañana. Carlos le
llamó por la tarde.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Preferences in pronoun interpretation
1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A
Ana le gusta.
2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se
lo comió todo. / Juan hizo un pastel con Pedro. Él se lo
comió todo.
3 Repeated mention ↔ Anne needed a car to drive to her
new job. She decided she wanted something roomy. Carol
went to the Honda dealership with her. She bought a Civic.
4 Parallelism ↔ Pedro llamó Juan por la mañana. Carlos le
llamó por la tarde.
5 Verb semantics ↔ Pedro hizo un pastel para Juan. Le
gustan los dulces. / Pedro pidió un pastel a Juan. Le
gustan los dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Reference pronoun to noun with highest
weight, otherwise, if there are no pronouns,
divide all weights by 2
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Reference pronoun to noun with highest
weight, otherwise, if there are no pronouns,
divide all weights by 2
Lappin & Leass
algorithm
Dr. Alexandra M. Liguori NLP Training – Session 3
Reference resolution
Algorithm for pronoun resolution (Lappin & Leass, 1994)
Divide discourse into sentences and analyze
one sentence at a time
Sentence
splitting
Tokenization
Parse 1st sentence and identify nouns and
pronouns
POS-tagging
Assign weights to all nouns and pronouns
Lappin & Leass
weights
Reference pronoun to noun with highest
weight, otherwise, if there are no pronouns,
divide all weights by 2
Lappin & Leass
algorithm
Proceed to 2nd sentence and repeat all steps as
above, adding all the weights along the way
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
5 Indirect object emphasis ↔ 40
e.g. Ana regaló el pastel a Carmen.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
5 Indirect object emphasis ↔ 40
e.g. Ana regaló el pastel a Carmen.
6 Non-adverbial emphasis ↔ 50
e.g. Ana puso un poco de chocolate en el pastel.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for pronoun resolution
Weighting scheme ↔ recency and syntactical preferences (Lappin
& Leass, 1994):
1 Sentence recency ↔ 100
2 Subject emphasis ↔ 80
e.g. El pastel está en la mesa de la cocina.
3 Existential emphasis ↔ 70
e.g. Hay un pastel en la mesa de la cocina.
4 Direct object emphasis ↔ 50
e.g. Ana hizo un pastel ayer.
5 Indirect object emphasis ↔ 40
e.g. Ana regaló el pastel a Carmen.
6 Non-adverbial emphasis ↔ 50
e.g. Ana puso un poco de chocolate en el pastel.
7 Head noun emphasis ↔ 80
e.g. El libro de recetas para el pastel de chocolate está en la
mesa de la cocina.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
1 Take first sentence: Pedro se comió una tarta de chocolate.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
1 Take first sentence: Pedro se comió una tarta de chocolate.
2 Parse this first sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 1
1 Take first sentence: Pedro se comió una tarta de chocolate.
2 Parse this first sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
3 Calculate weights for all nouns and pronouns appearing in this
first sentence:
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Weights for the nouns and pronouns from the first
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Pedro 100 80 0 0 0 50 80 310
tarta 100 0 0 50 0 50 80 280
chocolate 100 0 0 0 0 0 80 180
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Weights for the nouns and pronouns from the first
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Pedro 100 80 0 0 0 50 80 310
tarta 100 0 0 50 0 50 80 280
chocolate 100 0 0 0 0 0 80 180
No pronouns whose reference needs to be resolved →
divide all the results by 2:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
1 Take second sentence: Él se la había pedido a Juan.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
1 Take second sentence: Él se la había pedido a Juan.
2 Parse this second sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 2
1 Take second sentence: Él se la había pedido a Juan.
2 Parse this second sentence → parsing result:
Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0
una/DI0FS0 tarta/NCFS000 de/SPS00
chocolate/NCMS000 ./.
Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ
de chocolate/COMPL ./.
3 Calculate weights for all new nouns and pronouns appearing in
this second sentence:
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Weights for the nouns and pronouns from the second
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Él 100 80 0 0 0 50 80 310
la 100 0 0 50 0 50 80 280
Juan 100 0 0 0 40 50 80 270
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Weights for the nouns and pronouns from the second
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Él 100 80 0 0 0 50 80 310
la 100 0 0 50 0 50 80 280
Juan 100 0 0 0 40 50 80 270
The two pronouns Él and la have to be referred to nouns
from the first sentence
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Results from first two sentences:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
(PRO)NOUNS TOT.
Él 310
la 280
Juan 220
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Results from first two sentences:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
(PRO)NOUNS TOT.
Él 310
la 280
Juan 220
1 pronoun la is referred to noun tarta because of gender
constraints (i.e. only feminines here)
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Results from first two sentences:
(PRO)NOUNS TOT.
Pedro 310/2 = 155
tarta 280/2 = 140
chocolate 180/2 = 90
(PRO)NOUNS TOT.
Él 310
la 280
Juan 220
1 pronoun la is referred to noun tarta because of gender
constraints (i.e. only feminines here)
2 pronoun Él is referred to the noun from the previous
sentence with the highest value, i.e. Pedro
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Combined results of reference from first two sentences:
(PRO)NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
1 Take third sentence: Le gustan los dulces.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
1 Take third sentence: Le gustan los dulces.
2 Parse this third sentence → parsing result:
Le/PP3CSD00 gustan/VMII3P0 los/DA0MP0
dulces/NCMP000 ./.
Le/IND-OBJ gustan/VERB [los dulces]/SUBJ ./.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Example discourse
Pedro se comió una tarta de chocolate. Él se la había pedido a
Juan. Le gustan los dulces.
Step 3
1 Take third sentence: Le gustan los dulces.
2 Parse this third sentence → parsing result:
Le/PP3CSD00 gustan/VMII3P0 los/DA0MP0
dulces/NCMP000 ./.
Le/IND-OBJ gustan/VERB [los dulces]/SUBJ ./.
3 Calculate weights for all new nouns and pronouns appearing in
this third sentence:
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Weights for the nouns and pronouns from the third
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Le 100 0 0 0 50 50 80 280
dulces 100 100 0 0 0 50 80 330
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Weights for the nouns and pronouns from the third
sentence:
(PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT.
Le 100 0 0 0 50 50 80 280
dulces 100 100 0 0 0 50 80 330
There is only the pronoun Le that needs to be referred to a
previous noun...
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
we refer Le to previous noun with highest weight, i.e.
Pedro
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
we refer Le to previous noun with highest weight, i.e.
Pedro
referencing is completed!!!
Dr. Alexandra M. Liguori NLP Training – Session 3
Algorithm for reference resolution
Combined results of reference from first two sentences:
NOUNS TOT.
Pedro + Él (155+310)/2 = 232.5
tarta + la 140+280/2 = 210
chocolate 180/2 = 90
Juan 220/2 = 110
Singular masculine or feminine pronoun Le could be
referred to all singular, masculine and feminine, nouns:
Pedro, Juan, tarta, or chocolate.
we refer Le to previous noun with highest weight, i.e.
Pedro
referencing is completed!!!
Lappin & Leass algorithm has nearly 90% accuracy.
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.)
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.)
3 location (e.g. ”Madrid”, "Washington D.C.”, ”L.A.”,
”Barcelona”, etc.)
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Named Entity Recognition
Can be broken down in two distinct problems, i.e.:
1 detection of names
2 classification of the names by the type of entity to which
they refer → 4 standard types:
1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”,
etc.)
2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.)
3 location (e.g. ”Madrid”, "Washington D.C.”, ”L.A.”,
”Barcelona”, etc.)
4 other (e.g. ”Hotel Catalunya”, etc. )
Dr. Alexandra M. Liguori NLP Training – Session 3
NER
Tools for Named Entity Recognition
GATE for English, Spanish, and many more, via graphical
interface and Java API (development at the University of
Sheffield, UK)
https://gate.ac.uk/
NETagger: Java based Illinois Named Entity Recognition
(development by Cognitive Computation Group at University of
Illinois at Urbana - Champaign)
http://cogcomp.cs.illinois.edu/page/software_view/NETagger
OpenNLP: rule based and statistical Named Entity Recognition
(development by Apache)
http://opennlp.apache.org/index.html
Stanford CoreNLP: Java-based CRF Named Entity Recognition
(development by Stanford Natural Language Processing Group)
http://nlp.stanford.edu/software/CRF-NER.shtml
Dr. Alexandra M. Liguori NLP Training – Session 3
Keyword / topic / information extraction
Dr. Alexandra M. Liguori NLP Training – Session 3
Keyword / topic / information extraction
Tools
Dr. Alexandra M. Liguori NLP Training – Session 3
Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
Dr. Alexandra M. Liguori NLP Training – Session 3
Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
1 GATE (ANNIE tool) for English, Spanish, and many more,
via graphical interface and Java API
https://gate.ac.uk/
→ simply using jape files for the LUs
Dr. Alexandra M. Liguori NLP Training – Session 3
Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
1 GATE (ANNIE tool) for English, Spanish, and many more,
via graphical interface and Java API
https://gate.ac.uk/
→ simply using jape files for the LUs
2 pattern.vector module from CLiPS in Python
http://www.clips.ua.ac.be/pages/luceneapi_node/pattern.vector
Dr. Alexandra M. Liguori NLP Training – Session 3
Keyword / topic / information extraction
Tools
Keyword extraction: e.g.
1 GATE (ANNIE tool) for English, Spanish, and many more,
via graphical interface and Java API
https://gate.ac.uk/
→ simply using jape files for the LUs
2 pattern.vector module from CLiPS in Python
http://www.clips.ua.ac.be/pages/luceneapi_node/pattern.vector
Topic / information extraction: e.g. GATE (ANNIE tool)
for English, Spanish, and many more, via graphical
interface and Java API
→ using jape files for the LUs, FEs, and FRAMES
Dr. Alexandra M. Liguori NLP Training – Session 3
What next?
Another practical session on GATE this summer?
Dr. Alexandra M. Liguori NLP Training – Session 3

More Related Content

Viewers also liked

Advanced Communications Using NLP Methods
Advanced Communications Using NLP MethodsAdvanced Communications Using NLP Methods
Advanced Communications Using NLP Methods
Dr.Arivalan Ramaiyah
 
Neuro Linguistic Programming
Neuro Linguistic ProgrammingNeuro Linguistic Programming
Neuro Linguistic Programming
smjk
 
NLP& Bigdata. Motivation and Action
NLP& Bigdata. Motivation and ActionNLP& Bigdata. Motivation and Action
NLP& Bigdata. Motivation and Action
Sarath P R
 
Intro to NLP
Intro to NLPIntro to NLP
Intro to NLP
Alwyn Lau
 

Viewers also liked (20)

Nlp lotus
Nlp  lotusNlp  lotus
Nlp lotus
 
Dorset NLP Forum May 2012 - Evolution
Dorset NLP Forum May 2012 - EvolutionDorset NLP Forum May 2012 - Evolution
Dorset NLP Forum May 2012 - Evolution
 
NLP (Neurolingusitic Programming for IT Professionals)
NLP (Neurolingusitic Programming for IT Professionals)NLP (Neurolingusitic Programming for IT Professionals)
NLP (Neurolingusitic Programming for IT Professionals)
 
Why Now Is The Time For NLP
Why Now Is The Time For NLPWhy Now Is The Time For NLP
Why Now Is The Time For NLP
 
Advanced Communications Using NLP Methods
Advanced Communications Using NLP MethodsAdvanced Communications Using NLP Methods
Advanced Communications Using NLP Methods
 
Intorduction to Neuro Linguistic Programming (NLP)
Intorduction to Neuro Linguistic Programming (NLP)Intorduction to Neuro Linguistic Programming (NLP)
Intorduction to Neuro Linguistic Programming (NLP)
 
Neuro Linguistic Programming
Neuro Linguistic ProgrammingNeuro Linguistic Programming
Neuro Linguistic Programming
 
Named Entities
Named EntitiesNamed Entities
Named Entities
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Mobile App Testing
Mobile App TestingMobile App Testing
Mobile App Testing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP& Bigdata. Motivation and Action
NLP& Bigdata. Motivation and ActionNLP& Bigdata. Motivation and Action
NLP& Bigdata. Motivation and Action
 
NLP in English
NLP in EnglishNLP in English
NLP in English
 
Intro To NlP
Intro To NlPIntro To NlP
Intro To NlP
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
The Truth About Nlp & Hypnosis
The Truth About Nlp & HypnosisThe Truth About Nlp & Hypnosis
The Truth About Nlp & Hypnosis
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Personality developer
Personality developerPersonality developer
Personality developer
 
Intro to NLP
Intro to NLPIntro to NLP
Intro to NLP
 

Similar to NLP_session-3_Alexandra (11)

Lessons 19 & 20 - Bravo - Pass.docx
Lessons 19 & 20 - Bravo - Pass.docxLessons 19 & 20 - Bravo - Pass.docx
Lessons 19 & 20 - Bravo - Pass.docx
 
SPEECH ACTS_ORAL COM.pptx
SPEECH ACTS_ORAL COM.pptxSPEECH ACTS_ORAL COM.pptx
SPEECH ACTS_ORAL COM.pptx
 
Sdl lesson plan n°3
Sdl lesson plan n°3 Sdl lesson plan n°3
Sdl lesson plan n°3
 
Alcantara_Interview.pptx
Alcantara_Interview.pptxAlcantara_Interview.pptx
Alcantara_Interview.pptx
 
Lsat study session 1 nov. 3
Lsat study session 1   nov. 3Lsat study session 1   nov. 3
Lsat study session 1 nov. 3
 
Lsat study session 1 nov. 3
Lsat study session 1   nov. 3Lsat study session 1   nov. 3
Lsat study session 1 nov. 3
 
Teaching reading for first certificate
Teaching reading for first certificateTeaching reading for first certificate
Teaching reading for first certificate
 
John+field+presentation
John+field+presentationJohn+field+presentation
John+field+presentation
 
Parte 1
Parte 1Parte 1
Parte 1
 
PC Troubleshooting - Module 1
PC Troubleshooting - Module 1PC Troubleshooting - Module 1
PC Troubleshooting - Module 1
 
G8 MATHEMATICS REASONING MATHEMATICS.docx
G8 MATHEMATICS REASONING MATHEMATICS.docxG8 MATHEMATICS REASONING MATHEMATICS.docx
G8 MATHEMATICS REASONING MATHEMATICS.docx
 

More from Alexandra M. Liguori, Ph.D. (6)

AlexandraLiguori_CogniCor_Talk_UPC
AlexandraLiguori_CogniCor_Talk_UPCAlexandraLiguori_CogniCor_Talk_UPC
AlexandraLiguori_CogniCor_Talk_UPC
 
PHD_Final_exam_AlexandraM_Liguori
PHD_Final_exam_AlexandraM_LiguoriPHD_Final_exam_AlexandraM_Liguori
PHD_Final_exam_AlexandraM_Liguori
 
DPG_Talk_March2011_AlexandraM_Liguori
DPG_Talk_March2011_AlexandraM_LiguoriDPG_Talk_March2011_AlexandraM_Liguori
DPG_Talk_March2011_AlexandraM_Liguori
 
QuantumBiology_AlexandraM_Liguori
QuantumBiology_AlexandraM_LiguoriQuantumBiology_AlexandraM_Liguori
QuantumBiology_AlexandraM_Liguori
 
Benasque_Sept2010_AlexandraM_Liguori
Benasque_Sept2010_AlexandraM_LiguoriBenasque_Sept2010_AlexandraM_Liguori
Benasque_Sept2010_AlexandraM_Liguori
 
Quantum_Mechanics_Intro_AlexandraM_Liguori
Quantum_Mechanics_Intro_AlexandraM_LiguoriQuantum_Mechanics_Intro_AlexandraM_Liguori
Quantum_Mechanics_Intro_AlexandraM_Liguori
 

NLP_session-3_Alexandra

  • 1. NLP Training – Session 3 Dr. Alexandra M. Liguori Incubio – The Big Data Academy Barcelona, April 22, 2015 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 2. Welcome back!!! Dr. Alexandra M. Liguori NLP Training – Session 3
  • 3. Outline 1 Clarification about corpus 2 Recap: Typical NLP tasks 3 Automatic Question Answering 4 Reference resolution 5 Named Entity Recognition (NER) 6 Keyword / topic / information extraction Dr. Alexandra M. Liguori NLP Training – Session 3
  • 4. NLP: Ambiguities and Solutions Dr. Alexandra M. Liguori NLP Training – Session 3
  • 5. NLP: Ambiguities and Solutions Dr. Alexandra M. Liguori NLP Training – Session 3
  • 6. Corpus Definition Corpus = Large and structured set of texts. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 7. Corpus Definition Corpus = Large and structured set of texts. NLP Two types of corpora: Training corpus ↔ to make the list of rules or to get the statistical data Test corpus ↔ to test the results found with the training corpus Dr. Alexandra M. Liguori NLP Training – Session 3
  • 8. Typical NLP tasks: Basic and simpler tasks Dr. Alexandra M. Liguori NLP Training – Session 3
  • 9. Typical NLP tasks: Basic and simpler tasks Tokenization Dr. Alexandra M. Liguori NLP Training – Session 3
  • 10. Typical NLP tasks: Basic and simpler tasks Tokenization RegEx Dr. Alexandra M. Liguori NLP Training – Session 3
  • 11. Typical NLP tasks: Basic and simpler tasks Tokenization RegEx Sentence splitting Dr. Alexandra M. Liguori NLP Training – Session 3
  • 12. Typical NLP tasks: Basic and simpler tasks Tokenization RegEx Sentence splitting RegEx Dr. Alexandra M. Liguori NLP Training – Session 3
  • 13. Typical NLP tasks: Basic and simpler tasks Tokenization RegEx Sentence splitting RegEx POS-tagging Dr. Alexandra M. Liguori NLP Training – Session 3
  • 14. Typical NLP tasks: Basic and simpler tasks Tokenization RegEx Sentence splitting RegEx POS-tagging POS-tagging algorithms and tag sets Dr. Alexandra M. Liguori NLP Training – Session 3
  • 15. Typical NLP tasks: Complex tasks Dr. Alexandra M. Liguori NLP Training – Session 3
  • 16. Typical NLP tasks: Complex tasks Lemmatization or Stemming Dr. Alexandra M. Liguori NLP Training – Session 3
  • 17. Typical NLP tasks: Complex tasks Lemmatization or Stemming Implementations of Porter Stemmer (e.g. in Java), Stanford NLP tool, GATE, ... Dr. Alexandra M. Liguori NLP Training – Session 3
  • 18. Typical NLP tasks: Complex tasks Lemmatization or Stemming Implementations of Porter Stemmer (e.g. in Java), Stanford NLP tool, GATE, ... Syntactic parsing Dr. Alexandra M. Liguori NLP Training – Session 3
  • 19. Typical NLP tasks: Complex tasks Lemmatization or Stemming Implementations of Porter Stemmer (e.g. in Java), Stanford NLP tool, GATE, ... Syntactic parsing Early algorithm, CYK algorithm, GHR algorithm, Stanford Parser (Java implementation of probabilistic algorithm) Dr. Alexandra M. Liguori NLP Training – Session 3
  • 20. Typical NLP tasks: Complex tasks Lemmatization or Stemming Implementations of Porter Stemmer (e.g. in Java), Stanford NLP tool, GATE, ... Syntactic parsing Early algorithm, CYK algorithm, GHR algorithm, Stanford Parser (Java implementation of probabilistic algorithm) Question answering Topic extraction NER Semantic analysis ... Dr. Alexandra M. Liguori NLP Training – Session 3
  • 21. Typical NLP tasks: Complex tasks Lemmatization or Stemming Implementations of Porter Stemmer (e.g. in Java), Stanford NLP tool, GATE, ... Syntactic parsing Early algorithm, CYK algorithm, GHR algorithm, Stanford Parser (Java implementation of probabilistic algorithm) Question answering Topic extraction NER Semantic analysis ... Ad hoc tools, e.g. dictionaries, ontologies, Frames, GATE, NLTK, Lappin & Leass algorithm... Dr. Alexandra M. Liguori NLP Training – Session 3
  • 22. Question Answering Video on Bush Jr. and Condoleezza Rice from Who’s on first Dr. Alexandra M. Liguori NLP Training – Session 3
  • 23. Question Answering Dr. Alexandra M. Liguori NLP Training – Session 3
  • 24. Simple Question Answering ELIZA User_1: Men are all alike. ELIZA_1: IN WHAT WAY User_2: They’re always bugging us about something or other.. ELIZA_2: CAN YOU THINK OF A SPECIFIC EXAMPLE User_3: Well, my boyfriend made me come here. ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE User_4: Yes, he says I’m depressed much of the time. ELIZA_4: I AM SORRY TO HEAR YOU ARE DEPRESSED. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 25. ELIZA Regular expression substitutions change all instances of my to YOUR, and I’m to YOU ARE, etc., e.g.: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 26. ELIZA Regular expression substitutions change all instances of my to YOUR, and I’m to YOU ARE, etc., e.g.: 1 User_3: Well, my boyfriend made me come here. ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE Dr. Alexandra M. Liguori NLP Training – Session 3
  • 27. ELIZA Regular expression substitutions change all instances of my to YOUR, and I’m to YOU ARE, etc., e.g.: 1 User_3: Well, my boyfriend made me come here. ELIZA_3: YOUR BOYFRIEND MADE YOU COME HERE 2 User_4: ... I’m depressed ... . ELIZA_4: ... YOU ARE DEPRESSED. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 28. ELIZA Regular expression substitutions relevant patterns in the input → creat an appropriate output; e.g.: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 29. ELIZA Regular expression substitutions relevant patterns in the input → creat an appropriate output; e.g.: 1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR YOU ARE 1 / Dr. Alexandra M. Liguori NLP Training – Session 3
  • 30. ELIZA Regular expression substitutions relevant patterns in the input → creat an appropriate output; e.g.: 1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR YOU ARE 1 / 2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK YOU ARE 1 / Dr. Alexandra M. Liguori NLP Training – Session 3
  • 31. ELIZA Regular expression substitutions relevant patterns in the input → creat an appropriate output; e.g.: 1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR YOU ARE 1 / 2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK YOU ARE 1 / 3 s/.* all .*/IN WHAT WAY/ Dr. Alexandra M. Liguori NLP Training – Session 3
  • 32. ELIZA Regular expression substitutions relevant patterns in the input → creat an appropriate output; e.g.: 1 s/.* YOU ARE (depressed | sad) .*/I AM SORRY TO HEAR YOU ARE 1 / 2 s/.* YOU ARE (depressed | sad) .*/WHY DO YOU THINK YOU ARE 1 / 3 s/.* all .*/IN WHAT WAY/ 4 s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/ Dr. Alexandra M. Liguori NLP Training – Session 3
  • 33. Quizlyse Example Dr. Alexandra M. Liguori NLP Training – Session 3
  • 34. Quizlyse Example Dr. Alexandra M. Liguori NLP Training – Session 3
  • 35. Quizlyse Example 1) Input Affirmative sentence, e.g. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 36. Quizlyse Example 1) Input Affirmative sentence, e.g. Cristiano chuta el balon. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 37. Quizlyse Example 1) Input Affirmative sentence, e.g. Cristiano chuta el balon. 2) Intermediate output Parsed text: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 38. Quizlyse Example 1) Input Affirmative sentence, e.g. Cristiano chuta el balon. 2) Intermediate output Parsed text: Cristiano/NPMS000 chuta/VMIS3S0 el/DI0MS0 balon/NCMS000 ./. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 39. Quizlyse Example 1) Input Affirmative sentence, e.g. Cristiano chuta el balon. 2) Intermediate output Parsed text: Cristiano/NPMS000 chuta/VMIS3S0 el/DI0MS0 balon/NCMS000 ./. Cristiano/SUBJ chuta/VERB [el balon]/DIRECT-OBJ ./. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 40. Quizlyse Example Dr. Alexandra M. Liguori NLP Training – Session 3
  • 41. Quizlyse Example 3) Substitutions Relevant patterns in the input → create an appropriate output; e.g.: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 42. Quizlyse Example 3) Substitutions Relevant patterns in the input → create an appropriate output; e.g.: 1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) . */Qué 2 1 ? / Dr. Alexandra M. Liguori NLP Training – Session 3
  • 43. Quizlyse Example 3) Substitutions Relevant patterns in the input → create an appropriate output; e.g.: 1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) . */Qué 2 1 ? / 2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ? Dr. Alexandra M. Liguori NLP Training – Session 3
  • 44. Quizlyse Example 3) Substitutions Relevant patterns in the input → create an appropriate output; e.g.: 1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) . */Qué 2 1 ? / 2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ? 4) Final Output Automatically generated question as output; e.g.: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 45. Quizlyse Example 3) Substitutions Relevant patterns in the input → create an appropriate output; e.g.: 1 s/.* (NPMS000) (VMIS3S0) (DI0MS0 NCMS000) . */Qué 2 1 ? / 2 SUBJ VERB DIRECT-OBJ → Qué VERB SUBJ ? 4) Final Output Automatically generated question as output; e.g.: Qué chuta Cristiano? Dr. Alexandra M. Liguori NLP Training – Session 3
  • 46. Reference resolution Discourse Dr. Alexandra M. Liguori NLP Training – Session 3
  • 47. Reference resolution Discourse Gracie: Oh yeah... And then Mr. and Mrs. Jones were having matrimonial trouble, and my brother was hired to watch Mrs. Jones. George: Well, I imagine she was a very attractive woman. Gracie: She was, and my brother watched her day and night for six months. George: Well, what happened? Gracie: She finally got a divorce. George: Mrs. Jones? Gracie: No, my brother’s wife. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 48. Reference resolution Discourse Gracie: Oh yeah... And then Mr. and Mrs. Jones were having matrimonial trouble, and my brother was hired to watch Mrs. Jones. George: Well, I imagine she was a very attractive woman. Gracie: She was, and my brother watched her day and night for six months. George: Well, what happened? Gracie: She finally got a divorce. George: Mrs. Jones? Gracie: No, my brother’s wife. Jordi se fué al restaurante de Xavi para comer pescado. Este estaba fresco y le gustó. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 49. Reference resolution Dr. Alexandra M. Liguori NLP Training – Session 3
  • 50. Reference resolution 1 Reference phenomena Dr. Alexandra M. Liguori NLP Training – Session 3
  • 51. Reference resolution 1 Reference phenomena 2 Constraints on coreference Dr. Alexandra M. Liguori NLP Training – Session 3
  • 52. Reference resolution 1 Reference phenomena 2 Constraints on coreference 3 Preferences in pronoun interpretation Dr. Alexandra M. Liguori NLP Training – Session 3
  • 53. Reference resolution 1 Reference phenomena 2 Constraints on coreference 3 Preferences in pronoun interpretation 4 Example of algorithm for pronoun resolution Dr. Alexandra M. Liguori NLP Training – Session 3
  • 54. Reference resolution Reference phenomena Dr. Alexandra M. Liguori NLP Training – Session 3
  • 55. Reference resolution Reference phenomena 1 Indefinite noun phrases ↔ Pedro comió unos pasteles ayer. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 56. Reference resolution Reference phenomena 1 Indefinite noun phrases ↔ Pedro comió unos pasteles ayer. 2 Definite noun phrases ↔ Pedro comió unos pasteles ayer. Los pasteles eran muy dulces. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 57. Reference resolution Reference phenomena 1 Indefinite noun phrases ↔ Pedro comió unos pasteles ayer. 2 Definite noun phrases ↔ Pedro comió unos pasteles ayer. Los pasteles eran muy dulces. 3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran muy dulces. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 58. Reference resolution Reference phenomena 1 Indefinite noun phrases ↔ Pedro comió unos pasteles ayer. 2 Definite noun phrases ↔ Pedro comió unos pasteles ayer. Los pasteles eran muy dulces. 3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran muy dulces. 4 Demonstratives ↔ Pedro hizo unos pasteles: estos son de chocolate, aquellos son de almendra. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 59. Reference resolution Reference phenomena 1 Indefinite noun phrases ↔ Pedro comió unos pasteles ayer. 2 Definite noun phrases ↔ Pedro comió unos pasteles ayer. Los pasteles eran muy dulces. 3 Pronouns ↔ Ayer Pedro comió unos pasteles que eran muy dulces. 4 Demonstratives ↔ Pedro hizo unos pasteles: estos son de chocolate, aquellos son de almendra. 5 Anaphora con uno/una/unos/unas ↔ Ayer Pedro hizo una tarta. Hoy quiero hacer una yo también. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 60. Reference resolution Constraints on coreference Dr. Alexandra M. Liguori NLP Training – Session 3
  • 61. Reference resolution Constraints on coreference 1 Number agreement ↔ Los pasteles que comí ayer los hizo Ana. / Los pasteles que comí ayer lo hizo Ana. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 62. Reference resolution Constraints on coreference 1 Number agreement ↔ Los pasteles que comí ayer los hizo Ana. / Los pasteles que comí ayer lo hizo Ana. 2 Person and case agreement ↔ Ana y Carmen hicieron unos pastels. Les gustan. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 63. Reference resolution Constraints on coreference 1 Number agreement ↔ Los pasteles que comí ayer los hizo Ana. / Los pasteles que comí ayer lo hizo Ana. 2 Person and case agreement ↔ Ana y Carmen hicieron unos pastels. Les gustan. 3 Gender agreement ↔ La tarta que comí ayer la hizo Ana. / La tarta que comí ayer lo hizo Ana. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 64. Reference resolution Constraints on coreference 1 Number agreement ↔ Los pasteles que comí ayer los hizo Ana. / Los pasteles que comí ayer lo hizo Ana. 2 Person and case agreement ↔ Ana y Carmen hicieron unos pastels. Les gustan. 3 Gender agreement ↔ La tarta que comí ayer la hizo Ana. / La tarta que comí ayer lo hizo Ana. 4 Syntactic constraints ↔ Ana se hizo una tarta. / Ana le hizo una tarta. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 65. Reference resolution Constraints on coreference 1 Number agreement ↔ Los pasteles que comí ayer los hizo Ana. / Los pasteles que comí ayer lo hizo Ana. 2 Person and case agreement ↔ Ana y Carmen hicieron unos pastels. Les gustan. 3 Gender agreement ↔ La tarta que comí ayer la hizo Ana. / La tarta que comí ayer lo hizo Ana. 4 Syntactic constraints ↔ Ana se hizo una tarta. / Ana le hizo una tarta. 5 Selectional restrictions ↔ Ana puso el pastel en el horno. Es redondo. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 66. Reference resolution Preferences in pronoun interpretation Dr. Alexandra M. Liguori NLP Training – Session 3
  • 67. Reference resolution Preferences in pronoun interpretation 1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A Ana le gusta. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 68. Reference resolution Preferences in pronoun interpretation 1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A Ana le gusta. 2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se lo comió todo. / Juan hizo un pastel con Pedro. Él se lo comió todo. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 69. Reference resolution Preferences in pronoun interpretation 1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A Ana le gusta. 2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se lo comió todo. / Juan hizo un pastel con Pedro. Él se lo comió todo. 3 Repeated mention ↔ Anne needed a car to drive to her new job. She decided she wanted something roomy. Carol went to the Honda dealership with her. She bought a Civic. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 70. Reference resolution Preferences in pronoun interpretation 1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A Ana le gusta. 2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se lo comió todo. / Juan hizo un pastel con Pedro. Él se lo comió todo. 3 Repeated mention ↔ Anne needed a car to drive to her new job. She decided she wanted something roomy. Carol went to the Honda dealership with her. She bought a Civic. 4 Parallelism ↔ Pedro llamó Juan por la mañana. Carlos le llamó por la tarde. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 71. Reference resolution Preferences in pronoun interpretation 1 Recency ↔ Pedro hizo un pastel. Juan hizo una tarta. A Ana le gusta. 2 Grammatical role ↔ Pedro hizo un pastel con Juan. Él se lo comió todo. / Juan hizo un pastel con Pedro. Él se lo comió todo. 3 Repeated mention ↔ Anne needed a car to drive to her new job. She decided she wanted something roomy. Carol went to the Honda dealership with her. She bought a Civic. 4 Parallelism ↔ Pedro llamó Juan por la mañana. Carlos le llamó por la tarde. 5 Verb semantics ↔ Pedro hizo un pastel para Juan. Le gustan los dulces. / Pedro pidió un pastel a Juan. Le gustan los dulces. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 72. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Dr. Alexandra M. Liguori NLP Training – Session 3
  • 73. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Dr. Alexandra M. Liguori NLP Training – Session 3
  • 74. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Dr. Alexandra M. Liguori NLP Training – Session 3
  • 75. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Parse 1st sentence and identify nouns and pronouns Dr. Alexandra M. Liguori NLP Training – Session 3
  • 76. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Parse 1st sentence and identify nouns and pronouns POS-tagging Dr. Alexandra M. Liguori NLP Training – Session 3
  • 77. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Parse 1st sentence and identify nouns and pronouns POS-tagging Assign weights to all nouns and pronouns Dr. Alexandra M. Liguori NLP Training – Session 3
  • 78. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Parse 1st sentence and identify nouns and pronouns POS-tagging Assign weights to all nouns and pronouns Lappin & Leass weights Dr. Alexandra M. Liguori NLP Training – Session 3
  • 79. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Parse 1st sentence and identify nouns and pronouns POS-tagging Assign weights to all nouns and pronouns Lappin & Leass weights Reference pronoun to noun with highest weight, otherwise, if there are no pronouns, divide all weights by 2 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 80. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Parse 1st sentence and identify nouns and pronouns POS-tagging Assign weights to all nouns and pronouns Lappin & Leass weights Reference pronoun to noun with highest weight, otherwise, if there are no pronouns, divide all weights by 2 Lappin & Leass algorithm Dr. Alexandra M. Liguori NLP Training – Session 3
  • 81. Reference resolution Algorithm for pronoun resolution (Lappin & Leass, 1994) Divide discourse into sentences and analyze one sentence at a time Sentence splitting Tokenization Parse 1st sentence and identify nouns and pronouns POS-tagging Assign weights to all nouns and pronouns Lappin & Leass weights Reference pronoun to noun with highest weight, otherwise, if there are no pronouns, divide all weights by 2 Lappin & Leass algorithm Proceed to 2nd sentence and repeat all steps as above, adding all the weights along the way Dr. Alexandra M. Liguori NLP Training – Session 3
  • 82. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): Dr. Alexandra M. Liguori NLP Training – Session 3
  • 83. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): 1 Sentence recency ↔ 100 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 84. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): 1 Sentence recency ↔ 100 2 Subject emphasis ↔ 80 e.g. El pastel está en la mesa de la cocina. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 85. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): 1 Sentence recency ↔ 100 2 Subject emphasis ↔ 80 e.g. El pastel está en la mesa de la cocina. 3 Existential emphasis ↔ 70 e.g. Hay un pastel en la mesa de la cocina. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 86. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): 1 Sentence recency ↔ 100 2 Subject emphasis ↔ 80 e.g. El pastel está en la mesa de la cocina. 3 Existential emphasis ↔ 70 e.g. Hay un pastel en la mesa de la cocina. 4 Direct object emphasis ↔ 50 e.g. Ana hizo un pastel ayer. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 87. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): 1 Sentence recency ↔ 100 2 Subject emphasis ↔ 80 e.g. El pastel está en la mesa de la cocina. 3 Existential emphasis ↔ 70 e.g. Hay un pastel en la mesa de la cocina. 4 Direct object emphasis ↔ 50 e.g. Ana hizo un pastel ayer. 5 Indirect object emphasis ↔ 40 e.g. Ana regaló el pastel a Carmen. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 88. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): 1 Sentence recency ↔ 100 2 Subject emphasis ↔ 80 e.g. El pastel está en la mesa de la cocina. 3 Existential emphasis ↔ 70 e.g. Hay un pastel en la mesa de la cocina. 4 Direct object emphasis ↔ 50 e.g. Ana hizo un pastel ayer. 5 Indirect object emphasis ↔ 40 e.g. Ana regaló el pastel a Carmen. 6 Non-adverbial emphasis ↔ 50 e.g. Ana puso un poco de chocolate en el pastel. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 89. Algorithm for pronoun resolution Weighting scheme ↔ recency and syntactical preferences (Lappin & Leass, 1994): 1 Sentence recency ↔ 100 2 Subject emphasis ↔ 80 e.g. El pastel está en la mesa de la cocina. 3 Existential emphasis ↔ 70 e.g. Hay un pastel en la mesa de la cocina. 4 Direct object emphasis ↔ 50 e.g. Ana hizo un pastel ayer. 5 Indirect object emphasis ↔ 40 e.g. Ana regaló el pastel a Carmen. 6 Non-adverbial emphasis ↔ 50 e.g. Ana puso un poco de chocolate en el pastel. 7 Head noun emphasis ↔ 80 e.g. El libro de recetas para el pastel de chocolate está en la mesa de la cocina. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 90. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 1 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 91. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 1 1 Take first sentence: Pedro se comió una tarta de chocolate. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 92. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 1 1 Take first sentence: Pedro se comió una tarta de chocolate. 2 Parse this first sentence → parsing result: Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0 una/DI0FS0 tarta/NCFS000 de/SPS00 chocolate/NCMS000 ./. Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ de chocolate/COMPL ./. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 93. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 1 1 Take first sentence: Pedro se comió una tarta de chocolate. 2 Parse this first sentence → parsing result: Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0 una/DI0FS0 tarta/NCFS000 de/SPS00 chocolate/NCMS000 ./. Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ de chocolate/COMPL ./. 3 Calculate weights for all nouns and pronouns appearing in this first sentence: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 94. Algorithm for reference resolution Weights for the nouns and pronouns from the first sentence: (PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT. Pedro 100 80 0 0 0 50 80 310 tarta 100 0 0 50 0 50 80 280 chocolate 100 0 0 0 0 0 80 180 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 95. Algorithm for reference resolution Weights for the nouns and pronouns from the first sentence: (PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT. Pedro 100 80 0 0 0 50 80 310 tarta 100 0 0 50 0 50 80 280 chocolate 100 0 0 0 0 0 80 180 No pronouns whose reference needs to be resolved → divide all the results by 2: (PRO)NOUNS TOT. Pedro 310/2 = 155 tarta 280/2 = 140 chocolate 180/2 = 90 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 96. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 2 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 97. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 2 1 Take second sentence: Él se la había pedido a Juan. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 98. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 2 1 Take second sentence: Él se la había pedido a Juan. 2 Parse this second sentence → parsing result: Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0 una/DI0FS0 tarta/NCFS000 de/SPS00 chocolate/NCMS000 ./. Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ de chocolate/COMPL ./. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 99. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 2 1 Take second sentence: Él se la había pedido a Juan. 2 Parse this second sentence → parsing result: Pedro/NP000P0 se/PP3CN000 comió/VMIS3S0 una/DI0FS0 tarta/NCFS000 de/SPS00 chocolate/NCMS000 ./. Pedro/SUBJ [se comió]/VERB [una tarta]/OBJ de chocolate/COMPL ./. 3 Calculate weights for all new nouns and pronouns appearing in this second sentence: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 100. Algorithm for reference resolution Weights for the nouns and pronouns from the second sentence: (PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT. Él 100 80 0 0 0 50 80 310 la 100 0 0 50 0 50 80 280 Juan 100 0 0 0 40 50 80 270 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 101. Algorithm for reference resolution Weights for the nouns and pronouns from the second sentence: (PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT. Él 100 80 0 0 0 50 80 310 la 100 0 0 50 0 50 80 280 Juan 100 0 0 0 40 50 80 270 The two pronouns Él and la have to be referred to nouns from the first sentence Dr. Alexandra M. Liguori NLP Training – Session 3
  • 102. Algorithm for reference resolution Results from first two sentences: (PRO)NOUNS TOT. Pedro 310/2 = 155 tarta 280/2 = 140 chocolate 180/2 = 90 (PRO)NOUNS TOT. Él 310 la 280 Juan 220 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 103. Algorithm for reference resolution Results from first two sentences: (PRO)NOUNS TOT. Pedro 310/2 = 155 tarta 280/2 = 140 chocolate 180/2 = 90 (PRO)NOUNS TOT. Él 310 la 280 Juan 220 1 pronoun la is referred to noun tarta because of gender constraints (i.e. only feminines here) Dr. Alexandra M. Liguori NLP Training – Session 3
  • 104. Algorithm for reference resolution Results from first two sentences: (PRO)NOUNS TOT. Pedro 310/2 = 155 tarta 280/2 = 140 chocolate 180/2 = 90 (PRO)NOUNS TOT. Él 310 la 280 Juan 220 1 pronoun la is referred to noun tarta because of gender constraints (i.e. only feminines here) 2 pronoun Él is referred to the noun from the previous sentence with the highest value, i.e. Pedro Dr. Alexandra M. Liguori NLP Training – Session 3
  • 105. Algorithm for reference resolution Combined results of reference from first two sentences: (PRO)NOUNS TOT. Pedro + Él (155+310)/2 = 232.5 tarta + la 140+280/2 = 210 chocolate 180/2 = 90 Juan 220/2 = 110 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 106. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 3 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 107. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 3 1 Take third sentence: Le gustan los dulces. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 108. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 3 1 Take third sentence: Le gustan los dulces. 2 Parse this third sentence → parsing result: Le/PP3CSD00 gustan/VMII3P0 los/DA0MP0 dulces/NCMP000 ./. Le/IND-OBJ gustan/VERB [los dulces]/SUBJ ./. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 109. Algorithm for reference resolution Example discourse Pedro se comió una tarta de chocolate. Él se la había pedido a Juan. Le gustan los dulces. Step 3 1 Take third sentence: Le gustan los dulces. 2 Parse this third sentence → parsing result: Le/PP3CSD00 gustan/VMII3P0 los/DA0MP0 dulces/NCMP000 ./. Le/IND-OBJ gustan/VERB [los dulces]/SUBJ ./. 3 Calculate weights for all new nouns and pronouns appearing in this third sentence: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 110. Algorithm for reference resolution Weights for the nouns and pronouns from the third sentence: (PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT. Le 100 0 0 0 50 50 80 280 dulces 100 100 0 0 0 50 80 330 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 111. Algorithm for reference resolution Weights for the nouns and pronouns from the third sentence: (PRO)NOUNS Rec. Subj. Exist Obj. Ind.-Obj. Non -Adv. Head N TOT. Le 100 0 0 0 50 50 80 280 dulces 100 100 0 0 0 50 80 330 There is only the pronoun Le that needs to be referred to a previous noun... Dr. Alexandra M. Liguori NLP Training – Session 3
  • 112. Algorithm for reference resolution Combined results of reference from first two sentences: NOUNS TOT. Pedro + Él (155+310)/2 = 232.5 tarta + la 140+280/2 = 210 chocolate 180/2 = 90 Juan 220/2 = 110 Dr. Alexandra M. Liguori NLP Training – Session 3
  • 113. Algorithm for reference resolution Combined results of reference from first two sentences: NOUNS TOT. Pedro + Él (155+310)/2 = 232.5 tarta + la 140+280/2 = 210 chocolate 180/2 = 90 Juan 220/2 = 110 Singular masculine or feminine pronoun Le could be referred to all singular, masculine and feminine, nouns: Pedro, Juan, tarta, or chocolate. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 114. Algorithm for reference resolution Combined results of reference from first two sentences: NOUNS TOT. Pedro + Él (155+310)/2 = 232.5 tarta + la 140+280/2 = 210 chocolate 180/2 = 90 Juan 220/2 = 110 Singular masculine or feminine pronoun Le could be referred to all singular, masculine and feminine, nouns: Pedro, Juan, tarta, or chocolate. we refer Le to previous noun with highest weight, i.e. Pedro Dr. Alexandra M. Liguori NLP Training – Session 3
  • 115. Algorithm for reference resolution Combined results of reference from first two sentences: NOUNS TOT. Pedro + Él (155+310)/2 = 232.5 tarta + la 140+280/2 = 210 chocolate 180/2 = 90 Juan 220/2 = 110 Singular masculine or feminine pronoun Le could be referred to all singular, masculine and feminine, nouns: Pedro, Juan, tarta, or chocolate. we refer Le to previous noun with highest weight, i.e. Pedro referencing is completed!!! Dr. Alexandra M. Liguori NLP Training – Session 3
  • 116. Algorithm for reference resolution Combined results of reference from first two sentences: NOUNS TOT. Pedro + Él (155+310)/2 = 232.5 tarta + la 140+280/2 = 210 chocolate 180/2 = 90 Juan 220/2 = 110 Singular masculine or feminine pronoun Le could be referred to all singular, masculine and feminine, nouns: Pedro, Juan, tarta, or chocolate. we refer Le to previous noun with highest weight, i.e. Pedro referencing is completed!!! Lappin & Leass algorithm has nearly 90% accuracy. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 117. NER Dr. Alexandra M. Liguori NLP Training – Session 3
  • 118. NER Named Entity Recognition Can be broken down in two distinct problems, i.e.: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 119. NER Named Entity Recognition Can be broken down in two distinct problems, i.e.: 1 detection of names Dr. Alexandra M. Liguori NLP Training – Session 3
  • 120. NER Named Entity Recognition Can be broken down in two distinct problems, i.e.: 1 detection of names 2 classification of the names by the type of entity to which they refer → 4 standard types: Dr. Alexandra M. Liguori NLP Training – Session 3
  • 121. NER Named Entity Recognition Can be broken down in two distinct problems, i.e.: 1 detection of names 2 classification of the names by the type of entity to which they refer → 4 standard types: 1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”, etc.) Dr. Alexandra M. Liguori NLP Training – Session 3
  • 122. NER Named Entity Recognition Can be broken down in two distinct problems, i.e.: 1 detection of names 2 classification of the names by the type of entity to which they refer → 4 standard types: 1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”, etc.) 2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.) Dr. Alexandra M. Liguori NLP Training – Session 3
  • 123. NER Named Entity Recognition Can be broken down in two distinct problems, i.e.: 1 detection of names 2 classification of the names by the type of entity to which they refer → 4 standard types: 1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”, etc.) 2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.) 3 location (e.g. ”Madrid”, "Washington D.C.”, ”L.A.”, ”Barcelona”, etc.) Dr. Alexandra M. Liguori NLP Training – Session 3
  • 124. NER Named Entity Recognition Can be broken down in two distinct problems, i.e.: 1 detection of names 2 classification of the names by the type of entity to which they refer → 4 standard types: 1 person (e.g. ”Carol”, ”Tom Hanks”; ”Pedro”, ”Juan Carlos I”, etc.) 2 organization (e.g. ”WWF”, ”IBM”, ”El Mundo”, etc.) 3 location (e.g. ”Madrid”, "Washington D.C.”, ”L.A.”, ”Barcelona”, etc.) 4 other (e.g. ”Hotel Catalunya”, etc. ) Dr. Alexandra M. Liguori NLP Training – Session 3
  • 125. NER Tools for Named Entity Recognition GATE for English, Spanish, and many more, via graphical interface and Java API (development at the University of Sheffield, UK) https://gate.ac.uk/ NETagger: Java based Illinois Named Entity Recognition (development by Cognitive Computation Group at University of Illinois at Urbana - Champaign) http://cogcomp.cs.illinois.edu/page/software_view/NETagger OpenNLP: rule based and statistical Named Entity Recognition (development by Apache) http://opennlp.apache.org/index.html Stanford CoreNLP: Java-based CRF Named Entity Recognition (development by Stanford Natural Language Processing Group) http://nlp.stanford.edu/software/CRF-NER.shtml Dr. Alexandra M. Liguori NLP Training – Session 3
  • 126. Keyword / topic / information extraction Dr. Alexandra M. Liguori NLP Training – Session 3
  • 127. Keyword / topic / information extraction Tools Dr. Alexandra M. Liguori NLP Training – Session 3
  • 128. Keyword / topic / information extraction Tools Keyword extraction: e.g. Dr. Alexandra M. Liguori NLP Training – Session 3
  • 129. Keyword / topic / information extraction Tools Keyword extraction: e.g. 1 GATE (ANNIE tool) for English, Spanish, and many more, via graphical interface and Java API https://gate.ac.uk/ → simply using jape files for the LUs Dr. Alexandra M. Liguori NLP Training – Session 3
  • 130. Keyword / topic / information extraction Tools Keyword extraction: e.g. 1 GATE (ANNIE tool) for English, Spanish, and many more, via graphical interface and Java API https://gate.ac.uk/ → simply using jape files for the LUs 2 pattern.vector module from CLiPS in Python http://www.clips.ua.ac.be/pages/luceneapi_node/pattern.vector Dr. Alexandra M. Liguori NLP Training – Session 3
  • 131. Keyword / topic / information extraction Tools Keyword extraction: e.g. 1 GATE (ANNIE tool) for English, Spanish, and many more, via graphical interface and Java API https://gate.ac.uk/ → simply using jape files for the LUs 2 pattern.vector module from CLiPS in Python http://www.clips.ua.ac.be/pages/luceneapi_node/pattern.vector Topic / information extraction: e.g. GATE (ANNIE tool) for English, Spanish, and many more, via graphical interface and Java API → using jape files for the LUs, FEs, and FRAMES Dr. Alexandra M. Liguori NLP Training – Session 3
  • 132. What next? Another practical session on GATE this summer? Dr. Alexandra M. Liguori NLP Training – Session 3