SlideShare a Scribd company logo
1 of 39
Download to read offline
Entity Linking (at SEA)
David Graus, University of Amsterdam
Photo by TRPultz (Creative Commons Attribution 3.0 Unported License)
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
2
Today’s talk
Ò What?
Ò Why?
Ò How?
Ò Etc.
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
3
Entity Linking?
Ò Link mentions of entities (in text) to their
referent entities (in a KB)
Ò Example:



“During Tank Johnson’s tumultuous tenure with
the Bears, incidents with guns got him arrested,
jailed and suspended, and his close friend was
shot and killed in front of him after an altercation
at a Chicago bar.”
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
4
Entity Linking?
Ò Link mentions of entities (in text) to their
referent entities (in a KB)
Ò Example:



“During Tank Johnson’s tumultuous tenure with
the Bears, incidents with guns got him arrested,
jailed and suspended, and his close friend was
shot and killed in front of him after an altercation
at a Chicago bar.”
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
5
Entity Mention: Tank
TANK (VEHICLE)
Knowledge
Base (KB)
Document r
TANK
query q
?
?
TANK JOHNSON
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
6
Entity Search Outline
Ò What?
Ò Why?
Ò How?
Ò Etc.
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
7
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
8
Social Media Monitoring
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
9
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
10
Entity Search Outline
Ò What?
Ò Why?
Ò How?
Ò Etc.
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
11
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
12
The Semanticizer
Ò Open source framework (https://github.com/semanticize/semanticizer/)
Ò Links to Wikipedia
Ò Entity = Wikipedia Page
Ò “Lexical matching” approach
Ò no NER, information extraction
http://semanticize.uva.nl/
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
13
Lexical matching
Ò Construct “entity dictionaries”
Ò By taking entity Titles
!
!
Ò Anchors
!
!
Ò Redirect pages
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
14
n-gram -> entity
Ò Kendrick Lamar
Ò K-Dot
Ò Kendrick
Ò K. Dot
Ò Kendrick Duckworth
Ò Kendrick Lamar'
Ò Kendrick Lamar's
Ò K Dot
Ò Kendrick Lama
Ò Kendrick Lamarr
Ò Kendrick Llama
Ò The Jig Is Up (Dump'n)
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
Ò For an input sentence s;
!
!
!
!
Ò Retrieve all possible entity candidates
“Eminem Thinks Kendrick Lamar’s
good kid, m.A.A.d. city Was ‘Genius’”
15
Start linking!
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
Ò For an input sentence s;
!
!
!
!
Ò Retrieve) all possible entity candidates
“Eminem Thinks Kendrick Lamar’s
good kid, m.A.A.d. city Was ‘Genius’”
16
Start linking!
http://en.wikipedia.org/wiki/Eminem
http://en.wikipedia.org/wiki/Good_(economics)
http://en.wikipedia.org/wiki/Lamar_County,_Alabama
http://en.wikipedia.org/wiki/Lamar_County,_Mississippi
http://en.wikipedia.org/wiki/Lamar_Advertising_Company
http://en.wikipedia.org/wiki/Kendrick,_Idaho
http://en.wikipedia.org/wiki/Good_Kid_Maad_City
http://en.wikipedia.org/wiki/Kendrick_Lamar
http://en.wikipedia.org/wiki/Kendrick_School
http://en.wikipedia.org/wiki/Lamar_Cardinals_basketball
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
17
Ranking entity candidates
Ò “Prior probabilities”
Ò link probability
Ò commonness
Ò sense probability
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
18
1. Link Probability
Ò “Kendrick Lamar” occurs 698x on Wikipedia
Ò as hyperlink: 501x
Ò no hyperlink: 197x
!
!
Ò “Kendrick” occurs 5.037x on Wikipedia
Ò as hyperlink: 24x
Ò no hyperlink: 5.014x
!
24
5.037
= 0,005
!
501
698
= 0,718
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
19
2. “Commonness”
Ò “Kendrick” is used to refer to:
Ò Kendrick,_Idaho
Ò Kendrick,_Oklahoma
Ò T._D._Kendrick
Ò Kendrick_School
Ò John_Kendrick_(American_sea_captain)
Ò Kendrick_Lamar
Ò Francis_Kenrick
Ò Kendrick
Ò Howie Kendrick
!
8
3
3
2
2
2
2
1
1
!
/ 24
/ 24
/ 24
/ 24
/ 24
/ 24
/ 24
/ 24
/ 24
!
= 0,333
= 0,125
= 0,125
= 0.083
= 0.083
= 0.083
= 0.083
= 0.042
= 0.042
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
20
3. Sense Probability
Ò no. of times n-gram links to entity
Ò over all occurrences of n-gram
!
2
5.037
= 0,0004Kendrick -> Kendrick_Lamar =
Kendrick Lamar -> Kendrick_Lamar =
!
500
698
= 0,716
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
21
Ranking by prior probability
Works quite well for the bulk of times!
!
High accuracy reported on naive linking using only
“popularity ranking” [1]
!
!
[1] Heng Ji, Ralph Grishman, “Knowledge Base Population: Successful
Approaches and Challenges”, ACL 2011
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
22
Beyond ranking: supervised linking
Ò Entity linking as binary classification
!
Ò Input:
Ò sentence s + set of target entities E
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14 23
Beyond ranking: supervised linking
“Eminem Thinks Kendrick Lamar’s
good kid, m.A.A.d. city Was ‘Genius’”
http://en.wikipedia.org/wiki/Eminem
http://en.wikipedia.org/wiki/Good_(economics)
http://en.wikipedia.org/wiki/Lamar_County,_Alabama
http://en.wikipedia.org/wiki/Lamar_County,_Mississippi
http://en.wikipedia.org/wiki/Lamar_Advertising_Company
http://en.wikipedia.org/wiki/Kendrick,_Idaho
http://en.wikipedia.org/wiki/Good_Kid_Maad_City
http://en.wikipedia.org/wiki/Kendrick_Lamar
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
24
Beyond ranking: supervised linking
Ò Given a new sentence, for each candidate entity
e output probability of belonging to class:
Ò positive (= target), or
Ò negative (= no target)
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
25
Features
Ò Local:
Ò link each entity mention separately
Ò Global:
Ò link all mentions in a document simultaneously,
to arrive at a coherent set of entities
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
26
Global features
“Eminem Thinks Kendrick Lamar’s
good kid, m.A.A.d. city Was ‘Genius’”
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14 27
Global features
“[Eminem] Thinks [Kendrick Lamar]’s
[good kid, m.A.A.d. city] Was ‘Genius’”
http://en.wikipedia.org/wiki/Eminem
http://en.wikipedia.org/wiki/Good_(economics)
http://en.wikipedia.org/wiki/Lamar_County,_Alabama
http://en.wikipedia.org/wiki/Lamar_County,_Mississippi
http://en.wikipedia.org/wiki/Lamar_Advertising_Company
http://en.wikipedia.org/wiki/Kendrick,_Idaho
http://en.wikipedia.org/wiki/Good_Kid_Maad_City
http://en.wikipedia.org/wiki/Kendrick_Lamar
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
28
“Relatedness”
Source:
Milne, D. and Witten, I.H. (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI'08.
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
29
Features: Semanticizer
Ò Local:
Ò n-gram
Ò KB
Ò n-gram+KB
Ò Text similarity
Ò Global:
Ò Finding “related entities”
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
30
Local features: n-gram/KB
Ò n-gram features:
Ò link probability
Ò length of n-gram
Ò number of entity titles that contain n-gram
Ò entity features:
Ò entity’s number of inlinks
Ò entity’s number of outlinks
Ò number of redirect pages referring to entity
Ò n-gram+entity features:
Ò commonness
Ò sense probability
Ò edit distance between n-gram and entity title
Ò does n-gram contain entity title?
Ò does entity title contain n-gram?
Ò does title equal n-gram?
Ò TF of n-gram in entity document
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
31
Local features: Text similarity
Ò Similarity between input sentence s
!
!
!
and entity candidate document (Wikipedia page)
!
Ò Kendrick_Lamar 0.4215
Ò Kendrick,_Idaho 0.1599
“Eminem Thinks Kendrick Lamar’s
good kid, m.A.A.d. city Was ‘Genius’”
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
32
Global features
query q
Document r
query q
Document r
Cand.
e1
Cand.
e2
query q
Document r
Cand.
e1
Cand.
e2
Outlink
e3
Inlink
e4
Inlink
e5
Outlink
e6
Inlink
e7
query q
Document r
Cand.
e1
Cand.
e2
Outlink
e3
Inlink
e4
Inlink
e5
Outlink
e6
Inlink
e7
Anchor 3 Anchor 4
Anchor 3A
Anchor 3B
Anchor 5A
Anchor 5B
Anchor 5C
Anchor 4A
Anchor 4B
Anchor 2A
Anchor 2B
Anchor 2C
Anchor 1B
Anchor 1 Anchor 2
Anchor 1A
query q
Document r
Cand.
e1
Cand.
e2
Outlink
e3
Inlink
e4
Inlink
e5
Outlink
e6
Inlink
e7
Anchor 3 Anchor 4
Anchor 3A
Anchor 3B
Anchor 5A
Anchor 5B
Anchor 5C
Anchor 4A
Anchor 4B
Anchor 2A
Anchor 2B
Anchor 2C
Anchor 1B
Anchor 1 Anchor 2
Anchor 1A
Support
Anchor 1A
Support
Anchor 5C
Support
Anchor 4B
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
33
But
Ò Too slow in real life
Ò Solution: 

set of linked entities (inlinks / outlinks) as “virtual
document”
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
34
Related entity document
["Entertainment Weekly”, "Compton, California”, “California", “Rapping",
“songwriter", "Hip hop music”, "Top Dawg Entertainment”, "Aftermath
Entertainment”, "Interscope Records”, "Black Hippy”, "Dr. Dre”, "The Game
(rapper)”, "Jay Rock”, "J. Cole”, "Hip hop music”, "recording artist”, "Compton,
California”, "Carson, California","Top Dawg Entertainment","Aftermath
Entertainment","Interscope Records","West Coast hip hop","Supergroup
(music)","Black Hippy","rapper","Schoolboy Q","Jay Rock","Ab-Soul","Overly
Dedicated","independent album","Section.80","iTunes Store","Major record
label","Dr. Dre","Game (rapper)","Drake (entertainer)","Young Jeezy","Talib
Kweli","Busta Rhymes","E-40","Warren G”, …]
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
35
Ò Similarity between sentence s and virtual
document as related entity approximation
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
36
Supervised Linking
Ò Feature vector for each sentence-entity pair
Ò Train a Random Forest classifier
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
37
Local vs. global
Ò Hybrid > Local | Global
Ò Local & Global > Hybrid
Ò Approaches are complementary
Ò Global preferred for highly ambiguous entity
mentions (i.e., short ones)
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
38
Etc…
Ò Open challenges:
Ò out of KB entities
Ò Knowledge Base Creation
Entity Linking at SEA
Search Engines Amsterdam, 27 June ’14
39
Thanks!
!
!
!
!
!
!
David Graus
d.p.graus@uva.nl

More Related Content

Similar to David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Frontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representationFrontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representationJonathan Stray
 
The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our OysterRichard Wallis
 
Isma Business Ethics Sweat The Small Stuff
Isma   Business Ethics Sweat The Small StuffIsma   Business Ethics Sweat The Small Stuff
Isma Business Ethics Sweat The Small Stuffswailes
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsJason Kessler
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library DataRichard Wallis
 
Week 4 in house
Week 4 in houseWeek 4 in house
Week 4 in houseE Milanese
 

Similar to David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th (7)

Frontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representationFrontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representation
 
The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our Oyster
 
Isma Business Ethics Sweat The Small Stuff
Isma   Business Ethics Sweat The Small StuffIsma   Business Ethics Sweat The Small Stuff
Isma Business Ethics Sweat The Small Stuff
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
 
Finding Web Resources
Finding Web ResourcesFinding Web Resources
Finding Web Resources
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
 
Week 4 in house
Week 4 in houseWeek 4 in house
Week 4 in house
 

More from David Graus

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsDavid Graus
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in RecommendationsDavid Graus
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.David Graus
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactDavid Graus
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsDavid Graus
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyDavid Graus
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesDavid Graus
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamDavid Graus
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDavid Graus
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoDavid Graus
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenDavid Graus
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsDavid Graus
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsDavid Graus
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-DiscoveryDavid Graus
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseDavid Graus
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationDavid Graus
 

More from David Graus (16)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 

David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

  • 1. Entity Linking (at SEA) David Graus, University of Amsterdam Photo by TRPultz (Creative Commons Attribution 3.0 Unported License)
  • 2. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 2 Today’s talk Ò What? Ò Why? Ò How? Ò Etc.
  • 3. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 3 Entity Linking? Ò Link mentions of entities (in text) to their referent entities (in a KB) Ò Example:
 
 “During Tank Johnson’s tumultuous tenure with the Bears, incidents with guns got him arrested, jailed and suspended, and his close friend was shot and killed in front of him after an altercation at a Chicago bar.”
  • 4. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 4 Entity Linking? Ò Link mentions of entities (in text) to their referent entities (in a KB) Ò Example:
 
 “During Tank Johnson’s tumultuous tenure with the Bears, incidents with guns got him arrested, jailed and suspended, and his close friend was shot and killed in front of him after an altercation at a Chicago bar.”
  • 5. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 5 Entity Mention: Tank TANK (VEHICLE) Knowledge Base (KB) Document r TANK query q ? ? TANK JOHNSON
  • 6. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 6 Entity Search Outline Ò What? Ò Why? Ò How? Ò Etc.
  • 7. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 7
  • 8. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 8 Social Media Monitoring
  • 9. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 9
  • 10. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 10 Entity Search Outline Ò What? Ò Why? Ò How? Ò Etc.
  • 11. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 11
  • 12. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 12 The Semanticizer Ò Open source framework (https://github.com/semanticize/semanticizer/) Ò Links to Wikipedia Ò Entity = Wikipedia Page Ò “Lexical matching” approach Ò no NER, information extraction http://semanticize.uva.nl/
  • 13. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 13 Lexical matching Ò Construct “entity dictionaries” Ò By taking entity Titles ! ! Ò Anchors ! ! Ò Redirect pages
  • 14. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 14 n-gram -> entity Ò Kendrick Lamar Ò K-Dot Ò Kendrick Ò K. Dot Ò Kendrick Duckworth Ò Kendrick Lamar' Ò Kendrick Lamar's Ò K Dot Ò Kendrick Lama Ò Kendrick Lamarr Ò Kendrick Llama Ò The Jig Is Up (Dump'n)
  • 15. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 Ò For an input sentence s; ! ! ! ! Ò Retrieve all possible entity candidates “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’” 15 Start linking!
  • 16. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 Ò For an input sentence s; ! ! ! ! Ò Retrieve) all possible entity candidates “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’” 16 Start linking! http://en.wikipedia.org/wiki/Eminem http://en.wikipedia.org/wiki/Good_(economics) http://en.wikipedia.org/wiki/Lamar_County,_Alabama http://en.wikipedia.org/wiki/Lamar_County,_Mississippi http://en.wikipedia.org/wiki/Lamar_Advertising_Company http://en.wikipedia.org/wiki/Kendrick,_Idaho http://en.wikipedia.org/wiki/Good_Kid_Maad_City http://en.wikipedia.org/wiki/Kendrick_Lamar http://en.wikipedia.org/wiki/Kendrick_School http://en.wikipedia.org/wiki/Lamar_Cardinals_basketball
  • 17. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 17 Ranking entity candidates Ò “Prior probabilities” Ò link probability Ò commonness Ò sense probability
  • 18. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 18 1. Link Probability Ò “Kendrick Lamar” occurs 698x on Wikipedia Ò as hyperlink: 501x Ò no hyperlink: 197x ! ! Ò “Kendrick” occurs 5.037x on Wikipedia Ò as hyperlink: 24x Ò no hyperlink: 5.014x ! 24 5.037 = 0,005 ! 501 698 = 0,718
  • 19. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 19 2. “Commonness” Ò “Kendrick” is used to refer to: Ò Kendrick,_Idaho Ò Kendrick,_Oklahoma Ò T._D._Kendrick Ò Kendrick_School Ò John_Kendrick_(American_sea_captain) Ò Kendrick_Lamar Ò Francis_Kenrick Ò Kendrick Ò Howie Kendrick ! 8 3 3 2 2 2 2 1 1 ! / 24 / 24 / 24 / 24 / 24 / 24 / 24 / 24 / 24 ! = 0,333 = 0,125 = 0,125 = 0.083 = 0.083 = 0.083 = 0.083 = 0.042 = 0.042
  • 20. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 20 3. Sense Probability Ò no. of times n-gram links to entity Ò over all occurrences of n-gram ! 2 5.037 = 0,0004Kendrick -> Kendrick_Lamar = Kendrick Lamar -> Kendrick_Lamar = ! 500 698 = 0,716
  • 21. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 21 Ranking by prior probability Works quite well for the bulk of times! ! High accuracy reported on naive linking using only “popularity ranking” [1] ! ! [1] Heng Ji, Ralph Grishman, “Knowledge Base Population: Successful Approaches and Challenges”, ACL 2011
  • 22. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 22 Beyond ranking: supervised linking Ò Entity linking as binary classification ! Ò Input: Ò sentence s + set of target entities E
  • 23. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 23 Beyond ranking: supervised linking “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’” http://en.wikipedia.org/wiki/Eminem http://en.wikipedia.org/wiki/Good_(economics) http://en.wikipedia.org/wiki/Lamar_County,_Alabama http://en.wikipedia.org/wiki/Lamar_County,_Mississippi http://en.wikipedia.org/wiki/Lamar_Advertising_Company http://en.wikipedia.org/wiki/Kendrick,_Idaho http://en.wikipedia.org/wiki/Good_Kid_Maad_City http://en.wikipedia.org/wiki/Kendrick_Lamar
  • 24. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 24 Beyond ranking: supervised linking Ò Given a new sentence, for each candidate entity e output probability of belonging to class: Ò positive (= target), or Ò negative (= no target)
  • 25. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 25 Features Ò Local: Ò link each entity mention separately Ò Global: Ò link all mentions in a document simultaneously, to arrive at a coherent set of entities
  • 26. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 26 Global features “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”
  • 27. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 27 Global features “[Eminem] Thinks [Kendrick Lamar]’s [good kid, m.A.A.d. city] Was ‘Genius’” http://en.wikipedia.org/wiki/Eminem http://en.wikipedia.org/wiki/Good_(economics) http://en.wikipedia.org/wiki/Lamar_County,_Alabama http://en.wikipedia.org/wiki/Lamar_County,_Mississippi http://en.wikipedia.org/wiki/Lamar_Advertising_Company http://en.wikipedia.org/wiki/Kendrick,_Idaho http://en.wikipedia.org/wiki/Good_Kid_Maad_City http://en.wikipedia.org/wiki/Kendrick_Lamar
  • 28. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 28 “Relatedness” Source: Milne, D. and Witten, I.H. (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI'08.
  • 29. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 29 Features: Semanticizer Ò Local: Ò n-gram Ò KB Ò n-gram+KB Ò Text similarity Ò Global: Ò Finding “related entities”
  • 30. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 30 Local features: n-gram/KB Ò n-gram features: Ò link probability Ò length of n-gram Ò number of entity titles that contain n-gram Ò entity features: Ò entity’s number of inlinks Ò entity’s number of outlinks Ò number of redirect pages referring to entity Ò n-gram+entity features: Ò commonness Ò sense probability Ò edit distance between n-gram and entity title Ò does n-gram contain entity title? Ò does entity title contain n-gram? Ò does title equal n-gram? Ò TF of n-gram in entity document
  • 31. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 31 Local features: Text similarity Ò Similarity between input sentence s ! ! ! and entity candidate document (Wikipedia page) ! Ò Kendrick_Lamar 0.4215 Ò Kendrick,_Idaho 0.1599 “Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”
  • 32. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 32 Global features query q Document r query q Document r Cand. e1 Cand. e2 query q Document r Cand. e1 Cand. e2 Outlink e3 Inlink e4 Inlink e5 Outlink e6 Inlink e7 query q Document r Cand. e1 Cand. e2 Outlink e3 Inlink e4 Inlink e5 Outlink e6 Inlink e7 Anchor 3 Anchor 4 Anchor 3A Anchor 3B Anchor 5A Anchor 5B Anchor 5C Anchor 4A Anchor 4B Anchor 2A Anchor 2B Anchor 2C Anchor 1B Anchor 1 Anchor 2 Anchor 1A query q Document r Cand. e1 Cand. e2 Outlink e3 Inlink e4 Inlink e5 Outlink e6 Inlink e7 Anchor 3 Anchor 4 Anchor 3A Anchor 3B Anchor 5A Anchor 5B Anchor 5C Anchor 4A Anchor 4B Anchor 2A Anchor 2B Anchor 2C Anchor 1B Anchor 1 Anchor 2 Anchor 1A Support Anchor 1A Support Anchor 5C Support Anchor 4B
  • 33. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 33 But Ò Too slow in real life Ò Solution: 
 set of linked entities (inlinks / outlinks) as “virtual document”
  • 34. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 34 Related entity document ["Entertainment Weekly”, "Compton, California”, “California", “Rapping", “songwriter", "Hip hop music”, "Top Dawg Entertainment”, "Aftermath Entertainment”, "Interscope Records”, "Black Hippy”, "Dr. Dre”, "The Game (rapper)”, "Jay Rock”, "J. Cole”, "Hip hop music”, "recording artist”, "Compton, California”, "Carson, California","Top Dawg Entertainment","Aftermath Entertainment","Interscope Records","West Coast hip hop","Supergroup (music)","Black Hippy","rapper","Schoolboy Q","Jay Rock","Ab-Soul","Overly Dedicated","independent album","Section.80","iTunes Store","Major record label","Dr. Dre","Game (rapper)","Drake (entertainer)","Young Jeezy","Talib Kweli","Busta Rhymes","E-40","Warren G”, …]
  • 35. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 35 Ò Similarity between sentence s and virtual document as related entity approximation
  • 36. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 36 Supervised Linking Ò Feature vector for each sentence-entity pair Ò Train a Random Forest classifier
  • 37. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 37 Local vs. global Ò Hybrid > Local | Global Ò Local & Global > Hybrid Ò Approaches are complementary Ò Global preferred for highly ambiguous entity mentions (i.e., short ones)
  • 38. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 38 Etc… Ò Open challenges: Ò out of KB entities Ò Knowledge Base Creation
  • 39. Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 39 Thanks! ! ! ! ! ! ! David Graus d.p.graus@uva.nl