SlideShare a Scribd company logo
1 of 20
Dynamic Collective Entity
Representations for Entity Ranking
David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
2
3
4
Entity search?
 Index = Knowledge Base (= Wikipedia)
 Documents = Entities
 “Real world entities” have a single representation
(in KB)
5
Representation is not static
 People talk about entities all the time
 Associations between words and entities change
over time
6
Example 1: News events
7
Example 2: Social media chatter
8
Dynamic Collective Entity
Representations
 Use “collective intelligence” to mine entity
descriptions to enrich representation.
 Is like document expansion (add terms found
through explicit links)
 Is not query expansion (terms found through
predicted links)
9
Advantages
 Cheap: Change document in index, leverage tried &
tested retrieval algorithms
 Free “smoothing”: (e.g., tweets) may capture ‘newly
evolving’ word associations (Ferguson shooting) and
incorporate out-of-document terms
 “move relevant documents closer to queries” (= close
the gap between searcher vocabulary & docs in index)
10
Haven’t we seen this before?
 Anchors & queries in particular have been shown to
improve retrieval [1]
 Tweets have been shown to be similar to anchors [2]
 Social tags, same [3]
 But:
 in batch (i.e., add data, see how it affects retrieval)
 single source
[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001
[2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12
[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
11
Description sourcesAnthropornis nordenskjoeldi
Anthropornis
Nordenskjoeld's Giant Penguin
Eocene
Oligocene
Animal
Chordate
Aves
Sphenisciformes
Spheniscidae
...
emperor penguin
Nordenskjoeld's Giant Penguin
Anthropornis nordenskjoeldi
Nordenskjoeld's giant penguin
Anthropornis
Eocene birds
Oligocene birds
Extinct penguins
Oligocene extinctions
Bird genera
KB Anchors
KB Categories
KB Redirects
KB Links
Anthropornis nordenskjoeldi
Anthropornis nordenskjoeldi
Web Anchors
megafauna
Tags
Tweets
biggest penguin
anthropornis
extinct penguin
prehistoric birds
Queries
12
Challenge
 Heterogeneity
1. Description sources
2. Entities
 Dynamic nature
 Content changes over time
13
Method: Adaptive ranking
 Supervised single-field weighting model
 Features:
 field similarity: retrieval score per field.
 field “importance”: length, novel terms, etc.
 entity “importance”: time since last update.
 (Re-)learn optimal weights from clicks
14
Experimental setup
1. Data:
 MSN Query log (62,841 queries + clicks (on entities))
 Each query is treated as a time unit
 For each query:
 Produce ranking
 Observe click
 Evaluate ranking (MAP/P@1)
 Expand entities (w/ dynamic descriptions)
 [re-train ranker]
15
Main results
 Comparing effectiveness of diff. description
sources
 Comparing adaptive vs. non-adaptive ranker
performance
16
Description sources
MAP
No. of queries
17
Feature weights over time
Relativefeatureimportance
No. of queries
18
Non-adaptive vs. adaptive ranking
19
In summary
 Expanding entity representations with different
sources enables better matching of queries to
entities
 As new content comes in, it is beneficial to retrain
the ranker
 Informing ranker of “expansion state” further
improves performance
20
Thank you
 (Also, thank you WSDM & SIGIR travel grants)

More Related Content

What's hot

Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Ravi Madduri
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 
Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Penn State University
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Todd Vision
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisRavi Madduri
 
W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010tim.clark
 

What's hot (8)

Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010
 

Viewers also liked

yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsDavid Graus
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email TrafficDavid Graus
 
Instance Matching
Instance Matching Instance Matching
Instance Matching Robert Isele
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDavid Graus
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)David Graus
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersDavid Graus
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenDavid Graus
 

Viewers also liked (8)

yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
 
Instance Matching
Instance Matching Instance Matching
Instance Matching
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 

Similar to Dynamic Collective Entity Representations for Entity Ranking

Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extractionAnkit Sharma
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG: connecting the knowledge community
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data scienceDaemin Park
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninAlexandre Monnin
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentICZN
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendationkrisztianbalog
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusioninscit2006
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 

Similar to Dynamic Collective Entity Representations for Entity Ranking (20)

Shorthouse
ShorthouseShorthouse
Shorthouse
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data science
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre Monnin
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusion
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 

More from David Graus

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsDavid Graus
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in RecommendationsDavid Graus
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.David Graus
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactDavid Graus
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsDavid Graus
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyDavid Graus
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesDavid Graus
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamDavid Graus
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDavid Graus
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoDavid Graus
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsDavid Graus
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-DiscoveryDavid Graus
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseDavid Graus
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationDavid Graus
 

More from David Graus (14)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
 

Recently uploaded

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 

Recently uploaded (20)

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 

Dynamic Collective Entity Representations for Entity Ranking

  • 1. Dynamic Collective Entity Representations for Entity Ranking David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
  • 2. 2
  • 3. 3
  • 4. 4 Entity search?  Index = Knowledge Base (= Wikipedia)  Documents = Entities  “Real world entities” have a single representation (in KB)
  • 5. 5 Representation is not static  People talk about entities all the time  Associations between words and entities change over time
  • 7. 7 Example 2: Social media chatter
  • 8. 8 Dynamic Collective Entity Representations  Use “collective intelligence” to mine entity descriptions to enrich representation.  Is like document expansion (add terms found through explicit links)  Is not query expansion (terms found through predicted links)
  • 9. 9 Advantages  Cheap: Change document in index, leverage tried & tested retrieval algorithms  Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms  “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)
  • 10. 10 Haven’t we seen this before?  Anchors & queries in particular have been shown to improve retrieval [1]  Tweets have been shown to be similar to anchors [2]  Social tags, same [3]  But:  in batch (i.e., add data, see how it affects retrieval)  single source [1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12 [3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
  • 11. 11 Description sourcesAnthropornis nordenskjoeldi Anthropornis Nordenskjoeld's Giant Penguin Eocene Oligocene Animal Chordate Aves Sphenisciformes Spheniscidae ... emperor penguin Nordenskjoeld's Giant Penguin Anthropornis nordenskjoeldi Nordenskjoeld's giant penguin Anthropornis Eocene birds Oligocene birds Extinct penguins Oligocene extinctions Bird genera KB Anchors KB Categories KB Redirects KB Links Anthropornis nordenskjoeldi Anthropornis nordenskjoeldi Web Anchors megafauna Tags Tweets biggest penguin anthropornis extinct penguin prehistoric birds Queries
  • 12. 12 Challenge  Heterogeneity 1. Description sources 2. Entities  Dynamic nature  Content changes over time
  • 13. 13 Method: Adaptive ranking  Supervised single-field weighting model  Features:  field similarity: retrieval score per field.  field “importance”: length, novel terms, etc.  entity “importance”: time since last update.  (Re-)learn optimal weights from clicks
  • 14. 14 Experimental setup 1. Data:  MSN Query log (62,841 queries + clicks (on entities))  Each query is treated as a time unit  For each query:  Produce ranking  Observe click  Evaluate ranking (MAP/P@1)  Expand entities (w/ dynamic descriptions)  [re-train ranker]
  • 15. 15 Main results  Comparing effectiveness of diff. description sources  Comparing adaptive vs. non-adaptive ranker performance
  • 17. 17 Feature weights over time Relativefeatureimportance No. of queries
  • 19. 19 In summary  Expanding entity representations with different sources enables better matching of queries to entities  As new content comes in, it is beneficial to retrain the ranker  Informing ranker of “expansion state” further improves performance
  • 20. 20 Thank you  (Also, thank you WSDM & SIGIR travel grants)

Editor's Notes

  1. first entities & structure, i get to show the mandatory entity search example
  2. you are not interested in documents but in things: person/artist kendrick lamar referring to him w/ his former stage name
  3. so it is like web search, but the units of retrieval are real life entities, so we can collect data for them
  4. This is what we try to leverage in this work
  5. July 31st, after August 7th -> Added content, new words associations
  6. this looks a bit extreme, because there’s swearing but there’s a serious intuition here; vocabulary gap (formal KB, informal chatter)
  7. our method aims to leverage this enrich representation + close the gap
  8. of collective int/descr sources
  9. we look at a scenario where the expansions come in a streaming manner
  10. Fielded document representation
  11. You could do vanilla retrieval. But two challenges arise; description sources differ along several dimensions (e.g., volume, quality, novelty) head entities are likely to receive a larger number of external descriptions than tail entities. content changes over time, so expansions may accumulate and “swamp” the representation
  12. Our solution is to dynamically learn how to combine fields into single representation, Features (more detail in paper); field similarity features (per field) = query–field similarity scores. field importance features (per field) to inform the ranker of the status of the field at that time (i.e., more and novel content) entity importance (to favor “recently” updated entities) (what about experimental setup?)
  13. Took all queries that yield Wiki clicks. Top-k retrieval, extract features Allows to track performance over time
  14. in this talk I focus on the contribution of sources and adaptive vs. static ranker
  15. 1. Each source contributes to better ranking; Tags/web anchors do best, tweets are significantly > KB 2. Dynamic sources have higher “learning rates” (suggests that newly incoming data is successfully incorporated) 3. Tags starts under web but approaches it; new tags improve [NEXT] To see the effect of incoming data, feature weights
  16. - Static go down, dynamic go up (suggests retraining is important w/ dynamic expansions) - Tweets marginally, but as we know KB+Tweets > KB, the tweets do help - Not shown; static expansions stay roughly the same [NEXT] Increasing field weight + increased performance suggests retraining is needed, next;
  17. 1. [LEFT] Lower performance overall (more data w/o more training queries) 2. [LEFT] Dynamic ones higher slopes; so newly incoming data does help even in static 3. [RIGHT] same patterns but tags+web do comparatively better (because of swamping?) [END] higher performance: retraining increases ranker’s ability in optimally combining descriptions into a single representation
  18. More data helps, but to optimally benefit you need to inform your ranker