SlideShare a Scribd company logo
1 of 42
Download to read offline
London Information Retrieval Meetup
Latest Updates
Anna Ruggero, R&D Software Engineer
11th February 2020
Entity Search on Virtual Documents
Created with Graph Embeddings
London Information Retrieval Meetup
Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
● Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning
London Information Retrieval Meetup
Who I am
! R&D Search Software Engineer
! Master Degree in Computer Science
Engineering
! Big Data, Information Retrieval
! Organist, Music lover
London Information Retrieval Meetup
Background
Solution Design
Evaluation
Discussion and
Conclusion
Overview
London Information Retrieval Meetup
Background
Solution Design
Evaluation
Discussion and
Conclusion
Overview
London Information Retrieval Meetup
Entity Search
! Entity: an entity is a uniquely identifiable object or thing
characterized by its name(s), type(s) and relationships to
other entities.
London Information Retrieval Meetup
<dbr:Michael_Schumacher>
Michael
Schumacher
Racing
Driver
1969-01-03
Entity Search
London Information Retrieval Meetup
What is Entity search?
is the search paradigm of organizing and
accessing information centered around entities and
their attributes and relationships.
Focus: Task of ad-hoc entity retrieval
it aims to answer information needs relayed to
a particular entity expressed in unconstrained
natural language and resolved using a collection of
structured data.
Entity Search
London Information Retrieval Meetup
How can we represent and
manage these type of
information
Entities Representation
London Information Retrieval Meetup
RDF
London Information Retrieval Meetup
State of the Art
Text
Retrieval
SPARQL
< Michael_Schumacher ><name><Michael>
<Michael_Schumacher><type><RacingDriver>
<Michael_Schumacher><BirthDate><1969-01-03>
Entity
London Information Retrieval Meetup
! Structured-based approach
! Fielded representation with a field
associated to each predicate or
class of predicates.
State of the Art
Document creation:
! Text-based approach
! Triples concatenation in a
Bag-Of-Words approach.
London Information Retrieval Meetup
What do we want to introduce with
our approach
Our approach, with respect to the state
of the art, aims to create documents that
involve entities defined by information
that consider also the context in which
they are collocated.
TreeHyde Park
Squirrel
Nut
What’s new
London Information Retrieval Meetup
Graph Embeddings
! Graph embedding: is a method that allows us to obtain a
numerical vector representation of nodes and edges of a
graph. It represents the topology and structure of graph
through vectors or set of vectors.
! Graph embeddings obtain these representation through the
use of a neural network.
London Information Retrieval Meetup
Word2Vec skip-gram model
Graph Embeddings
London Information Retrieval Meetup
Node2Vec is based on word2vec and wants to identify also
relationships as homophily and structural roles through walk
generation based on BFS and DFS strategies.
Graph Embeddings
London Information Retrieval Meetup
Background
Solution Design
Evaluation
Discussion and
Conclusion
Overview
London Information Retrieval Meetup
Solution Design
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
London Information Retrieval Meetup
Dbpedia
London Information Retrieval Meetup
Experimental Setup
London Information Retrieval Meetup
Entities Representations
! We obtain entities embedding through the application of
Node2Vec setting these parameters:
! Embedding dimension
! Walk length
! Number of walks
! P
! Q
! Workers
! Window size
London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
London Information Retrieval Meetup
Clustering
! Idea: create documents that contain more than one related and similar
entities.
! We execute K-MeansSort on entities embeddings.
! Pro:
! Easy to initialize
! Obtains good results
! Highly scalable.
London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
London Information Retrieval Meetup
We associate a document to each cluster.
Document Creation
London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
London Information Retrieval Meetup
! We want to apply the classic text retrieval technique to
our virtual documents in order to obtain a ranked list of
clusters/documents
! We use BM25 model
Ranking System
London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
London Information Retrieval Meetup
Entity Retrieval
! The user wants a list of entities and not a list of cluster.
! We implement two set of systems whose aim is to create a ranked
list of entities starting from the ranked list of clusters/documents:
! Combination systems: basic approach
! Fusion systems: basic approach + state-of-the-art
London Information Retrieval Meetup
Combination System
London Information Retrieval Meetup
Fusion System
London Information Retrieval Meetup
Overview
Background
Solution Design
Evaluation
Discussion and
Conclusion
London Information Retrieval Meetup
! Quantitative evaluation: general consideration, average
measures.
! Qualitative (i.e. topic-based) evaluation: specific
consideration. We look for correlation between topics and
system effectiveness.
Evaluation
London Information Retrieval Meetup
Quantitative Evaluation
London Information Retrieval Meetup
Quantitative Evaluation
London Information Retrieval Meetup
Quantitative Evaluation
London Information Retrieval Meetup
Quantitative Evaluation
London Information Retrieval Meetup
! The topic is:
! ”Chefs that have a TV
show in Food Network.”
! We retrieve many relevant
entities in top positions.
! We retrieve entities that
BM25 does not found.
Qualitative Evaluation
London Information Retrieval Meetup
Overview
Background
Solution Design
Evaluation
Discussion and
Conclusion
London Information Retrieval Meetup
Discussion and Conclusion
! In fusion systems we exploit positive aspects of both the
methods: cluster-based and classic.
! The cluster construction process and the choice of the
number of entities to insert into the final list are
fundamental to obtaining good performances in the retrieval
phase.
! Performances are penalized in evaluation phase due to the
way the collection test is build.
! Our apporach turns out to be promizing because it succeed
in finding new relevant entities with respect to the state-of-
the-art.
London Information Retrieval Meetup
Thank you for your attention

More Related Content

What's hot

Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Lucidworks
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 

What's hot (17)

Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)
 
Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques Haystack London - Search Quality Evaluation, Tools and Techniques
Haystack London - Search Quality Evaluation, Tools and Techniques
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
 
Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?Enterprise Search – How Relevant Is Relevance?
Enterprise Search – How Relevant Is Relevance?
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank Story
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
How the Lucene More Like This Works
How the Lucene More Like This WorksHow the Lucene More Like This Works
How the Lucene More Like This Works
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Haystacks slides
Haystacks slidesHaystacks slides
Haystacks slides
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 

Similar to Entity Search on Virtual Documents Created with Graph Embeddings

SWAP : A Dublin Core Application Profile for desribing scholarly works
SWAP : A Dublin Core Application Profile for desribing scholarly worksSWAP : A Dublin Core Application Profile for desribing scholarly works
SWAP : A Dublin Core Application Profile for desribing scholarly works
Julie Allinson
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atif
Atif Latif
 
Repositories and the wider context
Repositories and the wider contextRepositories and the wider context
Repositories and the wider context
Julie Allinson
 

Similar to Entity Search on Virtual Documents Created with Graph Embeddings (20)

Semantic Web in Action
Semantic Web in ActionSemantic Web in Action
Semantic Web in Action
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glass
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
RDF data clustering
RDF data clusteringRDF data clustering
RDF data clustering
 
Dcap Ja Progmeet 2007 07 05
Dcap Ja Progmeet 2007 07 05Dcap Ja Progmeet 2007 07 05
Dcap Ja Progmeet 2007 07 05
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Web Services and the JISC IE
Web Services and the JISC IEWeb Services and the JISC IE
Web Services and the JISC IE
 
SWAP : A Dublin Core Application Profile for desribing scholarly works
SWAP : A Dublin Core Application Profile for desribing scholarly worksSWAP : A Dublin Core Application Profile for desribing scholarly works
SWAP : A Dublin Core Application Profile for desribing scholarly works
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atif
 
Repositories and the wider context
Repositories and the wider contextRepositories and the wider context
Repositories and the wider context
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Resource Discovery Landscape
Resource Discovery LandscapeResource Discovery Landscape
Resource Discovery Landscape
 
Irish Digital Libraries Summit
Irish Digital Libraries SummitIrish Digital Libraries Summit
Irish Digital Libraries Summit
 

More from Sease

When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
Sease
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Sease
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
Sease
 

More from Sease (20)

Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
When SDMX meets AI-Leveraging Open Source LLMs To Make Official Statistics Mo...
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
 
How does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspectiveHow does ChatGPT work: an Information Retrieval perspective
How does ChatGPT work: an Information Retrieval perspective
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Large Scale Indexing
Large Scale IndexingLarge Scale Indexing
Large Scale Indexing
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
How to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptxHow to cache your searches_ an open source implementation.pptx
How to cache your searches_ an open source implementation.pptx
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache Lucene
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
 
Feature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text CollectionsFeature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text Collections
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Entity Search on Virtual Documents Created with Graph Embeddings

  • 1. London Information Retrieval Meetup Latest Updates Anna Ruggero, R&D Software Engineer 11th February 2020 Entity Search on Virtual Documents Created with Graph Embeddings
  • 2. London Information Retrieval Meetup Sease Search Services ● Open Source Enthusiasts ● Apache Lucene/Solr experts ● Community Contributors ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Search Quality Evaluation, Relevancy Tuning
  • 3. London Information Retrieval Meetup Who I am ! R&D Search Software Engineer ! Master Degree in Computer Science Engineering ! Big Data, Information Retrieval ! Organist, Music lover
  • 4. London Information Retrieval Meetup Background Solution Design Evaluation Discussion and Conclusion Overview
  • 5. London Information Retrieval Meetup Background Solution Design Evaluation Discussion and Conclusion Overview
  • 6. London Information Retrieval Meetup Entity Search ! Entity: an entity is a uniquely identifiable object or thing characterized by its name(s), type(s) and relationships to other entities.
  • 7. London Information Retrieval Meetup <dbr:Michael_Schumacher> Michael Schumacher Racing Driver 1969-01-03 Entity Search
  • 8. London Information Retrieval Meetup What is Entity search? is the search paradigm of organizing and accessing information centered around entities and their attributes and relationships. Focus: Task of ad-hoc entity retrieval it aims to answer information needs relayed to a particular entity expressed in unconstrained natural language and resolved using a collection of structured data. Entity Search
  • 9. London Information Retrieval Meetup How can we represent and manage these type of information Entities Representation
  • 11. London Information Retrieval Meetup State of the Art Text Retrieval SPARQL < Michael_Schumacher ><name><Michael> <Michael_Schumacher><type><RacingDriver> <Michael_Schumacher><BirthDate><1969-01-03> Entity
  • 12. London Information Retrieval Meetup ! Structured-based approach ! Fielded representation with a field associated to each predicate or class of predicates. State of the Art Document creation: ! Text-based approach ! Triples concatenation in a Bag-Of-Words approach.
  • 13. London Information Retrieval Meetup What do we want to introduce with our approach Our approach, with respect to the state of the art, aims to create documents that involve entities defined by information that consider also the context in which they are collocated. TreeHyde Park Squirrel Nut What’s new
  • 14. London Information Retrieval Meetup Graph Embeddings ! Graph embedding: is a method that allows us to obtain a numerical vector representation of nodes and edges of a graph. It represents the topology and structure of graph through vectors or set of vectors. ! Graph embeddings obtain these representation through the use of a neural network.
  • 15. London Information Retrieval Meetup Word2Vec skip-gram model Graph Embeddings
  • 16. London Information Retrieval Meetup Node2Vec is based on word2vec and wants to identify also relationships as homophily and structural roles through walk generation based on BFS and DFS strategies. Graph Embeddings
  • 17. London Information Retrieval Meetup Background Solution Design Evaluation Discussion and Conclusion Overview
  • 18. London Information Retrieval Meetup Solution Design Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec
  • 19. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  • 21. London Information Retrieval Meetup Experimental Setup
  • 22. London Information Retrieval Meetup Entities Representations ! We obtain entities embedding through the application of Node2Vec setting these parameters: ! Embedding dimension ! Walk length ! Number of walks ! P ! Q ! Workers ! Window size
  • 23. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  • 24. London Information Retrieval Meetup Clustering ! Idea: create documents that contain more than one related and similar entities. ! We execute K-MeansSort on entities embeddings. ! Pro: ! Easy to initialize ! Obtains good results ! Highly scalable.
  • 25. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  • 26. London Information Retrieval Meetup We associate a document to each cluster. Document Creation
  • 27. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  • 28. London Information Retrieval Meetup ! We want to apply the classic text retrieval technique to our virtual documents in order to obtain a ranked list of clusters/documents ! We use BM25 model Ranking System
  • 29. London Information Retrieval Meetup Ranking list of documents Embeddings clusters RDF triples Entities representation Clustering Documents creation Ranking system Entity retrieval Ranking list of documents Entity embeddings Documents Ranking list of entities Embeddings clusters RDF triples Node2Vec Solution Design
  • 30. London Information Retrieval Meetup Entity Retrieval ! The user wants a list of entities and not a list of cluster. ! We implement two set of systems whose aim is to create a ranked list of entities starting from the ranked list of clusters/documents: ! Combination systems: basic approach ! Fusion systems: basic approach + state-of-the-art
  • 31. London Information Retrieval Meetup Combination System
  • 32. London Information Retrieval Meetup Fusion System
  • 33. London Information Retrieval Meetup Overview Background Solution Design Evaluation Discussion and Conclusion
  • 34. London Information Retrieval Meetup ! Quantitative evaluation: general consideration, average measures. ! Qualitative (i.e. topic-based) evaluation: specific consideration. We look for correlation between topics and system effectiveness. Evaluation
  • 35. London Information Retrieval Meetup Quantitative Evaluation
  • 36. London Information Retrieval Meetup Quantitative Evaluation
  • 37. London Information Retrieval Meetup Quantitative Evaluation
  • 38. London Information Retrieval Meetup Quantitative Evaluation
  • 39. London Information Retrieval Meetup ! The topic is: ! ”Chefs that have a TV show in Food Network.” ! We retrieve many relevant entities in top positions. ! We retrieve entities that BM25 does not found. Qualitative Evaluation
  • 40. London Information Retrieval Meetup Overview Background Solution Design Evaluation Discussion and Conclusion
  • 41. London Information Retrieval Meetup Discussion and Conclusion ! In fusion systems we exploit positive aspects of both the methods: cluster-based and classic. ! The cluster construction process and the choice of the number of entities to insert into the final list are fundamental to obtaining good performances in the retrieval phase. ! Performances are penalized in evaluation phase due to the way the collection test is build. ! Our apporach turns out to be promizing because it succeed in finding new relevant entities with respect to the state-of- the-art.
  • 42. London Information Retrieval Meetup Thank you for your attention