The document summarizes Anna Ruggero's presentation on entity search using graph embeddings. The presentation introduced an approach to entity search that creates virtual documents combining related entities based on their embeddings. Entities from DBPedia were represented as nodes in a graph and Node2Vec was used to generate embeddings. The embeddings were clustered to form documents, which were ranked using BM25. Systems that combined or fused the document rankings and entity rankings were evaluated on entity search tasks and found relevant entities that traditional methods did not.
Entity Search on Virtual Documents Created with Graph Embeddings
1. London Information Retrieval Meetup
Latest Updates
Anna Ruggero, R&D Software Engineer
11th February 2020
Entity Search on Virtual Documents
Created with Graph Embeddings
2. London Information Retrieval Meetup
Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
● Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning
3. London Information Retrieval Meetup
Who I am
! R&D Search Software Engineer
! Master Degree in Computer Science
Engineering
! Big Data, Information Retrieval
! Organist, Music lover
6. London Information Retrieval Meetup
Entity Search
! Entity: an entity is a uniquely identifiable object or thing
characterized by its name(s), type(s) and relationships to
other entities.
7. London Information Retrieval Meetup
<dbr:Michael_Schumacher>
Michael
Schumacher
Racing
Driver
1969-01-03
Entity Search
8. London Information Retrieval Meetup
What is Entity search?
is the search paradigm of organizing and
accessing information centered around entities and
their attributes and relationships.
Focus: Task of ad-hoc entity retrieval
it aims to answer information needs relayed to
a particular entity expressed in unconstrained
natural language and resolved using a collection of
structured data.
Entity Search
9. London Information Retrieval Meetup
How can we represent and
manage these type of
information
Entities Representation
11. London Information Retrieval Meetup
State of the Art
Text
Retrieval
SPARQL
< Michael_Schumacher ><name><Michael>
<Michael_Schumacher><type><RacingDriver>
<Michael_Schumacher><BirthDate><1969-01-03>
Entity
12. London Information Retrieval Meetup
! Structured-based approach
! Fielded representation with a field
associated to each predicate or
class of predicates.
State of the Art
Document creation:
! Text-based approach
! Triples concatenation in a
Bag-Of-Words approach.
13. London Information Retrieval Meetup
What do we want to introduce with
our approach
Our approach, with respect to the state
of the art, aims to create documents that
involve entities defined by information
that consider also the context in which
they are collocated.
TreeHyde Park
Squirrel
Nut
What’s new
14. London Information Retrieval Meetup
Graph Embeddings
! Graph embedding: is a method that allows us to obtain a
numerical vector representation of nodes and edges of a
graph. It represents the topology and structure of graph
through vectors or set of vectors.
! Graph embeddings obtain these representation through the
use of a neural network.
16. London Information Retrieval Meetup
Node2Vec is based on word2vec and wants to identify also
relationships as homophily and structural roles through walk
generation based on BFS and DFS strategies.
Graph Embeddings
18. London Information Retrieval Meetup
Solution Design
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
19. London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
22. London Information Retrieval Meetup
Entities Representations
! We obtain entities embedding through the application of
Node2Vec setting these parameters:
! Embedding dimension
! Walk length
! Number of walks
! P
! Q
! Workers
! Window size
23. London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
24. London Information Retrieval Meetup
Clustering
! Idea: create documents that contain more than one related and similar
entities.
! We execute K-MeansSort on entities embeddings.
! Pro:
! Easy to initialize
! Obtains good results
! Highly scalable.
25. London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
27. London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
28. London Information Retrieval Meetup
! We want to apply the classic text retrieval technique to
our virtual documents in order to obtain a ranked list of
clusters/documents
! We use BM25 model
Ranking System
29. London Information Retrieval Meetup
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design
30. London Information Retrieval Meetup
Entity Retrieval
! The user wants a list of entities and not a list of cluster.
! We implement two set of systems whose aim is to create a ranked
list of entities starting from the ranked list of clusters/documents:
! Combination systems: basic approach
! Fusion systems: basic approach + state-of-the-art
34. London Information Retrieval Meetup
! Quantitative evaluation: general consideration, average
measures.
! Qualitative (i.e. topic-based) evaluation: specific
consideration. We look for correlation between topics and
system effectiveness.
Evaluation
39. London Information Retrieval Meetup
! The topic is:
! ”Chefs that have a TV
show in Food Network.”
! We retrieve many relevant
entities in top positions.
! We retrieve entities that
BM25 does not found.
Qualitative Evaluation
41. London Information Retrieval Meetup
Discussion and Conclusion
! In fusion systems we exploit positive aspects of both the
methods: cluster-based and classic.
! The cluster construction process and the choice of the
number of entities to insert into the final list are
fundamental to obtaining good performances in the retrieval
phase.
! Performances are penalized in evaluation phase due to the
way the collection test is build.
! Our apporach turns out to be promizing because it succeed
in finding new relevant entities with respect to the state-of-
the-art.