Entity Search on Virtual Documents Created with Graph Embeddings

London Information Retrieval Meetup
Latest Updates
Anna Ruggero, R&D Software Engineer
11th February 2020
Entity Search on Virtual Documents
Created with Graph Embeddings

Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
● Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning

Who I am
! R&D Search Software Engineer
! Master Degree in Computer Science
Engineering
! Big Data, Information Retrieval
! Organist, Music lover

Background
Solution Design
Evaluation
Discussion and
Conclusion
Overview

Entity Search
! Entity: an entity is a uniquely identifiable object or thing
characterized by its name(s), type(s) and relationships to
other entities.

<dbr:Michael_Schumacher>
Michael
Schumacher
Racing
Driver
1969-01-03
Entity Search

What is Entity search?
is the search paradigm of organizing and
accessing information centered around entities and
their attributes and relationships.
Focus: Task of ad-hoc entity retrieval
it aims to answer information needs relayed to
a particular entity expressed in unconstrained
natural language and resolved using a collection of
structured data.
Entity Search

How can we represent and
manage these type of
information
Entities Representation

RDF

State of the Art
Text
Retrieval
SPARQL
< Michael_Schumacher ><name><Michael>
<Michael_Schumacher><type><RacingDriver>
<Michael_Schumacher><BirthDate><1969-01-03>
Entity

! Structured-based approach
! Fielded representation with a field
associated to each predicate or
class of predicates.
State of the Art
Document creation:
! Text-based approach
! Triples concatenation in a
Bag-Of-Words approach.

What do we want to introduce with
our approach
Our approach, with respect to the state
of the art, aims to create documents that
involve entities defined by information
that consider also the context in which
they are collocated.
TreeHyde Park
Squirrel
Nut
What’s new

Graph Embeddings
! Graph embedding: is a method that allows us to obtain a
numerical vector representation of nodes and edges of a
graph. It represents the topology and structure of graph
through vectors or set of vectors.
! Graph embeddings obtain these representation through the
use of a neural network.

Word2Vec skip-gram model
Graph Embeddings

Node2Vec is based on word2vec and wants to identify also
relationships as homophily and structural roles through walk
generation based on BFS and DFS strategies.
Graph Embeddings

Solution Design
Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec

Ranking list of
documents
Embeddings
clusters
RDF triples
Entities
representation
Clustering
Documents
creation
Ranking
system
Entity
retrieval
Ranking list of
documents
Entity
embeddings
Documents
Ranking list
of entities
Embeddings
clusters
RDF triples
Node2Vec
Solution Design

Dbpedia

Experimental Setup

Entities Representations
! We obtain entities embedding through the application of
Node2Vec setting these parameters:
! Embedding dimension
! Walk length
! Number of walks
! P
! Q
! Workers
! Window size

Clustering
! Idea: create documents that contain more than one related and similar
entities.
! We execute K-MeansSort on entities embeddings.
! Pro:
! Easy to initialize
! Obtains good results
! Highly scalable.

We associate a document to each cluster.
Document Creation

! We want to apply the classic text retrieval technique to
our virtual documents in order to obtain a ranked list of
clusters/documents
! We use BM25 model
Ranking System

Entity Retrieval
! The user wants a list of entities and not a list of cluster.
! We implement two set of systems whose aim is to create a ranked
list of entities starting from the ranked list of clusters/documents:
! Combination systems: basic approach
! Fusion systems: basic approach + state-of-the-art

Combination System

Fusion System

Overview
Background
Solution Design
Evaluation
Discussion and
Conclusion

! Quantitative evaluation: general consideration, average
measures.
! Qualitative (i.e. topic-based) evaluation: specific
consideration. We look for correlation between topics and
system effectiveness.
Evaluation

Quantitative Evaluation

! The topic is:
! ”Chefs that have a TV
show in Food Network.”
! We retrieve many relevant
entities in top positions.
! We retrieve entities that
BM25 does not found.
Qualitative Evaluation

Discussion and Conclusion
! In fusion systems we exploit positive aspects of both the
methods: cluster-based and classic.
! The cluster construction process and the choice of the
number of entities to insert into the final list are
fundamental to obtaining good performances in the retrieval
phase.
! Performances are penalized in evaluation phase due to the
way the collection test is build.
! Our apporach turns out to be promizing because it succeed
in finding new relevant entities with respect to the state-of-
the-art.

Thank you for your attention

Entity Search on Virtual Documents Created with Graph Embeddings

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Entity Search on Virtual Documents Created with Graph Embeddings

Similar to Entity Search on Virtual Documents Created with Graph Embeddings (20)

More from Sease

More from Sease (20)

Recently uploaded

Recently uploaded (20)

Entity Search on Virtual Documents Created with Graph Embeddings