SlideShare a Scribd company logo
1 of 78
Ontologies and their use in
Information Retrieval
Mauro Dragoni
Fondazione Bruno Kessler (FBK), Shape and Evolve Living Knowledge Unit (SHELL)
https://shell.fbk.eu/index.php/Mauro_Dragoni - dragoni@fbk.eu
KEYSTONE Training School, Malta
July, 20th 2015
Outline
1. On your marks and get set…
2. A general approach: pros and cons of concept-based structured
representations
3. Ontology-based IR platforms
4. Behind the lines
a) Cross-language Information Retrieval
b) Ontology Matching
Before to start…
 What is an ontology?
 What is a machine-readable dictionary?
 What about ambiguity?
 Terms vs. concepts, is everything clear?
What is an ontology?
 “the branch of philosophy which deals with the nature and the organization
of reality”
 “an ontology is an explicit specification of a conceptualization”
[Gruber1993]
 conceptualization: abstract model of the world
 explicit specification: model described by using unambiguous language
 domain ontology
 upper ontology
 example: DOLCE [Guarino2002]
Ontology Components
 Classes: entities describing objects common characteristics (for example:
“Agricultural Method”).
 Individuals: entities that are instances of classes (for example “Multi Crops
Farming” is an instance of “Agricultural Method”).
 Properties: binary relations between entities (for example “IsAffectedBy”).
 Attributes (or DataType Properties): characteristics that qualify individuals
(for example “Has Name”).
Hierarchies
 Concepts can be organized in subsumptions hierarchies
 Meaning: every sub-concepts is also a super-concept
 Examples:
 “Intensive Farming” is-a “Agricultural Method”
 “Agricultural Method” is-a “Method”
 Concept hierarchies are generally represented by using tree structures
Attributes and Properties
 Properties: binary relations between classes
 Domain and co-domain: classes to which individuals need to belong to be in
relation
 Example: “Agriculture” <isAffectedBy> “Agriculture Pollution”
 Attributes: binary relations between an individual and values (not other
entities)
 Domain: class to which the attribute is applied
 Co-domain: the type of the value (for example “String”)
 Properties and Attributes can be organized in hierarchies.
Steps for building an ontology
 To identify the classes of the domain.
 To organize them in a hierarchy.
 To define properties and attributes.
 To define individuals, if there are.
Why ontologies are useful?
 Ontologies provide:
 common dictionary of terms;
 a shared and formal interpretation of the domain.
 Ontologies permit to:
 solve ambiguities;
 share knowledge (not only between humans, but also between machines);
 use automatic reasoning techniques.
Use of ontologies in IR
 Exploit metadata
 Entity linking
 “which president …”  “Barack Obama is-a President”
 Extraction of triples from text
 applying NLP parsers for extracting dependencies
What is an thesaurus?
 A “coarse” version of ontologies
 Generally, 3 kinds of relations are represented:
 hierarchical (generalization/specialization)
 equivalence (synonymity)
 associative (other kind of relationships)
 Extensive tool used for query expansion approaches [Bhogal2007,
Grootjen2006,Qiu1993,Mandala2000]
Machine-readable dictionaries
 A dictionary in an electronic form.
 The power of MRD is characterized by word senses. [Kilgariff1997,
Lakoff1987, Ruhl1989]
 Identity of meaning: synonyms [Gove1973]
 Inclusion of meaning: hyponymy or hyperonymy; troponymy [Cruse1986,
Green2002, Fellbaum1998]
 transitive relationship
 Part-whole meaning: meronymy (has part), holonymy (part of)
[Green2002, Cruse1986, Evens1986]
 Opposite meaning: antonymy
and now…
… let’s see how we can exploit this within
an information retrieval system…
Motivations and Challenges
 Considering how information is usually represented and classified.
 Documents and Queries are represented using terms.
 Indexing:
 terms are extracted from each document;
 terms frequency of each document is computed (TF);
 terms frequency over the entire index is computed (IDF).
 Searching:
 the vector space model is used to computed the similarity between documents and
queries;
 queries are generally expanded to increase the recall of the system.
Drawbacks of the
Term-Based representation – 1/2
 The “semantic connections” between terms in documents and queries are
not considered.
 Different vector positions may be allocated to the synonyms of the same
term:
 the importance of a determinate concept is distributed among different vector
components;
 information loss.
Drawbacks of the
Term-Based representation – 2/2
 The query expansion has to be used carefully.
 It is more easy to increase the recall of a system with respect to its precision.
Which is better? [Abdelali2007]
 In the worst case, the size of a document vector could be close to the
number of terms used in the repository:
 in general, the number of concepts is less than the number of words;
 the time needed to compare documents is higher;
Intuition Behind
 Using concepts to represent the terms contained in documents and
queries. [Dragoni2012b]
1. Documents and Queries may be represented in the same way.
2. The issue related to how many and which terms have to be used for query
expansion is not considered.
3. The size of a concept vector is generally smaller than the size of a term vector.
 IMPORTANT: This is not a query expansion technique !!!
a first simple example …
 a close vocabulary:
a first simple example …
 a close vocabulary:
 how to compute concept weights?
a first simple example …
 how is weighted each concept of the vocabulary?
 suppose to have the document “xxyyyz”
a first simple example …
… that we evaluated
 Experiments on the MuchMore Collection (http://muchmore.dfki.de)
 The collection contains numerous medical terms.
 The term-based representations is advantaged over the semantic
representation.
 Experiments on the TREC Ad-Hoc Collection:
 Results have been compared with the IRS presented at TREC-7 and TREC-8
conference
 Only the systems that implements a semantic representation of queries have
been considered.
 Over dozens of runs, the three systems that performs better at recall 0.0 have
been chosen. [Spink2006]
MuchMore Collection
System P@5 P@10 P@15 P@30 MAP
Term-Based 0.544 0.480 0.405 0.273 0.449
Synset-Based 0.648 0.484 0.403 0.309 0.459
Conceptual Indexing 0.770 0.735 0.690 0.523 0.449
Ontology Indexing 0.784 0.765 0.728 0.594 0.477
TREC-7
System P@5 P@10 P@15 P@30 MAP
Term-Based 0.444 0.414 0.375 0.348 0.199
AT&T Labs 1 0.644 0.558 0.499 0.419 0.296
AT&T Labs 2 0.644 0.558 0.497 0.413 0.294
City University, Sheffield, Microsoft 0.572 0.542 0.507 0.412 0.288
Ontology Indexing 0.656 0.588 0.501 0.397 0.309
TREC-7
System P@5 P@10 P@15 P@30 MAP
Term-Based 0.444 0.414 0.375 0.348 0.199
AT&T Labs 1 0.644 0.558 0.499 0.419 0.296
AT&T Labs 2 0.644 0.558 0.497 0.413 0.294
City University, Sheffield, Microsoft 0.572 0.542 0.507 0.412 0.288
Ontology Indexing 0.656 0.588 0.501 0.397 0.309
TREC-8
System P@5 P@10 P@15 P@30 MAP
Term-Based 0.476 0.436 0.389 0.362 0.243
IBM Watson 0.588 0.504 0.472 0.410 0.301
Microsoft Research 0.580 0.550 0.499 0.425 0.317
TwentyOne 0.500 0.454 0.433 0.368 0.292
Ontology Indexing 0.616 0.572 0.485 0.415 0.315
TREC-8
System P@5 P@10 P@15 P@30 MAP
Term-Based 0.476 0.436 0.389 0.362 0.243
IBM Watson 0.588 0.504 0.472 0.410 0.301
Microsoft Research 0.580 0.550 0.499 0.425 0.317
TwentyOne 0.500 0.454 0.433 0.368 0.292
Ontology Indexing 0.616 0.572 0.485 0.415 0.315
Some considerations
 Two drawbacks have been identified:
 The absence of some terms in the ontology, (in particular terms related to
specific domains like biomedical, mechanical, business, etc.), may affects the
final retrieval result.
 a more complete knowledge base is needed.
 Term ambiguity. By using a Word Sense Disambiguation approach, concepts
associated with incorrect senses would be discarded or weighted less.
 a Word Sense Disambiguation algorithm is required: but it has to be used carefully.
Few words on disambiguation
Few words on disambiguation
Checkpoint 1
 the use of machine-readable dictionaries is suitable for implementing a
first semantic engine
 but if we use ontologies we have more and more information
 properties
 attributes
 the problem is: how can we exploit all these information?
Ontology enhanced IR
 Enrichment of documents (and queries) with information coming from
semantic resources
 information expansion: adding synonyms, antonyms, … not new but still helpful
 annotations: relation or association between a semantic entity and a document
 Most of the information expansion systems are based on WordNet and the
Roget’s Thesaurus
 Systems using annotations are interfaced with the Linked Open Data
cloud, and mainly with Freebase and Wikipedia
Classification of
Semantic IR approaches
Criterion Approaches
Semantic
knowledge
representation
• Statistical [Deerwester1990]
• Linguistic conceptualization [Gonzalo1998,
Mandala1998,Giunchiglia2009]
• Ontology-based [Guha2003,Popov2004]
Scope • Web search [Finin2005,Fernandez2008]
• Limited domain repositories [Popov2004]
• Desktop search [Chirita2005]
Query • Keyword query [Guha2003]
• Natural language query [Lopez2009]
• Controlled natural language query [Bernstein2006, Cohen2003]
• Structured query based on ontology query language [notes]
Content
retrieved
• Data retrieval
• Information retrieval
Content
ranking
• No ranking
• Keyword-based ranking [Guha2003]
• Semantic-based ranking [Stojanovic2003]
Limitation of Semantic
IR approaches – 1/2
Criterion Limitation IR Semantic
Semantic knowledge
representation
• No exploitation of the full
potential of an ontological
language, beyond those that
could be reduced to
conventional classification
schemes.
x (Partially)
Scope • No scalability to large and
heterogeneous repositories of
documents.
x
Goal • Boolean retrieval models where
the Information Retrieval
problem is reduced to a data
retrieval task.
x
Query • Limited usability x
Limitation of Semantic
IR approaches – 2/2
Criterion Limitation IR Semantic
Content retrieved • Focus on textual content: no
management of different formats
(multimedia)
(Partially) (Partially)
Content ranking • Lack of semantic ranking
criterion. The ranking (if provided
relies on keyword-based
approaches.
x x
Coverage • Knowledge incompleteness.
[Croft1986]
(Partially) x
Evaluation • Lack of standard evaluation
frameworks. [Giunchiglia2009]
x
A basic ontology-based IR model
SPARQL
Editor
SPARQL
Query
Query
Processing
Searching
Indexing
Ranking
Semantic
Entities
Semantic Knowledge
(ontology + KB)
Document Corpus
Ranked
Documents
Semantic Index
(weighted annotations)
User
Unsorted
Documents
Basic ontology-based IR model - Limits
 Heterogeneity
 a single ontologies (but also a set of them) cannot covers all possible domains
 Scalability
 imagine to annotate the Web by using all knowledge bases currently available
 a final solution does not exist… but nice and practical approaches can be used
 Usability
 try to think… are all the people you know able to write queries in SPARQL?
Extended ontology-based IR model
Nat. Lang.
Interface
Natural Lang.
Query
Query
Processing
Searching
Indexing
Ranking
Semantic
Entities
Preprocessed
Semantic Knowledge
Unstructured Web
contents
Ranked
Documents
Semantic Index
(weighted annotations)
User
Unsorted
Documents
Semantic
Web
Evaluation Results
 Mean Average Precision
 Prec@10
Semantic System Lucene TREC Automatic
0.16 0.1 0.2
Semantic System Lucene TREC Automatic
0.37 0.25 0.30
A focus on the indexing procedure
 Challenge: to link semantic knowledge with documents and query in an
efficient and effective way:
 document corpus and semantic knowledge should remain decoupled;
 annotations have to be provided in a flexible and scalable way.
 Annotations can be provided in two ways:
 by applying an information extraction technique based on pure NLP
approaches;
 by applying a contextual semantic information approach.
Annotator Requirements
 Identification of the entities within the documents
 conceptually, it is not so much different w.r.t. a traditional IR indexing process
 Ontologies must not be touched (decoupling)
 Should be open-domain
 Scalable-friendly:
 indexing of ontologies;
 indexing of documents;
 an interesting alternative: usage of non-embedded annotations
Natural Language Processing Annotation
<html>
<body>
<p>Schizophrenia patients whose medication couldn’t stop
the imaginary voices in their heads</p>
</body>
</html>
HTML
Parser
Schizophrenia patients whose medication
couldn’t stop the imaginary voices in their
heads
NLP
Tools
<document><p><s>
<w c=“w” pos=“NNP” stem=“Schizophrenia”>Schizophrenia</w>
<w c=“w” pos=“NN$” stem=“patient”>patients</w>
<w c=“w” pos=“WP$”>whose</w>
<w c=“w” pos=“NN” stem=“medication”>medications</w>
<w c=“w” pos=“MD”>could</w><w c=“w” pos=“RB”>not</w>
<w c=“w” pos=“VB” stem=“stop”>stop</w>
<w c=“w” pos=“DT”>the</w>
<w c=“w” pos=“JJ”>imaginary</w>
<w c=“w” pos=“NN$” stem=“voice”>voices</w>
<w c=“w” pos=“IN”>in</w>
<w c=“w” pos=“PRP$”>their</w>
<w c=“w” pos=“NN$” stem=“head”>heads</w>
</s></p></document>
Token
Filter
schizophrenia
patient
medication
stop
voice
head
Natural Language Processing Annotation
schizophrenia
patient
medication
stop
voice
head
Index
Searcher
Frequency
Counter
Annotation
Creator
Keyword Ontology Entities
schizophrenia E1
patient E4, E5
… …
head E2, E8
Ontology Entity Document Frequencies
E1 D1(1), D4(2)
… …
E8 D1(1), D5(4), D6(3)
Ontology Entity Document Weight
E1 D1 0.9
… …
E8 D1 0.3
Contextual Semantic Annotation
Ontology
Selection of the
semantic context
Selection of
contextualized terms in
the document index
Search of terms in
the document index
Selection of a
semantic entity
E1: Individual Maradona;
Labels: {“Maradona”,
“Diego Maradona”,
“Pelusa”}
Keyword Documents
Maradona D1, D2, D87
Pelusa D95, D140
football_player D87, D61, D44, D1
Argentina D43, D32, D2
E34 = Class: football_player
Labels:{“football player”}
E22 = Individual: Argentina
Labels:{“Argentina”}
Potential documents to annotate
{D1, D2, D87, D95, D140}
Contextualized documents
{D1, D2, D32, D43, D44, D61, D87}
Contextual Semantic Annotation
Potential documents to annotate
{D1, D2, D87, D95, D140}
Contextualized documents
{D1, D2, D32, D43, D44, D61, D87}
Selection of semantic
contextualized documents
Documents to annotate
{D1, D2, D87, D95, D140}
Creation of annotations
Ontology Entity Document Weight
E1 D1 0.5
E1 D2 0.2
E1 D87 0.67
An idea for aggregating rankings
 Multi-dimensional aggregation criteria
 Document score is computed from different perspectives (criteria)
 Assignment of priorities to criteria
 Compute criteria weights
 Weight of criteria with low priority depends on the score of criteria with high
priority
 Aggregate criteria scores [Dragoni2012]
Querying and Ranking
 Queries transformed by mapping terms with ontology entities
 Contextual disambiguation is very important
 simple example: “Rock musicians Britain”
 Ranking: two options
 to evaluate only the “matches” between detected entities
 to aggregate (on your way) rank produced by using only the entities, only the
query terms, and/or both of them
Use of multiple ontologies
 What we need: an Ontology Gateway
 Tasks of an ontology gateway:
 collect available semantic content;
 store the semantic content efficiently in order to ease its access;
 implement and approach for the “selection” of the content
 Most important ontology gateways online:
 Swoogle [Ding2004,Brin1998]
 Watson [Aquin2007,Aquin2007b]
 WebCORE [Fernandez2006,Fernandez2007]
Use of multiple ontologies - opportunities
 Recall improvement:
 Ontology 1 focused on entities  stress on the identification of semantic
entities within the document
 Ontology 2 focused on properties  stress on the identification of relationships
between entities in the document
 precision should also increase, but some drops are possible.
 Supporting multiple perspectives:
 analysis of each entities from different point of views
Use of multiple ontologies - challenges
 To figure out how to use them:
 it is necessary to formally represent the relationships between the ontologies
and the techniques used for extracting information from them;
 example: you may have ontologies describing the same domain by using
different structures!!!
 To find suitable ontologies and mappings:
 again: more than one ontologies describing the same domain;
 not a good practice to select only one  build mappings!!!
A use case
 Information system containing products technical data
 users look for something that satisfies their needs
 engineers want to exploit information for creating new product variants
 Ontologies focused on particular aspects of products
 product conceptualizations are separated
A multi-ontology approach
A multi-ontology approach
A multi-ontology approach
A multi-ontology approach
A multi-ontology approach
Checkpoint 2
 Annotation of documents is more important than the querying of the
repositories… why?
 differences in the amount of content
 once we have decided how to annotate documents, queries should be
annotated by using the same procedure in order to homogenize the process
 Challenges in built knowledge bases
 Ranking… play with them and “stress your creativity”
Ontologies and IR – 2 use cases
 Demonstrate the usefulness of semantic approaches used in combination
with traditional IR techniques.
 Show how IR and Semantics may help each other
 Two scenarios:
 Cross-language information retrieval [Dragoni2014]
 Ontology matching [Dragoni2015]
 Sentiment analysis
Cross-Language Information Retrieval
Background - Challenges
 Out-of-Vocabulary issue
 improve the corpora used for training the machine translation model.
 usage of domain information for increasing the coverage of the
dictionaries.
 Usage of semantic artifacts for structuring the representation of
(multilingual) documents.
 GOAL: to integrate domain-specific semantic knowledge
within a CLIR system and evaluate their effectiveness
Our Scenario
 Use case: the agricultural domain
 Knowledge resources: Agrovoc and Organic.Lingua ontologies
 3 components used in the proposed approach:
 Annotator
 Indexer
 Retriever
Annotation Process – Step 1
en
es
it
de
fr
….
en
es
it
de
fr
….
 Document content is used as query.
 Between the candidate results, only “exact matches” are
considered.
Annotation Process – Step 2
Approach – Annotation Stats
Domain
Ontology
Number of
Concepts
Manual
Annotations
Automatic
Annotations
Agrovoc (AV) 32061 0 133596
(5834 distinct
concepts used)
Organic.Lingua (OL) 291 27871
(264 distinct
concepts used)
16434
(208 distinct
concepts used)
Approach - Index
 Given a document:
 Text and annotations are extracted.
 The context of each concept is retrieved from the ontologies.
 Each contextual concepts are indexed with a weight proportional
w.r.t. their semantic distance from the semantic annotation.
 Structure of each index record:
Approach - Retriever
 Three retrieval configurations available:
 Only translations: query terms are translated by using machine
translation services.
 Semantic expansion by exploiting the domain ontology: query terms
are matched with ontology concepts; if an exact match exists, query
is expanded by using the URI of the concept and the URIs of the
contextual ones.
 Ontology matching only: terms not having an exact match with
ontology concepts are discarded.
Evaluation - Setup
 Collection of 13,000 multilingual documents.
 48 queries originally provided in English and manually translated
in 12 languages under the supervision of both domain and
language experts.
 Gold standard manually built by the domain experts.
 MAP, Prec@5, Prec@10, Prec@20, Recall have been used.
Results - 1
Avg. MAP Prec@5 Prec@10 Prec@20 Avg. Rec.
BASELINE 0.554 0.617 0.545 0.465 0.920
Auto: AV 3.24% 3.11% 5.04% 3.81% 2.52%
Auto: OL 2.31% 1.91% 2.88% 2.98% 0.77%
Auto: AV+OL 3.13% 2.95% 4.63% 3.86% 2.53%
Auto+Man: OL 1.65% 3.40% 3.95% 4.48% 1.37%
Auto+Man: AV+OL 4.38% 5.96% 7.18% 6.07% 2.97%
Auto+Man*2: OL 1.00% 3.30% 4.02% 3.27% 1.36%
Auto+Man*2: AV+OL 3.29% 4.86% 6.73% 6.03% 2.97%
Results - 2
Query
Cov.
Avg. MAP Prec@5 Prec@10 Prec@20 Avg. Rec.
AV 39.3
(9 langs)
0.137 0.189 0.191 0.179 0.552
OL 15.7
(10 langs)
0.260 0.359 0.319 0.322 0.635
AV + OL 33.3
(12 langs)
0.173 0.247 0.226 0.221 0.586
Ontology Matching
 Given two thesauri/ontologies/vocabularies find alignments between
entities
 Formally a “match” may be represented with the following 5-tuple:
‹ id, e1, e2, R, c ›
 Extensive literature about matching approaches (early ‘80s)
Motivations
 Need: a system, for experts, able to suggest possible matches between
concepts
 Exploit multilinguality… why?
 allows to reduce ambiguity: the probability, for two different concepts, of having
the same label across several languages is very low.
 term translations have been adapted to the domain: experts in charge of
translations put a lot of their cultural heritage in choosing the right terms for
each concept.
The Proposed Approach - 1
 Inspired by information retrieval techniques
 Built on top of the Lucene search engine
 For each element of the thesaurus a structured multilingual representation
is built:
 An index for each thesaurus is built
[prefLabel] "Food chains"@en
[prefLabel] "Catene alimentari"@it
[altLabel] "Food distributions"@en
[altLabel] "Reti alimentari"@it
label-en: “food chain”
label-en: “food distribution”
label-it: “catena alimentare”
label-it: “rete alimentare”
The Proposed Approach - 2
 How matches are suggested?
 source and target thesauri are chosen
 for each concept, a query is performed from the source to the target thesaurus
 the standard Lucene scoring formula is used for computing the ranking
 for each query, a ranking of 5 suggestions is provided to the user
Evaluation Set-Up
 2 contexts:
 six multilingual thesauri (3 medical domain, 3 agricultural domain)
 adapted Multifarm benchmark
 2 tasks:
 matching system (only the first suggestion is considered)
 suggestion system
Results - 1
Mapping Set # of Mappings Prec@1 Prec@3 Prec@5 Recall
Eurovoc  Agrovoc 1297 0.816 0.931 0.967 0.874
Agrovoc  Eurovoc 1297 0.906 0.969 0.988 0.695
Avg. 0.861 0.950 0.978 0.785
Gemet  Agrovoc 1181 0.909 0.964 0.983 0.546
Agrovoc  Gemet 1181 0.943 0.981 0.994 0.740
Avg. 0.926 0.973 0.989 0.643
MDR  MeSH 6061 0.776 0.914 0.956 0.807
MeSH  MDR 6061 0.716 0.888 0.939 0.789
Avg. 0.746 0.901 0.948 0.798
MDR  SNOMED 19971 0.621 0.826 0.908 0.559
SNOMED  MDR 19971 0.556 0.760 0.855 0.519
Avg. 0.589 0.793 0.882 0.539
MeSH  SNOMED 26634 0.690 0.871 0.931 0.660
SNOMED  MeSH 26634 0.657 0.835 0.908 0.564
Avg. 0.674 0.853 0.920 0.612
Results obtained by the proposed system on the domain-specific thesauri
Results - 2
Mapping Set IRBOM WeSeE
(2012)
RiMOM
(2013)
YAM++
(2013)
YAM++
(2012)
AUTOM
Sv2
(2012)
Agrovoc  Eurovoc 0.821 0.785 0.628 0.615 0.615 0.599
Gemet  Agrovoc 0.759 0.726 0.548 0.579 0.579 0.485
MDR  MeSH 0.771 0.749 0.611 0.613 0.613 0.536
MDR  SNOMED 0.563 0.624 0.495 0.473 0.473 0.405
MeSH  SNOMED 0.642 0.631 0.457 0.458 0.458 0.497
Results obtained by the all systems on the domain-specific thesauri
Results - 3
System Name Precision Recall F-Measure
IRBOM 0.68 0.43 0.53
WeSeE (2012) 0.61 0.32 0.41
RiMOM (2013) 0.52 0.13 0.21
YAM++ (2013) 0.51 0.36 0.40
YAM++ (2012) 0.50 0.36 0.40
AUTOMSv2 (2012) 0.49 0.10 0.36
Results obtained by all systems on the adapted Multifarm Benchmark
So… at the end…
 Ontologies in IR is still a controversial topic
 Personal Opinion: to combine structured and unstructured representation
seems to be the most suitable solution
 Pay attention to the kind of queries performed by users
 Aggregation of results
 Be brave… try to work with triples!!!!
Mauro Dragoni
https://shell.fbk.eu/index.php/Mauro_Dragoni
dragoni@fbk.eu

More Related Content

What's hot

Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
maxfalc
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
Mounia Lalmas-Roelleke
 

What's hot (20)

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Text mining
Text miningText mining
Text mining
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platform
 
Text mining
Text miningText mining
Text mining
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked Data
 
Linking library data
Linking library dataLinking library data
Linking library data
 
Ziegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections LibrariesZiegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections Libraries
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Keystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenanceKeystone summer school 2015 paolo-missier-provenance
Keystone summer school 2015 paolo-missier-provenance
 

Viewers also liked

Recommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted rankingRecommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted ranking
vikramadityajakkula
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
Atul Shridhar
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platform
toncho11
 

Viewers also liked (20)

1st KeyStone Summer School - Hackathon Challenge
1st KeyStone Summer School - Hackathon Challenge1st KeyStone Summer School - Hackathon Challenge
1st KeyStone Summer School - Hackathon Challenge
 
Search, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving DataSearch, Exploration and Analytics of Evolving Data
Search, Exploration and Analytics of Evolving Data
 
Curse of Dimensionality and Big Data
Curse of Dimensionality and Big DataCurse of Dimensionality and Big Data
Curse of Dimensionality and Big Data
 
Aggregating Multiple Dimensions for Computing Document Relevance
Aggregating Multiple Dimensions for Computing Document RelevanceAggregating Multiple Dimensions for Computing Document Relevance
Aggregating Multiple Dimensions for Computing Document Relevance
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
 
Recommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted rankingRecommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted ranking
 
School intro
School introSchool intro
School intro
 
Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Ontological approach for improving semantic web search results
Ontological approach for improving semantic web search resultsOntological approach for improving semantic web search results
Ontological approach for improving semantic web search results
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to Delivery
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platform
 
A Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval TechniquesA Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval Techniques
 
Ir 01
Ir   01Ir   01
Ir 01
 
In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?
 
Semantics And Search
Semantics And SearchSemantics And Search
Semantics And Search
 
Semantic data mining: an ontology based approach
Semantic data mining: an ontology based approachSemantic data mining: an ontology based approach
Semantic data mining: an ontology based approach
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...
 

Similar to Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval

Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
IJwest
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
eswcsummerschool
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
IJwest
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 

Similar to Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval (20)

The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Ontology
OntologyOntology
Ontology
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect Information
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
 
0810ijdms02
0810ijdms020810ijdms02
0810ijdms02
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
 
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic Mining
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontology
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Ontologies
OntologiesOntologies
Ontologies
 

More from Mauro Dragoni

Keynote given at ISWC 2019 Semantic Management for Healthcare Workshop
Keynote given at ISWC 2019 Semantic Management for Healthcare WorkshopKeynote given at ISWC 2019 Semantic Management for Healthcare Workshop
Keynote given at ISWC 2019 Semantic Management for Healthcare Workshop
Mauro Dragoni
 

More from Mauro Dragoni (8)

Keynote given at ISWC 2019 Semantic Management for Healthcare Workshop
Keynote given at ISWC 2019 Semantic Management for Healthcare WorkshopKeynote given at ISWC 2019 Semantic Management for Healthcare Workshop
Keynote given at ISWC 2019 Semantic Management for Healthcare Workshop
 
Exploiting Multilinguality For Creating Mappings Between Thesauri
Exploiting Multilinguality For Creating Mappings Between ThesauriExploiting Multilinguality For Creating Mappings Between Thesauri
Exploiting Multilinguality For Creating Mappings Between Thesauri
 
Semantic-based Process Analysis
Semantic-based Process AnalysisSemantic-based Process Analysis
Semantic-based Process Analysis
 
Authoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntaxAuthoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntax
 
A Fuzzy Approach For Multi-Domain Sentiment Analysis
A Fuzzy Approach For Multi-Domain Sentiment AnalysisA Fuzzy Approach For Multi-Domain Sentiment Analysis
A Fuzzy Approach For Multi-Domain Sentiment Analysis
 
Using Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR SystemsUsing Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR Systems
 
Multilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best PracticesMultilingual Knowledge Organization Systems Management: Best Practices
Multilingual Knowledge Organization Systems Management: Best Practices
 
Collaborative Modeling of Processes and Ontologies with MoKi
Collaborative Modeling of Processes and Ontologies with MoKiCollaborative Modeling of Processes and Ontologies with MoKi
Collaborative Modeling of Processes and Ontologies with MoKi
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval

  • 1. Ontologies and their use in Information Retrieval Mauro Dragoni Fondazione Bruno Kessler (FBK), Shape and Evolve Living Knowledge Unit (SHELL) https://shell.fbk.eu/index.php/Mauro_Dragoni - dragoni@fbk.eu KEYSTONE Training School, Malta July, 20th 2015
  • 2. Outline 1. On your marks and get set… 2. A general approach: pros and cons of concept-based structured representations 3. Ontology-based IR platforms 4. Behind the lines a) Cross-language Information Retrieval b) Ontology Matching
  • 3. Before to start…  What is an ontology?  What is a machine-readable dictionary?  What about ambiguity?  Terms vs. concepts, is everything clear?
  • 4. What is an ontology?  “the branch of philosophy which deals with the nature and the organization of reality”  “an ontology is an explicit specification of a conceptualization” [Gruber1993]  conceptualization: abstract model of the world  explicit specification: model described by using unambiguous language  domain ontology  upper ontology  example: DOLCE [Guarino2002]
  • 5. Ontology Components  Classes: entities describing objects common characteristics (for example: “Agricultural Method”).  Individuals: entities that are instances of classes (for example “Multi Crops Farming” is an instance of “Agricultural Method”).  Properties: binary relations between entities (for example “IsAffectedBy”).  Attributes (or DataType Properties): characteristics that qualify individuals (for example “Has Name”).
  • 6. Hierarchies  Concepts can be organized in subsumptions hierarchies  Meaning: every sub-concepts is also a super-concept  Examples:  “Intensive Farming” is-a “Agricultural Method”  “Agricultural Method” is-a “Method”  Concept hierarchies are generally represented by using tree structures
  • 7. Attributes and Properties  Properties: binary relations between classes  Domain and co-domain: classes to which individuals need to belong to be in relation  Example: “Agriculture” <isAffectedBy> “Agriculture Pollution”  Attributes: binary relations between an individual and values (not other entities)  Domain: class to which the attribute is applied  Co-domain: the type of the value (for example “String”)  Properties and Attributes can be organized in hierarchies.
  • 8. Steps for building an ontology  To identify the classes of the domain.  To organize them in a hierarchy.  To define properties and attributes.  To define individuals, if there are.
  • 9. Why ontologies are useful?  Ontologies provide:  common dictionary of terms;  a shared and formal interpretation of the domain.  Ontologies permit to:  solve ambiguities;  share knowledge (not only between humans, but also between machines);  use automatic reasoning techniques.
  • 10. Use of ontologies in IR  Exploit metadata  Entity linking  “which president …”  “Barack Obama is-a President”  Extraction of triples from text  applying NLP parsers for extracting dependencies
  • 11. What is an thesaurus?  A “coarse” version of ontologies  Generally, 3 kinds of relations are represented:  hierarchical (generalization/specialization)  equivalence (synonymity)  associative (other kind of relationships)  Extensive tool used for query expansion approaches [Bhogal2007, Grootjen2006,Qiu1993,Mandala2000]
  • 12. Machine-readable dictionaries  A dictionary in an electronic form.  The power of MRD is characterized by word senses. [Kilgariff1997, Lakoff1987, Ruhl1989]  Identity of meaning: synonyms [Gove1973]  Inclusion of meaning: hyponymy or hyperonymy; troponymy [Cruse1986, Green2002, Fellbaum1998]  transitive relationship  Part-whole meaning: meronymy (has part), holonymy (part of) [Green2002, Cruse1986, Evens1986]  Opposite meaning: antonymy
  • 13. and now… … let’s see how we can exploit this within an information retrieval system…
  • 14. Motivations and Challenges  Considering how information is usually represented and classified.  Documents and Queries are represented using terms.  Indexing:  terms are extracted from each document;  terms frequency of each document is computed (TF);  terms frequency over the entire index is computed (IDF).  Searching:  the vector space model is used to computed the similarity between documents and queries;  queries are generally expanded to increase the recall of the system.
  • 15. Drawbacks of the Term-Based representation – 1/2  The “semantic connections” between terms in documents and queries are not considered.  Different vector positions may be allocated to the synonyms of the same term:  the importance of a determinate concept is distributed among different vector components;  information loss.
  • 16. Drawbacks of the Term-Based representation – 2/2  The query expansion has to be used carefully.  It is more easy to increase the recall of a system with respect to its precision. Which is better? [Abdelali2007]  In the worst case, the size of a document vector could be close to the number of terms used in the repository:  in general, the number of concepts is less than the number of words;  the time needed to compare documents is higher;
  • 17. Intuition Behind  Using concepts to represent the terms contained in documents and queries. [Dragoni2012b] 1. Documents and Queries may be represented in the same way. 2. The issue related to how many and which terms have to be used for query expansion is not considered. 3. The size of a concept vector is generally smaller than the size of a term vector.  IMPORTANT: This is not a query expansion technique !!!
  • 18. a first simple example …  a close vocabulary:
  • 19. a first simple example …  a close vocabulary:
  • 20.  how to compute concept weights? a first simple example …
  • 21.  how is weighted each concept of the vocabulary?  suppose to have the document “xxyyyz” a first simple example …
  • 22. … that we evaluated  Experiments on the MuchMore Collection (http://muchmore.dfki.de)  The collection contains numerous medical terms.  The term-based representations is advantaged over the semantic representation.  Experiments on the TREC Ad-Hoc Collection:  Results have been compared with the IRS presented at TREC-7 and TREC-8 conference  Only the systems that implements a semantic representation of queries have been considered.  Over dozens of runs, the three systems that performs better at recall 0.0 have been chosen. [Spink2006]
  • 23. MuchMore Collection System P@5 P@10 P@15 P@30 MAP Term-Based 0.544 0.480 0.405 0.273 0.449 Synset-Based 0.648 0.484 0.403 0.309 0.459 Conceptual Indexing 0.770 0.735 0.690 0.523 0.449 Ontology Indexing 0.784 0.765 0.728 0.594 0.477
  • 24. TREC-7 System P@5 P@10 P@15 P@30 MAP Term-Based 0.444 0.414 0.375 0.348 0.199 AT&T Labs 1 0.644 0.558 0.499 0.419 0.296 AT&T Labs 2 0.644 0.558 0.497 0.413 0.294 City University, Sheffield, Microsoft 0.572 0.542 0.507 0.412 0.288 Ontology Indexing 0.656 0.588 0.501 0.397 0.309
  • 25. TREC-7 System P@5 P@10 P@15 P@30 MAP Term-Based 0.444 0.414 0.375 0.348 0.199 AT&T Labs 1 0.644 0.558 0.499 0.419 0.296 AT&T Labs 2 0.644 0.558 0.497 0.413 0.294 City University, Sheffield, Microsoft 0.572 0.542 0.507 0.412 0.288 Ontology Indexing 0.656 0.588 0.501 0.397 0.309
  • 26. TREC-8 System P@5 P@10 P@15 P@30 MAP Term-Based 0.476 0.436 0.389 0.362 0.243 IBM Watson 0.588 0.504 0.472 0.410 0.301 Microsoft Research 0.580 0.550 0.499 0.425 0.317 TwentyOne 0.500 0.454 0.433 0.368 0.292 Ontology Indexing 0.616 0.572 0.485 0.415 0.315
  • 27. TREC-8 System P@5 P@10 P@15 P@30 MAP Term-Based 0.476 0.436 0.389 0.362 0.243 IBM Watson 0.588 0.504 0.472 0.410 0.301 Microsoft Research 0.580 0.550 0.499 0.425 0.317 TwentyOne 0.500 0.454 0.433 0.368 0.292 Ontology Indexing 0.616 0.572 0.485 0.415 0.315
  • 28. Some considerations  Two drawbacks have been identified:  The absence of some terms in the ontology, (in particular terms related to specific domains like biomedical, mechanical, business, etc.), may affects the final retrieval result.  a more complete knowledge base is needed.  Term ambiguity. By using a Word Sense Disambiguation approach, concepts associated with incorrect senses would be discarded or weighted less.  a Word Sense Disambiguation algorithm is required: but it has to be used carefully.
  • 29. Few words on disambiguation
  • 30. Few words on disambiguation
  • 31. Checkpoint 1  the use of machine-readable dictionaries is suitable for implementing a first semantic engine  but if we use ontologies we have more and more information  properties  attributes  the problem is: how can we exploit all these information?
  • 32. Ontology enhanced IR  Enrichment of documents (and queries) with information coming from semantic resources  information expansion: adding synonyms, antonyms, … not new but still helpful  annotations: relation or association between a semantic entity and a document  Most of the information expansion systems are based on WordNet and the Roget’s Thesaurus  Systems using annotations are interfaced with the Linked Open Data cloud, and mainly with Freebase and Wikipedia
  • 33. Classification of Semantic IR approaches Criterion Approaches Semantic knowledge representation • Statistical [Deerwester1990] • Linguistic conceptualization [Gonzalo1998, Mandala1998,Giunchiglia2009] • Ontology-based [Guha2003,Popov2004] Scope • Web search [Finin2005,Fernandez2008] • Limited domain repositories [Popov2004] • Desktop search [Chirita2005] Query • Keyword query [Guha2003] • Natural language query [Lopez2009] • Controlled natural language query [Bernstein2006, Cohen2003] • Structured query based on ontology query language [notes] Content retrieved • Data retrieval • Information retrieval Content ranking • No ranking • Keyword-based ranking [Guha2003] • Semantic-based ranking [Stojanovic2003]
  • 34. Limitation of Semantic IR approaches – 1/2 Criterion Limitation IR Semantic Semantic knowledge representation • No exploitation of the full potential of an ontological language, beyond those that could be reduced to conventional classification schemes. x (Partially) Scope • No scalability to large and heterogeneous repositories of documents. x Goal • Boolean retrieval models where the Information Retrieval problem is reduced to a data retrieval task. x Query • Limited usability x
  • 35. Limitation of Semantic IR approaches – 2/2 Criterion Limitation IR Semantic Content retrieved • Focus on textual content: no management of different formats (multimedia) (Partially) (Partially) Content ranking • Lack of semantic ranking criterion. The ranking (if provided relies on keyword-based approaches. x x Coverage • Knowledge incompleteness. [Croft1986] (Partially) x Evaluation • Lack of standard evaluation frameworks. [Giunchiglia2009] x
  • 36. A basic ontology-based IR model SPARQL Editor SPARQL Query Query Processing Searching Indexing Ranking Semantic Entities Semantic Knowledge (ontology + KB) Document Corpus Ranked Documents Semantic Index (weighted annotations) User Unsorted Documents
  • 37. Basic ontology-based IR model - Limits  Heterogeneity  a single ontologies (but also a set of them) cannot covers all possible domains  Scalability  imagine to annotate the Web by using all knowledge bases currently available  a final solution does not exist… but nice and practical approaches can be used  Usability  try to think… are all the people you know able to write queries in SPARQL?
  • 38. Extended ontology-based IR model Nat. Lang. Interface Natural Lang. Query Query Processing Searching Indexing Ranking Semantic Entities Preprocessed Semantic Knowledge Unstructured Web contents Ranked Documents Semantic Index (weighted annotations) User Unsorted Documents Semantic Web
  • 39. Evaluation Results  Mean Average Precision  Prec@10 Semantic System Lucene TREC Automatic 0.16 0.1 0.2 Semantic System Lucene TREC Automatic 0.37 0.25 0.30
  • 40. A focus on the indexing procedure  Challenge: to link semantic knowledge with documents and query in an efficient and effective way:  document corpus and semantic knowledge should remain decoupled;  annotations have to be provided in a flexible and scalable way.  Annotations can be provided in two ways:  by applying an information extraction technique based on pure NLP approaches;  by applying a contextual semantic information approach.
  • 41. Annotator Requirements  Identification of the entities within the documents  conceptually, it is not so much different w.r.t. a traditional IR indexing process  Ontologies must not be touched (decoupling)  Should be open-domain  Scalable-friendly:  indexing of ontologies;  indexing of documents;  an interesting alternative: usage of non-embedded annotations
  • 42. Natural Language Processing Annotation <html> <body> <p>Schizophrenia patients whose medication couldn’t stop the imaginary voices in their heads</p> </body> </html> HTML Parser Schizophrenia patients whose medication couldn’t stop the imaginary voices in their heads NLP Tools <document><p><s> <w c=“w” pos=“NNP” stem=“Schizophrenia”>Schizophrenia</w> <w c=“w” pos=“NN$” stem=“patient”>patients</w> <w c=“w” pos=“WP$”>whose</w> <w c=“w” pos=“NN” stem=“medication”>medications</w> <w c=“w” pos=“MD”>could</w><w c=“w” pos=“RB”>not</w> <w c=“w” pos=“VB” stem=“stop”>stop</w> <w c=“w” pos=“DT”>the</w> <w c=“w” pos=“JJ”>imaginary</w> <w c=“w” pos=“NN$” stem=“voice”>voices</w> <w c=“w” pos=“IN”>in</w> <w c=“w” pos=“PRP$”>their</w> <w c=“w” pos=“NN$” stem=“head”>heads</w> </s></p></document> Token Filter schizophrenia patient medication stop voice head
  • 43. Natural Language Processing Annotation schizophrenia patient medication stop voice head Index Searcher Frequency Counter Annotation Creator Keyword Ontology Entities schizophrenia E1 patient E4, E5 … … head E2, E8 Ontology Entity Document Frequencies E1 D1(1), D4(2) … … E8 D1(1), D5(4), D6(3) Ontology Entity Document Weight E1 D1 0.9 … … E8 D1 0.3
  • 44. Contextual Semantic Annotation Ontology Selection of the semantic context Selection of contextualized terms in the document index Search of terms in the document index Selection of a semantic entity E1: Individual Maradona; Labels: {“Maradona”, “Diego Maradona”, “Pelusa”} Keyword Documents Maradona D1, D2, D87 Pelusa D95, D140 football_player D87, D61, D44, D1 Argentina D43, D32, D2 E34 = Class: football_player Labels:{“football player”} E22 = Individual: Argentina Labels:{“Argentina”} Potential documents to annotate {D1, D2, D87, D95, D140} Contextualized documents {D1, D2, D32, D43, D44, D61, D87}
  • 45. Contextual Semantic Annotation Potential documents to annotate {D1, D2, D87, D95, D140} Contextualized documents {D1, D2, D32, D43, D44, D61, D87} Selection of semantic contextualized documents Documents to annotate {D1, D2, D87, D95, D140} Creation of annotations Ontology Entity Document Weight E1 D1 0.5 E1 D2 0.2 E1 D87 0.67
  • 46. An idea for aggregating rankings  Multi-dimensional aggregation criteria  Document score is computed from different perspectives (criteria)  Assignment of priorities to criteria  Compute criteria weights  Weight of criteria with low priority depends on the score of criteria with high priority  Aggregate criteria scores [Dragoni2012]
  • 47. Querying and Ranking  Queries transformed by mapping terms with ontology entities  Contextual disambiguation is very important  simple example: “Rock musicians Britain”  Ranking: two options  to evaluate only the “matches” between detected entities  to aggregate (on your way) rank produced by using only the entities, only the query terms, and/or both of them
  • 48. Use of multiple ontologies  What we need: an Ontology Gateway  Tasks of an ontology gateway:  collect available semantic content;  store the semantic content efficiently in order to ease its access;  implement and approach for the “selection” of the content  Most important ontology gateways online:  Swoogle [Ding2004,Brin1998]  Watson [Aquin2007,Aquin2007b]  WebCORE [Fernandez2006,Fernandez2007]
  • 49. Use of multiple ontologies - opportunities  Recall improvement:  Ontology 1 focused on entities  stress on the identification of semantic entities within the document  Ontology 2 focused on properties  stress on the identification of relationships between entities in the document  precision should also increase, but some drops are possible.  Supporting multiple perspectives:  analysis of each entities from different point of views
  • 50. Use of multiple ontologies - challenges  To figure out how to use them:  it is necessary to formally represent the relationships between the ontologies and the techniques used for extracting information from them;  example: you may have ontologies describing the same domain by using different structures!!!  To find suitable ontologies and mappings:  again: more than one ontologies describing the same domain;  not a good practice to select only one  build mappings!!!
  • 51. A use case  Information system containing products technical data  users look for something that satisfies their needs  engineers want to exploit information for creating new product variants  Ontologies focused on particular aspects of products  product conceptualizations are separated
  • 57. Checkpoint 2  Annotation of documents is more important than the querying of the repositories… why?  differences in the amount of content  once we have decided how to annotate documents, queries should be annotated by using the same procedure in order to homogenize the process  Challenges in built knowledge bases  Ranking… play with them and “stress your creativity”
  • 58. Ontologies and IR – 2 use cases  Demonstrate the usefulness of semantic approaches used in combination with traditional IR techniques.  Show how IR and Semantics may help each other  Two scenarios:  Cross-language information retrieval [Dragoni2014]  Ontology matching [Dragoni2015]  Sentiment analysis
  • 59. Cross-Language Information Retrieval Background - Challenges  Out-of-Vocabulary issue  improve the corpora used for training the machine translation model.  usage of domain information for increasing the coverage of the dictionaries.  Usage of semantic artifacts for structuring the representation of (multilingual) documents.  GOAL: to integrate domain-specific semantic knowledge within a CLIR system and evaluate their effectiveness
  • 60. Our Scenario  Use case: the agricultural domain  Knowledge resources: Agrovoc and Organic.Lingua ontologies  3 components used in the proposed approach:  Annotator  Indexer  Retriever
  • 61. Annotation Process – Step 1 en es it de fr ….
  • 62. en es it de fr ….  Document content is used as query.  Between the candidate results, only “exact matches” are considered. Annotation Process – Step 2
  • 63. Approach – Annotation Stats Domain Ontology Number of Concepts Manual Annotations Automatic Annotations Agrovoc (AV) 32061 0 133596 (5834 distinct concepts used) Organic.Lingua (OL) 291 27871 (264 distinct concepts used) 16434 (208 distinct concepts used)
  • 64. Approach - Index  Given a document:  Text and annotations are extracted.  The context of each concept is retrieved from the ontologies.  Each contextual concepts are indexed with a weight proportional w.r.t. their semantic distance from the semantic annotation.  Structure of each index record:
  • 65. Approach - Retriever  Three retrieval configurations available:  Only translations: query terms are translated by using machine translation services.  Semantic expansion by exploiting the domain ontology: query terms are matched with ontology concepts; if an exact match exists, query is expanded by using the URI of the concept and the URIs of the contextual ones.  Ontology matching only: terms not having an exact match with ontology concepts are discarded.
  • 66. Evaluation - Setup  Collection of 13,000 multilingual documents.  48 queries originally provided in English and manually translated in 12 languages under the supervision of both domain and language experts.  Gold standard manually built by the domain experts.  MAP, Prec@5, Prec@10, Prec@20, Recall have been used.
  • 67. Results - 1 Avg. MAP Prec@5 Prec@10 Prec@20 Avg. Rec. BASELINE 0.554 0.617 0.545 0.465 0.920 Auto: AV 3.24% 3.11% 5.04% 3.81% 2.52% Auto: OL 2.31% 1.91% 2.88% 2.98% 0.77% Auto: AV+OL 3.13% 2.95% 4.63% 3.86% 2.53% Auto+Man: OL 1.65% 3.40% 3.95% 4.48% 1.37% Auto+Man: AV+OL 4.38% 5.96% 7.18% 6.07% 2.97% Auto+Man*2: OL 1.00% 3.30% 4.02% 3.27% 1.36% Auto+Man*2: AV+OL 3.29% 4.86% 6.73% 6.03% 2.97%
  • 68. Results - 2 Query Cov. Avg. MAP Prec@5 Prec@10 Prec@20 Avg. Rec. AV 39.3 (9 langs) 0.137 0.189 0.191 0.179 0.552 OL 15.7 (10 langs) 0.260 0.359 0.319 0.322 0.635 AV + OL 33.3 (12 langs) 0.173 0.247 0.226 0.221 0.586
  • 69. Ontology Matching  Given two thesauri/ontologies/vocabularies find alignments between entities  Formally a “match” may be represented with the following 5-tuple: ‹ id, e1, e2, R, c ›  Extensive literature about matching approaches (early ‘80s)
  • 70. Motivations  Need: a system, for experts, able to suggest possible matches between concepts  Exploit multilinguality… why?  allows to reduce ambiguity: the probability, for two different concepts, of having the same label across several languages is very low.  term translations have been adapted to the domain: experts in charge of translations put a lot of their cultural heritage in choosing the right terms for each concept.
  • 71. The Proposed Approach - 1  Inspired by information retrieval techniques  Built on top of the Lucene search engine  For each element of the thesaurus a structured multilingual representation is built:  An index for each thesaurus is built [prefLabel] "Food chains"@en [prefLabel] "Catene alimentari"@it [altLabel] "Food distributions"@en [altLabel] "Reti alimentari"@it label-en: “food chain” label-en: “food distribution” label-it: “catena alimentare” label-it: “rete alimentare”
  • 72. The Proposed Approach - 2  How matches are suggested?  source and target thesauri are chosen  for each concept, a query is performed from the source to the target thesaurus  the standard Lucene scoring formula is used for computing the ranking  for each query, a ranking of 5 suggestions is provided to the user
  • 73. Evaluation Set-Up  2 contexts:  six multilingual thesauri (3 medical domain, 3 agricultural domain)  adapted Multifarm benchmark  2 tasks:  matching system (only the first suggestion is considered)  suggestion system
  • 74. Results - 1 Mapping Set # of Mappings Prec@1 Prec@3 Prec@5 Recall Eurovoc  Agrovoc 1297 0.816 0.931 0.967 0.874 Agrovoc  Eurovoc 1297 0.906 0.969 0.988 0.695 Avg. 0.861 0.950 0.978 0.785 Gemet  Agrovoc 1181 0.909 0.964 0.983 0.546 Agrovoc  Gemet 1181 0.943 0.981 0.994 0.740 Avg. 0.926 0.973 0.989 0.643 MDR  MeSH 6061 0.776 0.914 0.956 0.807 MeSH  MDR 6061 0.716 0.888 0.939 0.789 Avg. 0.746 0.901 0.948 0.798 MDR  SNOMED 19971 0.621 0.826 0.908 0.559 SNOMED  MDR 19971 0.556 0.760 0.855 0.519 Avg. 0.589 0.793 0.882 0.539 MeSH  SNOMED 26634 0.690 0.871 0.931 0.660 SNOMED  MeSH 26634 0.657 0.835 0.908 0.564 Avg. 0.674 0.853 0.920 0.612 Results obtained by the proposed system on the domain-specific thesauri
  • 75. Results - 2 Mapping Set IRBOM WeSeE (2012) RiMOM (2013) YAM++ (2013) YAM++ (2012) AUTOM Sv2 (2012) Agrovoc  Eurovoc 0.821 0.785 0.628 0.615 0.615 0.599 Gemet  Agrovoc 0.759 0.726 0.548 0.579 0.579 0.485 MDR  MeSH 0.771 0.749 0.611 0.613 0.613 0.536 MDR  SNOMED 0.563 0.624 0.495 0.473 0.473 0.405 MeSH  SNOMED 0.642 0.631 0.457 0.458 0.458 0.497 Results obtained by the all systems on the domain-specific thesauri
  • 76. Results - 3 System Name Precision Recall F-Measure IRBOM 0.68 0.43 0.53 WeSeE (2012) 0.61 0.32 0.41 RiMOM (2013) 0.52 0.13 0.21 YAM++ (2013) 0.51 0.36 0.40 YAM++ (2012) 0.50 0.36 0.40 AUTOMSv2 (2012) 0.49 0.10 0.36 Results obtained by all systems on the adapted Multifarm Benchmark
  • 77. So… at the end…  Ontologies in IR is still a controversial topic  Personal Opinion: to combine structured and unstructured representation seems to be the most suitable solution  Pay attention to the kind of queries performed by users  Aggregation of results  Be brave… try to work with triples!!!!

Editor's Notes

  1. [Mandala2000] Rila Mandala, Takenobu Tokunaga, Hozumi Tanaka: Query expansion using heterogeneous thesauri. Inf. Process. Manage. (IPM) 36(3):361-378 (2000)
  2. [Ruhl1989] C. Ruhl. On Monosemy: A study in linguistic semantics. State University of New York Press, Albany, NY, 1989. [Gove1973] P.B. Gove. Webster’s New Dictionary of Synonyms. G. & C. Merriam Company, Springfield, MA, 1973. [Cruse1986] A.D. Cruse. Lexical Semantics. Cambridge University Press, 1986. [Green2002] R. Green, C.A. Bean, and S.H. Myaeng. The Semantics of Relationships: An Interdisciplinary Perspective. Cambridge University Press, 2002. [Fellbaum1998] C. Fellbaum, editor. WordNet: An Electonic Lexical Database. MIT Press, 1998. [Evens1986] M.W. Evens. Relational Models of the Lexicon. Cambridge University Press, 1986.
  3. http://www.w3.org/Submission/RDQL/ http://www.w3.org/TR/rdf-sparql-query/
  4. [Yates1999] R. Baeza Yates, B. Ribeiro Neto, Modern Information Retrieval, Addison-Wesley, Harlow, UK, 1999. [Fernandez2011]
  5. [Aquin2007] [Aquin2007b] [Castells2007]