SlideShare a Scribd company logo
1 of 193
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Question Answering over
Linked Data & Big Data
André Freitas
LdB – Semantics: Theory, Technologies and Applications
Bari, September 2013
Digital Enterprise Research Institute www.deri.ie
Goal of this Talk
 Understand the changes on the database landscape in
the direction of more heterogeneous data scenarios.
 Understand how Question Answering (QA) fits into
this new scenario.
 Give the fundamental pointers to develop your own
QA system from the state-of-the-art.
 Coverage over depth.
2
Digital Enterprise Research Institute www.deri.ie
Outline
 Motivation & Context
 Big Data & QA
 Are NLIs useful for Databases?
 The Anatomy of a QA System
 Evaluation of QA Systems
 QA over Linked Data
 Treo: Detailed Case Study
 Do-it-yourself (DIY): Core Resources
 QA over Linked Data: Roadmaps
 From QA to Semantic Applications
(Deriving Patterns)
 Take-away Message
3
Digital Enterprise Research Institute www.deri.ie
Motivation &
Context
4
Digital Enterprise Research Institute www.deri.ie
Why Question Answering?
5
 Humans are built-in with
natural language
communication
capabilities.
 Very natural way for
humans to communicate
information needs.
Digital Enterprise Research Institute www.deri.ie
What is Question Answering?
 A research field on its own.
 Empirical bias: Focus on the development of
automatic systems to answer questions.
 Multidisciplinary:
 Natural Language Processing
 Information Retrieval
 Knowledge Representation
 Databases
 Linguistics
 Artificial Intelligence
 Software Engineering
 ...
6
Digital Enterprise Research Institute www.deri.ie
What is Question Answering?
7
QA
System
Knowledge
Bases
Question: Who is the
daughter of Bill Clinton
married to?
Answer: Marc
Mezvinsky
Digital Enterprise Research Institute www.deri.ie
QA vs IR
 Keyword Search:
 User still carries the major efforts in interpreting the data.
 Satisfying information needs may depend on multiple search
operations.
 Answer-driven information access.
 Input: Keyword search
– Typically specification of simpler information needs.
 Output: documents, data.
 QA:
 Delegates more ‘interpretation effort’ to the machines.
 Query-driven information access.
 Input: natural language query
– Specification of complex information needs.
 Output: direct answer.
8
Digital Enterprise Research Institute www.deri.ie
QA vs Databases
 Structured Queries:
 A priori user effort in understanding the schemas behind
databases.
 Effort in mastering the syntax of a query language.
 Satisfying information needs may depend on multiple
querying operations.
 Input: Structured query
 Output: data records, aggregations, etc
 QA:
 Delegates more ‘interpretation effort’ to the machines.
 Input: natural language query
 Output: direct answer
9
Digital Enterprise Research Institute www.deri.ie
When to use?
 Keyword search:
 Simple information needs.
 Predictable search behavior.
 Vocabulary redundancy (large document collections, Web)
 Structured queries:
 Precision/recall guarantees.
 Small & centralized schemas.
 More data volume/less semantic heterogeneity.
 QA:
 Specification of complex information needs.
 More automated semantic interpretation.
10
Digital Enterprise Research Institute www.deri.ie
QA: Vision
11
Digital Enterprise Research Institute www.deri.ie
12
QA: Reality (FB Graph Search)
Digital Enterprise Research Institute www.deri.ie
13
QA: Reality (Watson)
Digital Enterprise Research Institute www.deri.ie
14
QA: Reality (Watson)
Digital Enterprise Research Institute www.deri.ie
15
QA: Reality (Siri)
Digital Enterprise Research Institute www.deri.ie
To Summarize
 QA is usually associated with the delegation of
more of the‘interpretation effort’ to the machines.
 QA supports the specification of more complex
information needs.
 QA, Information Retrieval and Databases are
complementary.
16
Digital Enterprise Research Institute www.deri.ie
Big Data & QA
17
Digital Enterprise Research Institute www.deri.ie
Big Data
 Big Data: More complete data-based picture of the
world.
18
Digital Enterprise Research Institute www.deri.ie
From Rigid Schema to Schemaless
10s-100s attributes
1,000s-1,000,000s attributes
 Heterogeneous, complex and large-scale
databases.
 Very-large and dynamic “schemas”.
19
circa 2000
circa 2013
Digital Enterprise Research Institute www.deri.ie
Mutiple Interpretations
 Multiple perspectives (conceptualizations) of the reality.
 Ambiguity, vagueness, inconsistency.
20
Digital Enterprise Research Institute www.deri.ie
Big Data & Dataspaces
 Franklin et al. (2005): From Databases to Dataspaces.
 Helland (2011): If You Have Too Much Data, then
“Good Enough” Is Good Enough.
 Fundamental trends:
 Co-existence of heterogeneous data.
 Semantic best-effort queries.
 Pay-as-you go data integration.
 Co-existent query/search services.
21
Digital Enterprise Research Institute www.deri.ie
Big Data & NoSQL
 From Relational to NoSQL Databases.
 NoSQL (Not only SQL).
 Four trends (Emil Eifrem)
 Trend 1: Data set size.
 Trend 2: Connectedness.
 Trends 3 & 4: Semi-structured data & decentralized
architecture.
22
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
Digital Enterprise Research Institute www.deri.ie
Trend 1: Data size
23
Gantz & Reinsel, The Digital Universe in 2020 (2012).
Digital Enterprise Research Institute www.deri.ie
Trend 2: Connectedness
24
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
Digital Enterprise Research Institute www.deri.ie
Trends 3 & 4: Semi-structured data
 Individualization of content.
 Decentralization of content generation.
25
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
Digital Enterprise Research Institute www.deri.ie
Emerging NoSQL Platforms
 Key-value stores
 Data model: collection of key values (K/V) pairs.
 Example: Voldemort.
 BigTables clones
 Data model: Big Table.
 Example: Hbase, Hypertable.
 Document databases
 Data Model: Collections of K/V collections.
 Example: MongoDB, CouchDB.
 Graph databases
 Data Model: nodes, edges.
 Example: AllegroGraph, VertexDB, Neo4J, Semantic Web DBs.
26
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
Digital Enterprise Research Institute www.deri.ie
NoSQL Databases
27
Size
Complexity
Key-value
stores
Bigtable clones
Document databases
Graph databases
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
Digital Enterprise Research Institute www.deri.ie
Big Data
 Volume
 Velocity
 Variety
28
- The most interesting but usually neglected
dimension.
Digital Enterprise Research Institute www.deri.ie
Big Data & Linked Data
 Linked Data is Big Data (Variety)
 Entity-Attribute-Value (EAV) Data Model.
 Entities:
 Identifiers
 Attributes
 Attribute Values
 Linked Data as a special type of the EAV Data Model:
 EAV/CR: Classes and Relations.
 URIs as a superkey.
 Dereferentiable URIs.
 Standards-based (RDF/HTTP).
 EAV as an anti-pattern.
29
Idehen, Understanding Linked Data via EAV Model (2010).
Digital Enterprise Research Institute www.deri.ie
Big Data & Linked Data
30
Digital Enterprise Research Institute www.deri.ie
Big Data: Structured queries
31
 Big Problem: Structured queries are still the primary way to query
databases.
Digital Enterprise Research Institute www.deri.ie
Big Data: Structured queries
10-100s
attributes
103
-106
s
attributes
32
Query construction size x schema size
Digital Enterprise Research Institute www.deri.ie
Query/Search Spectrum
Adapted from Kauffman et al (2009)
Structured
queries
33
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Schema-agnostic queriesSemantic Gap
Possible representations
34
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
35
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
Query:
Data
36
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
Query:
Data
37
Popescu et al. (2003): Semantic tractability
Digital Enterprise Research Institute www.deri.ie
38
Big Data & QAs
QA Big Data
semantic gap,
vocabulary gap
QA Big Data
distributional
semantics
Digital Enterprise Research Institute www.deri.ie
To Summarize
 Schema size and heterogeneity represent a
fundamental shift for databases.
 Addressing the associated data management
challenges (specially querying) depends on the
development of principled semantic models for
databases.
 QA/Natural Language Interfaces (NLIs) as schema-
agnostic query mechanisms.
39
Digital Enterprise Research Institute www.deri.ie
Are NLIs useful for
Databases?
40
Digital Enterprise Research Institute www.deri.ie
41
NLI & QAs
 Natural Language Interfaces (NLI)
 Input: Natural language queries
 Output: Either processed or unprocessed queries
– Processed: Direct answers.
– Unprocessed: Database records, text snippets, documents.
NLI
QA
Digital Enterprise Research Institute www.deri.ie
42
 Kaufmann & Bernstein (2007).
 User study based on 4 different types of
interfaces:
1. NLP-Reduce: Simple keyword-based NLI
– Domain-independent
– Bag of Words
– WordNet-based query expansion
2. Querix: More complex NLI
– Domain-independent
– Query parsing
– WordNet-based query expansion
– Clarification dialogs
Comparative study
Digital Enterprise Research Institute www.deri.ie
43
 Kaufmann & Bernstein (2007).
 User study based on 4 different types of
interfaces:
3. Ginseng: Guided input NLI
– Grammar-based suggestions
4. Semantic Crystal: Visual query interface
– Graphically displayable and clickable
Comparative study
Digital Enterprise Research Institute www.deri.ie
44
NLP-Reduce
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
Digital Enterprise Research Institute www.deri.ie
45
Querix
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
Digital Enterprise Research Institute www.deri.ie
46
Ginseng
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
Digital Enterprise Research Institute www.deri.ie
47
Semantic Crystal
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
Digital Enterprise Research Institute www.deri.ie
48
 48 subjects (different areas and ages).
 4 questions.
 Query metrics + System Usability Scale (SUS).
 Tang & Mooney Dataset.
User study
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
Digital Enterprise Research Institute www.deri.ie
49
 48 subjects (different areas and ages).
 4 questions.
 Query metrics + System Usability Scale (SUS).
 Tang & Mooney Dataset.
User study
Brooke, SUS - A quick and dirty usability scale.
Digital Enterprise Research Institute www.deri.ie
50
 Querix: Full English question input was judged to
be the most useful and best-liked query interface.
 Users appreciated the “freedom” given by full
natural language queries.
 Possible interpretations:
 No need to convert natural language to keyword search.
 More complete expression of information needs.
 Limitations:
 Number of queries
 Database size
Results
Digital Enterprise Research Institute www.deri.ie
The Anatomy of a QA
System
51
Digital Enterprise Research Institute www.deri.ie
Basic Concepts & Taxonomy
 Categorization of questions and answers.
 Important for:
 Understanding the challenges before attacking
the problem.
 Scoping the system.
 Based on:
 Chin-Yew Lin: Question Answering.
 Farah Benamara: Question Answering Systems: State of
the Art and Future Directions.
52
Digital Enterprise Research Institute www.deri.ie
Terminology: Question Phrase
 The part of the question that says what is
being asked:
 Wh-words:
– who, what, which, when, where, why, and how
 Wh-words + nouns, adjectives or adverbs:
– •“which party …”, “which actress …”, “how long …”,
“how tall …”.
53
Digital Enterprise Research Institute www.deri.ie
Terminology: Question Type
 Useful for distinguishing different processing
strategies
 FACTOID: “Who is the wife of Barack Obama?”
 LIST: “Give me all cities in the US with less than 10000
inhabitants.”
 DEFINITION: “Who was Tom Jobim?”
 RELATIONSHIP: “What is the connection between Barack
Obama and Indonesia?”
 SUPERLATIVE: “What is the highest mountain?”
 YES-NO: “Was Margaret Thatcher a chemist?”
 OPINION: “What do most Americans think of gun control?”
 CAUSE & EFFECT: “Why did the revenue of IBM drop?”
54
Digital Enterprise Research Institute www.deri.ie
Terminology: Answer Type
 The class of object sought by the question
 Person (from “Who …”)
 Place (from “Where …”)
 Process & Method (from “How …”)
 Date (from “When …”)
 Number (from “How many …”)
 Explanation & Justification (from “Why …”)
55
Digital Enterprise Research Institute www.deri.ie
Terminology: Question Focus & Topic
 Question focus is the property or entity that is
being sought by the question
 “In which city Barack Obama was born?”
 “What is the population of Galway?”
 Question topic: What the question is generally about :
 “What is the height of Mount Everest?” (geography, mountains)
 “Which organ is affected by the Meniere’s disease?” (medicine)
56
Digital Enterprise Research Institute www.deri.ie
Terminology: Data Source Type
 Structure level:
 Structured data (databases)
 Semi-structured data (e.g. comment field in
databases, XML)
 Free text
 Data source:
 Single (Centralized)
 Multiple
 Web-scale
57
Digital Enterprise Research Institute www.deri.ie
Terminology: Domain Type
 Domain Scope:
 Open Domain
 Domain specific
 Data Type:
 Text
 Image
 Sound
 Video
 Multi-modal QA
58
Digital Enterprise Research Institute www.deri.ie
Terminology: Answer Format
 Long answers
 Definition/justification based.
 Short answers
 Phrases.
 Exact answers
 Named entities, numbers, aggregate, yes/no
59
Digital Enterprise Research Institute www.deri.ie
Answer Quality Criteria
 Relevance: The level in which the answer addresses
users information needs.
 Correctness: The level in which the answer is
factually correct.
 Conciseness: The answer should not contain
irrelevant information.
 Completeness: The answer should be complete.
 Simplicity: The answer should be simple to be
interpreted by the data consumer.
 Justification: Sufficient context should be provided
to support the data consumer in the determination of
the query correctness.
Digital Enterprise Research Institute www.deri.ie
Answer Assessment
 Right: The answer is correct and complete.
 Inexact: The answer is incomplete or incorrect.
 Unsupported: the answers does not have an
appropriate justification.
 Wrong: The answer is not appropriate for the
question.
Digital Enterprise Research Institute www.deri.ie
Answer Processing
 Simple Extraction: cut and paste of snippets from
the original document(s) / records from the data.
 Combination: Combines excerpts from multiple
sentences, documents / multiple data records,
databases.
 Summarization: Synthesis from large texts / data
collections.
 Operational/functional: Depends on the application
of functional operators.
 Reasoning: Depends on the an inference process.
62
Digital Enterprise Research Institute www.deri.ie
Complexity of the QA Task
 Semantic Tractability (Popescu et al., 2003):
Vocabulary distance between the query and the
answer.
 Answer Locality (Webber et al., 2003): Whether
answer fragments are distributed across different
document fragments/documents or datasets/dataset
records.
 Derivability (Webber et al, 2003): Dependent if the
answer is explicit or implicit. Level of reasoning
dependency.
 Semantic Complexity: Level of ambiguity and
discourse/data heterogeneity.
63
Digital Enterprise Research Institute www.deri.ie
Main Components
64
Question
Analysis
Search
Answer
Extraction
Question Keyword
query
Query
features
Data
records/
Passages
Answers
Documents/
Datasets
Digital Enterprise Research Institute www.deri.ie
Main Components
 Question Analysis: Includes question parsing,
extraction of core features (NER, answer type, etc).
 Search (i.e. passage retrieval): Pre-selection of
fragments of text (sentences, paragraphs, documents)
and data (records, datasets) which may contain the
answer.
 Answer Extraction: Processing of the answer based
on the passages.
65
Digital Enterprise Research Institute www.deri.ie
Evaluation of
QA Systems
66
Digital Enterprise Research Institute www.deri.ie
Evaluation Campaigns
 Test Collection
 Questions
 Datasets
 Answers (Gold-standard)
 Evaluation Measures
Digital Enterprise Research Institute www.deri.ie
Recall
 Measures how complete is the answer set.
 The fraction of relevant instances that are retrieved.
 Which are the Jovian planets in the Solar System?
 Returned Answers:
– Mercury
– Jupiter
– Saturn
 Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
Recall = 2/4 = 0.5
Digital Enterprise Research Institute www.deri.ie
Precision
 Measures how accurate is the answer set.
 The fraction of retrieved instances that are relevant.
 Which are the Jovian planets in the Solar System?
 Returned Answers:
– Mercury
– Jupiter
– Saturn
 Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
Recall = 2/3 = 0.67
Digital Enterprise Research Institute www.deri.ie
Mean Reciprocal Rank (MRR)
 Measures the ranking quality.
 The Reciprocal-Rank (1/r) of a query can be defined as the rank r
at which a system returns the first relevant entity.
 Which are the Jovian planets in the Solar System?
 Returned Answers:
– Mercury
– Jupiter
– Saturn
 Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
rr = 1/2 = 0.5
Digital Enterprise Research Institute www.deri.ie
Mean Average Interpolated Precision
(MAiP)
 Computes the Interpolated-Precision at a set of n standard recall
levels (1%, 10%, 20%, etc).
 Average-Interpolated-Precision (AiP) is a single-valued measure
that reflects the performance of a search engine over all the
relevant results.
 Mean-Average-Interpolated-Precision (MAiP) that reflects the
performance of a system over all the results.
Digital Enterprise Research Institute www.deri.ie
Normalized Discounted
Cumulative Gain (NDCG)
 Discounted-Cumulative-Gain (DCG) uses a graded relevance scale
to measure the gain of a system based on the positions of the
relevant entities in the result set.
 This measure gives a lower gain to relevant entities returned in
the lower ranks to that of the higher ranks.
Digital Enterprise Research Institute www.deri.ie
Test Collections
 Question Answering over Linked Data (QALD-CLEF)
 INEX Linked Data Track
 BioASQ
 SemSearch
Digital Enterprise Research Institute www.deri.ie
QALD
Digital Enterprise Research Institute www.deri.ie
QALD-1, ESWC (2011)
 Datasets:
 Dbpedia 3.6 (RDF)
 MusicBrainz (RDF)
 Tasks:
 Training questions: 50 questions for each dataset
 Test questions: 50 questions for each dataset
http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=challenge&q=1
Digital Enterprise Research Institute www.deri.ie
QALD-1, ESWC (2011)
Digital Enterprise Research Institute www.deri.ie
QALD-1, ESWC (2011)
 Which presidents were born in 1945?
 Who developed the video game World of Warcraft?
 List all episodes of the first season of the HBO television series The
Sopranos!
 Who produced the most films?
 Which mountains are higher than the Nanga Parbat?
 Give me all actors starring in Batman Begins.
 Which software has been developed by organizations founded in
California?
 Which companies work in the aerospace industry as well as on nuclear
reactor technology?
 Is Christian Bale starring in Batman Begins?
 Give me the websites of companies with more than 500000 employees.
 Which cities have more than 2 million inhabitants?
Digital Enterprise Research Institute www.deri.ie
QALD-2, ESWC (2012)
 Datasets:
 Dbpedia 3.7 (RDF)
 MusicBrainz (RDF)
 Tasks:
 Training questions: 100 questions for each dataset
 Test questions: 50 questions for each dataset
http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=challenge&q=1
Digital Enterprise Research Institute www.deri.ie
QALD-3, CLEF (2013)
 Datasets:
 Dbpedia 3.7 (RDF)
 MusicBrainz (RDF)
 Tasks:
 Multilingual QA
– Given a RDF dataset and a natural language question or set
of keywords in one of six languages (English, Spanish,
German, Italian, French, Dutch), either return the correct
answers, or a SPARQL query that retrieves these answers.
 Ontology Lexicalization
http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=task1&q=3
Digital Enterprise Research Institute www.deri.ie
INEX Linked Data Track (2013)
 Focuses on the combination of textual and structured data.
 Datasets:
 English Wikipedia (MediaWiki XML Format)
 DBpedia 3.8 & YAGO2 (RDF)
 Links among the Wikipedia, DBpedia 3.8, and YAGO2 URI's.
 Tasks:
 Ad-hoc Task: return a ranked list of results in response to a search
topic that is formulated as a keyword query (144 search topics).
 Jeopardy Task: Investigate retrieval techniques over a set of natural-
language Jeopardy clues (105 search topics – 74 (2012) + 31
(2013)).
https://inex.mmci.uni-saarland.de/tracks/lod/
Digital Enterprise Research Institute www.deri.ie
INEX Linked Data Track (2013)
Digital Enterprise Research Institute www.deri.ie
SemSearch Challenge
 Focuses on entity search over Linked Datasets.
 Datasets:
 Sample of Linked Data crawled from publicly available
sources (based on the Billion Triple Challenge 2009).
 Tasks:
 Entity Search: Queries that refer to one particular entity. Tiny
sample of Yahoo! Search Query.
 List Search: The goal of this track is select objects that match
particular criteria. These queries have been hand-written by
the organizing committee.
http://semsearch.yahoo.com/datasets.php#
Digital Enterprise Research Institute www.deri.ie
SemSearch Challenge
 List Search queries:
 republics of the former Yugoslavia
 ten ancient Greek city
 kingdoms of Cyprus
 the four of the companions of the prophet
 Japanese-born players who have played in MLB where the
British monarch is also head of state
 nations where Portuguese is an official language
 bishops who sat in the House of Lords
 Apollo astronauts who walked on the Moon
Digital Enterprise Research Institute www.deri.ie
SemSearch Challenge
 Entity Search queries:
 1978 cj5 jeep
 employment agencies w. 14th street
 nyc zip code
 waterville Maine
 LOS ANGELES CALIFORNIA
 ibm
 KARL BENZ
 MIT
Digital Enterprise Research Institute www.deri.ie
 Datasets:
 PubMed documents
 Tasks:
 1a: Large-Scale Online Biomedical Semantic Indexing
– Automatic annotation of PubMed documents.
– Training data is provided.
 1b: Introductory Biomedical Semantic QA
– 300 questions and related material (concepts, triples and
golden answers).
Digital Enterprise Research Institute www.deri.ie
Baseline
Balog & Neumayer, A Test Collection for Entity Search in Dbpedia (2013).
Digital Enterprise Research Institute www.deri.ie
Going Deeper
 Metrics, Statistics, Tests - Tetsuya Sakai
 http://www.promise-noe.eu/documents/10156/26e7f254-1feb-41
 Building test Collections (IR Evaluation - Ian Soboroff)
 http://www.promise-noe.eu/documents/10156/951b6dfb-a404-46
Digital Enterprise Research Institute www.deri.ie
QA over
Linked Data
88
Digital Enterprise Research Institute www.deri.ie
The Semantic Web Vision
2001:
Software which is able to
understand meaning (intelligent,
flexible)
Leveraging the Web for
information scale
Digital Enterprise Research Institute www.deri.ie
The Semantic Web Vision
 What was the plan to
achieve it?
 Build a Semantic Web
Stack
 Which covers both
representation and
reasoning
Digital Enterprise Research Institute www.deri.ie
Reality Check
 Adoption:
 No significant data
growth
 Ontologies are not
straightforward to
build:
 People are not
familiriazed with the
tools and principles
 Difficult to keep
consistency at Web scale
 Scalability
Digital Enterprise Research Institute www.deri.ie
Reasoning
 Problems:
 Consistecy
 Scalability
Logic World Web World
Digital Enterprise Research Institute www.deri.ie
Linked Data
 The Web as a Huge Database
 Fundamental step for data
creation
2006:
Digital Enterprise Research Institute www.deri.ie
No Reasoning, no Fun?
 Where are the intelligence and
flexibility?
 We will be back to this point
in a minute
Digital Enterprise Research Institute www.deri.ie
From which university did the wife of
Barack Obama graduate?
Consuming/Using Linked Data
 With Linked Data we are still in the DB world
Digital Enterprise Research Institute www.deri.ie
Consuming/Using Linked Data
 With Linked Data we are still in the DB world
 (but slightly worse)
Digital Enterprise Research Institute www.deri.ie
Linked Data
 Data Model Features:
 Graph-based data model
 Extensible schema
 Entity-centric data integration
 Specific Features:
 Designed over open Web standards
 Based on the Web infrastructure (HTTP, URIs)
Digital Enterprise Research Institute www.deri.ie
Linked Data
 Positives:
 Solid adoption in the Open Data context
(eGovernment, eScience, etc,...)
 Existing data is relevant (you can build real
applications)
 Negatives:
 Data consumption is a problem
 Data generation beyond databases
mapping/triplification is also a problem
 Still far from the Semantic Web vision
Digital Enterprise Research Institute www.deri.ie
QA for Linked Data
 Addresses practical problems of data
accessibility in a data heterogeneity
scenario.
 A fundamental part of the original
Semantic Web vision.
99
Digital Enterprise Research Institute www.deri.ie
QA4LD Requirements
 High usability:
 Supporting natural language queries.
 High expressivity:
 Path, conjunctions, disjunctions, aggregations, conditions.
 Accurate & comprehensive semantic matching:
 High precision and recall.
 Low maintainability:
 Easily transportable across datasets from different domains
(minimum adaptation effort/low adaptation time).
 Low query execution time:
 Suitable for interactive querying.
 High scalability:
 Scalable to a large number of datasets (Organization-scale, Web-
scale).
100
Digital Enterprise Research Institute www.deri.ie
Exemplar QA Systems
 Aqualog & PowerAqua (Lopez et al. 2006)
 ORAKEL (Cimiano et al, 2007)
 QuestIO & Freya (Damljanovic et al. 2010)
 Treo (Freitas et al. 2011, 2012)
101
Digital Enterprise Research Institute www.deri.ie
PowerAqua (Lopez et al. 2006)
 Key contribution: semantic similarity
mapping.
 Terminological Matching:
 WordNet-based
 Ontology Based
 String similarity
 Sense-based similarity matcher
 Evaluation: QALD (2011).
 Extends the AquaLog system.
102
Digital Enterprise Research Institute www.deri.ie
WordNet
103
Digital Enterprise Research Institute www.deri.ie
Sense-based similarity matcher
 Two words are strongly similar if any of the following
holds:
 1. They have a synset in common (e.g. “human” and “person”)
 2. A word is a hypernym/hyponym in the taxonomy of the
other word.
 3. If there exists an allowable “is-a” path connecting a synset
associated with each word.
 4. If any of the previous cases is true and the definition
(gloss) of one of the synsets of the word (or its direct
hypernyms/hyponyms) includes the other word as one of its
synonyms, we said that they are highly similar.
104
Lopez et al. 2006
Digital Enterprise Research Institute www.deri.ie
PowerAqua (Lopez et al. 2006)
105
Lopez et al. 2006
Digital Enterprise Research Institute www.deri.ie
ORAKEL (Cimiano et al. 2007)
 Key contribution:
 FrameMapper lexicon engineering approach.
 Parsing strategy: Logical Description Grammars (LDGs).
– LDG is inspired by Lexicalized Tree Adjoining Grammars (LTAGs).
– An important characteristic of these trees is that they encapsulate
all syntactic/semantic arguments of a word.
 Terminological Matching:
 FrameMapper
 Evaluation:
 Dataset: domain-specific
 Dimensions: Relevance, Performance, lexicon construction
106
Digital Enterprise Research Institute www.deri.ie
ORAKEL (Cimiano et al. 2007)
 More sophisticated parsing strategy: Logical
Description Grammars (LDGs).
 LDG is inspired by Lexicalized Tree Adjoining
Grammars (LTAGs).
 An important characteristic of these trees is that
they encapsulate all syntactic/semantic
arguments of a word.
107
Digital Enterprise Research Institute www.deri.ie
ORAKEL (Cimiano et al. 2007)
108
Digital Enterprise Research Institute www.deri.ie
ORAKEL (Cimiano et al. 2007)
109
Digital Enterprise Research Institute www.deri.ie
Freya (Damljanovic et al. 2010)
 Key contribution: user interaction (dialogs)
for terminological matching.
 Terminological Matching:
 WordNet & Cyc (synonyms)
 String similarity (Monge Elkan + Sundex)
 User feedback
 Extends the QuestIO system.
 Evaluation: Mooney (2010), QALD (2011)
110
Digital Enterprise Research Institute www.deri.ie
Freya (Damljanovic et al. 2010)
111
Ontology-based
Gazeteer (GATE)
SPARQL
Generation
Ontology
Query
Analysis
Synonym
(WordNet + Cyc)
Digital Enterprise Research Institute www.deri.ie
Other QA Approaches
 Unger et al. (2012), Template-based
Question Answering over RDF Data.
 Cabrio et al. (2012), QAKIS: An Open
Domain QA System based on Relational
Patterns.
112
Digital Enterprise Research Institute www.deri.ie
Treo: Detailed Case
Study
113
Digital Enterprise Research Institute www.deri.ie
Treo (Irish): Direction, path
Digital Enterprise Research Institute www.deri.ie
Treo (Freitas et al. 2011, 2012)
 Key contribution:
 Distributional semantic relatedness matching model
 Distributional relational space.
 Terminological Matching:
 Explicit Semantic Analysis (ESA)
 WordNet
 String similarity + node cardinality
 Evaluation: QALD (2011)
115
Digital Enterprise Research Institute www.deri.ie
Statistical
analysis
Linked Data
Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
 Transform natural language queries into
triple patterns
“Who is the daughter of Bill Clinton married to?”
Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
 Step 1: POS Tagging
 Who/WP
 is/VBZ
 the/DT
 daughter/NN
 of/IN
 Bill/NNP
 Clinton/NNP
 married/VBN
 to/TO
 ?/.
Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
 Step 2: Core Entity Recognition
 Rules-based: POS Tag + TF/IDF
Who is the daughter of Bill Clinton married to?
(PROBABLY AN INSTANCE)
Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
 Step 3: Determine answer type
 Rules-based.
Who is the daughter of Bill Clinton married to?
(PERSON)
Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
 Step 4: Dependency parsing
 dep(married-8, Who-1)
 auxpass(married-8, is-2)
 det(daughter-4, the-3)
 nsubjpass(married-8, daughter-4)
 prep(daughter-4, of-5)
 nn(Clinton-7, Bill-6)
 pobj(of-5, Clinton-7)
 root(ROOT-0, married-8)
 xcomp(married-8, to-9)
Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
 Step 5: Determine Partial Ordered Dependency
Structure (PODS)
 Rules based.
– Remove stop words.
– Merge words into entities.
– Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
ANSWER
TYPE
QUESTION FOCUS
Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
 Step 5: Determine Partial Ordered Dependency
Structure (PODS)
 Rules based.
– Remove stop words.
– Merge words into entities.
– Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
(PREDICATE) (PREDICATE) Query Features
Digital Enterprise Research Institute www.deri.ie
Query Planning
 Map query features into a query plan.
 A query plan contains a sequence of:
 Search operations.
 Navigation operations.
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
 (1) INSTANCE SEARCH (Bill Clinton)
 (2) DISAMBIGUATE ENTITY TYPE
 (3) GENERATE ENTITY FACETS
 (4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)
 (5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)
 (6) p2 <- SEARCH RELATED PREDICATE (e1, married to)
 (7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)
 (8) POST PROCESS (Bill Clintion, e1, p1, e2, p2)
Query Plan
Digital Enterprise Research Institute www.deri.ie
Core Entity Search
 Entity Index:
 Construction of entity index (instances, classes and complex
classes).
 Extract terms from URIs and index the terms using an inverted
index.
 Search instances by keywords
 Entity Search (Instance Example):
 Input: Keyword search (Bill Clinton).
 Ranking by: string similarity and entity cardinality.
 Output: List of URIs.
 Core rationale:
 Prioritize the matching of less ambiguous/polysemic entities.
 Prioritize more popular entities.
Digital Enterprise Research Institute www.deri.ie
Core Entity Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
Entity Search
Digital Enterprise Research Institute www.deri.ie
Core Entity Search
 Entity Disambiguation:
 Get entity types.
 Compute distributional semantic relatedness between entity
types:
– Explicit Semantic Analysis (ESA).
– semantic relatedness (Bill Clinton, President) = 0.11
– semantic relatedness (Bill Clinton, Politician)= 0.052
– semantic relatedness (Bill Clinton, Actor) = 0.011
 Generate Entity Types facets.
http://vmdeb14.deri.ie:8080/esa/service?task=esa&term1=bill%20clinton&term2=politician
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
(PIVOT ENTITY)
(ASSOCIATED
TRIPLES)
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
sem_rel(daughter,alma mater)=0.001
Which properties are semantically related to ‘daughter’?
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Relatedness
 Computation of a measure of “semantic proximity”
between two terms.
 Allows a semantic approximate matching between
query terms and dataset terms.
 It supports a commonsense reasoning-like behavior
based on the knowledge embedded in the corpus.
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
 Use distributional semantics to semantically match
query terms to predicates and classes.
 Distributional principle: Words that co-occur together
tend to have related meaning.
 Allows the creation of a comprehensive semantic model from
unstructured text.
 Based on statistical patterns over large amounts of text.
 No human annotations.
 Distributional semantics can be used to compute a
semantic relatedness measure between two words.
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
(PIVOT ENTITY)
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Mark_Mezvinsky
:spouse
Digital Enterprise Research Institute www.deri.ie
Semantic Relatedness
 Computation of a measure of “semantic proximity”
between two terms
 Allows a semantic approximate matching between
query terms and dataset terms
 It supports a reasoning-like behavior based on the
knowledge embedded in the corpus
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
 Use distributional semantics to semantically match
query terms to predicates and classes
 Distributional principle: Words that co-occur together
tend to have related meaning
 Allows the creation of a comprehensive semantic model from
unstructured text
 Based on statistical patterns over large amounts of text
 No human annotations
 Distributional semantics can be used to compute a
semantic relatedness measure between two words
Digital Enterprise Research Institute www.deri.ie
Results
Digital Enterprise Research Institute www.deri.ie
Post-Processing
Digital Enterprise Research Institute www.deri.ie
Second Query Example
What is the highest mountain?
(CLASS) (OPERATOR) Query Features
mountain - highest PODS
Digital Enterprise Research Institute www.deri.ie
Entity Search
Mountain highest
:Mountain
Query:
Linked
Data:
:typeOf
(PIVOT ENTITY)
Digital Enterprise Research Institute www.deri.ie
Extensional Expansion
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Matching
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:location
...
:deathPlaceOf
Digital Enterprise Research Institute www.deri.ie
Get all numerical values
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
Digital Enterprise Research Institute www.deri.ie
Apply operator functional definition
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
SORT
TOP_MOST
Digital Enterprise Research Institute www.deri.ie
Results
Digital Enterprise Research Institute www.deri.ie
From Exact to Approximate
 Semantic approximation in databases (as in any IR
system): semantic best-effort.
 Need some level of user disambiguation,
refinement and feedback.
 As we move in the direction of semantic systems
we should expect the need for principled dialog
mechanisms (like in human communication).
 Pull the the user interaction back into the system.
Digital Enterprise Research Institute www.deri.ie
From Exact to Approximate
Digital Enterprise Research Institute www.deri.ie
User Feedback
Digital Enterprise Research Institute www.deri.ie
Treo Architecture
Digital Enterprise Research Institute www.deri.ie
Core Elements of Treo
 Hybrid model: database/IR/QA.
 Ranked query results.
 A distributional VSM for representing and semantically
processing relational data was formulated: Ƭ-Space.
 Similar in motivation to Cohen’s predication space.
 Distributional semantic relatedness as a primitive
operation.
150
Digital Enterprise Research Institute www.deri.ie
-SpaceƬ
ESA
EAV
vector field
151
Digital Enterprise Research Institute www.deri.ie
Simple Queries (Video)
Digital Enterprise Research Institute www.deri.ie
More Complex Queries (Video)
Digital Enterprise Research Institute www.deri.ie
Vocabulary Search (Video)
Digital Enterprise Research Institute www.deri.ie
Treo Answers Jeopardy Queries (Video)
Digital Enterprise Research Institute www.deri.ie
Video Links:
 Introducing Treo: Talk to your Data
 http://www.youtube.com/watch?v=Zor2X0uoKsM
 Treo: Do it your own (DIY) Jeopardy Question
Answering Engine
 http://www.youtube.com/watch?v=Vqh0r8GxYe8
 Treo: Semantic Search over Schema & Vocabularies
 http://www.youtube.com/watch?v=HCBwSV1mTdY
Digital Enterprise Research Institute www.deri.ie
Evaluation: Relevance
Relevance
Avg. Precision Avg. Recall MRR % of queries
answered
0.62 0.81 0.49 80%
 Test Collection: QALD 2011.
 DBpedia 3.7 + YAGO.
 102 natural language queries.
Dataset: 45,767 predicates, 5,556,492 classes and 9,434,677
instances
157
Digital Enterprise Research Institute www.deri.ie
Evaluation: Treo Evolution
Version Avg.
Precision
Avg.
Recall
MRR % of queries
answered
QA Query
exec. time
# of Queries
0.1 0.39 0.45 0.42 56% No QA > 2 min 50
0.2 0.48 0.49 0.52 58% No QA > 2 min 50
0.3 0.62 0.81 0.49 80% QA 8,530 s >102
158
Digital Enterprise Research Institute www.deri.ie
Evaluation: Terminological Matching
Avg.
Precision@5
Avg.
Precision@10
MRR % of queries
answered
0.732 0.646 0.646 92.25%
Approach % of queries answered
ESA 92.25%
String matching 45.77%
String matching + WordNet QE 52.48%
159
Digital Enterprise Research Institute www.deri.ie
Performance & Maintainability
160
Digital Enterprise Research Institute www.deri.ie
Do-it-yourself (DIY):
Core Resources
161
Digital Enterprise Research Institute www.deri.ie
Corpora: Wikipedia
 High domain coverage:
 ~95% of Jeopardy! Answers.
 ~98% of TREC answers.
 Wikipedia is entity-centric.
 Curated link structure.
 Where to use:
 Construction of distributional semantic models.
 As a commonsense KB
 Complementary tools:
 Wikipedia Miner
Digital Enterprise Research Institute www.deri.ie
Linked Datasets
 DBpedia: Instances and data.
 YAGO: Classes and instances.
 Freebase: Instances.
 CIA Factbook: Data.
<http://dbpedia.org/class/yago/19th-centuryPresidentsOfTheUnitedStates>
<http://dbpedia.org/class/yago/4th-centuryBCGreekPeople>
<http://dbpedia.org/class/yago/OrganizationsEstablishedIn1918>
<http://dbpedia.org/class/yago/PeopleFromToronto>
<http://dbpedia.org/class/yago/JewishAtheists>
<http://dbpedia.org/class/yago/CountriesOfTheMediterraneanSea>
<http://dbpedia.org/class/yago/TennisPlayersAtThe1996SummerOlympics>
<http://dbpedia.org/class/yago/OlympicGoldMedalistsForTheUnitedStates>
<http://dbpedia.org/class/yago/WorldNo.1TennisPlayers>
<http://dbpedia.org/class/yago/PeopleAssociatedWithTheUniversityOfZurich>
<http://dbpedia.org/class/yago/SwissImmigrantsToTheUnitedStates>
<http://dbpedia.org/class/yago/1979VideoGames>
Digital Enterprise Research Institute www.deri.ie
Linked Datasets
 DBpedia: Instances and data.
 YAGO: Classes and instances.
 Freebase: Instances.
 CIA Factbook: Data.
 Where to use:
 As a commonsense KB.
Digital Enterprise Research Institute www.deri.ie
Dictionaries
 WordNet: Large Lexical Database
 Wikitionary: Dictionary
 Where to use:
 Query expansion.
 Semantic similarity.
 Semantic relatedness.
 Word sense disambiguation.
 Complementary tools:
 WordNet::Similarity
Digital Enterprise Research Institute www.deri.ie
Parsers
 Stanford Parser: POS-Tagger, Syntactic &
Dependency Trees.
 Where to use:
 Question Analysis.
Digital Enterprise Research Institute www.deri.ie
Named Entity Recognition/Resolution
 Stanford NER.
 DBpedia Spotlight
 Treo NER.
 Where to use:
 Question Analysis
Digital Enterprise Research Institute www.deri.ie
Search Engines
 Lucene & Solr: Advanced search engine
framework.
 Treo -Space: Distributional semantic searchƬ
over structured data.
 Treo Entity Search: Semantic search engine for
retrieving individual entities and vocabulary
elements.
Digital Enterprise Research Institute www.deri.ie
Distributional Semantics
 E2SA: High-performance Explicit Semantic
Analysis (ESA) framework (based on NoSQL).
 Semantic Vectors.
 Where to use:
 Semantic relatedness & similiarity.
 Word Sense Disambiguation.
Digital Enterprise Research Institute www.deri.ie
Linked Data Extraction
 Fred: Represents the extracted data using
ontology patterns.
 Graphia: Extracts contextualized Linked Data
graphs.
Digital Enterprise Research Institute www.deri.ie
QA over Linked Data: Roadm
171
Digital Enterprise Research Institute www.deri.ie
Research Topics/Opportunities
 #1: Merge Linked Data extraction into QA4LD.
Digital Enterprise Research Institute www.deri.ie
Motivation
Digital Enterprise Research Institute www.deri.ie
Motivation
Digital Enterprise Research Institute www.deri.ie
Motivation
Digital Enterprise Research Institute www.deri.ie
Extraction Examples
Digital Enterprise Research Institute www.deri.ie
Extraction Examples
Digital Enterprise Research Institute www.deri.ie
Extraction Examples
Digital Enterprise Research Institute www.deri.ie
Extraction Examples
Digital Enterprise Research Institute www.deri.ie
Extraction Examples
Digital Enterprise Research Institute www.deri.ie
Research Topics/Opportunities
 #1: Merge Linked Data extraction into QA4LD.
 #2: Explore complex dialog/context in QA4LD tasks.
 #3: Advance the use of distributional semantic models on
QA.
 #4: Provide the incentives for the advancement of open
robust software resources and high quality data.
(altimetrics.org).
 #5: Multilingual QA.
 #6: Creation of multi-sourced test collections.
 #7: Integration of reasoning (deductive, inductive,
counterfactual, abductive ...) on QA approaches and test
collections.
 #8: Development of QA approaches/tasks which use both
structured and unstructured data.
Digital Enterprise Research Institute www.deri.ie
From QA to Semantic
Applications
(Deriving Patterns)
Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
 Derived from the experience developing Treo.
 Not restricted to QA over Linked Data.
 The following list is not intended to be complete.
Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
 Pattern #1: Maximize the amount of knowledge in
your semantic application.
 Meaning interpretation depends on knowledge.
 Using LOD: DBpedia, Freebase, YAGO can give you
a very comprehensive set of instances and their
types.
 Wikipedia can provide you a comprehensive
distributional semantic model.
Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
 Pattern #2: Allow your databases to grow.
 Dynamic schema.
 Entity-centric data integration.
Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
 Pattern #3: Once the database grows in complexity
use semantic search instead of structured queries.
 Instances can be used as pivot entities to reduce
the search space
 They are easier to search.
 Higher specificity and lower vocabulary variation.
Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
 Pattern #4: Use distributional semantics and
semantic relatedness for a robust semantic
matching.
 Distributional semantics allows your application to
digest (and make use of) large amounts of
unstructured information.
 Multilingual solution.
 Can be complemented with WordNet.
Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
 Pattern #5: POS-Tags, Syntactic Parsing + Rules will
go a long way to help your application to interpret
natural language queries and sentences.
 Use them to explore the regularities in natural
language.
 Define a scope for natural language processing for
your application (restrict by domain, syntactic
complexity).
 These tools are easy to use and quite robust
(*English).
Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
 Pattern #6: Provide a user dialog mechanism in the
application
 Improve the semantic model with user feedback.
 Record the user feedback.
Digital Enterprise Research Institute www.deri.ie
Take-away Message
 Big Data/complex dataspaces demand new principled
semantic approaches to cope with the scale and
heterogeneity of data.
 Information systems in the future will depend on semantic
technologies. Be the first to develop.
 Part of the Semantic Web/AI vision can be addressed today
with a multi-disciplinary perspective:
 Linked Data, IR and NLP
 You can build your own IBM Watson-like application.
 Both data and tools are available and ready to use: the main
barrier is the mindset.
 Huge opportunity for new solutions.
Digital Enterprise Research Institute www.deri.ie
References
[1] Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
[2] Idehen, Understanding Linked Data via EAV Model (2010).
[3] Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web
for Casual End-users? (2007)
[4] Chin-Yew Lin, Question Answering.
[5] Farah Benamara, Question Answering Systems: State of the Art and Future Directions.
[6] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges,
Approaches and Trends, 2012.
[7] Freitas et al, A Distributional Structured Semantic Space for Querying RDF Graph Data.,
2012.
[8] Freitas et al, A Distributional Approach for Terminological Semantic Search on the Linked
Data Web, 2012.
[9] Freitas et al, A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs
from Wikipedia., 2012
[10]Freitas et al., Answering Natural Language Queries over Linked Data Graphs: A
Distributional Semantics Approach,, 2013. 
[11] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges,
Approaches and Trends, 2012.
Digital Enterprise Research Institute www.deri.ie
References
[12] Cimiano et al., Towards portable natural language interfaces to knowledge bases, 2008.
[13] Lopez et al., PowerAqua: shing the semantic web, 2006.fi
[14] Damljanovic et al., Natural Language Interfaces to Ontologies: Combining Syntactic Analysis
and Ontology-based Lookup through the User Interaction, 2010
[16] Unger et al. Template-based Question Answering over RDF Data, 2012.
[17] Cabrio et al., QAKiS: an Open Domain QA System based on Relational Patterns, 2012.
[18] How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users?,
2007.
[19] Popescu et al.,Towards a theory of natural language interfaces to databases., 2003
 
192
Digital Enterprise Research Institute www.deri.ie
Contact
http://andrefreitas.org
andre.freitas@deri.org

More Related Content

What's hot

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsVito Ostuni
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesTraian Rebedea
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
 
RDF: Resource Description Failures?
RDF: Resource Description Failures?RDF: Resource Description Failures?
RDF: Resource Description Failures?Robert Sanderson
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebJie Bao
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremonyFabien Gandon
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataPolytechnic University of Bari
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the WebArmin Haller
 

What's hot (20)

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender Systems
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
 
RDF: Resource Description Failures?
RDF: Resource Description Failures?RDF: Resource Description Failures?
RDF: Resource Description Failures?
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremony
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 

Viewers also liked

Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engineEchelon Institute of Technology
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...Wolfgang Reinhardt
 
위키데이터 개론
위키데이터 개론위키데이터 개론
위키데이터 개론Jeongmin KIM
 
page ranking algorithm
page ranking algorithmpage ranking algorithm
page ranking algorithmJaved Khan
 
Representing Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data GraphsRepresenting Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data GraphsAndre Freitas
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2Andre Freitas
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study Andre Freitas
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalMadhusudan Daad
 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Andre Freitas
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Julien PLU
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureeXascale Infolab
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Giannis Tsakonas
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsJulien PLU
 
SPARQL - Basic and Federated Queries
SPARQL - Basic and Federated QueriesSPARQL - Basic and Federated Queries
SPARQL - Basic and Federated QueriesKnud Möller
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through EntitiesPeter Mika
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsAndre Freitas
 

Viewers also liked (20)

Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engine
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
 
위키데이터 개론
위키데이터 개론위키데이터 개론
위키데이터 개론
 
page ranking algorithm
page ranking algorithmpage ranking algorithm
page ranking algorithm
 
Representing Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data GraphsRepresenting Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data Graphs
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document Retrieval
 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and Refinement
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
 
Enhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER ModelsEnhancing Entity Linking by Combining NER Models
Enhancing Entity Linking by Combining NER Models
 
SPARQL - Basic and Federated Queries
SPARQL - Basic and Federated QueriesSPARQL - Basic and Federated Queries
SPARQL - Basic and Federated Queries
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering Systems
 

Similar to Introduction to question answering for linked data & big data

Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
A distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataA distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataAndre Freitas
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdfPoornimaShetty27
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdfSreenivasa Harish
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesasnaparveen414
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 

Similar to Introduction to question answering for linked data & big data (20)

Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
A distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataA distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph data
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 

More from Andre Freitas

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAndre Freitas
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ ManchesterAndre Freitas
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...Andre Freitas
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachAndre Freitas
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...Andre Freitas
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?Andre Freitas
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackAndre Freitas
 
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Andre Freitas
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional SemanticsAndre Freitas
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsAndre Freitas
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
 

More from Andre Freitas (19)

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary Definitions
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
 
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category Descriptors
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Introduction to question answering for linked data & big data

  • 1. © Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Question Answering over Linked Data & Big Data André Freitas LdB – Semantics: Theory, Technologies and Applications Bari, September 2013
  • 2. Digital Enterprise Research Institute www.deri.ie Goal of this Talk  Understand the changes on the database landscape in the direction of more heterogeneous data scenarios.  Understand how Question Answering (QA) fits into this new scenario.  Give the fundamental pointers to develop your own QA system from the state-of-the-art.  Coverage over depth. 2
  • 3. Digital Enterprise Research Institute www.deri.ie Outline  Motivation & Context  Big Data & QA  Are NLIs useful for Databases?  The Anatomy of a QA System  Evaluation of QA Systems  QA over Linked Data  Treo: Detailed Case Study  Do-it-yourself (DIY): Core Resources  QA over Linked Data: Roadmaps  From QA to Semantic Applications (Deriving Patterns)  Take-away Message 3
  • 4. Digital Enterprise Research Institute www.deri.ie Motivation & Context 4
  • 5. Digital Enterprise Research Institute www.deri.ie Why Question Answering? 5  Humans are built-in with natural language communication capabilities.  Very natural way for humans to communicate information needs.
  • 6. Digital Enterprise Research Institute www.deri.ie What is Question Answering?  A research field on its own.  Empirical bias: Focus on the development of automatic systems to answer questions.  Multidisciplinary:  Natural Language Processing  Information Retrieval  Knowledge Representation  Databases  Linguistics  Artificial Intelligence  Software Engineering  ... 6
  • 7. Digital Enterprise Research Institute www.deri.ie What is Question Answering? 7 QA System Knowledge Bases Question: Who is the daughter of Bill Clinton married to? Answer: Marc Mezvinsky
  • 8. Digital Enterprise Research Institute www.deri.ie QA vs IR  Keyword Search:  User still carries the major efforts in interpreting the data.  Satisfying information needs may depend on multiple search operations.  Answer-driven information access.  Input: Keyword search – Typically specification of simpler information needs.  Output: documents, data.  QA:  Delegates more ‘interpretation effort’ to the machines.  Query-driven information access.  Input: natural language query – Specification of complex information needs.  Output: direct answer. 8
  • 9. Digital Enterprise Research Institute www.deri.ie QA vs Databases  Structured Queries:  A priori user effort in understanding the schemas behind databases.  Effort in mastering the syntax of a query language.  Satisfying information needs may depend on multiple querying operations.  Input: Structured query  Output: data records, aggregations, etc  QA:  Delegates more ‘interpretation effort’ to the machines.  Input: natural language query  Output: direct answer 9
  • 10. Digital Enterprise Research Institute www.deri.ie When to use?  Keyword search:  Simple information needs.  Predictable search behavior.  Vocabulary redundancy (large document collections, Web)  Structured queries:  Precision/recall guarantees.  Small & centralized schemas.  More data volume/less semantic heterogeneity.  QA:  Specification of complex information needs.  More automated semantic interpretation. 10
  • 11. Digital Enterprise Research Institute www.deri.ie QA: Vision 11
  • 12. Digital Enterprise Research Institute www.deri.ie 12 QA: Reality (FB Graph Search)
  • 13. Digital Enterprise Research Institute www.deri.ie 13 QA: Reality (Watson)
  • 14. Digital Enterprise Research Institute www.deri.ie 14 QA: Reality (Watson)
  • 15. Digital Enterprise Research Institute www.deri.ie 15 QA: Reality (Siri)
  • 16. Digital Enterprise Research Institute www.deri.ie To Summarize  QA is usually associated with the delegation of more of the‘interpretation effort’ to the machines.  QA supports the specification of more complex information needs.  QA, Information Retrieval and Databases are complementary. 16
  • 17. Digital Enterprise Research Institute www.deri.ie Big Data & QA 17
  • 18. Digital Enterprise Research Institute www.deri.ie Big Data  Big Data: More complete data-based picture of the world. 18
  • 19. Digital Enterprise Research Institute www.deri.ie From Rigid Schema to Schemaless 10s-100s attributes 1,000s-1,000,000s attributes  Heterogeneous, complex and large-scale databases.  Very-large and dynamic “schemas”. 19 circa 2000 circa 2013
  • 20. Digital Enterprise Research Institute www.deri.ie Mutiple Interpretations  Multiple perspectives (conceptualizations) of the reality.  Ambiguity, vagueness, inconsistency. 20
  • 21. Digital Enterprise Research Institute www.deri.ie Big Data & Dataspaces  Franklin et al. (2005): From Databases to Dataspaces.  Helland (2011): If You Have Too Much Data, then “Good Enough” Is Good Enough.  Fundamental trends:  Co-existence of heterogeneous data.  Semantic best-effort queries.  Pay-as-you go data integration.  Co-existent query/search services. 21
  • 22. Digital Enterprise Research Institute www.deri.ie Big Data & NoSQL  From Relational to NoSQL Databases.  NoSQL (Not only SQL).  Four trends (Emil Eifrem)  Trend 1: Data set size.  Trend 2: Connectedness.  Trends 3 & 4: Semi-structured data & decentralized architecture. 22 Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
  • 23. Digital Enterprise Research Institute www.deri.ie Trend 1: Data size 23 Gantz & Reinsel, The Digital Universe in 2020 (2012).
  • 24. Digital Enterprise Research Institute www.deri.ie Trend 2: Connectedness 24 Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
  • 25. Digital Enterprise Research Institute www.deri.ie Trends 3 & 4: Semi-structured data  Individualization of content.  Decentralization of content generation. 25 Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
  • 26. Digital Enterprise Research Institute www.deri.ie Emerging NoSQL Platforms  Key-value stores  Data model: collection of key values (K/V) pairs.  Example: Voldemort.  BigTables clones  Data model: Big Table.  Example: Hbase, Hypertable.  Document databases  Data Model: Collections of K/V collections.  Example: MongoDB, CouchDB.  Graph databases  Data Model: nodes, edges.  Example: AllegroGraph, VertexDB, Neo4J, Semantic Web DBs. 26 Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
  • 27. Digital Enterprise Research Institute www.deri.ie NoSQL Databases 27 Size Complexity Key-value stores Bigtable clones Document databases Graph databases Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
  • 28. Digital Enterprise Research Institute www.deri.ie Big Data  Volume  Velocity  Variety 28 - The most interesting but usually neglected dimension.
  • 29. Digital Enterprise Research Institute www.deri.ie Big Data & Linked Data  Linked Data is Big Data (Variety)  Entity-Attribute-Value (EAV) Data Model.  Entities:  Identifiers  Attributes  Attribute Values  Linked Data as a special type of the EAV Data Model:  EAV/CR: Classes and Relations.  URIs as a superkey.  Dereferentiable URIs.  Standards-based (RDF/HTTP).  EAV as an anti-pattern. 29 Idehen, Understanding Linked Data via EAV Model (2010).
  • 30. Digital Enterprise Research Institute www.deri.ie Big Data & Linked Data 30
  • 31. Digital Enterprise Research Institute www.deri.ie Big Data: Structured queries 31  Big Problem: Structured queries are still the primary way to query databases.
  • 32. Digital Enterprise Research Institute www.deri.ie Big Data: Structured queries 10-100s attributes 103 -106 s attributes 32 Query construction size x schema size
  • 33. Digital Enterprise Research Institute www.deri.ie Query/Search Spectrum Adapted from Kauffman et al (2009) Structured queries 33
  • 34. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to? Schema-agnostic queriesSemantic Gap Possible representations 34
  • 35. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to ? Semantic Gap Lexical-level Abstraction-level Structural-level 35
  • 36. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to ? Semantic Gap Lexical-level Abstraction-level Structural-level Query: Data 36
  • 37. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to ? Semantic Gap Lexical-level Abstraction-level Structural-level Query: Data 37 Popescu et al. (2003): Semantic tractability
  • 38. Digital Enterprise Research Institute www.deri.ie 38 Big Data & QAs QA Big Data semantic gap, vocabulary gap QA Big Data distributional semantics
  • 39. Digital Enterprise Research Institute www.deri.ie To Summarize  Schema size and heterogeneity represent a fundamental shift for databases.  Addressing the associated data management challenges (specially querying) depends on the development of principled semantic models for databases.  QA/Natural Language Interfaces (NLIs) as schema- agnostic query mechanisms. 39
  • 40. Digital Enterprise Research Institute www.deri.ie Are NLIs useful for Databases? 40
  • 41. Digital Enterprise Research Institute www.deri.ie 41 NLI & QAs  Natural Language Interfaces (NLI)  Input: Natural language queries  Output: Either processed or unprocessed queries – Processed: Direct answers. – Unprocessed: Database records, text snippets, documents. NLI QA
  • 42. Digital Enterprise Research Institute www.deri.ie 42  Kaufmann & Bernstein (2007).  User study based on 4 different types of interfaces: 1. NLP-Reduce: Simple keyword-based NLI – Domain-independent – Bag of Words – WordNet-based query expansion 2. Querix: More complex NLI – Domain-independent – Query parsing – WordNet-based query expansion – Clarification dialogs Comparative study
  • 43. Digital Enterprise Research Institute www.deri.ie 43  Kaufmann & Bernstein (2007).  User study based on 4 different types of interfaces: 3. Ginseng: Guided input NLI – Grammar-based suggestions 4. Semantic Crystal: Visual query interface – Graphically displayable and clickable Comparative study
  • 44. Digital Enterprise Research Institute www.deri.ie 44 NLP-Reduce Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? (2007).
  • 45. Digital Enterprise Research Institute www.deri.ie 45 Querix Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? (2007).
  • 46. Digital Enterprise Research Institute www.deri.ie 46 Ginseng Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? (2007).
  • 47. Digital Enterprise Research Institute www.deri.ie 47 Semantic Crystal Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? (2007).
  • 48. Digital Enterprise Research Institute www.deri.ie 48  48 subjects (different areas and ages).  4 questions.  Query metrics + System Usability Scale (SUS).  Tang & Mooney Dataset. User study Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? (2007).
  • 49. Digital Enterprise Research Institute www.deri.ie 49  48 subjects (different areas and ages).  4 questions.  Query metrics + System Usability Scale (SUS).  Tang & Mooney Dataset. User study Brooke, SUS - A quick and dirty usability scale.
  • 50. Digital Enterprise Research Institute www.deri.ie 50  Querix: Full English question input was judged to be the most useful and best-liked query interface.  Users appreciated the “freedom” given by full natural language queries.  Possible interpretations:  No need to convert natural language to keyword search.  More complete expression of information needs.  Limitations:  Number of queries  Database size Results
  • 51. Digital Enterprise Research Institute www.deri.ie The Anatomy of a QA System 51
  • 52. Digital Enterprise Research Institute www.deri.ie Basic Concepts & Taxonomy  Categorization of questions and answers.  Important for:  Understanding the challenges before attacking the problem.  Scoping the system.  Based on:  Chin-Yew Lin: Question Answering.  Farah Benamara: Question Answering Systems: State of the Art and Future Directions. 52
  • 53. Digital Enterprise Research Institute www.deri.ie Terminology: Question Phrase  The part of the question that says what is being asked:  Wh-words: – who, what, which, when, where, why, and how  Wh-words + nouns, adjectives or adverbs: – •“which party …”, “which actress …”, “how long …”, “how tall …”. 53
  • 54. Digital Enterprise Research Institute www.deri.ie Terminology: Question Type  Useful for distinguishing different processing strategies  FACTOID: “Who is the wife of Barack Obama?”  LIST: “Give me all cities in the US with less than 10000 inhabitants.”  DEFINITION: “Who was Tom Jobim?”  RELATIONSHIP: “What is the connection between Barack Obama and Indonesia?”  SUPERLATIVE: “What is the highest mountain?”  YES-NO: “Was Margaret Thatcher a chemist?”  OPINION: “What do most Americans think of gun control?”  CAUSE & EFFECT: “Why did the revenue of IBM drop?” 54
  • 55. Digital Enterprise Research Institute www.deri.ie Terminology: Answer Type  The class of object sought by the question  Person (from “Who …”)  Place (from “Where …”)  Process & Method (from “How …”)  Date (from “When …”)  Number (from “How many …”)  Explanation & Justification (from “Why …”) 55
  • 56. Digital Enterprise Research Institute www.deri.ie Terminology: Question Focus & Topic  Question focus is the property or entity that is being sought by the question  “In which city Barack Obama was born?”  “What is the population of Galway?”  Question topic: What the question is generally about :  “What is the height of Mount Everest?” (geography, mountains)  “Which organ is affected by the Meniere’s disease?” (medicine) 56
  • 57. Digital Enterprise Research Institute www.deri.ie Terminology: Data Source Type  Structure level:  Structured data (databases)  Semi-structured data (e.g. comment field in databases, XML)  Free text  Data source:  Single (Centralized)  Multiple  Web-scale 57
  • 58. Digital Enterprise Research Institute www.deri.ie Terminology: Domain Type  Domain Scope:  Open Domain  Domain specific  Data Type:  Text  Image  Sound  Video  Multi-modal QA 58
  • 59. Digital Enterprise Research Institute www.deri.ie Terminology: Answer Format  Long answers  Definition/justification based.  Short answers  Phrases.  Exact answers  Named entities, numbers, aggregate, yes/no 59
  • 60. Digital Enterprise Research Institute www.deri.ie Answer Quality Criteria  Relevance: The level in which the answer addresses users information needs.  Correctness: The level in which the answer is factually correct.  Conciseness: The answer should not contain irrelevant information.  Completeness: The answer should be complete.  Simplicity: The answer should be simple to be interpreted by the data consumer.  Justification: Sufficient context should be provided to support the data consumer in the determination of the query correctness.
  • 61. Digital Enterprise Research Institute www.deri.ie Answer Assessment  Right: The answer is correct and complete.  Inexact: The answer is incomplete or incorrect.  Unsupported: the answers does not have an appropriate justification.  Wrong: The answer is not appropriate for the question.
  • 62. Digital Enterprise Research Institute www.deri.ie Answer Processing  Simple Extraction: cut and paste of snippets from the original document(s) / records from the data.  Combination: Combines excerpts from multiple sentences, documents / multiple data records, databases.  Summarization: Synthesis from large texts / data collections.  Operational/functional: Depends on the application of functional operators.  Reasoning: Depends on the an inference process. 62
  • 63. Digital Enterprise Research Institute www.deri.ie Complexity of the QA Task  Semantic Tractability (Popescu et al., 2003): Vocabulary distance between the query and the answer.  Answer Locality (Webber et al., 2003): Whether answer fragments are distributed across different document fragments/documents or datasets/dataset records.  Derivability (Webber et al, 2003): Dependent if the answer is explicit or implicit. Level of reasoning dependency.  Semantic Complexity: Level of ambiguity and discourse/data heterogeneity. 63
  • 64. Digital Enterprise Research Institute www.deri.ie Main Components 64 Question Analysis Search Answer Extraction Question Keyword query Query features Data records/ Passages Answers Documents/ Datasets
  • 65. Digital Enterprise Research Institute www.deri.ie Main Components  Question Analysis: Includes question parsing, extraction of core features (NER, answer type, etc).  Search (i.e. passage retrieval): Pre-selection of fragments of text (sentences, paragraphs, documents) and data (records, datasets) which may contain the answer.  Answer Extraction: Processing of the answer based on the passages. 65
  • 66. Digital Enterprise Research Institute www.deri.ie Evaluation of QA Systems 66
  • 67. Digital Enterprise Research Institute www.deri.ie Evaluation Campaigns  Test Collection  Questions  Datasets  Answers (Gold-standard)  Evaluation Measures
  • 68. Digital Enterprise Research Institute www.deri.ie Recall  Measures how complete is the answer set.  The fraction of relevant instances that are retrieved.  Which are the Jovian planets in the Solar System?  Returned Answers: – Mercury – Jupiter – Saturn  Gold-standard: – Jupiter – Saturn – Neptune – Uranus Recall = 2/4 = 0.5
  • 69. Digital Enterprise Research Institute www.deri.ie Precision  Measures how accurate is the answer set.  The fraction of retrieved instances that are relevant.  Which are the Jovian planets in the Solar System?  Returned Answers: – Mercury – Jupiter – Saturn  Gold-standard: – Jupiter – Saturn – Neptune – Uranus Recall = 2/3 = 0.67
  • 70. Digital Enterprise Research Institute www.deri.ie Mean Reciprocal Rank (MRR)  Measures the ranking quality.  The Reciprocal-Rank (1/r) of a query can be defined as the rank r at which a system returns the first relevant entity.  Which are the Jovian planets in the Solar System?  Returned Answers: – Mercury – Jupiter – Saturn  Gold-standard: – Jupiter – Saturn – Neptune – Uranus rr = 1/2 = 0.5
  • 71. Digital Enterprise Research Institute www.deri.ie Mean Average Interpolated Precision (MAiP)  Computes the Interpolated-Precision at a set of n standard recall levels (1%, 10%, 20%, etc).  Average-Interpolated-Precision (AiP) is a single-valued measure that reflects the performance of a search engine over all the relevant results.  Mean-Average-Interpolated-Precision (MAiP) that reflects the performance of a system over all the results.
  • 72. Digital Enterprise Research Institute www.deri.ie Normalized Discounted Cumulative Gain (NDCG)  Discounted-Cumulative-Gain (DCG) uses a graded relevance scale to measure the gain of a system based on the positions of the relevant entities in the result set.  This measure gives a lower gain to relevant entities returned in the lower ranks to that of the higher ranks.
  • 73. Digital Enterprise Research Institute www.deri.ie Test Collections  Question Answering over Linked Data (QALD-CLEF)  INEX Linked Data Track  BioASQ  SemSearch
  • 74. Digital Enterprise Research Institute www.deri.ie QALD
  • 75. Digital Enterprise Research Institute www.deri.ie QALD-1, ESWC (2011)  Datasets:  Dbpedia 3.6 (RDF)  MusicBrainz (RDF)  Tasks:  Training questions: 50 questions for each dataset  Test questions: 50 questions for each dataset http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=challenge&q=1
  • 76. Digital Enterprise Research Institute www.deri.ie QALD-1, ESWC (2011)
  • 77. Digital Enterprise Research Institute www.deri.ie QALD-1, ESWC (2011)  Which presidents were born in 1945?  Who developed the video game World of Warcraft?  List all episodes of the first season of the HBO television series The Sopranos!  Who produced the most films?  Which mountains are higher than the Nanga Parbat?  Give me all actors starring in Batman Begins.  Which software has been developed by organizations founded in California?  Which companies work in the aerospace industry as well as on nuclear reactor technology?  Is Christian Bale starring in Batman Begins?  Give me the websites of companies with more than 500000 employees.  Which cities have more than 2 million inhabitants?
  • 78. Digital Enterprise Research Institute www.deri.ie QALD-2, ESWC (2012)  Datasets:  Dbpedia 3.7 (RDF)  MusicBrainz (RDF)  Tasks:  Training questions: 100 questions for each dataset  Test questions: 50 questions for each dataset http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=challenge&q=1
  • 79. Digital Enterprise Research Institute www.deri.ie QALD-3, CLEF (2013)  Datasets:  Dbpedia 3.7 (RDF)  MusicBrainz (RDF)  Tasks:  Multilingual QA – Given a RDF dataset and a natural language question or set of keywords in one of six languages (English, Spanish, German, Italian, French, Dutch), either return the correct answers, or a SPARQL query that retrieves these answers.  Ontology Lexicalization http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=task1&q=3
  • 80. Digital Enterprise Research Institute www.deri.ie INEX Linked Data Track (2013)  Focuses on the combination of textual and structured data.  Datasets:  English Wikipedia (MediaWiki XML Format)  DBpedia 3.8 & YAGO2 (RDF)  Links among the Wikipedia, DBpedia 3.8, and YAGO2 URI's.  Tasks:  Ad-hoc Task: return a ranked list of results in response to a search topic that is formulated as a keyword query (144 search topics).  Jeopardy Task: Investigate retrieval techniques over a set of natural- language Jeopardy clues (105 search topics – 74 (2012) + 31 (2013)). https://inex.mmci.uni-saarland.de/tracks/lod/
  • 81. Digital Enterprise Research Institute www.deri.ie INEX Linked Data Track (2013)
  • 82. Digital Enterprise Research Institute www.deri.ie SemSearch Challenge  Focuses on entity search over Linked Datasets.  Datasets:  Sample of Linked Data crawled from publicly available sources (based on the Billion Triple Challenge 2009).  Tasks:  Entity Search: Queries that refer to one particular entity. Tiny sample of Yahoo! Search Query.  List Search: The goal of this track is select objects that match particular criteria. These queries have been hand-written by the organizing committee. http://semsearch.yahoo.com/datasets.php#
  • 83. Digital Enterprise Research Institute www.deri.ie SemSearch Challenge  List Search queries:  republics of the former Yugoslavia  ten ancient Greek city  kingdoms of Cyprus  the four of the companions of the prophet  Japanese-born players who have played in MLB where the British monarch is also head of state  nations where Portuguese is an official language  bishops who sat in the House of Lords  Apollo astronauts who walked on the Moon
  • 84. Digital Enterprise Research Institute www.deri.ie SemSearch Challenge  Entity Search queries:  1978 cj5 jeep  employment agencies w. 14th street  nyc zip code  waterville Maine  LOS ANGELES CALIFORNIA  ibm  KARL BENZ  MIT
  • 85. Digital Enterprise Research Institute www.deri.ie  Datasets:  PubMed documents  Tasks:  1a: Large-Scale Online Biomedical Semantic Indexing – Automatic annotation of PubMed documents. – Training data is provided.  1b: Introductory Biomedical Semantic QA – 300 questions and related material (concepts, triples and golden answers).
  • 86. Digital Enterprise Research Institute www.deri.ie Baseline Balog & Neumayer, A Test Collection for Entity Search in Dbpedia (2013).
  • 87. Digital Enterprise Research Institute www.deri.ie Going Deeper  Metrics, Statistics, Tests - Tetsuya Sakai  http://www.promise-noe.eu/documents/10156/26e7f254-1feb-41  Building test Collections (IR Evaluation - Ian Soboroff)  http://www.promise-noe.eu/documents/10156/951b6dfb-a404-46
  • 88. Digital Enterprise Research Institute www.deri.ie QA over Linked Data 88
  • 89. Digital Enterprise Research Institute www.deri.ie The Semantic Web Vision 2001: Software which is able to understand meaning (intelligent, flexible) Leveraging the Web for information scale
  • 90. Digital Enterprise Research Institute www.deri.ie The Semantic Web Vision  What was the plan to achieve it?  Build a Semantic Web Stack  Which covers both representation and reasoning
  • 91. Digital Enterprise Research Institute www.deri.ie Reality Check  Adoption:  No significant data growth  Ontologies are not straightforward to build:  People are not familiriazed with the tools and principles  Difficult to keep consistency at Web scale  Scalability
  • 92. Digital Enterprise Research Institute www.deri.ie Reasoning  Problems:  Consistecy  Scalability Logic World Web World
  • 93. Digital Enterprise Research Institute www.deri.ie Linked Data  The Web as a Huge Database  Fundamental step for data creation 2006:
  • 94. Digital Enterprise Research Institute www.deri.ie No Reasoning, no Fun?  Where are the intelligence and flexibility?  We will be back to this point in a minute
  • 95. Digital Enterprise Research Institute www.deri.ie From which university did the wife of Barack Obama graduate? Consuming/Using Linked Data  With Linked Data we are still in the DB world
  • 96. Digital Enterprise Research Institute www.deri.ie Consuming/Using Linked Data  With Linked Data we are still in the DB world  (but slightly worse)
  • 97. Digital Enterprise Research Institute www.deri.ie Linked Data  Data Model Features:  Graph-based data model  Extensible schema  Entity-centric data integration  Specific Features:  Designed over open Web standards  Based on the Web infrastructure (HTTP, URIs)
  • 98. Digital Enterprise Research Institute www.deri.ie Linked Data  Positives:  Solid adoption in the Open Data context (eGovernment, eScience, etc,...)  Existing data is relevant (you can build real applications)  Negatives:  Data consumption is a problem  Data generation beyond databases mapping/triplification is also a problem  Still far from the Semantic Web vision
  • 99. Digital Enterprise Research Institute www.deri.ie QA for Linked Data  Addresses practical problems of data accessibility in a data heterogeneity scenario.  A fundamental part of the original Semantic Web vision. 99
  • 100. Digital Enterprise Research Institute www.deri.ie QA4LD Requirements  High usability:  Supporting natural language queries.  High expressivity:  Path, conjunctions, disjunctions, aggregations, conditions.  Accurate & comprehensive semantic matching:  High precision and recall.  Low maintainability:  Easily transportable across datasets from different domains (minimum adaptation effort/low adaptation time).  Low query execution time:  Suitable for interactive querying.  High scalability:  Scalable to a large number of datasets (Organization-scale, Web- scale). 100
  • 101. Digital Enterprise Research Institute www.deri.ie Exemplar QA Systems  Aqualog & PowerAqua (Lopez et al. 2006)  ORAKEL (Cimiano et al, 2007)  QuestIO & Freya (Damljanovic et al. 2010)  Treo (Freitas et al. 2011, 2012) 101
  • 102. Digital Enterprise Research Institute www.deri.ie PowerAqua (Lopez et al. 2006)  Key contribution: semantic similarity mapping.  Terminological Matching:  WordNet-based  Ontology Based  String similarity  Sense-based similarity matcher  Evaluation: QALD (2011).  Extends the AquaLog system. 102
  • 103. Digital Enterprise Research Institute www.deri.ie WordNet 103
  • 104. Digital Enterprise Research Institute www.deri.ie Sense-based similarity matcher  Two words are strongly similar if any of the following holds:  1. They have a synset in common (e.g. “human” and “person”)  2. A word is a hypernym/hyponym in the taxonomy of the other word.  3. If there exists an allowable “is-a” path connecting a synset associated with each word.  4. If any of the previous cases is true and the definition (gloss) of one of the synsets of the word (or its direct hypernyms/hyponyms) includes the other word as one of its synonyms, we said that they are highly similar. 104 Lopez et al. 2006
  • 105. Digital Enterprise Research Institute www.deri.ie PowerAqua (Lopez et al. 2006) 105 Lopez et al. 2006
  • 106. Digital Enterprise Research Institute www.deri.ie ORAKEL (Cimiano et al. 2007)  Key contribution:  FrameMapper lexicon engineering approach.  Parsing strategy: Logical Description Grammars (LDGs). – LDG is inspired by Lexicalized Tree Adjoining Grammars (LTAGs). – An important characteristic of these trees is that they encapsulate all syntactic/semantic arguments of a word.  Terminological Matching:  FrameMapper  Evaluation:  Dataset: domain-specific  Dimensions: Relevance, Performance, lexicon construction 106
  • 107. Digital Enterprise Research Institute www.deri.ie ORAKEL (Cimiano et al. 2007)  More sophisticated parsing strategy: Logical Description Grammars (LDGs).  LDG is inspired by Lexicalized Tree Adjoining Grammars (LTAGs).  An important characteristic of these trees is that they encapsulate all syntactic/semantic arguments of a word. 107
  • 108. Digital Enterprise Research Institute www.deri.ie ORAKEL (Cimiano et al. 2007) 108
  • 109. Digital Enterprise Research Institute www.deri.ie ORAKEL (Cimiano et al. 2007) 109
  • 110. Digital Enterprise Research Institute www.deri.ie Freya (Damljanovic et al. 2010)  Key contribution: user interaction (dialogs) for terminological matching.  Terminological Matching:  WordNet & Cyc (synonyms)  String similarity (Monge Elkan + Sundex)  User feedback  Extends the QuestIO system.  Evaluation: Mooney (2010), QALD (2011) 110
  • 111. Digital Enterprise Research Institute www.deri.ie Freya (Damljanovic et al. 2010) 111 Ontology-based Gazeteer (GATE) SPARQL Generation Ontology Query Analysis Synonym (WordNet + Cyc)
  • 112. Digital Enterprise Research Institute www.deri.ie Other QA Approaches  Unger et al. (2012), Template-based Question Answering over RDF Data.  Cabrio et al. (2012), QAKIS: An Open Domain QA System based on Relational Patterns. 112
  • 113. Digital Enterprise Research Institute www.deri.ie Treo: Detailed Case Study 113
  • 114. Digital Enterprise Research Institute www.deri.ie Treo (Irish): Direction, path
  • 115. Digital Enterprise Research Institute www.deri.ie Treo (Freitas et al. 2011, 2012)  Key contribution:  Distributional semantic relatedness matching model  Distributional relational space.  Terminological Matching:  Explicit Semantic Analysis (ESA)  WordNet  String similarity + node cardinality  Evaluation: QALD (2011) 115
  • 116. Digital Enterprise Research Institute www.deri.ie Statistical analysis Linked Data
  • 117. Digital Enterprise Research Institute www.deri.ie Query Pre-Processing (Question Analysis)  Transform natural language queries into triple patterns “Who is the daughter of Bill Clinton married to?”
  • 118. Digital Enterprise Research Institute www.deri.ie Query Pre-Processing (Question Analysis)  Step 1: POS Tagging  Who/WP  is/VBZ  the/DT  daughter/NN  of/IN  Bill/NNP  Clinton/NNP  married/VBN  to/TO  ?/.
  • 119. Digital Enterprise Research Institute www.deri.ie Query Pre-Processing (Question Analysis)  Step 2: Core Entity Recognition  Rules-based: POS Tag + TF/IDF Who is the daughter of Bill Clinton married to? (PROBABLY AN INSTANCE)
  • 120. Digital Enterprise Research Institute www.deri.ie Query Pre-Processing (Question Analysis)  Step 3: Determine answer type  Rules-based. Who is the daughter of Bill Clinton married to? (PERSON)
  • 121. Digital Enterprise Research Institute www.deri.ie Query Pre-Processing (Question Analysis)  Step 4: Dependency parsing  dep(married-8, Who-1)  auxpass(married-8, is-2)  det(daughter-4, the-3)  nsubjpass(married-8, daughter-4)  prep(daughter-4, of-5)  nn(Clinton-7, Bill-6)  pobj(of-5, Clinton-7)  root(ROOT-0, married-8)  xcomp(married-8, to-9)
  • 122. Digital Enterprise Research Institute www.deri.ie Query Pre-Processing (Question Analysis)  Step 5: Determine Partial Ordered Dependency Structure (PODS)  Rules based. – Remove stop words. – Merge words into entities. – Reorder structure from core entity position. Bill Clinton daughter married to (INSTANCE) Person ANSWER TYPE QUESTION FOCUS
  • 123. Digital Enterprise Research Institute www.deri.ie Query Pre-Processing (Question Analysis)  Step 5: Determine Partial Ordered Dependency Structure (PODS)  Rules based. – Remove stop words. – Merge words into entities. – Reorder structure from core entity position. Bill Clinton daughter married to (INSTANCE) Person (PREDICATE) (PREDICATE) Query Features
  • 124. Digital Enterprise Research Institute www.deri.ie Query Planning  Map query features into a query plan.  A query plan contains a sequence of:  Search operations.  Navigation operations. (INSTANCE) (PREDICATE) (PREDICATE) Query Features  (1) INSTANCE SEARCH (Bill Clinton)  (2) DISAMBIGUATE ENTITY TYPE  (3) GENERATE ENTITY FACETS  (4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)  (5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)  (6) p2 <- SEARCH RELATED PREDICATE (e1, married to)  (7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)  (8) POST PROCESS (Bill Clintion, e1, p1, e2, p2) Query Plan
  • 125. Digital Enterprise Research Institute www.deri.ie Core Entity Search  Entity Index:  Construction of entity index (instances, classes and complex classes).  Extract terms from URIs and index the terms using an inverted index.  Search instances by keywords  Entity Search (Instance Example):  Input: Keyword search (Bill Clinton).  Ranking by: string similarity and entity cardinality.  Output: List of URIs.  Core rationale:  Prioritize the matching of less ambiguous/polysemic entities.  Prioritize more popular entities.
  • 126. Digital Enterprise Research Institute www.deri.ie Core Entity Search Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: Entity Search
  • 127. Digital Enterprise Research Institute www.deri.ie Core Entity Search  Entity Disambiguation:  Get entity types.  Compute distributional semantic relatedness between entity types: – Explicit Semantic Analysis (ESA). – semantic relatedness (Bill Clinton, President) = 0.11 – semantic relatedness (Bill Clinton, Politician)= 0.052 – semantic relatedness (Bill Clinton, Actor) = 0.011  Generate Entity Types facets. http://vmdeb14.deri.ie:8080/esa/service?task=esa&term1=bill%20clinton&term2=politician
  • 128. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... (PIVOT ENTITY) (ASSOCIATED TRIPLES)
  • 129. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists :religion :Yale_Law_School :almaMater ... sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 sem_rel(daughter,alma mater)=0.001 Which properties are semantically related to ‘daughter’?
  • 130. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child
  • 131. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Relatedness  Computation of a measure of “semantic proximity” between two terms.  Allows a semantic approximate matching between query terms and dataset terms.  It supports a commonsense reasoning-like behavior based on the knowledge embedded in the corpus.
  • 132. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Search  Use distributional semantics to semantically match query terms to predicates and classes.  Distributional principle: Words that co-occur together tend to have related meaning.  Allows the creation of a comprehensive semantic model from unstructured text.  Based on statistical patterns over large amounts of text.  No human annotations.  Distributional semantics can be used to compute a semantic relatedness measure between two words.
  • 133. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child (PIVOT ENTITY)
  • 134. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Search Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Mark_Mezvinsky :spouse
  • 135. Digital Enterprise Research Institute www.deri.ie Semantic Relatedness  Computation of a measure of “semantic proximity” between two terms  Allows a semantic approximate matching between query terms and dataset terms  It supports a reasoning-like behavior based on the knowledge embedded in the corpus
  • 136. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Search  Use distributional semantics to semantically match query terms to predicates and classes  Distributional principle: Words that co-occur together tend to have related meaning  Allows the creation of a comprehensive semantic model from unstructured text  Based on statistical patterns over large amounts of text  No human annotations  Distributional semantics can be used to compute a semantic relatedness measure between two words
  • 137. Digital Enterprise Research Institute www.deri.ie Results
  • 138. Digital Enterprise Research Institute www.deri.ie Post-Processing
  • 139. Digital Enterprise Research Institute www.deri.ie Second Query Example What is the highest mountain? (CLASS) (OPERATOR) Query Features mountain - highest PODS
  • 140. Digital Enterprise Research Institute www.deri.ie Entity Search Mountain highest :Mountain Query: Linked Data: :typeOf (PIVOT ENTITY)
  • 141. Digital Enterprise Research Institute www.deri.ie Extensional Expansion Mountain highest :Mountain Query: Linked Data: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ...
  • 142. Digital Enterprise Research Institute www.deri.ie Distributional Semantic Matching Mountain highest :Mountain Query: Linked Data: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ... :elevation :location ... :deathPlaceOf
  • 143. Digital Enterprise Research Institute www.deri.ie Get all numerical values Mountain highest :Mountain Query: Linked Data: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ... :elevation :elevation 8848 m 8611 m
  • 144. Digital Enterprise Research Institute www.deri.ie Apply operator functional definition Mountain highest :Mountain Query: Linked Data: :Everest :typeOf (PIVOT ENTITY) :K2:typeOf ... :elevation :elevation 8848 m 8611 m SORT TOP_MOST
  • 145. Digital Enterprise Research Institute www.deri.ie Results
  • 146. Digital Enterprise Research Institute www.deri.ie From Exact to Approximate  Semantic approximation in databases (as in any IR system): semantic best-effort.  Need some level of user disambiguation, refinement and feedback.  As we move in the direction of semantic systems we should expect the need for principled dialog mechanisms (like in human communication).  Pull the the user interaction back into the system.
  • 147. Digital Enterprise Research Institute www.deri.ie From Exact to Approximate
  • 148. Digital Enterprise Research Institute www.deri.ie User Feedback
  • 149. Digital Enterprise Research Institute www.deri.ie Treo Architecture
  • 150. Digital Enterprise Research Institute www.deri.ie Core Elements of Treo  Hybrid model: database/IR/QA.  Ranked query results.  A distributional VSM for representing and semantically processing relational data was formulated: Ƭ-Space.  Similar in motivation to Cohen’s predication space.  Distributional semantic relatedness as a primitive operation. 150
  • 151. Digital Enterprise Research Institute www.deri.ie -SpaceƬ ESA EAV vector field 151
  • 152. Digital Enterprise Research Institute www.deri.ie Simple Queries (Video)
  • 153. Digital Enterprise Research Institute www.deri.ie More Complex Queries (Video)
  • 154. Digital Enterprise Research Institute www.deri.ie Vocabulary Search (Video)
  • 155. Digital Enterprise Research Institute www.deri.ie Treo Answers Jeopardy Queries (Video)
  • 156. Digital Enterprise Research Institute www.deri.ie Video Links:  Introducing Treo: Talk to your Data  http://www.youtube.com/watch?v=Zor2X0uoKsM  Treo: Do it your own (DIY) Jeopardy Question Answering Engine  http://www.youtube.com/watch?v=Vqh0r8GxYe8  Treo: Semantic Search over Schema & Vocabularies  http://www.youtube.com/watch?v=HCBwSV1mTdY
  • 157. Digital Enterprise Research Institute www.deri.ie Evaluation: Relevance Relevance Avg. Precision Avg. Recall MRR % of queries answered 0.62 0.81 0.49 80%  Test Collection: QALD 2011.  DBpedia 3.7 + YAGO.  102 natural language queries. Dataset: 45,767 predicates, 5,556,492 classes and 9,434,677 instances 157
  • 158. Digital Enterprise Research Institute www.deri.ie Evaluation: Treo Evolution Version Avg. Precision Avg. Recall MRR % of queries answered QA Query exec. time # of Queries 0.1 0.39 0.45 0.42 56% No QA > 2 min 50 0.2 0.48 0.49 0.52 58% No QA > 2 min 50 0.3 0.62 0.81 0.49 80% QA 8,530 s >102 158
  • 159. Digital Enterprise Research Institute www.deri.ie Evaluation: Terminological Matching Avg. Precision@5 Avg. Precision@10 MRR % of queries answered 0.732 0.646 0.646 92.25% Approach % of queries answered ESA 92.25% String matching 45.77% String matching + WordNet QE 52.48% 159
  • 160. Digital Enterprise Research Institute www.deri.ie Performance & Maintainability 160
  • 161. Digital Enterprise Research Institute www.deri.ie Do-it-yourself (DIY): Core Resources 161
  • 162. Digital Enterprise Research Institute www.deri.ie Corpora: Wikipedia  High domain coverage:  ~95% of Jeopardy! Answers.  ~98% of TREC answers.  Wikipedia is entity-centric.  Curated link structure.  Where to use:  Construction of distributional semantic models.  As a commonsense KB  Complementary tools:  Wikipedia Miner
  • 163. Digital Enterprise Research Institute www.deri.ie Linked Datasets  DBpedia: Instances and data.  YAGO: Classes and instances.  Freebase: Instances.  CIA Factbook: Data. <http://dbpedia.org/class/yago/19th-centuryPresidentsOfTheUnitedStates> <http://dbpedia.org/class/yago/4th-centuryBCGreekPeople> <http://dbpedia.org/class/yago/OrganizationsEstablishedIn1918> <http://dbpedia.org/class/yago/PeopleFromToronto> <http://dbpedia.org/class/yago/JewishAtheists> <http://dbpedia.org/class/yago/CountriesOfTheMediterraneanSea> <http://dbpedia.org/class/yago/TennisPlayersAtThe1996SummerOlympics> <http://dbpedia.org/class/yago/OlympicGoldMedalistsForTheUnitedStates> <http://dbpedia.org/class/yago/WorldNo.1TennisPlayers> <http://dbpedia.org/class/yago/PeopleAssociatedWithTheUniversityOfZurich> <http://dbpedia.org/class/yago/SwissImmigrantsToTheUnitedStates> <http://dbpedia.org/class/yago/1979VideoGames>
  • 164. Digital Enterprise Research Institute www.deri.ie Linked Datasets  DBpedia: Instances and data.  YAGO: Classes and instances.  Freebase: Instances.  CIA Factbook: Data.  Where to use:  As a commonsense KB.
  • 165. Digital Enterprise Research Institute www.deri.ie Dictionaries  WordNet: Large Lexical Database  Wikitionary: Dictionary  Where to use:  Query expansion.  Semantic similarity.  Semantic relatedness.  Word sense disambiguation.  Complementary tools:  WordNet::Similarity
  • 166. Digital Enterprise Research Institute www.deri.ie Parsers  Stanford Parser: POS-Tagger, Syntactic & Dependency Trees.  Where to use:  Question Analysis.
  • 167. Digital Enterprise Research Institute www.deri.ie Named Entity Recognition/Resolution  Stanford NER.  DBpedia Spotlight  Treo NER.  Where to use:  Question Analysis
  • 168. Digital Enterprise Research Institute www.deri.ie Search Engines  Lucene & Solr: Advanced search engine framework.  Treo -Space: Distributional semantic searchƬ over structured data.  Treo Entity Search: Semantic search engine for retrieving individual entities and vocabulary elements.
  • 169. Digital Enterprise Research Institute www.deri.ie Distributional Semantics  E2SA: High-performance Explicit Semantic Analysis (ESA) framework (based on NoSQL).  Semantic Vectors.  Where to use:  Semantic relatedness & similiarity.  Word Sense Disambiguation.
  • 170. Digital Enterprise Research Institute www.deri.ie Linked Data Extraction  Fred: Represents the extracted data using ontology patterns.  Graphia: Extracts contextualized Linked Data graphs.
  • 171. Digital Enterprise Research Institute www.deri.ie QA over Linked Data: Roadm 171
  • 172. Digital Enterprise Research Institute www.deri.ie Research Topics/Opportunities  #1: Merge Linked Data extraction into QA4LD.
  • 173. Digital Enterprise Research Institute www.deri.ie Motivation
  • 174. Digital Enterprise Research Institute www.deri.ie Motivation
  • 175. Digital Enterprise Research Institute www.deri.ie Motivation
  • 176. Digital Enterprise Research Institute www.deri.ie Extraction Examples
  • 177. Digital Enterprise Research Institute www.deri.ie Extraction Examples
  • 178. Digital Enterprise Research Institute www.deri.ie Extraction Examples
  • 179. Digital Enterprise Research Institute www.deri.ie Extraction Examples
  • 180. Digital Enterprise Research Institute www.deri.ie Extraction Examples
  • 181. Digital Enterprise Research Institute www.deri.ie Research Topics/Opportunities  #1: Merge Linked Data extraction into QA4LD.  #2: Explore complex dialog/context in QA4LD tasks.  #3: Advance the use of distributional semantic models on QA.  #4: Provide the incentives for the advancement of open robust software resources and high quality data. (altimetrics.org).  #5: Multilingual QA.  #6: Creation of multi-sourced test collections.  #7: Integration of reasoning (deductive, inductive, counterfactual, abductive ...) on QA approaches and test collections.  #8: Development of QA approaches/tasks which use both structured and unstructured data.
  • 182. Digital Enterprise Research Institute www.deri.ie From QA to Semantic Applications (Deriving Patterns)
  • 183. Digital Enterprise Research Institute www.deri.ie Semantic Application Patterns  Derived from the experience developing Treo.  Not restricted to QA over Linked Data.  The following list is not intended to be complete.
  • 184. Digital Enterprise Research Institute www.deri.ie Semantic Application Patterns  Pattern #1: Maximize the amount of knowledge in your semantic application.  Meaning interpretation depends on knowledge.  Using LOD: DBpedia, Freebase, YAGO can give you a very comprehensive set of instances and their types.  Wikipedia can provide you a comprehensive distributional semantic model.
  • 185. Digital Enterprise Research Institute www.deri.ie Semantic Application Patterns  Pattern #2: Allow your databases to grow.  Dynamic schema.  Entity-centric data integration.
  • 186. Digital Enterprise Research Institute www.deri.ie Semantic Application Patterns  Pattern #3: Once the database grows in complexity use semantic search instead of structured queries.  Instances can be used as pivot entities to reduce the search space  They are easier to search.  Higher specificity and lower vocabulary variation.
  • 187. Digital Enterprise Research Institute www.deri.ie Semantic Application Patterns  Pattern #4: Use distributional semantics and semantic relatedness for a robust semantic matching.  Distributional semantics allows your application to digest (and make use of) large amounts of unstructured information.  Multilingual solution.  Can be complemented with WordNet.
  • 188. Digital Enterprise Research Institute www.deri.ie Semantic Application Patterns  Pattern #5: POS-Tags, Syntactic Parsing + Rules will go a long way to help your application to interpret natural language queries and sentences.  Use them to explore the regularities in natural language.  Define a scope for natural language processing for your application (restrict by domain, syntactic complexity).  These tools are easy to use and quite robust (*English).
  • 189. Digital Enterprise Research Institute www.deri.ie Semantic Application Patterns  Pattern #6: Provide a user dialog mechanism in the application  Improve the semantic model with user feedback.  Record the user feedback.
  • 190. Digital Enterprise Research Institute www.deri.ie Take-away Message  Big Data/complex dataspaces demand new principled semantic approaches to cope with the scale and heterogeneity of data.  Information systems in the future will depend on semantic technologies. Be the first to develop.  Part of the Semantic Web/AI vision can be addressed today with a multi-disciplinary perspective:  Linked Data, IR and NLP  You can build your own IBM Watson-like application.  Both data and tools are available and ready to use: the main barrier is the mindset.  Huge opportunity for new solutions.
  • 191. Digital Enterprise Research Institute www.deri.ie References [1] Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009) [2] Idehen, Understanding Linked Data via EAV Model (2010). [3] Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? (2007) [4] Chin-Yew Lin, Question Answering. [5] Farah Benamara, Question Answering Systems: State of the Art and Future Directions. [6] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012. [7] Freitas et al, A Distributional Structured Semantic Space for Querying RDF Graph Data., 2012. [8] Freitas et al, A Distributional Approach for Terminological Semantic Search on the Linked Data Web, 2012. [9] Freitas et al, A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia., 2012 [10]Freitas et al., Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach,, 2013.  [11] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012.
  • 192. Digital Enterprise Research Institute www.deri.ie References [12] Cimiano et al., Towards portable natural language interfaces to knowledge bases, 2008. [13] Lopez et al., PowerAqua: shing the semantic web, 2006.fi [14] Damljanovic et al., Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-based Lookup through the User Interaction, 2010 [16] Unger et al. Template-based Question Answering over RDF Data, 2012. [17] Cabrio et al., QAKiS: an Open Domain QA System based on Relational Patterns, 2012. [18] How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users?, 2007. [19] Popescu et al.,Towards a theory of natural language interfaces to databases., 2003   192
  • 193. Digital Enterprise Research Institute www.deri.ie Contact http://andrefreitas.org andre.freitas@deri.org

Editor's Notes

  1. Include user feedbacks
  2. Emphasize entity
  3. Emphasize entity
  4. Emphasize entity
  5. Emphasize entity
  6. Emphasize entity
  7. Emphasize entity
  8. Emphasize entity
  9. Emphasize entity
  10. Emphasize entity
  11. Emphasize entity
  12. Change to horizontal
  13. Change to horizontal
  14. Change to horizontal
  15. Emphasize entity