WISS QA Do it yourself Question answering over Linked Data

NLP & Semantic Computing Group
N L P
WISS Challenge
Do-it-yourself Question Answering over Linked Data
Andre Freitas

Challenge Description
 Create a Question Answering (QA) system over
DBpedia (and maybe part of Wikipedia text
data).
 Evaluate the QA system using the latest
Question Answering over Linked Data (QALD)
test collection.

Simple Queries (Video)

More Complex Queries (Video)

Why should I participate?
 Very intense and solid learning experience.
 Will help to consolidate and to make concrete
the concepts you saw at the talks.
 If you are starting in the field, will give you the
basic artefacts to experiment with QA.

Approach
 Participants will be split into groups.
 Each group will develop a component of the QA
system.
 Group shuffling at the end will help everybody
to be aware of different components of the
system.
 You can bring your own code. You can suggest
variations over a theme.
 This is a hands-on session! Thou shalt code.

Guidelines
 Having a decent QA system by the end on the week is
a very challenging task.
 Don’t be afraid to ask and to make mistakes.
 Ethical project commitment: if you started then you
should finish.
 Do not hesitate to contact me anytime:
 andrenfreitas@gmail.com
 skype: andre.freitas5

System Components
Question
Analysis
Query
Generation
Semantic
Matching
QA Pipeline
Web Interface
Answer Ranking
&
Generation
Evaluation
Query
Generation
Entity
Search
QA Pipeline
Web Interface / REST API
Query
Generation
Query
Execution
Graph
Extraction

Question Analysis
 Identifies linguistic regularities in the question
and individuate main question features.
 Use of basic NLP tools (e.g. syntactic parsing,
NER …).
 Understand what is expressed in a query and
how to harvest this information.

Question Analysis
POS Tagging
 - Who/WP
 - is/VBZ
 - the/DT
 - daughter/NN
 - of/IN
 - Bill/NNP
 - Clinton/NNP
 - married/VBN
 - to/TO
 - ?/.

Dependency parsing
 - dep(married-8, Who-1)
 - auxpass(married-8, is-2)
 - det(daughter-4, the-3)
 - nsubjpass(married-8, daughter-4)
 - prep(daughter-4, of-5)
 - nn(Clinton-7, Bill-6)
 - pobj(of-5, Clinton-7)
 - root(ROOT-0, married-8)
 - xcomp(married-8, to-9)
Question Analysis

Question segmentation and candidate type identification.
Who is the daughter of Bill Clinton married to?
(PROPERTY) (INSTANCE) (PROPERTY)
Question Analysis

Determine answer type.
Who is the daughter of Bill Clinton married to?
(PERSON)
Question Analysis

Question Analysis
 Input: Natural language question.
 Output: Parsed question.
 Candidate entities and associated types.
 Candidate relations between entities.
 Lexical answer type.
 Candidate database operations.

Entity Search
 Matches query terms to dataset entities.
 Index/search temporal performance.
 Need to support semantic approximations.
 E.g. coping with different lexical expressions,
abstraction levels.
 Will use thesauri and distributional semantics
based approaches for semantic matching.

Entity Search
 Query terms:
daughter of Bill Clinton married to
 Dataset entities:
child of Bill Clinton spouse of

Entity Search
 Input: query terms.
 Output: corresponding database entities.

Query Generation
 Transforms the natural language query into a
query in a logical form.
 Involves the interface between natural language
and knowledge representation / logical models.
 Relation identification / extraction.

Query Generation
child of Bill Clinton spouse of
SELECT ?y WHERE
{
:Bill Clinton :child ?x .
?x :spouse ?y .
?y :type :Person .
}

Query Generation
 Input: outputs from the question analysis and
entity search.
 Output: Possible SPARQL queries.

Query Execution
 Input: Possible SPARQL queries.
 Output: Result sets.

Answer Ranking & Generation
 Ranking models and heuristic models for
classifying the answers in relation to a question.
 Transform results in triple format to a natural
language form.

Chelsea Clinton’s spouse is Marc Mezvinsky

 Input: SPARQL result sets, lexical answer type.
 Output: Ranked answers in a natural language
format.

Graph Extraction
 Extract entities and relations from Wikipedia
text.
 Preserving contextual information.
 Persist them as RDF graphs.
 Focus on fact extraction.

On July 31, 2010, Chelsea Clinton
married to investment banker Marc
Mezvinsky in Rhinebeck, New York.
Graph Extraction
Chelsea Clinton Marc Mezvinsky
married to
time place
Investment Banker
31.07.2010 Rhinebeck, New York
type

QA Pipeline & UI
 Integration of the QA components.
 Development of the Web interface for the QA
system.
 Exploration of simple user feedback
mechanisms (e.g. entity disambiguation).

Evaluation
 Automatic evaluation for the QA system using
the Question Answering over Linked Data Test
Collection (QALD-4).

System Components: Groups
Question
Analysis
Query
Generation
Semantic
Matching
QA Pipeline
Web Interface
Answer Ranking
&
Generation
Evaluation
Query
Generation
Entity
Search
QA Pipeline
Web Interface / REST API
Query
Generation
Query
Execution
Graph
Extraction

Coding Proficiency
 Entity Search (1)
 UI & QA Pipeline (2)
 Question Analysis (3)
 Graph Extraction (4)
 Query Execution / Answer Ranking &
Generation (5)
 Query Generation (6)
 Evaluation (7)

Focused Practical Session
 NLP Tools (Syntactic Parsing, SRL, NER,
Relation Extraction).
 Semantic Matching (WordNet, Distributional
Models).
 Semantic Web / Linked Data (Entity Linking.
SPARQL).
 Other?

Question Analysis: First task
 Using rules and regular expressions over POS
Tags.
 Detect the lexical answer type of the
example questions.
 Segment the question into a set of candidate
terms.
 Use Stanford CoreNLP or NLTK.

Entity Search: First task
 Index the DBpedia graph using Lucene.

Query Generation: First task
 Based on entity candidates and Stanford
dependencies or C-structures.
 Build a triple-like representation of the query.

Query Execution & Answer
Generation: First task
 Build an interface for the public DBpedia
SPARQL Endpoint.
 Build a simple answer verbalizer from the
SPARQL result set to a more natural language
format.

Graph Extraction: First task
 Using OpenIE, extract relations from the
Wikipedia articles Barack Obama, Paris,
Jupiter.

Evaluation: First task
 Using the latest QALD version, build a tool to
calculate precision, recall and f1-measure for
the example queries.

QA Pipeline & UI: First task
 Build the initial pipeline and the stubs for the
components of the QA system.

WISS QA Do it yourself Question answering over Linked Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to WISS QA Do it yourself Question answering over Linked Data

Similar to WISS QA Do it yourself Question answering over Linked Data (20)

More from Andre Freitas

More from Andre Freitas (16)

Recently uploaded

Recently uploaded (20)

WISS QA Do it yourself Question answering over Linked Data