WISS QA Do it yourself Question answering over Linked Data
1. NLP & Semantic Computing Group
N L P
WISS Challenge
Do-it-yourself Question Answering over Linked Data
Andre Freitas
2. NLP & Semantic Computing Group
Challenge Description
Create a Question Answering (QA) system over
DBpedia (and maybe part of Wikipedia text
data).
Evaluate the QA system using the latest
Question Answering over Linked Data (QALD)
test collection.
4. NLP & Semantic Computing Group
More Complex Queries (Video)
5. NLP & Semantic Computing Group
Why should I participate?
Very intense and solid learning experience.
Will help to consolidate and to make concrete
the concepts you saw at the talks.
If you are starting in the field, will give you the
basic artefacts to experiment with QA.
6. NLP & Semantic Computing Group
Approach
Participants will be split into groups.
Each group will develop a component of the QA
system.
Group shuffling at the end will help everybody
to be aware of different components of the
system.
You can bring your own code. You can suggest
variations over a theme.
This is a hands-on session! Thou shalt code.
7. NLP & Semantic Computing Group
Guidelines
Having a decent QA system by the end on the week is
a very challenging task.
Don’t be afraid to ask and to make mistakes.
Ethical project commitment: if you started then you
should finish.
Do not hesitate to contact me anytime:
andrenfreitas@gmail.com
skype: andre.freitas5
8. NLP & Semantic Computing Group
System Components
Question
Analysis
Query
Generation
Semantic
Matching
QA Pipeline
Web Interface
Answer Ranking
&
Generation
Evaluation
Query
Generation
Entity
Search
QA Pipeline
Web Interface / REST API
Query
Generation
Query
Execution
Graph
Extraction
9. NLP & Semantic Computing Group
Question Analysis
Identifies linguistic regularities in the question
and individuate main question features.
Use of basic NLP tools (e.g. syntactic parsing,
NER …).
Understand what is expressed in a query and
how to harvest this information.
12. NLP & Semantic Computing Group
Question segmentation and candidate type identification.
Who is the daughter of Bill Clinton married to?
(PROPERTY) (INSTANCE) (PROPERTY)
Question Analysis
13. NLP & Semantic Computing Group
Determine answer type.
Who is the daughter of Bill Clinton married to?
(PERSON)
Question Analysis
14. NLP & Semantic Computing Group
Question Analysis
Input: Natural language question.
Output: Parsed question.
Candidate entities and associated types.
Candidate relations between entities.
Lexical answer type.
Candidate database operations.
15. NLP & Semantic Computing Group
Entity Search
Matches query terms to dataset entities.
Index/search temporal performance.
Need to support semantic approximations.
E.g. coping with different lexical expressions,
abstraction levels.
Will use thesauri and distributional semantics
based approaches for semantic matching.
16. NLP & Semantic Computing Group
Entity Search
Query terms:
daughter of Bill Clinton married to
Dataset entities:
child of Bill Clinton spouse of
18. NLP & Semantic Computing Group
Query Generation
Transforms the natural language query into a
query in a logical form.
Involves the interface between natural language
and knowledge representation / logical models.
Relation identification / extraction.
19. NLP & Semantic Computing Group
Query Generation
child of Bill Clinton spouse of
SELECT ?y WHERE
{
:Bill Clinton :child ?x .
?x :spouse ?y .
?y :type :Person .
}
20. NLP & Semantic Computing Group
Query Generation
Input: outputs from the question analysis and
entity search.
Output: Possible SPARQL queries.
21. NLP & Semantic Computing Group
Query Execution
Input: Possible SPARQL queries.
Output: Result sets.
22. NLP & Semantic Computing Group
Answer Ranking & Generation
Ranking models and heuristic models for
classifying the answers in relation to a question.
Transform results in triple format to a natural
language form.
23. NLP & Semantic Computing Group
Answer Ranking & Generation
Chelsea Clinton’s spouse is Marc Mezvinsky
24. NLP & Semantic Computing Group
Answer Ranking & Generation
Input: SPARQL result sets, lexical answer type.
Output: Ranked answers in a natural language
format.
25. NLP & Semantic Computing Group
Graph Extraction
Extract entities and relations from Wikipedia
text.
Preserving contextual information.
Persist them as RDF graphs.
Focus on fact extraction.
26. NLP & Semantic Computing Group
On July 31, 2010, Chelsea Clinton
married to investment banker Marc
Mezvinsky in Rhinebeck, New York.
Graph Extraction
Chelsea Clinton Marc Mezvinsky
married to
time place
Investment Banker
31.07.2010 Rhinebeck, New York
type
27. NLP & Semantic Computing Group
QA Pipeline & UI
Integration of the QA components.
Development of the Web interface for the QA
system.
Exploration of simple user feedback
mechanisms (e.g. entity disambiguation).
28. NLP & Semantic Computing Group
Evaluation
Automatic evaluation for the QA system using
the Question Answering over Linked Data Test
Collection (QALD-4).
29. NLP & Semantic Computing Group
System Components: Groups
Question
Analysis
Query
Generation
Semantic
Matching
QA Pipeline
Web Interface
Answer Ranking
&
Generation
Evaluation
Query
Generation
Entity
Search
QA Pipeline
Web Interface / REST API
Query
Generation
Query
Execution
Graph
Extraction
33. NLP & Semantic Computing Group
Question Analysis: First task
Using rules and regular expressions over POS
Tags.
Detect the lexical answer type of the
example questions.
Segment the question into a set of candidate
terms.
Use Stanford CoreNLP or NLTK.
34. NLP & Semantic Computing Group
Entity Search: First task
Index the DBpedia graph using Lucene.
35. NLP & Semantic Computing Group
Query Generation: First task
Based on entity candidates and Stanford
dependencies or C-structures.
Build a triple-like representation of the query.
36. NLP & Semantic Computing Group
Query Execution & Answer
Generation: First task
Build an interface for the public DBpedia
SPARQL Endpoint.
Build a simple answer verbalizer from the
SPARQL result set to a more natural language
format.
37. NLP & Semantic Computing Group
Graph Extraction: First task
Using OpenIE, extract relations from the
Wikipedia articles Barack Obama, Paris,
Jupiter.
38. NLP & Semantic Computing Group
Evaluation: First task
Using the latest QALD version, build a tool to
calculate precision, recall and f1-measure for
the example queries.
39. NLP & Semantic Computing Group
QA Pipeline & UI: First task
Build the initial pipeline and the stubs for the
components of the QA system.