Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs
1. JIST2014
Tutorial
on
Linked
Data
and
Knowledge
Graphs
-‐
ConstrucAng
and
Understanding
Knowledge
Graphs
Presenter
Jeff
Z.
Pan
(University
of
Aberdeen)
Contributors
Honghan
Wu
(University
of
Aberdeen)
Yuan
Ren
(University
of
Aberdeen)
Panos
Alexopoulos
(iSOCO)
2. Jeff
Z.
Pan
(University
of
Aberdeen)
Agenda
Overview
&
ApplicaAons
1:00pm
–
1:20pm
1:35pm
–
1:45pm
The
Current
Status
of
Linked
Data:
the
Good,
the
Bad
and
the
Ugly
1:20pm
–
1:35pm
Example
Linked
Data
Knowledge
Repositories
PART
I
LINKED
DATA
&
KNOWLEDGE
GRAPHS
1:45pm
–
2:00pm
Research
Challenges
2
3. Jeff
Z.
Pan
(University
of
Aberdeen)
Agenda
ConstrucAng
Knowledge
Graphs
2:00pm
–
3:05pm
3:05pm
–
3:40pm
Understanding
Knowledge
Graphs
2:30pm
–
2:45pm
Coffee
Break
PART
II
METHODS
&
TECHNIQUES
3:40pm
–
3:45pm
Outlook
3
4. Jeff
Z.
Pan
(University
of
Aberdeen)
• Overview
• ApplicaLons
• Linked
Data
Knowledge
Repositories
• Knowledge
Graph
on
Linked
Data
• Research
Challenges
PART
I
LINKED
DATA
&
KNOWLEDGE
GRAPHS
4
5. Jeff
Z.
Pan
(University
of
Aberdeen)
Knowledge
• What
is
knowledge?
• Something
is
known
• Structured
informaLon
• About
certain
aspects
of
the
(real)
world
5
6. Jeff
Z.
Pan
(University
of
Aberdeen)
Semantic Networks
A semantic network is a graph
structure
for
represenLng
knowledge
in
paSerns
of
interconnected
nodes
and
arcs.
• with nodes representing objects,
concepts, or situations, and
• arcs representing relationships
6
7. Jeff
Z.
Pan
(University
of
Aberdeen)
RDF: Standard for Directed Labelled
Graph KBs for the Web
• RDF is
• a modern version of semantic network,
with formal syntax and semantics
• a
standard
model
for
data
interchange
on
the
Web
• RDF statements: Subject-property-value triples
[my-‐chair
colour
tan
.]
[my-‐chair
rdf:type
chair
.]
[chair
rdfs:subClassOf
furniture
.]
7
8. Jeff
Z.
Pan
(University
of
Aberdeen)
Linked
Data
and
Knowledge
Graphs
• Linked
Data
refers
to
(RDF)
data
published
on
the
web
• with
its
meaning
explicitly
defined
with
ontological
(OWL)
vocabulary
• can
be
inter-‐linked
with
external
datasets
• A
knowledge
graph
is
a
set
of
interconnected
typed
enLLes
and
their
aSributes
8
9. Jeff
Z.
Pan
(University
of
Aberdeen)
Knowledge
Graph
(KG)
Services
and
Related
Research
Problems
• KG
construcLon:
how
to
construct
high
quality
knowledge
graphs?
• Knowledge
aquaciLon
• Knowledge
evaluaLon
• KG
understanding:
how
to
make
it
easier
to
access
and
reuse
knowledge?
• for
end
users
• for
data
engineers
• KG
reasoning:
how
to
bridge
the
gap
between
vocabulary
used
in
the
graphs
and
those
used
in
qeuries
• Scalability
• Efficiency
9
10. Jeff
Z.
Pan
(University
of
Aberdeen)
APPLICATIONS
OF
KNOWLEDGE
GRAPHS
Summary of entities, Faceted fact, From best to list, EntityAssociations,
Structured Queries, and QuestionAnswering
10
11. Jeff
Z.
Pan
(University
of
Aberdeen)
ENTITY
UNDERSTANDING:
THINGS,
NOT
STRINGS
11
12. Jeff
Z.
Pan
(University
of
Aberdeen)
What
is
it?
(EnAty
Understanding)
12
13. Jeff
Z.
Pan
(University
of
Aberdeen)
FACETED
FACT:
GETTING
THE
VALUE
OF
SOME
ATTRIBUTE
13
14. Jeff
Z.
Pan
(University
of
Aberdeen)
What
is
the
Ame
there?
(Faceted
Fact)
14
15. Jeff
Z.
Pan
(University
of
Aberdeen)
FROM
BEST
TO
LIST:
NOT
ONLY
THE
BEST
15
16. Jeff
Z.
Pan
(University
of
Aberdeen)
Give
a
List
instead
of
Best
16
17. Jeff
Z.
Pan
(University
of
Aberdeen)
ENTITY
ASSOCIATION:
SHOW
THE
CONNECTIONS
17
18. Jeff
Z.
Pan
(University
of
Aberdeen)
How
are
they
connected?
(EnAty
AssociaAon)
Gong Cheng,Yanan Zhang, andYuzhong Qu. Explass: ExploringAssociations
between Entities viaTop-K Ontological Patterns and Facets. In Proc. Of ISWC
2014, pp. 422–437.
http://ws.nju.edu.cn/explass/
18
19. Jeff
Z.
Pan
(University
of
Aberdeen)
STRUCTURED
QUERIES:
EVEN
WHEN
THE
INPUTS
ARE
KEYWORDS
19
20. Jeff
Z.
Pan
(University
of
Aberdeen)
From
keywords
to
structural
queries
Wang, Haofen, Kang Zhang, Qiaoling Liu,ThanhTran, andYongYu.
Q2semantic:A lightweight keyword interface to semantic search. In Proc. Of
ESWC 2008, pp 584-598.
“Capin SVG”
find specifications about“SVG”whose author’s name is“Capin”
20
21. Jeff
Z.
Pan
(University
of
Aberdeen)
QUESTION
ANSWERING:
COMPUTE
ANSWERS
WITH
THE
KG
21
22. Jeff
Z.
Pan
(University
of
Aberdeen)
QuesAon
Answering
Christina Unger, Lorenz Bühmann, Jens Lehmann,Axel-Cyrille Ngonga
Ngomo, Daniel Gerber, and Philipp Cimiano. "Template-based question
answering over RDF data." In Proceedings of the 21st international conference
onWorldWideWeb, pp. 639-648.ACM, 2012.
“films starring Brad Pitt”
22
23. Jeff
Z.
Pan
(University
of
Aberdeen)
SAMPLE
LINKED
DATA
KNOWLEDGE
REPOSITORIES
DBpedia,WikiData, GoodRelation
23
24. Jeff
Z.
Pan
(University
of
Aberdeen)
DBpedia
• A
crowd-‐sourced
community
effort
to
extract
structured
informaLon
from
Wikipedia
• allows
to
ask
structured
queries
against
Wikipedia
• and
to
link
the
different
data
sets
on
the
Web
to
Wikipedia
data.
24
25. Jeff
Z.
Pan
(University
of
Aberdeen)
DBpedia
–
the
content
Entities and their attributes from
Wikipedia
infobox templates, categorisation
information, images, geo-
coordinates, etc
Classification Schemas
• Wikipedia Categories are represented using the SKOS
vocabulary and DCMI terms.
• YAGO Classification is derived from the Wikipedia category
system using Word Net.
• Word Net Synset Links were generated by manually relating
Wikipedia infobox templates and Word Net synsets
DBpedia 2014 release consists of 3 billion
RDF triples
25
26. Jeff
Z.
Pan
(University
of
Aberdeen)
DBpedia
–
services
http://dbpedia.org/sparql
Query Builders (e.g. Leipzig query builder at
http://querybuilder.dbpedia.org)
Public Faceted Web
Service Interface
Dump Downloads
• DBpedia dumps in 125 languages at
DBpedia download server.
• DBpedia Ontology
26
27. Jeff
Z.
Pan
(University
of
Aberdeen)
DBpedia
–
use
cases
Nucleus for the Web of Data
Revolutionise Access to Wikipedia information
“Give me all cities in New Jersey with more than 10,000 inhabitants”
27
28. Jeff
Z.
Pan
(University
of
Aberdeen)
WikiData
• A
collaboraAvely
edited
knowledge
base
operated
by
the
Wikimedia
FoundaLon.
• Can
be
read
and
edited
by
both
humans
and
machines.
• Acts
as
central
storage
for
the
structured
data
of
its
Wikimedia
sister
projects
including
Wikipedia,
Wikivoyage,
Wikisource,
and
others
28
29. Jeff
Z.
Pan
(University
of
Aberdeen)
WikiData
–
the
content
Wikidata is a document-
oriented, focused around
topics.
• Information is added to
items by creating
statements (key-value
pairs)
29
30. Jeff
Z.
Pan
(University
of
Aberdeen)
WikiData
-‐
to
Linked
Data
Web
(1)
Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch,
Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC
2014, pp. 50-65.
Exporting Statements as Triples
• Faithful representations: with additional quantifiers and references
• Simplified representations: without additional quantifiers and references
30
31. Jeff
Z.
Pan
(University
of
Aberdeen)
WikiData
-‐
to
Linked
Data
Web
(2)
Fredo Erxleben, Michael G¨unther, Markus Kr¨otzsch,
Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC
2014, pp. 50-65.
Extracting Schema Information from Wikidata
• instance of (P31) → rdf:type and subclass of (P279) → rdfs:subClassOf
• constraints for the use of properties → OWL Axioms
31
32. Jeff
Z.
Pan
(University
of
Aberdeen)
WikiData
–
use
case
&
data
access
Use Cases
• Information about the sources helps support the
notion of verifiability
• Collecting structured data: allow easy reuse of that
data
• Support for Wikimedia projects: reducing the
workload in Wikipedia and increasing its quality
• Support well beyond that. Everyone can use
Wikidata
Accessing the data
• Mediawiki Lua Scribunto interface
• Wikibase/API
• RDF Dumphttp://tools.wmflabs.org/wikidata-exports/rdf/exports/20141013/
32
33. Jeff
Z.
Pan
(University
of
Aberdeen)
GoodRelaAons
GoodRelations is a lightweight ontology for annotating offerings
and other aspects of e-commerce on the Web.
[Slide
credit:
MarLn
Hepp]
33
34. Jeff
Z.
Pan
(University
of
Aberdeen)
GoodRelaAons
–
use
cases
[Slide
credit:
MarLn
Hepp]
34
35. Jeff
Z.
Pan
(University
of
Aberdeen)
GoodRelaAons
–
use
cases(2)
35
Google, Bing, Yahoo, and Yandex will improve the
rendering of your page directly in the search results
Rich Snippets:Search engines use your markup to augment the preview of your site
Targeted Searching:profile and preferences of the person behind the query
36. Jeff
Z.
Pan
(University
of
Aberdeen)
GoodRelaAons
–
who
are
using
36
Search Engines
and 10,000+
small and large
shops
Publishers
Software
OpenLink (Virtuoso)
37. Jeff
Z.
Pan
(University
of
Aberdeen)
CURRENT
STATUS
OF
ONLINE
LINKED
DATA
The good, the bad and the ugly
37
38. Jeff
Z.
Pan
(University
of
Aberdeen)
The
Good
Ontology
Mapping
Data
linkage
RDF
/
OWL
Querying
and
reasoning
techniques
-‐
Flexible
schema
sebng
-‐
schemaless
-‐>
simple
schema
-‐>
rich
schema
-‐
Universal
Unique
ID
for
data
enLLes:
URI
-‐
Shared
vocabularies
-‐
Schema
mapping
-‐
Instance
mapping
-‐
SPARQL
entailment
regimes
-‐
DisLrbuted
SPARQL
endpoints
38
Flexible
linked
data
eco-‐system
39. Jeff
Z.
Pan
(University
of
Aberdeen)
The
Good
• Flexible
linked
data
eco-‐system
• FaciliLes
of
sharing
and
linking
knowledge
in
open
environment
• Knowledge
representaLon:
various
levels
of
expressive
power
• Services,
tools,
and
approaches
for
knowledge
generaLon,
understanding,
and
consuming
• Interlinked
knowledge
repositories
across
various
domains
39
40. Jeff
Z.
Pan
(University
of
Aberdeen)
The
Bad
• Knowledge
Quality
(errors,
provenance,
quanLfier,
freshness…)
• Data
protecLon
(license,
access
control)
• Data
business
model
40
41. Jeff
Z.
Pan
(University
of
Aberdeen)
The
Ugly
• Excel
in
knowledge
representaLon
• But,
a
large
amount
of
datasets
missing
schema
informaLon
• RDF
is
triple
based
model
• But,
it
is
hard
and
Lme-‐consuming
(even
for
SW
geeks)
to
understand
a
RDF
knowledge
repository
41
42. Jeff
Z.
Pan
(University
of
Aberdeen)
RESEARCH
CHALLENGES
42
43. Jeff
Z.
Pan
(University
of
Aberdeen)
Research
Challenges
• KG Construction
• Ontology / Schema Construction
• Data Lifting
• Quality Evaluation
• Understanding KG
• User Understanding
• Data Understanding
• Dynamic Knowledge in KG
• Stream Data / Prediction
• Belief Revision
• Intelligent Services for KG
• Ontology Reasoning (see my tutorial at ISWC2014)
• Problem Solving / Workflow
43
44. Jeff
Z.
Pan
(University
of
Aberdeen)
• Incompleteness of data: is the constructed schema
generic enough to accommodate new data?
• Inconsistency of data: what if data conflicts with each
other?
e.g. Birthdate of people: some people may not have
birthdate asserted in the dataset, should the schema
specify that each people has a birthdate? Some people
may have different birthdates asserted in different
datasets, should the schema specify that birthdate is
unique?
Challenges
in
AutomaAc
ConstrucAon
44
45. Jeff
Z.
Pan
(University
of
Aberdeen)
• Expertise of ontology engineers: do the engineers
have sufficient understanding and experience of
ontology technologies (RDF(S), OWL, SPARQL,
RIF, etc…)
• Workload of ontology engineers: how much time
does it take to manually construct a large
ontology? E.g. SNOMED CT has about 400,000
concepts
• Collaboration: when multiple ontology engineers
work together, how to make sure they have
consistent understanding of the ontology?
Challenges
in
Mannual
ConstrucAon
45
46. Jeff
Z.
Pan
(University
of
Aberdeen)
• Requirement and evaluation: how to specify the
requirement of ontology construction and test if the
requirements have been fulfilled?
• Expressiveness v.s. Efficiency: which knowledge
representation should we use? Is it sufficient to
describe the domain? Is there efficient reasoning
and query answering mechanism and system
available?
• Ontology reuse: do we have to construct everything
from scratch? Is there ontology available covering
partially the domain?
Challenges
for
both
AutomaAc/Mannual
ConstrucAons
46
47. Jeff
Z.
Pan
(University
of
Aberdeen)
Key challenges:
• Entity identification: certain entities can be
hard to identify, e.g. movie titles
• AVP (attribute-value pair) identification: an
entity, attribute and its value may scattered
across the text or dataset, making it hard to
establish the relation
Challenge
in
Data
Liding
Data Lifting enrichs unstructured data with structural
annotations, therefore extract the entities and their
relations, properties for knowledge graph
47
48. Jeff
Z.
Pan
(University
of
Aberdeen)
Challenge
in
EnAty
IdenAficaAon
• There different ways to identify entities: e.g. “The
President of the U.S.” and “Barak Obama”
• The same name can be referring to different
entities
• People may use acronym or abbreviation for
entities: e.g. “K-Drive” is the acronym for
“Knowledge-driven Data Exploitation” project
instead of the drive labelled K in my computer.
• Natural language text may have typos, values
may use different notations
48
49. Jeff
Z.
Pan
(University
of
Aberdeen)
• Users are unfamiliar with the content of knowledge
graphs:
• What is the vocabulary?
• What is described by the knowledge graph?
• How is the content organised?
• How is it connected to the other datasets I
have?
• Users do not know how to exploit the knowledge
graph:
• Which query can I ask this knowledge graph?
• Which query can be answered with this
knowledge graph?
Challenge
in
Data
Understanding
49
50. Jeff
Z.
Pan
(University
of
Aberdeen)
Challenge
in
Knowledge
Dynamics
• Validity of knowledge: is a piece of information permanent or
temporary?
• Representation: e.g. to represent the temporal dependency of
knowledge, e.g. “George W. Bush was the president of the U.S.
until Barak Obama became the president.”
• Updating of knowledge graph: When and how do we retract a
previously unknown mistake from the knowledge graph? Which
knowledge should become obsolete after the current update?
• Querying: to query w.r.t. the temporal properties of knowledge,
e.g. “Who was the last president of the U.S.?”
• Predicting the dynamics: which change is likely to occur given
the history of the knowledge graph?
50
51. Jeff
Z.
Pan
(University
of
Aberdeen)
Key challenges
• Efficiency of the services: knowledge graphs are
usually accessed by multiple users in real-time.
Efficiency is crucial to the quality of service.
• Scalability of the services: knowledge graphs are
usually of large scale while basic reasoning services,
e.g. transitive closure, can already consume large
amount of time and resources.
Challenge
in
Intelligent
Services
The large amount of information and their inter-connection in
a knowledge graph can be used to provide intelligent
services; e.g. reasoning can be used to discover hidden
relations in a knowledge graph
51
52. Jeff
Z.
Pan
(University
of
Aberdeen)
Agenda
ConstrucAng
Knowledge
Graphs
2:00pm
–
3:05pm
3:05pm
–
3:40pm
Understanding
Knowledge
Graphs
2:30pm
–
2:45pm
Coffee
Break
PART
II
METHODS
&
TECHNIQUES
3:40pm
–
3:45pm
Outlook
52
53. Jeff
Z.
Pan
(University
of
Aberdeen)
• Test Driven Ontology Construction
• Methodology
• A Protégé plug-in
• Handling Entity DisambiguaLon
• Approach
• Some evaluation result
• Briding Requirements and Authoring Tests
• Competency Questions as Informal Requirement
Specification
• Some evaluation results
CONSTRUCTING
KNOWLEDGE
GRAPHS
53
54. Jeff
Z.
Pan
(University
of
Aberdeen)
Uschold
&
King’s
(1995)
Methodology
on
Ontology
ConstrucAon
• Key steps: capturing, coding, integrating and
evaluating/testing
• Ontology evaluation/testing:
• to make a technical judgment of the ontologies
• w.r.t. to a frame of reference
• A frame of reference can be:
• requirement specifications
• competency questions
• or, the real world
54
54
55. Jeff
Z.
Pan
(University
of
Aberdeen)
Ontology and Tests
• Uschold & King’s methodology
• Test ontology after axioms are written
• Test-driven ontology authoring
• Write authoring tests before writing axioms
• Writing authoring tests before axioms does not take any
more efforts than writing them after axioms
• Force authors to think about requirements before
writing axioms
• Writing authoring tests first will help authors to detect
and remove errors sooner
• Understand how good is a(n) existing/reused ontology
55
55
56. Jeff
Z.
Pan
(University
of
Aberdeen)
Gruninger
&
Fox’s
(1995)
Methodology
Key steps:
1. Motivating Scenarios
2. Informal competency questions
3. FOL terminology (classes, properties, objects)
4. Formal competency questions (2 -> 4?)
5. FOL axioms
6. Completeness theorem (defining the conditions
under which the solutions to the questions are
complete)
56
56
57. Jeff
Z.
Pan
(University
of
Aberdeen)
The
METHONTOLOGY
(2003)
Methodology
• Key steps:
1. specification of requirements
2. terminology with tabular and/or graph notations
3. formalisation with logic based ontology language
4. maintenance (including evaluation/testing)
• Ontology evaluation/testing:
• checking consistency, completeness, redundancy
57
57
58. Jeff
Z.
Pan
(University
of
Aberdeen)
The
DKAP
(2007)
Methodology
• Key steps:
1. determine the domain and scope
2. check availability of existing ontologies
3. collect and analyse data for knowledge extraction
4. develop initial ontology
5. refine and validate ontology
• Ontology Validation/testing:
• consistency and accuracy checking
58
58
59. Jeff
Z.
Pan
(University
of
Aberdeen)
LimitaAons
of
ExisAng
Methodologies
• Methodology level:
• Lack of details about the transitions
• from requirement to tests
• from requirements to terminology
• form terminology to axioms
• Tool level:
• lack of tools to guide the above transitions
59
59
60. Jeff
Z.
Pan
(University
of
Aberdeen)
An approach to Test-‐Driven
Ontology
Authoring
(presented
in
an
invited
talk
at
BMIR,
Stanford
University,
June
2013)
• An ontology contains not only OWL files, but
also a test suit
• A test suit contains a set of tests as SPARQL
1.1 queries
• not all requirements can be represented in
SPARQL 1.1 though
• Ontology reuse
• check the associated test suit before ontology
reuse, to better understand the original intention
• Collaborative ontology authoring
• all authors agree upon a common test suit
• each author can have their an extra test suit locally
60
60
61. Jeff
Z.
Pan
(University
of
Aberdeen)
Authoring
Tests
Test
Suite
Test
1
Test
2
…
Query
Expected
results
Ontology
Actual
results
Pass/
fail
reasoner
SPARQL
1.1
61
62. Jeff
Z.
Pan
(University
of
Aberdeen)
A
Protégé
Plug-‐in
for
Authoring
Tests
(based
on
the
TrOWL
reasoner)
62
62
63. Jeff
Z.
Pan
(University
of
Aberdeen)
• Clicking
on
a
test
to
show
the
expected
and
actual
results
Loading
the
Manifest
File
• A
manifest
file
specifies
queries
and
expected
results
• Running
reasoner
to
get
the
results
for
each
test
63
63
64. Jeff
Z.
Pan
(University
of
Aberdeen)
Compute
JusAficaAons
for
Errors
Related
to
Failed
Tests
• with
the
jusLficaLon
plug-‐in
(and
reasoners,
such
as
TrOWL)
64
64
65. Jeff
Z.
Pan
(University
of
Aberdeen)
Modify
the
Ontology
• so
that
CheeseTopping
no
longer
disjoint
with
VegetableTopping
65
65
66. Jeff
Z.
Pan
(University
of
Aberdeen)
Key
Issue
(to
be
revisited
ader
the
EnAty
DisambiguaAon
part)
• Understanding
the
intension
of
ontology
authors
• How
to
generate
authoring
tests?
• How
to
judge
the
quality
of
the
authoring
tests?
66
66
67. Jeff
Z.
Pan
(University
of
Aberdeen)
EnAty
RecogniAon
and
DisambiguaAon
• Challenge
revisit:
• There different ways to identify entities: e.g. “The President of the
U.S.” and “Barak Obama”
• The same name can be referring to different entities
• Contextual
hypothesis
used in many existing
aproaches
• terms
with
similar
meanings
are
oien
used
in
similar
contexts
• The
role
of
these
contexts
is
typically
played
by
already
annotated
documents
(e.g.
wikipedia
arLcles)
which
are
used
to
train
term
classifiers
67
67
68. Jeff
Z.
Pan
(University
of
Aberdeen)
AlternaAve
Context:
Evidence
Model
•
Idea: semantic entities that may serve as disambiguation
evidence for the scenario’s target entities
68
69. Jeff
Z.
Pan
(University
of
Aberdeen)
Evidence
Model
ConstrucAon
(Manual)
• The identification of target concepts whose instances we wish to
disambiguate (e.g. locations)
• The determination related concepts whose instances may serve
as contextual disambiguation evidence.
• For example, in texts that describe historical events, some
concepts whose instances may act as location evidence
are related locations, historical events, and historical
groups and persons.
• The identification, for each pair of evidence and target concept, of
the relation paths that links them.
69
70. Jeff
Z.
Pan
(University
of
Aberdeen)
Evidence-‐Target
Paths
70
71. Jeff
Z.
Pan
(University
of
Aberdeen)
Term
ExtracAon
(AutomaAc)
Extraction is performed with Knowledge Tagger (from
iSOCO) based on GATE.
71
72. Jeff
Z.
Pan
(University
of
Aberdeen)
EvaluaAon
Results:
Football
Match
Scenario
• 50 texts describing football matches.
• E.g. “It's the 70th minute of the game and after a
magnificent pass by Pedro, Messi managed to beat
Claudio Bravo. Barcelona now leads 1-0 against Real."
72
73. Jeff
Z.
Pan
(University
of
Aberdeen)
EvaluaAon
Results:
Military
Conflict
Scenario
• 50 historical texts describing military conflicts.
• E.g. “The Siege of Augusta was a significant battle of the
American Revolution. Fought for control of Fort
Cornwallis, a British fort near Augusta, the battle was a
major victory for the Patriot forces of Lighthorse Harry Lee
and a stunning reverse to the British and Loyalist forces in
the South”.
73
74. Jeff
Z.
Pan
(University
of
Aberdeen)
Future
Work
• Fully automated construction of the disambiguation evidence model.
• Challenge here is how to automatically identify the text’s domain/
topic.
• Combination with statistical methods for cases where available
domain semantic information is incomplete.
• Challenge here is how to select the optimal ratio of ontological
evidence v.s. statistical one.
• Development of tool to enable users to dynamically build such models
out of existing semantic data and use them for disambiguation
purposes
74
75. Jeff
Z.
Pan
(University
of
Aberdeen)
Issues
in
Test-‐Driven
Ontology
Authoring
1. How to generate
tests
2. How to judge the
quality of tests
• why they are relevant
• how to provide the
correct expected
answers
75
75
76. Jeff
Z.
Pan
(University
of
Aberdeen)
Requirement
Driven?
• How
about
starLng
from
requirements
instead
of
tests?
Ontology
Authoring
Requirements
Ontology
Authoring
Tests
Test
Results
76
77. Jeff
Z.
Pan
(University
of
Aberdeen)
Requirement-‐Driven
Ontology
Authoring
[Ren
et.
al,
2014]
• Key questions
• RQ1: what forms of requirements should we consider
• RQ2: how to generate authoring tests from requirements
77
77
78. Jeff
Z.
Pan
(University
of
Aberdeen)
Competency
QuesAon
• QuesLons
that
people
expect
the
constructed
ontologies
to
answer
• Useful
for
novice
users
• in
natural
languages
• about
domain
knowledge
• requires
liSle
understanding
of
ontology
technologies
• A
typical
CQ:
Which
pizza
has
some
cheese
topping?
78
79. Jeff
Z.
Pan
(University
of
Aberdeen)
RQ1: what forms of requirements should we
consider
RQ1’:
How
are
CQs
formulated?
Competency
QuesAons
(CQs)
can
be
regarded
as
a
funcAonal
requirement
of
the
ontology
79
80. Jeff
Z.
Pan
(University
of
Aberdeen)
Key
Idea
1:
IdenAficaAon
of
CQ
Paoerns
• A
typical
CQ:
Which
pizza
has
some
cheese
topping?
• Hypothesis:
CQs
usually
have
clear
syntacLc
paSerns
• Features
and
elements
can
be
extracted
Feature:
Type
of
quesLon
Element:
Class
expression
CE1
Element:
Object
property
expressions
OPE
Feature:
Binary
predicate
Element:
Class
expression
CE2
CE1
OPE
CE2
80
81. Jeff
Z.
Pan
(University
of
Aberdeen)
Result
1:
A
Feature-‐based
Framework
for
CQ
FormulaAon
• Based
on
CQs
collected
from
the
Soiware
Ontology
Project
(75
CQs)
and
Manchester
OWL
Workshops
(70
CQs)
• Primary
features
-‐>
CQ
Archetypes
• Secondary
features
-‐>
CQ
Subtypes
Feature
Primary
Feature
Secondary
Feature
QuesLon
Type
Element
Visibility
SelecLon
Boolean
CounLng
Explicit
Implicit
Predicate
Arity
Unary
Binary
N-‐ary
RelaLon
Type
Object
Datatype
Modifier
QuanLty
Numeric
Domain
Independent
Element
SpaLal
Temporal
QuesLon
Polarity
PosiLve
NegaLve
81
82. Jeff
Z.
Pan
(University
of
Aberdeen)
Result
2:
Archetypes
of
CQ
Paoerns
82
83. Jeff
Z.
Pan
(University
of
Aberdeen)
Answerability
of
CQs
• ExisLng
work
focused
on
answering
CQs
directly
• But
is
the
answer
meaningful?
• The
ability
to
answer
CQs
meaningfully
can
be
regarded
as
a
funcLonal
requirement
of
the
ontology
• What
if
the
answer
is
an
empty
set
• Possible
scenarios
• Pizza
does
not
exist
• Cheese
topping
does
not
exist
• Pizzas
are
not
allowed
to
have
cheese
topping
• The
ontology
has
not
been
populated
with
any
cheesy
pizza
yet
• …
• A
typical
CQ:
Which
pizza
has
some
cheese
topping?
83
84. Jeff
Z.
Pan
(University
of
Aberdeen)
RQ2: how to generate authoring tests from
requirements
RQ2’:
How
can
we
automaLcally
test
whether
a
CQ
can
be
meaningfully
answered?
84
85. Jeff
Z.
Pan
(University
of
Aberdeen)
Key
Idea
2:
PresupposiAons
of
CQ
• A
CQ
comes
with
certain
presupposi(ons
• Some
condi(ons
the
speakers
assume
to
be
met
• A
CQ
can
be
meaningfully
answered
only
when
its
presupposiLons
are
saLsfied
• Classes
Pizza,
CheeseTopping
should
occur
in
the
ontology
• Property
has(Topping)
should
occur
in
the
ontology
• The
ontology
should
allow
Pizza
to
have
CheeseTopping
• The
ontology
should
also
allow
Pizza
to
not
have
CheeseTopping
• A
typical
CQ:
Which
pizza
has
some
cheese
topping?
85
86. Jeff
Z.
Pan
(University
of
Aberdeen)
CQs
and
Authoring
Tests
• A
typical
CQ:
Which
pizza
has
some
cheese
topping?
• SaLsfiability
of
CQ
presupposiLons
can
be
verified
by
authoring
tests
generated
based
on
its
features
and
elements
• Classes
Pizza,
CheeseTopping
should
occur
in
the
ontology
• [CE1],
[CE2]
should
both
occur
in
the
class
vocabulary
• Property
has(Topping)
should
occur
in
the
ontology
•
[OPE]
should
occur
in
the
property
vocabulry
• The
ontology
should
allow
Pizza
to
have
CheeseTopping
•
should
be
sa6sfiable
• The
ontology
should
also
allow
Pizza
to
not
have
CheeseTopping
•
should
be
sa6sfiable
CE1
OPE
CE2
86
87. Jeff
Z.
Pan
(University
of
Aberdeen)
Result
3:
Associate
PresupposiAons
with
Features
• Features
in
a
CQ
are
associated
with
the
presupposiLons
of
the
CQ.
• An
example
on
the
ques6on
type
feature:
QuesLon
Type
SelecLon
Boolean
CounLng
Occurrence
of
“Pizza”,
“Pork”,
“contains”
Which
pizza
contains
pork?
Can
pizza
contain
pork?
How
many
pizza
contains
pork?
Some
pizza
can
contain
pork
Some
pizza
can
contain
no
pork
87
88. Jeff
Z.
Pan
(University
of
Aberdeen)
Result
4:
Formal
Authoring
Tests
• All
tesLngs
can
be
automated
88
89. Jeff
Z.
Pan
(University
of
Aberdeen)
Class
Hierarchy
Verbalise
r
Competency
QuesLons
User/System
Dialogue
History
User
Input
WhatIf
Gadget
89
90. Jeff
Z.
Pan
(University
of
Aberdeen)
Input
(Manchester
Syntax)
1. User
selects
a
speech
act
by
clicking
or
selecLng
a
shortcut.
2. We
need
to
evaluate
their
usefulness.
3. Examples:
● Class:
Pizza
SubClassOf:
Food
● Class:
Fruit
DisjointWith:
Pizza
90
91. Jeff
Z.
Pan
(University
of
Aberdeen)
Input
(OWL
Simplified
English)
1. A
set
of
restricted
natural
language
paSerns.
2. System
recognises
the
speech
act.
3. Capable
of
accepLng
Competency
QuesLons.
4. Examples:
● Which
pizza
has
topping
a
tomato
topping?
● An
apple
is
a
fruit.
91
92. Jeff
Z.
Pan
(University
of
Aberdeen)
Modelling
User
Goals
(1)
1. Users
can
import
or
write
their
own
CQs
in
OWL
Simplified
English
2. Based
on
the
inserted
CQ,
a
list
of
Authoring
Tests
(ATs)
will
be
generated.
3. A
tree
structure
displays
these
CQs
and
ATs.
4. The
system
is
constantly
monitoring
these
CQs
and
ATs.
Any
change
in
saLsfiability
of
ATs:
a. Will
be
reported
by
changing
the
icon
of
ATs
in
the
tree.
Red/Green
respecLvely
represent
fail/
pass
of
each
AT.
b. Will
be
reported
in
the
“history”
pane.
92
93. Jeff
Z.
Pan
(University
of
Aberdeen)
Modelling
User
Goals
(2)
CQ
+
AT
hierarchical
representaLon.
Icons
represent
the
saLsfiability
state
WriSen
feedback
presented
to
the
user
in
the
“history”
pane.
93
94. Jeff
Z.
Pan
(University
of
Aberdeen)
Further
Challenges
● Maintaining a continuous and meaningful
interaction with the user
● Generating a coherent and comprehensive
set of entailments in response to What-If
questions
❖ Content selection
❖ Grouping and aggregation
❖ Ordering
94
95. Jeff
Z.
Pan
(University
of
Aberdeen)
• Data
understanding
• Data summarisation
• Query generation
UNDERSTANDING
KNOWLEDGE
GRAPHS
95
96. Jeff
Z.
Pan
(University
of
Aberdeen)
Data
Understanding:
A
Core
AcAvity
in
Data
ExploitaAon
• TradiLonal
focus
in
semanLc
web
research:
data
understanding
for
machines
and
programs.
• More
importantly:
Data
understanding
for
human
• humans
are
the
ulLmate
owners
and
consumers
of
data
• systems
such
as
knowledge
graphs,
Watson,
Siri,
etc.
• to
help
human
users
to
understand
the
contents,
implicaLons
and
applicaLons
of
data
• More
than
HCI,
we
want
interesLng
and
insighqul
data!
9696
97. Jeff
Z.
Pan
(University
of
Aberdeen)
SemanAc
Datasets
Are
HARD
to
Understand
• Non-expert users might not be familiar with
RDF, OWL and SPARQL
• RDF(s) has 6 core documents
• OWL 2 has 6 core documents
• SPARQL 1.1 has 11 core documents
• Users are unfamiliar with datasets
• That are too large to explore
• That are external to their organisation
• …
• It is HARD for novice users to construct
queries
9797
98. Jeff
Z.
Pan
(University
of
Aberdeen)
Challenges
of
Data
Understanding
• Challenges
• Expressing
needs
(keywords/SPARQL)
• Describing
datasets
• Only
retrieve
the
relevant
parts
• 9.96%
SPARQL
/
8.19%
DUMP
99. Jeff
Z.
Pan
(University
of
Aberdeen)
SoluAon
–
Summary
based
profiling
for
LD
• Key
idea:
building
block
based
informaLon
space
modelling
• Decomposing
&
ConstrucLng
100. Jeff
Z.
Pan
(University
of
Aberdeen)
The
philosophy
of
interpreAng
informaAon
• Task:
explain
the
data
to
human
users
Entity Centric
101. Jeff
Z.
Pan
(University
of
Aberdeen)
EnAty-‐centric
View
of
RDF
Data
En6ty
Descrip6on
Block
102. Jeff
Z.
Pan
(University
of
Aberdeen)
Concrete
to
abstract
En6ty
Descrip6on
Pa?ern
103. Jeff
Z.
Pan
(University
of
Aberdeen)
Data
SummarisaAon
–
EDP
Graph
• Reveal
the
schema
level
informaLon
• What
concepts
are
there
(nodes)and
how
they
are
related
to
each
other(edges)?
• Disclose
individual
level
distribuLon
• StaAsAcs
aSached
to
nodes
and
edges
Jamendo
dataset
104. Jeff
Z.
Pan
(University
of
Aberdeen)
Understanding
Data
Redundancy
[Wu
et.
al,
2014]
104
105. Jeff
Z.
Pan
(University
of
Aberdeen)
Related
Paper
at
JIST2014
• Graph
PaSern
based
RDF
Data
Compression
Jeff
Z.
Pan,
Jose
Manuel
Gomez-‐Perez,
Yuan
Ren,
Honghan
Wu,
Haofen
Wang
and
Man
Zhu
• (Monday
aiernoon)
105
106. Jeff
Z.
Pan
(University
of
Aberdeen)
Understanding
How
Data
Can
be
Used
• Given
a
knowledge
graph,
generate
candidate
insighqul
queries
• Manual
generaLon/automaLc
generaLon
• GeneraLon
based
on
schema/actual
data
• With/without
user
interference
• Our
aim:
automaLc
generaLon
based
on
data
without
user
interference
• Most
friendly
to
new,
novice
users
• Complementary
to
inference
(heavily
based
on
schema)
106106
107. Jeff
Z.
Pan
(University
of
Aberdeen)
Candidate
Insighpul
Queries
[Pan,
et
al,
2013]
• Graph
paSerns
are
summarisaLons
that
represent
many
subsets
of
the
RDF
graph
• PaSern
structure
• Structured
knowledge,
which
is
difficult
to
express
with
schema
• Such
as
star,
chain,
tree,
loop
• Correspondences
between
mulLple
graph
paSerns
• Strongly
corresponding
paSerns
(large
overlapping)
• Weakly
corresponding
paSerns
(liSle
overlapping)
• ExcepLons
107
108. Jeff
Z.
Pan
(University
of
Aberdeen)
Query
GeneraAon
Framework
• 1.
data
summarisaLon
• Significantly
decrease
the
search
space
in
rule
mining
• 2.
data
analyLcs
• First
order
inducLve
learning
• AssociaLon
rule
mining
• 3.
query
generaLon
• ExploiLng
the
relaLons
between
queries
and
rules
109. Jeff
Z.
Pan
(University
of
Aberdeen)
EvaluaAon
109
110. Jeff
Z.
Pan
(University
of
Aberdeen)
Another
Example
• Given
university
data
set
in
LUBM,
the
following
two
queries
have
the
same
results
(when
no
reasoning
is
applied)
111. Jeff
Z.
Pan
(University
of
Aberdeen)
Summary
and
Future
Work
• Take
home
message
• Data
summarisaLon
and
data
analyLcs
technologies
not
only
help
people
to
find
answers,
but
also
help
people
asking
quesLons!
• Future
work
• Integrate
with
applicaLon
scenario
background
knowledge
• Integrate
with
reasoning
• Integrate
with
user
preferences
112. Jeff
Z.
Pan
(University
of
Aberdeen)
OUTLOOK
Outlook of Knowledge Graph: from application’s point of view
112
113. Jeff
Z.
Pan
(University
of
Aberdeen)
What knowledge graph still needs:
• “How to…” knowledge in addition to “What is
…” knowledge
• Operations associated to the entities
Outlook
What knowledge graph is good at:
Maintaining factual knowledge in a structural
manner and answer queries about them
113
114. JIST2014
Tutorial
on
ConstrucAng
and
Understanding
Knowledge
Graphs
Thanks
you!
QuesAons?