Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

JIST2014
Tutorial
on

Linked
Data
and
Knowledge
Graphs

-‐
ConstrucAng
and
Understanding
Knowledge
Graphs

Presenter

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Contributors

Honghan
Wu
(University
of

Aberdeen)

Yuan
Ren
(University
of

Aberdeen)

Panos
Alexopoulos
(iSOCO)

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Agenda

Overview
&
ApplicaAons

1:00pm
–

1:20pm

1:35pm
–

1:45pm

The
Current
Status
of
Linked
Data:
the
Good,
the
Bad
and

the
Ugly

1:20pm
–

1:35pm

Example
Linked
Data
Knowledge
Repositories

PART
I
LINKED
DATA
&
KNOWLEDGE
GRAPHS

1:45pm
–

2:00pm

Research
Challenges

2

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Agenda

ConstrucAng
Knowledge
Graphs

2:00pm
–

3:05pm

3:05pm
–

3:40pm

Understanding
Knowledge
Graphs

2:30pm
–

2:45pm

Coﬀee
Break

PART
II
METHODS
&
TECHNIQUES

3:40pm
–

3:45pm

Outlook

3

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

•  Overview

•  ApplicaLons

•  Linked
Data
Knowledge
Repositories

•  Knowledge
Graph
on
Linked
Data

•  Research
Challenges

PART
I

LINKED
DATA
&
KNOWLEDGE
GRAPHS

4

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Knowledge

•  What
is
knowledge?

•  Something
is
known

•  Structured
informaLon

•  About
certain
aspects
of

the
(real)
world

5

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Semantic Networks
A semantic network is a graph

structure
for
represenLng

knowledge
in
paSerns
of

interconnected
nodes
and
arcs.
•  with nodes representing objects,
concepts, or situations, and
•  arcs representing relationships
6

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

RDF: Standard for Directed Labelled
Graph KBs for the Web
•  RDF is
•  a modern version of semantic network,
with formal syntax and semantics
•  a
standard
model
for
data
interchange
on
the

Web
•  RDF statements: Subject-property-value triples
[my-‐chair
colour
tan
.]

[my-‐chair
rdf:type
chair
.]

[chair
rdfs:subClassOf
furniture
.]

7

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Linked
Data
and
Knowledge
Graphs

• Linked
Data
refers
to
(RDF)
data
published
on

the
web

•  with
its
meaning
explicitly
deﬁned
with
ontological

(OWL)
vocabulary

•  can
be
inter-‐linked
with
external
datasets

• A
knowledge
graph
is
a
set
of
interconnected

typed
enLLes
and
their
aSributes

8

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Knowledge
Graph
(KG)
Services
and

Related
Research
Problems

•  KG
construcLon:
how
to
construct
high
quality

knowledge
graphs?

•  Knowledge
aquaciLon

•  Knowledge
evaluaLon

•  KG
understanding:
how
to
make
it
easier
to
access
and

reuse
knowledge?

•  for
end
users

•  for
data
engineers

•  KG
reasoning:
how
to
bridge
the
gap
between

vocabulary
used
in
the
graphs
and
those
used
in
qeuries

•  Scalability

•  Eﬃciency

9

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

APPLICATIONS
OF

KNOWLEDGE
GRAPHS

Summary of entities, Faceted fact, From best to list, EntityAssociations,
Structured Queries, and QuestionAnswering
10

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

ENTITY
UNDERSTANDING:

THINGS,
NOT
STRINGS

11

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

What
is
it?
(EnAty
Understanding)

12

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

FACETED
FACT:

GETTING
THE
VALUE
OF
SOME

ATTRIBUTE

13

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

What
is
the
Ame
there?
(Faceted
Fact)

14

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

FROM
BEST
TO
LIST:

NOT
ONLY
THE
BEST

15

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Give
a
List
instead
of
Best

16

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

ENTITY
ASSOCIATION:

SHOW
THE
CONNECTIONS

17

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

How
are
they
connected?
(EnAty
AssociaAon)

Gong Cheng,Yanan Zhang, andYuzhong Qu. Explass: ExploringAssociations
between Entities viaTop-K Ontological Patterns and Facets. In Proc. Of ISWC
2014, pp. 422–437.
http://ws.nju.edu.cn/explass/
18

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

STRUCTURED
QUERIES:

EVEN
WHEN
THE
INPUTS
ARE

KEYWORDS

19

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

From
keywords
to
structural
queries

Wang, Haofen, Kang Zhang, Qiaoling Liu,ThanhTran, andYongYu.
Q2semantic:A lightweight keyword interface to semantic search. In Proc. Of
ESWC 2008, pp 584-598.
“Capin SVG”
find specifications about“SVG”whose author’s name is“Capin”
20

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

QUESTION
ANSWERING:

COMPUTE
ANSWERS
WITH
THE
KG

21

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

QuesAon
Answering

Christina Unger, Lorenz Bühmann, Jens Lehmann,Axel-Cyrille Ngonga
Ngomo, Daniel Gerber, and Philipp Cimiano. "Template-based question
answering over RDF data." In Proceedings of the 21st international conference
onWorldWideWeb, pp. 639-648.ACM, 2012.
“films starring Brad Pitt”
22

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

SAMPLE
LINKED
DATA
KNOWLEDGE

REPOSITORIES

DBpedia,WikiData, GoodRelation
23

Jeff
Z.
Pan
(University
of

Aberdeen)

DBpedia

•  A
crowd-‐sourced
community
effort
to
extract

structured
informaLon
from
Wikipedia

•  allows
to
ask
structured
queries
against

Wikipedia

•  and
to
link
the
different
data
sets
on
the
Web

to
Wikipedia
data.

24

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

DBpedia
–
the
content

Entities and their attributes from
Wikipedia
infobox templates, categorisation
information, images, geo-
coordinates, etc
Classification Schemas
•  Wikipedia Categories are represented using the SKOS
vocabulary and DCMI terms.
•  YAGO Classification is derived from the Wikipedia category
system using Word Net.
•  Word Net Synset Links were generated by manually relating
Wikipedia infobox templates and Word Net synsets
DBpedia 2014 release consists of 3 billion
RDF triples
25

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

DBpedia
–
services

http://dbpedia.org/sparql
Query Builders (e.g. Leipzig query builder at
http://querybuilder.dbpedia.org)
Public Faceted Web
Service Interface
Dump Downloads
•  DBpedia dumps in 125 languages at
DBpedia download server.
•  DBpedia Ontology
26

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

DBpedia
–
use
cases

Nucleus for the Web of Data
Revolutionise Access to Wikipedia information
“Give me all cities in New Jersey with more than 10,000 inhabitants”
27

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

WikiData

•  A
collaboraAvely
edited
knowledge
base

operated
by
the
Wikimedia
FoundaLon.

•  Can
be
read
and
edited
by
both
humans
and

machines.

•  Acts
as
central
storage
for
the
structured

data
of
its
Wikimedia
sister
projects
including

Wikipedia,
Wikivoyage,
Wikisource,
and

others

28

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

WikiData
–
the
content

Wikidata is a document-
oriented, focused around
topics.
•  Information is added to
items by creating
statements (key-value
pairs)
29

Jeff
Z.
Pan
(University
of

Aberdeen)

WikiData
-‐
to
Linked
Data
Web
(1)

Fredo Erxleben, Michael Günther, Markus Krötzsch,
Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC
2014, pp. 50-65.
Exporting Statements as Triples
•  Faithful representations: with additional quantifiers and references
•  Simplified representations: without additional quantifiers and references
30

Jeff
Z.
Pan
(University
of

Aberdeen)

WikiData
-‐
to
Linked
Data
Web
(2)

Fredo Erxleben, Michael Günther, Markus Krötzsch,
Julian Mendez, and DennyVrandeˇci´. IntroducingWikidata to the Linked DataWeb. In Proc. Of ISWC
2014, pp. 50-65.
Extracting Schema Information from Wikidata
•  instance of (P31) → rdf:type and subclass of (P279) → rdfs:subClassOf
•  constraints for the use of properties → OWL Axioms
31

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

WikiData
–
use
case
&
data
access

Use Cases
•  Information about the sources helps support the
notion of verifiability
•  Collecting structured data: allow easy reuse of that
data
•  Support for Wikimedia projects: reducing the
workload in Wikipedia and increasing its quality
•  Support well beyond that. Everyone can use
Wikidata
Accessing the data
•  Mediawiki Lua Scribunto interface
•  Wikibase/API
•  RDF Dumphttp://tools.wmflabs.org/wikidata-exports/rdf/exports/20141013/
32

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

GoodRelaAons

GoodRelations is a lightweight ontology for annotating offerings
and other aspects of e-commerce on the Web.
[Slide
credit:

MarLn
Hepp]

33

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

GoodRelaAons
–
use
cases

[Slide
credit:

MarLn
Hepp]

34

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

GoodRelaAons
–
use
cases(2)

35
Google, Bing, Yahoo, and Yandex will improve the
rendering of your page directly in the search results
Rich Snippets:Search engines use your markup to augment the preview of your site
Targeted Searching:profile and preferences of the person behind the query

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

GoodRelaAons
–
who
are
using

36
Search Engines
and 10,000+
small and large
shops
Publishers
Software
OpenLink (Virtuoso)

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

CURRENT
STATUS
OF
ONLINE

LINKED
DATA

The good, the bad and the ugly
37

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

The
Good

Ontology
Mapping

Data
linkage

RDF
/
OWL
Querying
and
reasoning
techniques

-‐ 
Flexible

schema
sebng

-‐ 
schemaless
-‐>
simple

schema
-‐>
rich
schema

-‐
Universal
Unique
ID
for
data
enLLes:
URI

-‐
Shared
vocabularies

-‐
Schema
mapping

-‐ 
Instance
mapping

-‐ 
SPARQL
entailment
regimes

-‐ 
DisLrbuted
SPARQL
endpoints

38
Flexible
linked
data
eco-‐system

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

The
Good

• Flexible
linked
data
eco-‐system

• FaciliLes
of
sharing
and
linking
knowledge
in

open
environment

• Knowledge
representaLon:
various
levels
of

expressive
power

• Services,
tools,
and
approaches
for
knowledge

generaLon,
understanding,
and
consuming

• Interlinked
knowledge
repositories
across

various
domains

39

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

The
Bad

• Knowledge
Quality
(errors,
provenance,

quanLﬁer,
freshness…)

• Data
protecLon
(license,
access
control)

• Data
business
model

40

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

The
Ugly

• Excel
in
knowledge
representaLon

•  But,
a
large
amount
of
datasets
missing
schema

informaLon

• RDF
is
triple
based
model

•  But,
it
is
hard
and
Lme-‐consuming
(even
for
SW

geeks)
to
understand
a
RDF
knowledge
repository

41

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

RESEARCH
CHALLENGES

42

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Research
Challenges

•  KG Construction
•  Ontology / Schema Construction
•  Data Lifting
•  Quality Evaluation
•  Understanding KG
•  User Understanding
•  Data Understanding
•  Dynamic Knowledge in KG
•  Stream Data / Prediction
•  Belief Revision
•  Intelligent Services for KG
•  Ontology Reasoning (see my tutorial at ISWC2014)
•  Problem Solving / Workflow
43

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

•  Incompleteness of data: is the constructed schema
generic enough to accommodate new data?
•  Inconsistency of data: what if data conflicts with each
other?
e.g. Birthdate of people: some people may not have
birthdate asserted in the dataset, should the schema
specify that each people has a birthdate? Some people
may have different birthdates asserted in different
datasets, should the schema specify that birthdate is
unique?
Challenges
in
AutomaAc
ConstrucAon

44

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

•  Expertise of ontology engineers: do the engineers
have sufficient understanding and experience of
ontology technologies (RDF(S), OWL, SPARQL,
RIF, etc…)
•  Workload of ontology engineers: how much time
does it take to manually construct a large
ontology? E.g. SNOMED CT has about 400,000
concepts
•  Collaboration: when multiple ontology engineers
work together, how to make sure they have
consistent understanding of the ontology?
Challenges
in
Mannual
ConstrucAon

45

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

•  Requirement and evaluation: how to specify the
requirement of ontology construction and test if the
requirements have been fulfilled?
•  Expressiveness v.s. Efficiency: which knowledge
representation should we use? Is it sufficient to
describe the domain? Is there efficient reasoning
and query answering mechanism and system
available?
•  Ontology reuse: do we have to construct everything
from scratch? Is there ontology available covering
partially the domain?
Challenges
for
both
AutomaAc/Mannual

ConstrucAons

46

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Key challenges:
•  Entity identification: certain entities can be
hard to identify, e.g. movie titles
•  AVP (attribute-value pair) identification: an
entity, attribute and its value may scattered
across the text or dataset, making it hard to
establish the relation
Challenge
in
Data
Liding

Data Lifting enrichs unstructured data with structural
annotations, therefore extract the entities and their
relations, properties for knowledge graph
47

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Challenge
in
EnAty
IdenAﬁcaAon

•  There different ways to identify entities: e.g. “The
President of the U.S.” and “Barak Obama”
•  The same name can be referring to different
entities
•  People may use acronym or abbreviation for
entities: e.g. “K-Drive” is the acronym for
“Knowledge-driven Data Exploitation” project
instead of the drive labelled K in my computer.
•  Natural language text may have typos, values
may use different notations
48

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

•  Users are unfamiliar with the content of knowledge
graphs:
•  What is the vocabulary?
•  What is described by the knowledge graph?
•  How is the content organised?
•  How is it connected to the other datasets I
have?
•  Users do not know how to exploit the knowledge
graph:
•  Which query can I ask this knowledge graph?
•  Which query can be answered with this
knowledge graph?
Challenge
in
Data
Understanding

49

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Challenge
in
Knowledge
Dynamics

•  Validity of knowledge: is a piece of information permanent or
temporary?
•  Representation: e.g. to represent the temporal dependency of
knowledge, e.g. “George W. Bush was the president of the U.S.
until Barak Obama became the president.”
•  Updating of knowledge graph: When and how do we retract a
previously unknown mistake from the knowledge graph? Which
knowledge should become obsolete after the current update?
•  Querying: to query w.r.t. the temporal properties of knowledge,
e.g. “Who was the last president of the U.S.?”
•  Predicting the dynamics: which change is likely to occur given
the history of the knowledge graph?
50

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Key challenges
•  Efficiency of the services: knowledge graphs are
usually accessed by multiple users in real-time.
Efficiency is crucial to the quality of service.
•  Scalability of the services: knowledge graphs are
usually of large scale while basic reasoning services,
e.g. transitive closure, can already consume large
amount of time and resources.
Challenge
in
Intelligent
Services

The large amount of information and their inter-connection in
a knowledge graph can be used to provide intelligent
services; e.g. reasoning can be used to discover hidden
relations in a knowledge graph
51

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Agenda

ConstrucAng
Knowledge
Graphs

2:00pm
–

3:05pm

3:05pm
–

3:40pm

Understanding
Knowledge
Graphs

2:30pm
–

2:45pm

Coﬀee
Break

PART
II
METHODS
&
TECHNIQUES

3:40pm
–

3:45pm

Outlook

52

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

•  Test Driven Ontology Construction
•  Methodology
•  A Protégé plug-in
•  Handling Entity DisambiguaLon

•  Approach
•  Some evaluation result
•  Briding Requirements and Authoring Tests
•  Competency Questions as Informal Requirement
Specification
•  Some evaluation results
CONSTRUCTING
KNOWLEDGE
GRAPHS

53

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Uschold
&
King’s
(1995)
Methodology
on

Ontology
ConstrucAon

•  Key steps: capturing, coding, integrating and
evaluating/testing
•  Ontology evaluation/testing:
•  to make a technical judgment of the ontologies
•  w.r.t. to a frame of reference
•  A frame of reference can be:
•  requirement specifications
•  competency questions
•  or, the real world
54
54

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Ontology and Tests
•  Uschold & King’s methodology
•  Test ontology after axioms are written
•  Test-driven ontology authoring
•  Write authoring tests before writing axioms
•  Writing authoring tests before axioms does not take any
more efforts than writing them after axioms
•  Force authors to think about requirements before
writing axioms
•  Writing authoring tests first will help authors to detect
and remove errors sooner
•  Understand how good is a(n) existing/reused ontology
55
55

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Gruninger
&
Fox’s
(1995)
Methodology

Key steps:
1.  Motivating Scenarios
2.  Informal competency questions
3.  FOL terminology (classes, properties, objects)
4.  Formal competency questions (2 -> 4?)
5.  FOL axioms
6.  Completeness theorem (defining the conditions
under which the solutions to the questions are
complete)
56
56

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

The
METHONTOLOGY
(2003)
Methodology

•  Key steps:
1.  specification of requirements
2.  terminology with tabular and/or graph notations
3.  formalisation with logic based ontology language
4.  maintenance (including evaluation/testing)
•  Ontology evaluation/testing:
•  checking consistency, completeness, redundancy
57
57

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

The
DKAP
(2007)
Methodology

•  Key steps:
1.  determine the domain and scope
2.  check availability of existing ontologies
3.  collect and analyse data for knowledge extraction
4.  develop initial ontology
5.  refine and validate ontology
•  Ontology Validation/testing:
•  consistency and accuracy checking
58
58

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

LimitaAons
of
ExisAng
Methodologies

•  Methodology level:
•  Lack of details about the transitions
• from requirement to tests
• from requirements to terminology
• form terminology to axioms
•  Tool level:
•  lack of tools to guide the above transitions
59
59

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

An approach to Test-‐Driven
Ontology

Authoring
(presented
in
an
invited
talk
at

BMIR,
Stanford
University,
June
2013)
•  An ontology contains not only OWL files, but
also a test suit
•  A test suit contains a set of tests as SPARQL
1.1 queries
•  not all requirements can be represented in
SPARQL 1.1 though
•  Ontology reuse
•  check the associated test suit before ontology
reuse, to better understand the original intention
•  Collaborative ontology authoring
•  all authors agree upon a common test suit
•  each author can have their an extra test suit locally
60
60

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Authoring
Tests

Test
Suite

Test
1
Test
2
…

Query
Expected

results

Ontology

Actual

results

Pass/
fail
reasoner

SPARQL

1.1

61

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

A
Protégé
Plug-‐in
for
Authoring
Tests

(based
on
the
TrOWL
reasoner)

62
62

Jeff
Z.
Pan
(University
of

Aberdeen)

•  Clicking
on
a
test
to

show
the
expected

and
actual
results

Loading
the
Manifest
File

•  A
manifest
file

specifies
queries
and

expected
results

•  Running
reasoner
to

get
the
results
for

each
test

63
63

Jeff
Z.
Pan
(University
of

Aberdeen)

Compute
JusAficaAons
for
Errors
Related
to

Failed
Tests

• with
the
jusLficaLon
plug-‐in
(and
reasoners,

such
as
TrOWL)

64
64

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Modify
the
Ontology

• so
that
CheeseTopping
no
longer
disjoint
with

VegetableTopping

65
65

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Key
Issue
(to
be
revisited
ader
the
EnAty

DisambiguaAon
part)

• Understanding
the
intension
of
ontology
authors

•  How
to
generate
authoring
tests?

•  How
to
judge
the
quality
of
the
authoring
tests?

66
66

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

EnAty
RecogniAon
and
DisambiguaAon

•  Challenge
revisit:

•  There different ways to identify entities: e.g. “The President of the
U.S.” and “Barak Obama”
•  The same name can be referring to different entities
•  Contextual
hypothesis
used in many existing
aproaches
•  terms
with
similar
meanings
are
oien
used
in
similar

contexts

•  The
role
of
these
contexts
is
typically
played
by
already

annotated
documents
(e.g.
wikipedia
arLcles)
which

are
used
to
train
term
classiﬁers
67
67

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

AlternaAve
Context:
Evidence
Model

• 
Idea: semantic entities that may serve as disambiguation
evidence for the scenario’s target entities
68

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Evidence
Model
ConstrucAon
(Manual)

•  The identification of target concepts whose instances we wish to
disambiguate (e.g. locations)
•  The determination related concepts whose instances may serve
as contextual disambiguation evidence.
• For example, in texts that describe historical events, some
concepts whose instances may act as location evidence
are related locations, historical events, and historical
groups and persons.
•  The identification, for each pair of evidence and target concept, of
the relation paths that links them.
69

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Evidence-‐Target
Paths

70

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Term
ExtracAon
(AutomaAc)

Extraction is performed with Knowledge Tagger (from
iSOCO) based on GATE.
71

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

EvaluaAon
Results:
Football
Match
Scenario

•  50 texts describing football matches.
•  E.g. “It's the 70th minute of the game and after a
magnificent pass by Pedro, Messi managed to beat
Claudio Bravo. Barcelona now leads 1-0 against Real."
72

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

EvaluaAon
Results:
Military
Conﬂict
Scenario

•  50 historical texts describing military conflicts.
•  E.g. “The Siege of Augusta was a significant battle of the
American Revolution. Fought for control of Fort
Cornwallis, a British fort near Augusta, the battle was a
major victory for the Patriot forces of Lighthorse Harry Lee
and a stunning reverse to the British and Loyalist forces in
the South”.
73

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Future
Work

•  Fully automated construction of the disambiguation evidence model.
•  Challenge here is how to automatically identify the text’s domain/
topic.
•  Combination with statistical methods for cases where available
domain semantic information is incomplete.
•  Challenge here is how to select the optimal ratio of ontological
evidence v.s. statistical one.
•  Development of tool to enable users to dynamically build such models
out of existing semantic data and use them for disambiguation
purposes
74

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Issues
in
Test-‐Driven
Ontology
Authoring
1.  How to generate
tests
2.  How to judge the
quality of tests
•  why they are relevant
•  how to provide the
correct expected
answers
75
75

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Requirement
Driven?

• How
about
starLng
from
requirements
instead

of
tests?

Ontology

Authoring

Requirements

Ontology

Authoring

Tests

Test
Results

76

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Requirement-‐Driven
Ontology
Authoring

[Ren
et.
al,
2014]
•  Key questions
•  RQ1: what forms of requirements should we consider
•  RQ2: how to generate authoring tests from requirements
77
77

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Competency
QuesAon

•  QuesLons
that
people

expect
the
constructed

ontologies
to
answer

•  Useful
for
novice
users

•  in
natural
languages

•  about
domain
knowledge

•  requires
liSle

understanding
of

ontology
technologies

•  A
typical
CQ:
Which
pizza
has
some
cheese
topping?

78

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

RQ1: what forms of requirements should we
consider

RQ1’:
How
are
CQs
formulated?

Competency
QuesAons
(CQs)
can
be

regarded
as
a
funcAonal
requirement
of

the
ontology

79

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Key
Idea
1:
IdenAﬁcaAon
of
CQ
Paoerns

•  A
typical
CQ:
Which
pizza
has
some
cheese
topping?

•  Hypothesis:
CQs
usually
have

clear
syntacLc
paSerns

•  Features
and
elements
can

be
extracted
Feature:
Type
of
quesLon

Element:
Class
expression
CE1

Element:
Object
property

expressions
OPE

Feature:
Binary
predicate

Element:
Class
expression
CE2

CE1
OPE
CE2

80

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Result
1:
A
Feature-‐based
Framework
for
CQ

FormulaAon

•  Based
on
CQs
collected
from
the
Soiware
Ontology
Project
(75
CQs)

and
Manchester
OWL
Workshops
(70
CQs)

•  Primary
features
-‐>
CQ
Archetypes

•  Secondary
features
-‐>
CQ
Subtypes

Feature

Primary
Feature
Secondary
Feature

QuesLon

Type

Element

Visibility

SelecLon

Boolean

CounLng

Explicit

Implicit

Predicate

Arity

Unary

Binary

N-‐ary

RelaLon

Type

Object

Datatype

Modiﬁer

QuanLty

Numeric

Domain

Independent

Element

SpaLal

Temporal

QuesLon

Polarity

PosiLve

NegaLve

81

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Result
2:
Archetypes
of
CQ
Paoerns

82

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Answerability
of
CQs

•  ExisLng
work
focused
on

answering
CQs
directly

•  But
is
the
answer

meaningful?

•  The
ability
to
answer
CQs

meaningfully
can
be

regarded
as
a
funcLonal

requirement
of
the
ontology

•  What
if
the
answer
is
an

empty
set

•  Possible
scenarios

•  Pizza
does
not
exist

•  Cheese
topping
does
not
exist

•  Pizzas
are
not
allowed
to
have

cheese
topping

•  The
ontology
has
not
been

populated
with
any
cheesy

pizza
yet

•  …

•  A
typical
CQ:
Which
pizza
has
some
cheese
topping?

83

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

RQ2: how to generate authoring tests from
requirements
RQ2’:
How
can
we
automaLcally
test
whether
a

CQ
can
be
meaningfully
answered?

84

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Key
Idea
2:
PresupposiAons
of
CQ

•  A
CQ
comes
with
certain

presupposi(ons

•  Some
condi(ons
the
speakers

assume
to
be
met

•  A
CQ
can
be
meaningfully

answered
only
when
its

presupposiLons
are
saLsﬁed

•  Classes
Pizza,
CheeseTopping

should
occur
in
the
ontology

•  Property
has(Topping)
should

occur
in
the
ontology

•  The
ontology
should
allow

Pizza
to
have
CheeseTopping

•  The
ontology
should
also

allow
Pizza
to
not
have

CheeseTopping

•  A
typical
CQ:
Which
pizza
has
some
cheese
topping?

85

Jeff
Z.
Pan
(University
of

Aberdeen)

CQs
and
Authoring
Tests

•  A
typical
CQ:
Which
pizza
has
some
cheese
topping?

•  SaLsfiability
of
CQ

presupposiLons
can
be

verified
by
authoring
tests

generated
based
on
its

features
and
elements

•  Classes
Pizza,
CheeseTopping

should
occur
in
the
ontology

•  [CE1],
[CE2]
should
both
occur
in
the

class
vocabulary

•  Property
has(Topping)
should
occur

in
the
ontology

• 
[OPE]
should
occur
in
the
property

vocabulry

•  The
ontology
should
allow
Pizza
to

have
CheeseTopping

• 
should
be
sa6sfiable

•  The
ontology
should
also
allow

Pizza
to
not
have
CheeseTopping
• 
should
be
sa6sfiable

CE1
OPE
CE2

86

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Result
3:
Associate
PresupposiAons
with

Features

• Features
in
a
CQ
are
associated
with
the

presupposiLons
of
the
CQ.

•  An
example
on
the
ques6on
type
feature:

QuesLon

Type

SelecLon

Boolean

CounLng

Occurrence
of
“Pizza”,
“Pork”,

“contains”

Which
pizza
contains
pork?

Can
pizza
contain
pork?

How
many
pizza
contains
pork?

Some
pizza
can
contain
pork

Some
pizza
can
contain
no
pork

87

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Result
4:
Formal
Authoring
Tests

•  All
tesLngs
can
be
automated

88

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Class

Hierarchy

Verbalise
r

Competency

QuesLons

User/System
Dialogue

History

User
Input

WhatIf
Gadget

89

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Input
(Manchester
Syntax)

1. User
selects
a
speech
act
by
clicking
or
selecLng
a

shortcut.

2. We
need
to
evaluate
their
usefulness.

3. Examples:

●  Class:
Pizza
SubClassOf:
Food

●  Class:
Fruit
DisjointWith:
Pizza

90

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Input
(OWL
Simpliﬁed
English)

1.  A
set
of
restricted
natural
language
paSerns.

2.  System
recognises
the
speech
act.

3.  Capable
of
accepLng
Competency
QuesLons.

4.  Examples:

●  Which
pizza
has
topping
a
tomato
topping?

●  An
apple
is
a
fruit.

91

Jeff
Z.
Pan
(University
of

Aberdeen)

Modelling
User
Goals
(1)

1. Users
can
import
or
write
their
own
CQs
in
OWL

Simplified
English

2. Based
on
the
inserted
CQ,
a
list
of
Authoring
Tests

(ATs)
will
be
generated.

3. A
tree
structure
displays
these
CQs
and
ATs.

4. The
system
is
constantly
monitoring
these
CQs
and

ATs.
Any
change
in
saLsfiability
of
ATs:

a. Will
be
reported
by
changing
the
icon
of
ATs
in

the
tree.
Red/Green
respecLvely
represent
fail/
pass
of
each
AT.

b. Will
be
reported
in
the
“history”
pane.

92

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Modelling
User
Goals
(2)

CQ
+
AT
hierarchical
representaLon.

Icons
represent
the
saLsﬁability
state

WriSen
feedback
presented
to
the

user
in
the
“history”
pane.

93

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Further
Challenges

●  Maintaining a continuous and meaningful
interaction with the user
●  Generating a coherent and comprehensive
set of entailments in response to What-If
questions
❖ Content selection
❖ Grouping and aggregation
❖ Ordering
94

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

•  Data
understanding
•  Data summarisation
•  Query generation

UNDERSTANDING
KNOWLEDGE
GRAPHS

95

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Data
Understanding:
A
Core
AcAvity
in
Data
ExploitaAon

•  TradiLonal
focus
in
semanLc
web
research:
data

understanding
for
machines
and
programs.

•  More
importantly:
Data
understanding
for
human

•  humans
are
the
ulLmate
owners
and
consumers
of

data

• systems
such
as
knowledge
graphs,
Watson,
Siri,
etc.

•  to
help
human
users
to
understand
the
contents,

implicaLons
and
applicaLons
of
data

• More
than
HCI,
we
want
interesLng
and
insighqul
data!

9696

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

SemanAc
Datasets
Are
HARD
to
Understand

•  Non-expert users might not be familiar with
RDF, OWL and SPARQL
•  RDF(s) has 6 core documents
•  OWL 2 has 6 core documents
•  SPARQL 1.1 has 11 core documents
•  Users are unfamiliar with datasets
•  That are too large to explore
•  That are external to their organisation
•  …
•  It is HARD for novice users to construct
queries
9797

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Challenges
of
Data
Understanding

• Challenges

•  Expressing
needs
(keywords/SPARQL)

•  Describing
datasets

•  Only
retrieve
the
relevant
parts

• 9.96%
SPARQL
/
8.19%
DUMP

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

SoluAon
–
Summary
based
proﬁling
for
LD

•  Key
idea:
building
block
based
informaLon
space

modelling

•  Decomposing
&
ConstrucLng

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

The
philosophy
of
interpreAng
informaAon

• Task:
explain
the
data
to
human
users

Entity Centric

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

EnAty-‐centric
View
of
RDF
Data

En6ty
Descrip6on
Block

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Concrete
to
abstract

En6ty
Descrip6on
Pa?ern

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Data
SummarisaAon
–
EDP
Graph
•  Reveal
the
schema
level
informaLon

•  What
concepts
are
there
(nodes)and
how
they
are
related
to
each

other(edges)?

•  Disclose

individual
level
distribuLon

•  StaAsAcs
aSached
to
nodes
and
edges

Jamendo
dataset

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Understanding
Data
Redundancy

[Wu
et.
al,
2014]

104

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Related
Paper
at
JIST2014

• Graph
PaSern
based
RDF
Data
Compression

Jeﬀ
Z.
Pan,
Jose
Manuel
Gomez-‐Perez,
Yuan
Ren,

Honghan
Wu,
Haofen
Wang
and
Man
Zhu

• (Monday
aiernoon)

105

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Understanding
How
Data
Can
be
Used

•  Given
a
knowledge
graph,
generate
candidate

insighqul
queries

•  Manual
generaLon/automaLc
generaLon

•  GeneraLon
based
on
schema/actual
data

•  With/without
user
interference

•  Our
aim:
automaLc
generaLon
based
on
data

without
user
interference

•  Most
friendly
to
new,
novice
users

•  Complementary
to
inference
(heavily
based
on

schema)

106106

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Candidate
Insighpul
Queries

[Pan,
et
al,
2013]

•  Graph
paSerns
are
summarisaLons
that
represent

many
subsets
of
the
RDF
graph

•  PaSern
structure

•  Structured
knowledge,
which
is
diﬃcult
to
express
with

schema

•  Such
as
star,
chain,
tree,
loop

•  Correspondences
between
mulLple
graph

paSerns

•  Strongly
corresponding
paSerns
(large
overlapping)

•  Weakly
corresponding
paSerns
(liSle
overlapping)

•  ExcepLons

107

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Query
GeneraAon
Framework

•  1.
data
summarisaLon

•  Signiﬁcantly
decrease
the

search
space
in
rule
mining

•  2.
data
analyLcs

•  First
order
inducLve

learning

•  AssociaLon
rule
mining

•  3.
query
generaLon

•  ExploiLng
the
relaLons

between
queries
and
rules

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

EvaluaAon

109

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Another
Example

• Given
university
data
set
in
LUBM,
the

following
two
queries
have
the
same
results

(when
no
reasoning
is
applied)

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

Summary
and
Future
Work

• Take
home
message

•  Data
summarisaLon
and
data
analyLcs

technologies
not
only
help
people
to
ﬁnd
answers,

but
also
help
people
asking
quesLons!

• Future
work

•  Integrate
with
applicaLon
scenario
background

knowledge

•  Integrate
with
reasoning

•  Integrate
with
user
preferences

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

OUTLOOK

Outlook of Knowledge Graph: from application’s point of view
112

Jeﬀ
Z.
Pan
(University
of

Aberdeen)

What knowledge graph still needs:
•  “How to…” knowledge in addition to “What is
…” knowledge
•  Operations associated to the entities
Outlook

What knowledge graph is good at:
Maintaining factual knowledge in a structural
manner and answer queries about them
113

JIST2014
Tutorial
on

ConstrucAng
and
Understanding

Knowledge
Graphs

Thanks
you!

QuesAons?

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs

Similar to Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs (20)

Recently uploaded

Recently uploaded (20)

Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs