Presentation given by Maria Theodoridou of FORTH-ICS at the ARIADNE winterschool on experiments that have been carried out within ARIADNE to improve the interoperability and re-usability of archaeological datasets. The CIDOC-CRM with a set of extensions has been used as a reference model within ARIADNE.
Maria Theodoridou Semantic Integration Experiments
1. ARIADNE
is
funded
by
the
European
Commission's
Seventh
Framework
Programme
SemanAc
IntegraAon
experiments
Improving
Interoperability
and
Reusability
Unlocking
the
PotenAal
of
Digital
Archaeological
Data
Florence,
15
December
2016
Maria
Theodoridou
FORTH-‐ICS,
Greece
2. The
challenge
Build
an
Integrated
Knowledge
Repository
and
support
innovaAve
reasoning
on
archaeological
datasets
(relaAng
and
combining
data)
preserving
the
original
meaning
and
the
perspecAve
of
the
different
data
providers.
Two
main
pillars:
Ø a
global,
extensible
schema
in
the
form
of
a
formal
ontology
that
allows
for
integraAon
without
loss
of
meaning.
Ø ARIADNE
Reference
Model
=
CIDOC
CRM
+
Extension
Suite
Ø Common
vocabularies/terminologies
Ø Use
of
well
established
standard
terminologies
Ø GeCy
AAT
Ø Nomisma.org
4. Case
Studies
Ø NumismaAcs
• tradiAonal
science
with
experience
and
iniAaAves
in
standardizaAon
so
it
was
chosen
as
a
very
good
starAng
point
for
item-‐level
integraAon
• Nomisma.org
serves
as
a
authoritaAve
resource
Ø Wood/Dendrochronology
• integraAon
of
informaAon
from
diverse
datasets
and
(via
NLP)
archaeological
reports
in
different
languages
• GeCy
AAT
serves
as
an
authoritaAve
resource
Ø Sculptures
• data
integraAon
of
sources
from
various
disciplines
including
sculpture
informaAon
and
its
archaeological
context.
• focuses
on
the
provenance
of
informaAon
according
to
bibliographic
references
which
leads
to
advanced
literature
research
5. NumismaAcs
Case
Study
Extracts
of
5
diverse
databases
&
datasets:
Ø OEAW:
dFMRO
coin
archive
72
records
Ø COINS
Project:
SAR
Archive
627
records
Ø COINS
Project:
FWM
Archive
Ø iDAI
Coins
Pergamon
517
records
Ø CultureItalia:
MuseiD-‐Italia
25562
records
Ø NLP
data
from
Heslington
East
ExcavaAon
Archive
37
records
Ø ACDM
records
7. Wood/Dendrochronology
Case
Study
• Extracts
of
5
archaeological
datasets,
output
from
NLP
on
25
grey
literature
reports
• MulAlingual
-‐
English,
Dutch
and
Swedish
data
• Data
integraAon
via
CIDOC
CRM
and
Geay
AAT
• 1.09
million
RDF
triples
• 23,594
records
• 37,935
objects
• DemonstraAon
query
builder
for
easier
cross-‐search
and
browse
of
integrated
datasets
8. Wood/Dendrochronology
Case
Study
SPARQL
queries
DemonstraAon
applicaAon:
Query
Builder
DCCD
RDF
triple
store
ADS,
DANS,
SND
Geay
AAT
(RDF)
VAG
cruck
NMS
VAG
dendro
UNID
XML
NLP
Direct
import
TransformaAon
(STELETO)
Cleansing
+
NormalisaAon
(OpenRefine)
tabular
records
TransformaAon
(STELETO)
Grey
literature
Archaeological
datasets
tabular
records
TransformaAon
(XSLT)
9. Sculptures
Case
Study
• Extracts
of
5
diverse
databases
&
datasets:
– Archaeological
object
database:
Arachne
– Field
research
databases:
Athenian
Agora,
iDAI.field
– Museum
data:
BriAsh
Museum
– Research
data:
Oxford
Roman
Economy
Project
• Data
integraAon
via
CIDOC
CRM
and
controlled
vocabularies:
Geay
AAT,
Wikidata,
Zenon,
iDAI.gazeaeer
• 5,44
million
triples
• 58343
records
11. IntegraLon
&
Interoperability
ARIADNE
portal
Integrated
Knowledge
Repository
X3ML
Mapping
Framework
mapping
provider
dataset
records
to
CIDOC
CRM
Content
Providers
ARIADNE
aggregaLon
infrastructure
Provider
dataset
descripLons
Catalog
Integrated
Browse/Query
Interface
Provider
records
ACDM
records
ACDM
records
mapping
ACDM
records
to
CIDOC
CRM
Browse
the
Catalog
NLP
NLP
records
12. Integrated
Knowledge
Repository
Experimental
integrated
knowledge
repository
Ø NumismaAcs
Case
Sudy
1,2M
triples
Ø Wood/Dendrochronology
Case
Study
1,5M
triples
Ø Sculptures
Case
Study
5,5
M
triples
Ø AAT
thesaurus
4,4M
triples
Total
~
13M
triples
Contains
different
levels
of
informaAon:
Ø Item
specific
informaAon
Ø Document
research
data
Ø NLP
data
Ø Catalog
informaAon
Technologies
used:
hap://www.metaphacts.com/
haps://www.blazegraph.com/
13. Research
quesAons
Ø Query
mechanisms
support
innovaAve
reasoning
on
archaeological
datasets
Ø Query
power
lies
in
relaAng
and
combining
Ø data
from
different
providers,
preserving
the
original
meaning
and
their
perspecAve
Ø data
from
grey
literature
reports
Ø item
level
with
catalog
info
on
archaeological
datasets
14. Research
quesAons
Ø Find
all
bronze
coins
(item
level
info,
retrieves
datasets
from
mulAple
providers)
Ø Find
the
publishers
of
all
collecAons
that
contain
coins
(catalog
info)
Ø Find
all
datasets
and
grey
literature
reports
that
contain
bronze
antonianus
(item
level,
NLP
data
and
catalog
info)
SAR
records
NLP
record
CulturaItalia
records
DAI
record
OEAW
records
Catalog
info
15. ContribuAng
partners
Achille
Felicem,
PIN
Carlo
Meghini,
CNR-‐ISTI
Philipp
Gerth,
DAI
Ceri
Binding,
USW
Douglas
Tudhope,
USW
Andreas
Vlachidis,
USW
Nadezhda
Kecheva,
NIAM-‐BAS
Sara
di
Giorgio.
ICCU
Edeltraud
Aspoeck,
OEAW
Anja
Masur,
OEAW
16. ARIADNE
is
a
project
funded
by
the
European
Commission
under
the
Community’s
Seventh
Framework
Programme,
contract
no.
FP7-‐
INFRASTRUCTURES-‐2012-‐1-‐313193.
The
views
and
opinions
expressed
in
this
presentaAon
are
the
sole
responsibility
of
the
authors
and
do
not
necessarily
reflect
the
views
of
the
European
Commission.
Thank
you