This presentation by Achille Felicetti of PIN at the ARIADNE winter school describes the approach adopted in ARIADNE for the semantic integration of archaeological information. The challenges of integrating archaeological datasets created in various countries with different research objectives and implicit knowledge built into the structure of the data. The CIDOC-CRM ontology is introduced and the benefits of using it as a reference framework for semantic integration are discussed.
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Achille Felicetti - ARIADNE Semantic Integration of Archaeological Information
1. ARIADNE
is
funded
by
the
European
Commission's
Seventh
Framework
Programme
ARIADNE
SemanAc
IntegraAon
of
Archaeological
InformaAon
Achille
FeliceD
VAST-‐LAB
-‐
PIN,
Università
degli
Studi
di
Firenze,
Italy
2. ID Item Room Showcase
35 Amphora 3 2
24 Coin 8 15
18 ... ... ...
ID Artifact SU
1020 Coin 12
1021 ... ...
1022 Amphora 13
Museum
DB:
Items
Table
Excava4on
DB:
Ar4facts
Table
Different
archives
Different
data
structures
Is
integra4on
possible?
IntegraAng
archaeological
data
3. ID Item Room Showcase
35 Amphora 3 2
24 Coin 8 15
18 ... ... ...
ID Artifact SU
1020 Coin 12
1021 ... ...
1022 Amphora 13
Museum
DB:
Items
Table
Excava4on
DB:
Ar4facts
Table
Object
Place
IntegraAng
archaeological
data
4. ID Item Room Showcase
35 Amphora 3 2
24 Coin 8 15
18 ... ... ...
ID Artifact SU
1020 Coin 12
1021 ... ...
1022 Amphora 13
Museum
DB:
Items
Table
Excava4on
DB:
Ar4facts
Table
Object
Place
Object
Place
Found
In
Stored
In
Implicit
knowledge:
semanAc
relaAons
5. Temporal
enAAes
ID Artifact SU Data Period
1020 Coin 12 1981 V B.C.
1021 ... ...
1022 Amphora 13 1974 III B.C.
Time
Object
Created
Found
6. The
CIDOC
CRM
model
• The
CIDOC
Conceptual
Reference
Model
– A
collaboraAon
with
the
InternaAonal
Council
of
Museums
– An
ontology
of
92classes
and
142
properAes
for
culture
and
more
– With
the
capacity
to
explain
hundreds
of
(meta)data
formats
– Accepted
by
ISO
TC46
in
September
2000
– InternaAonal
standard
since
2006
-‐
ISO
21127:2006
– To
be
revised
2014
(minor
extensions)
• Serving
as:
– Intellectual
guide
to
create
schemata,
formats,
profiles
– A
language
for
analysis
of
exisAng
sources
for
integraAon/
mediaAon
– IdenAfy
elements
with
common
meaning
– TransportaAon
format
for
data
integraAon
/
migraAon
/
Internet
7. What
is
a
formal
ontology
made
of?
• System
of
classes
and
relaAons
that
should
describe
some
domain
of
discourse
–
It
entails
no
specific
encoding
•
Any
ontological
specifica4on
should
contain
at
least:
• Scope:
a
definiAon
of
the
intended
field
of
discourse/reality
that
the
formal
ontology
should
cover
– e.g.
Car
Manufacturing,
Cultural
Heritage,
Fashion
• Classes:
universals
meant
to
represent
some
set
of
en44es
in
the
world
of
discourse,
that
have
a
disAnct,
idenAfiable
behaviour
and
iden4ty
• Proper4es:
the
rela4ons
that
exist
between
classes
in
the
ontology.
The
relaAons
formally
define
the
possible
proposiAons
that
can
be
made
of
instances
of
classes
7
8. Reading
a
formal
ontology
• Formal
Ontologies
are
arranged
hierarchically.
• The
highest
classes
are
the
most
abstract
and
define,
with
their
properAes,
the
highest
levels
of
discourse
within
a
domain.
• Scan
for
classes
and
relaAons
that
seem
relevant
to
what
you
want
to
describe.
Are
they
adequate?
• If
you
adopt
a
formal
ontology,
then
the
world
you
want
to
describe
should
largely
be
expressible
under
its
high
level
terms
and
have
specific
enough
terms
to
support
minimally
ambiguous
discourse.
8
9. Anatomy
of
a
Class
• The
Label:
arbitrary
but
idenAfying
• Subclass/Superclass:
Place
in
IsA
• The
Scope
Note:
gives
the
meaning,
the
intension.
First
thing
to
check!
• The
Examples:
helps
to
verify…
do
others
think/do
it
like
you
do
• The
Proper4es/Rela4ons:
more
verificaAon
of
appropriateness.
• How
does
it
relate
to
other
concepts?
Is
this
how
my
concept
behaves?
9
10. Anatomy
of
a
Property
• The
Label:
arbitrary
but
idenAfying
• The
Domain:
The
set
of
classes
from
which
the
property
can
originate
• The
Range:
the
set
of
classes
to
which
the
property
can
join
the
domain
class
• Superproperty/subproperty:
Place
in
IsA
Hierarchy
• The
Scope
Note:
gives
the
meaning,
the
intension.
First
thing
to
check!
• The
Examples:
helps
to
verify…
do
others
think/do
it
like
you
do
10
11. • IdenAficaAon
of
real
world
items
by
real
world
names
• ObservaAon
and
ClassificaAon
of
real
world
items
• Part-‐decomposiAon
and
structural
properAes
of
Conceptual
and
Physical
Objects,
Periods,
Actors,
Places
and
Times
• ParAcipaAon
of
persistent
items
in
temporal
enAAes.
– Creates
a
no4on
of
history:
“world-‐lines”
mee4ng
in
space-‐4me.
• LocaAon
of
periods
in
space-‐Ame
and
physical
objects
in
space.
• Influence
of
objects
on
acAviAes
and
products
and
vice-‐versa.
• Reference
of
informaAon
objects
to
any
real-‐world
item.
Official
DocumentaAon
hhp://cidoc-‐crm.org/official_release_cidoc.html
The
CIDOC
CRM
model
12. CIDOC
CRM:
top
level
classes
participate in
E39 Actors
E55 Types
E28 Conceptual Objects
E18 Physical Thing
E2 Temporal Entities
E41Appellations
affect or / refer to
refer to / refine
referto/identify
location
atwithin
E53 Places
E52 Time-Spans
14. E2
Temporal
EnAAes
E2 Temporal Entity
E5 Event E63 Beginning of Existence
E7 Activity
E69 Death
E6 Destruction
E87 Curation Activity
E83 Type Creation
E13 Attribute Assignment
E86 Leaving
E80 Part Removal
E 79 Part Addition
Generalization
E64 End of Existence
E10 Transfer of Custody
E15 Identifier Assignment
E4 Period
E3 Condition State
E68 Dissolution
E81 Transformation
E67 Birth
E66 Formation
E65 Creation
E11 Modification
E9 Move
E8 Acquisition
E85 Joining
E12 Production
E17 Type Assignment
E14 Condition Assessment
E16 Measurement
15. CIDOC
CRM
approach
P54 has current permanent location
(is current permanent location of)
E18 Physical Thing
E7 Activity
E9 Move
E53 Place E19 Physical Object
P53 has former or current location
(is former or current location of)
P55 has current location
(currently holds)
P26 moved to
(was destination of)
1,n
0,n 0,n
0,n
0,n
1,n
0,1
0,n
1,n
1,nP27 moved from
(was origin of)
P25 moved
(moved by)
E55 Type
P21 had general purpose
(was purpose of)
0,n 0,n
P20 had specific purpose
(was purpose of) 0,n
0,n
0,n 0,1
E5 Period
P7 took place at
(witnessed)
1,n
0,n
17. Mapping
to
CIDOC
CRM
– Data
DescripAon
– Common
Language
Archaeological Object
E22 Man-made Object
Excavation/Survey
E7 Activity
P24B changed ownership through
DSCU, DSCS: Finding Place
E53 Place
P7 took place at
DSCF, DSCA, RCGA,:
Excavation responsibles
E39 Actor
DSCT, RCGE:
Motivation
E17 Activity
P14 carried out by
P17 was motivated by
SCAN: Excavation Name
E41 Appellation
P57 is identified by
DSCD RCGD:
Excavation Date
E52 Time Span
P4 has time-span
DSCM, RCGM: Method
E55 Type
P32 used general technique
[Open Vocabulary]
"Stratigraphic"
"Open Area"
...
[Open Vocabulary]
"Rescue Archaeology"
"Photo Interpretation"
...
TCL:
Type = "Finding"
NCUN, DSCI: Identifiers
E42 Identifier
P1 is identified by
[DSC Authority File]
OBJECT FINDING
E8 Acquisition
P117 occurs during
20. ACDM
to
CIDOC
CRM:
Data
conversion
• All
ACDM
informaAon
in
ARIADNE
Catalog
• Exported
from
ARIADNE
Registry
• ACDM/XML
format
• Uploaded
in
3M
as
source
data
21. ACDM
to
CIDOC
CRM:
Data
conversion
• Schema-‐to-‐Schema
mapping
applied
• 3M
TransformaAon
Engine
• CIDOC
CRM
encoding
• CIDOC
CRM/RDF
Format
• Ready
for
PARTHENOS
22. ARIADNE
is
funded
by
the
European
Commission's
Seventh
Framework
Programme
----------------------------------------------------------------------------------
Metatada
Repository
CIDOC
CRM
Content
Providers
Integra4on
&
Interoperability
XML
OAI-‐PMH
RDF
IntegraAon
Layer
– Common
semanAc
representaAon
(mapping)
– Data
transparency
– Data
peculiarity
preserved
by
the
system
24. • CRMinf
v0.7:
who
said
that?
–
from
data
to
knowledge
– integraAng
data
with
their
scholarly
jusAficaAon
– being
validated
with
scholarly
annotaAons
• CRMsci
v1.2.2:
a
ScienAfic
ObservaAon
model
– generalizes
over
INSPIRE,
OBOE,
SEEK,
Darwin
Core
– generalizes
concepts
of
units
of
maher
and
their
“(physical)
genesis”
– introduces
concept
of
observaAon
and
data
evaluaAon
– validated
in
archeology,
biodiversity
and
geology
• CRMba
v1.3:
buildings
archaeology
– introduces
concepts
of
buildings
– Will
be
integrated
with
CRMarchaeo
• CRMarchaeo
v1.2.1:
an
ExcavaAon
model
– introduces
concepts
of
straAgraphy
and
excavaAon
– being
validated
by
archaeological
records
• CRMgeo
v1.2:
a
SpaAotemporal
model
– integrates
CRM
with
OGC
standards
– a
complete
model
of
phenomena
occupying
spaceAme
(consistent
with
modern
physics)
– integrates
geometry-‐
and
semanAcs-‐derived
topological
relaAons
– core
concepts
being
integrated
into
CRM
• CRMdig
v3.2:
a
model
of
DigiAzaAon
processes
– validated
in
European
&
US
projects,
to
be
adapted
to
CRMsci
ARIADNE
Reference
Model
v1.0
24
25. Item
Level
IntegraAon
• Goal:
AggregaAon
and
integraAon
of
a
set
of
diverse
datasets
to
prove
that
it
is
possible
to
create
a
rich
common
repository
at
a
data
item
level
• Use
Case:
IntegraAon
of
heterogeneous
datasets
containing
informaAon
about
coins
• Involved
partners:
CNR,
FORTH,
PIN,
DAI
26. Case
Studies
Ø NumismaAcs
• tradiAonal
science
with
experience
and
iniAaAves
in
standardizaAon
so
it
was
chosen
as
a
very
good
starAng
point
for
item-‐level
integraAon
• Nomisma.org
serves
as
a
authoritaAve
resource
Ø Wood/Dendrochronology
• integraAon
of
informaAon
from
diverse
datasets
and
(via
NLP)
archaeological
reports
in
different
languages
• GeVy
AAT
serves
as
an
authoritaAve
resource
Ø Sculptures
• data
integraAon
of
sources
from
various
disciplines
including
sculpture
informaAon
and
its
archaeological
context.
• focuses
on
the
provenance
of
informaAon
according
to
bibliographic
references
which
leads
to
advanced
literature
research
27. NumismaAcs
Case
Study
Extracts
of
5
diverse
databases
&
datasets:
Ø OEAW:
dFMRO
coin
archive
72
records
Ø COINS
Project:
SAR
Archive
627
records
Ø COINS
Project:
FWM
Archive
Ø iDAI
Coins
Pergamon
517
records
Ø CultureItalia:
MuseiD-‐Italia
25562
records
Ø NLP
data
from
Heslington
East
ExcavaAon
Archive
37
records
Ø ACDM
records
28. Research
quesAons
• Origin
-‐
Where
does
this
coin
come
from?
• Tracking
-‐
How
did
it
arrive
here?
• Chronology
-‐
First/last
appearance
• Prac4cal/symbolic
value,
incidents
-‐
Why
is
it
deposited
here?
• Poli4cal
message
-‐
Why
was
it
produced
(i.e.
"minted")?
• Economic
stability,
power
-‐
Why
was
it
widely
used
/
not
used?
• Sta4s4cs
-‐
Material
versus
nominal
value
29. Research
quesAons
There
exist
several
queries
that
are
trivial
to
be
answered
by
each
dataset
separately,
however
they
become
important
if
they
can
be
answered
by
the
aggregated
repository:
• Find
coins
minted
in
the
same
place/area
or
by
the
same
authority
• Find
coins
produced
in
the
same
period
or
Ame
span
(typically
the
same
century
or
half/quarter
century)
• Find
coins
having
common
shape/iconography/inscripAons
• Find
coins
made
by
a
specific
material
31. Wood/Dendrochronology
Case
Study
• Extracts
of
5
archaeological
datasets,
output
from
NLP
on
25
grey
literature
reports
• MulAlingual
-‐
English,
Dutch
and
Swedish
data
• Data
integraAon
via
CIDOC
CRM
and
Gehy
AAT
• 1.09
million
RDF
triples
• 23,594
records
• 37,935
objects
• DemonstraAon
query
builder
for
easier
cross-‐search
and
browse
of
integrated
datasets
32. Wood/Dendrochronology
Case
Study
SPARQL
queries
DemonstraAon
applicaAon:
Query
Builder
DCCD
RDF
triple
store
ADS,
DANS,
SND
Gehy
AAT
(RDF)
VAG
cruck
NMS
VAG
dendro
UNID
XML
NLP
Direct
import
TransformaAon
(STELETO)
Cleansing
+
NormalisaAon
(OpenRefine)
tabular
records
TransformaAon
(STELETO)
Grey
literature
Archaeological
datasets
tabular
records
TransformaAon
(XSLT)
33. Sculptures
Case
Study
• Extracts
of
5
diverse
databases
&
datasets:
– Archaeological
object
database:
Arachne
– Field
research
databases:
Athenian
Agora,
iDAI.field
– Museum
data:
BriAsh
Museum
– Research
data:
Oxford
Roman
Economy
Project
• Data
integraAon
via
CIDOC
CRM
and
controlled
vocabularies:
Gehy
AAT,
Wikidata,
Zenon,
iDAI.gazeheer
• 5,44
million
triples
• 58343
records
37.
Mapping
provider
records
to
CRM
Ini4al
DAI
record
(parAal)
<ROW
MODID="9"
RECORDID="310">
…
<PS_MuenzeID>103361</PS_MuenzeID>
<Erhaltung_Durchmesser>18,50</Erhaltung_Durchmesser>
<Erhaltung_Gewicht>5,7</Erhaltung_Gewicht>
<Erhaltung_Prozent>100</Erhaltung_Prozent>
<Erhaltung_Staerke>4,85</Erhaltung_Staerke>
<Funddatum>18.9.2010</Funddatum>
<GrobdaAerung>hellenisAsch</GrobdaAerung>
<Metall>Bronze</Metall>
…
</ROW>
<crm:E22_Man-‐Made_Object
rdf:about="hhps://www.dainst.org/COIN/103361">
<crm:P50_has_current_keeper
rdf:resource="hhps://www.dainst.org/"/>
<crm:P108i_was_produced_by>
<crm:E12_ProducAon
rdf:about="urn:uuid:eb011a7f-‐
b5f1-‐4aab-‐9bd8-‐4a07f4f008ea“
<crm:P43_has_dimension>
<crm:E54_Dimension
rdf:about="urn:uuid:…-‐c6f3d6989c9d">
<crm:P90_has_value>5,7</crm:P90_has_value>
<crm:P91_has_unit
rdf:resource=“…/measurement
units/gr"/>
<crm:P2_has_type
rdf:resource=“…/dimensions/weight"/>
<rdfs:label>weight
of
coin
103361</rdfs:label>
</crm:E54_Dimension>
<crm:P45_consists_of
rdf:resource="hhps://www.dainst.org/material/Bronze"/>
</crm:E22_Man-‐Made_Object>
Transformed
DAI
record
(parAal)
38. Integrated
Knowledge
Repository
Experimental
integrated
knowledge
repository
in
Blazegraph
Ø NumismaAcs
Case
Sudy
1,2M
triples
Ø Wood/dendro
Case
Study
1,5M
triples
Ø Sculptures
Case
Study
5,5
M
triples
Ø AAT
4,4M
triples
Total
~
13M
triples
Contains
different
levels
of
informaAon:
Ø Item
specific
informaAon
Ø Document
research
data
Ø NLP
data
Ø Catalog
informaAon
39. Research
quesAons
Ø Query
mechanisms
support
innovaAve
reasoning
on
archaeological
datasets
Ø Query
power
lies
in
relaAng
and
combining
Ø data
from
different
providers,
preserving
the
original
meaning
and
their
perspecAve
Ø item
level
with
catalog
info
on
archaeological
datasets
40. Research
quesAons
Ø Find
all
bronze
coins
(item
level
info,
retrieves
datasets
from
mulAple
providers)
Ø Find
the
publishers
of
all
collecAons
that
contain
coins
(catalog
info)
Ø Find
all
datasets
and
grey
literature
reports
that
contain
bronze
antonianus
(item
level,
NLP
data
and
catalog
info)
SAR
records
NLP
record
CulturaItalia
records
DAI
record
OEAW
records
Catalog
info
41. Integrated
Repository
Experimental
integrated
repository
in
Blazegraph
Ø dFMRO
72
records
(all
Roman
coins)
Ø SAR
627
records
(all
Roman
coins)
Ø Pergamon
517
records
(12
Roman
coins
–
1
empty
record)
Ø MuseiD-‐Italia
2
records
Ø NLP
data
from
Heslington
East
ExcavaAon
Archive
37
records
Ø ACDM
2
records
(OEAW,
Heslington)
42. Terminology
Ø Provider
specific
terminology
Ø ARIADNE
specific
terminology
Ø GeVy
AAT
Ø Nomisma.org
• Nomisma.org
is
the
standard,
nearly
everyone
is
referring
to
in
the
numismaAcs
• Normalized
vocabulary
with
references,
f.e.
to
Gehy
AAT
• Ontology,
which
is
used
for
data
integraAon
of
coin
databases
43. Research
quesAons
Different
levels
of
informaAon:
• Item
specific
info
• Catalog
info
Query
power
lies
in
combining
item
level
with
catalog
info:
• Find
all
bronze
antoninianus
coins
(item
level
info,
retrieves
datasets
from
mulAple
providers)
• Find
the
publishers
of
all
collecAons
that
contain
coins
(catalog
info)
• Find
the
publishers
of
all
collecAons
that
contain
bronze
antoninianus
(item
level
and
catalog
info)
44. Queries
Query
to
find
the
contributor
of
a
coin
(produced
with
NLP)
through
the
catalog
SELECT
?thing
?contributor
WHERE
{{
{?thing
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/P67i_is_referred_to_by>
?s1.
?s1
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/P148i_is_component_of>
?d1.
?d1
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/P148i_is_component_of>
?d2.
?catalog
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/P129_is_about>
?d2.
?catalog
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/P94i_was_created_by>
?
creaAon.
?creaAon
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/P11_had_parAcipant>
?
contributor.
}}}
45. Queries
Query
to
find
the
owner
of
a
coin
through
the
catalog
SELECT
?thing
?owner
WHERE
{
?thing
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/
P67i_is_referred_to_by>
?param.
?param
<hhp://www.cidoc-‐crm.org/cidoc-‐crm/
P52_has_current_owner>
?owner.
}
hhp://www.oeaw.ac.at/COIN/626
46. ARIADNE
is
funded
by
the
European
Commission's
Seventh
Framework
Programme
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
SemanAc
Repository
Registry
ACDM
CIDOC
CRM
Content
Providers
Integra4on
&
Interoperability
Integra4on
Services
ConfiguraAon
&
Management
ARIADNE
Portal
Browse/Query
Interfaces
Vocabularies
CRM/RDF
OAI-‐PMH
XML
Metadata
Enhancement
Data
+
Metadata
Data
+
Metadata
Data
+
Metadata
Data
+
Metadata
Resource
Discovery
Preview
PreservaAon
Data
Access
(SPARQL,
REST)
Archive
Discovery
Digital
Asset
Management
Dataset
Discovery
Vocabularies
Management
WEB
LOD
Repository
and
Services
Architecture
47. Final
consideraAons
• Very
advanced
stage
of
development
– End
of
the
project
• ARIADNE
main
goal
– “IntegraAon
of
exisAng
archaeological
research
data
infrastructure
through
new
and
powerful
technologies”
(ARIADNE
DoW)
• “From
differences
results
the
most
beauAful
harmony”(Heraclitus
of
Ephesus)
48. ARIADNE
is
a
project
funded
by
the
European
Commission
under
the
Community’s
Seventh
Framework
Programme,
contract
no.
FP7-‐
INFRASTRUCTURES-‐2012-‐1-‐313193.
The
views
and
opinions
expressed
in
this
presentaAon
are
the
sole
responsibility
of
the
authors
and
do
not
necessarily
reflect
the
views
of
the
European
Commission.
Thank
you
…