To be useful, Linked Open Data requires shared identities and the reuse of their identifiers (URIs). This presentation argues that exact identity matching is both theoretically and practically impossible, and proposes some practical considerations for how to create an actual web of data.
Presented as invited seminar at UC Berkeley, February 24th, 2017
4. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Linked
Open
Data’s
Potential
Linked
Open
Data
achieves
its
potential
when
institutions:
• link
outside
of
their
own
data
(⭐⭐⭐⭐⭐),
• trust
other
organizations
to
manage,
publish
and
maintain
data
which
they
use
5. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Linked
Open
Data’s
Challenges
Commonly
cited:
• Amount
of
data
to
transform
• Data
is
mostly
“strings”,
not
“things”
• Cost
of
new
management
system
• Cost
of
new
business
workflows
• Difficulty
of
data
enrichment
• Institutional
reluctance
to
embrace
change,
trust,
imperfection
6. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Identity
We
need
to
understand
the
entity
before
we
can
reuse
its
identifier!
Questions:
1. What
constitutes
“identity”?
2. How
does
one
describe
entities?
3. How
does
one
discover
identifiers?
7. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
LOD
Identity
Fundamentals
• Open
World
Assumption
• What
is
not
stated
is
unknown,
not
false
• No
single
agent
or
observer
has
complete
knowledge
in
a
distributed
system
• Identifier
space
is
infinite
• No
formal
character
limit
for
IRIs
• Even
practical
limit
is
very
large
(65536
^ length)
8. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
LOD
Identity
Fundamentals
• IRIs
are
globally
unique
• IRIs
used
for
identifying
entities
and
relationships
• No
identity
for
instance
of
a
relationship
• Only
one
contextual
identity
(named
graph)
per
statement,
with
inconsistent
use
• Anyone
may
make
assertions
about
any
entity
10. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
http://www.getty.edu/art/collection/objects/249050/
…
some
Philosophy
• RDF
falls
in
Plato’s
“Universals”
space
• Same
relationship
had
by
many
entities
• No
relationship
instances
• Fictional
entities
and
relationships
ok
12. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Open
World
Ramifications
If
we
know
that
a owl:sameAs b
And
discover
that
a property x
Then
we
know
that
b property x
The
rule
is
an
effect
of
identity,
it
doesn’t
help
us
determine
identity.
14. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Open
World
Ramifications
(1)
Uh-‐oh!
• There
are
infinite
(potential)
properties
• We
cannot
compute
indiscernibility
as
the
for
loop
on
the
properties
would
run
forever
len(Ψ)
= ∞
Indiscernibility:
(∀ P ∈ Ψ)(P(a)
=
P(b))
→
a
=
b
15. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Open
World
Ramifications
(2)
Uh-‐oh!!
• There
are
infinite
(potential)
properties
• [Imagine
the
loop
could
run
in
zero
time]
• Any
different
property
would
prevent
identity
• The
likelihood
of
encountering
indiscernibles is
1/∞
…
or
0
16. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Open
World
Ramifications
(3)
Uh-‐oh!!!
• There
are
infinite
(potential)
properties
• Any
property
not
asserted
is
just
not
known
locally
and
could
be
known
elsewhere
• To
compute,
you
need
complete
knowledge
of
an
infinite
set
of
instances
and
infinite
properties,
and
zero
cost
comparison.
18. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Escaping
the
Infinite
Loop?
Still
need
the
big-‐triplestore-‐in-‐the-‐sky
with
all
assertions
from
all
publishers.
Answer:
Google
can
do
it!
Google,
will
you
run
a
big
triplestore for
us?
24. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
More
Properties,
More
Identity?
Identity
is
a
relationship
that
admits
of
degree:
• Less
than
100%
identity
is
resemblance
• The
more
resemblance,
the
more
certain
the
identity
relation
skos:exactMatch
• “high
degree
of
confidence
that
the
concepts
can
be
used
interchangeably
across
a
wide
range
of
applications”
skos:closeMatch
• “sufficiently
similar
that
they
can
be
used
interchangeably
in some applications”
26. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Resemblance?
• Given
“sufficient
resemblance”,
we
can
conclude
identity
for
practical
purposes
• Resemblance
is
via
shared
properties
• To
compute
resemblance,
we
must
understand
the
properties
shared
by
candidate
entities
• Properties
are
given
as
predicates
in
LOD
• Need
for
shared
ontology?
27. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Porridge
Too
Hot?
Too
Cold?
Too
few
properties:
• Sufficiency
of
resemblance
impossible
Too
many
properties:
• Amount
of
information
overwhelming
• More
likely
to
run
into
incompatible
properties
28. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
_:Porridge
crm:P51_has_former_or_current_owner
_:Papa
Bear?
Understanding
can
then
be
increased
by
not
only
looking
at
the
one
entity,
but
where
it
fits
within
the
graph
of
connected
entities.
Now
you
have
many
resemblance
problems.
29. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Graphs
unlikely
to
have
the
same
shape,
even
with
a
shared
ontology.
Different
organizations:
• know
different
information
• are
from
different
domains
• have
different
foci
• have
different
contexts
for
the
work
Graph
Isomorphism
31. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Every
Identity,
its
Ontology
In
the
absence
of
continuous
community
pressure,
demonstration
of
value,
and
in-‐
house
expertise,
even
well-‐intentioned
organizations
will
create
their
own
identities
and
ontologies
for
describing
entities.
32. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Cultural
Heritage
Sector
• Getty
ULAN
• Library
of
Congress
NAF
• Bibliotheque nationale de
France
• Deutsche
National
Bibliothek
• British
Library
• ISNI
• VIAF
• SNAC
• …
Example:
Lewis
Carroll
Industry
• MusicBrainz (LinkedBrainz)
• IMDB
(LinkedMDB)
• DBPedia
• WikiData
• Google
/
Freebase
• Genealogics
• Quora
• ReadSocial
• …
34. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
We
could
stop
requiring
perfection
in
our
use
of
others’
data:
• skos:exactMatch,
not
owl:sameAs
• Data
that
is
good
enough
• And
contribute
improvements!
• Persistence,
not
Permanence
• Target
is
Comprehension,
not
Inference
Perfect
is
the
Enemy
of
the
Good
35. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
We
could
publish
a
set
of
rules
per
class
by
which
sufficiency
of
resemblance
can
be
determined:
• Which
properties
must
overlap?
• Which
properties
must
be
exactly
the
same?
• Which
properties
can
be
ignored?
• Which
relationships
must
match?
Sufficiency
of
Resemblance
36. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
We
could
publish
services
to
make
it
easier
to
discover
and
reconcile
our
identities:
• Auto-‐complete
/
type-‐ahead
• Open
Refine
reconciliation
• Embeddable
widgets
Resemblance
Services
37. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
We
could
contribute
to
shared
infrastructure
for
discovery
and
change
management:
• Shared
infrastructure,
decentralized
publication
• Notifications
when
data
changes
• Notifications
when
identities
are
used
• With
links
back
from
the
identity
• Separate
publishing
/
discovery
concerns
Shared
Infrastructure
45. @azaroth42
rsanderson
@getty.edu
IIIF:
Interoperabilituy
Every
Identity,
Its
Ontology
@azaroth42
rsanderson
@getty.edu
Linked
Open
Usable
Data!
• Strict
identity
matching
is
impossible
• Target
is
skos:exactMatch,
not
owl:sameAs
• Shared
ontologies
are
more
important
than
precision
• Target
is
comprehension,
not
inference
• Build
services
&
infrastructure
to
enable
reconciliation
• Target
audience
of
LOD
is
Developers
Pick
Usable
not
Perfect!