Europeana is a digital platform containing over 58 million digitized cultural heritage objects from 3,700 institutions across 44 countries. The document discusses Europeana's efforts to improve semantic interoperability between these diverse datasets by developing the Europeana Data Model, enriching metadata by linking to external vocabularies, and building an Entity Collection and API to provide centralized access to contextual information about places, people, concepts, and organizations. The goal is to enable richer discovery, exploration, and reuse of Europeana's cultural heritage data on the web.
1. Semantic Interoperability at
Europeana
Antoine Isaac
with slides from Hugo Manguinhas, Valentine Charles, Nuno Freire,
Juliane Stiller
Workshop on Semantic Interoperability for Multilingual DSIs
Brussels, 18 October 2018
2. Title here
CC BY-SACC BY-SA
Europeana is a rather big digital
culture effort
58 million digitized objects, from 3,700 institutions in 44 countries
3. Title here
CC BY-SACC BY-SA
The data Europeana holds
● Descriptive and technical metadata
● Thumbnails
As a rule, content is still served from our data partners
● Some content for specific projects
● newspapers text and images
● user-generated content (Europeana 1914-1918)
4. Title here
CC BY-SACC BY-SA
A network of data partners
● Data providers: Cultural heritage institutions providing content and metadata
to Europeana
● "Intermediate” Aggregators:
organizations or projects gathering
metadata and content for institutions
from a specific country, sector, or on a
specific domain (music, archaeology,
theater…) and making it available for
Europeana and other data consumers
5. Title here
CC BY-SACC BY-SA
Title here
CC BY-SA
Quality issues
Europeana Essentials
CC BY-SACC BY-SA
6. Title here
CC BY-SACC BY-SA
Europeana is diverse
58 million digitized objects, from 3,700 institutions in 44
countries
● Many different themes and types of objects
Books, newspapers, journals, letters, diaries, archival papers, paintings, maps, drawings, photographs,
music, spoken word, radio broadcasts, film, newsreels, television, fashion, sculpture, 3D objects, and
more
● Libraries, archives, museums have different ways to describe
objects. Even within a sector, big differences can be observed
● Heterogeneity makes quality issues worse
7. Title here
CC BY-SA
Multilinguism
Europeana Essentials
CC BY-SACC BY-SA
● Officially we get metadata in 44 languages
● But there are more languages used in individual
metadata fields
8. Title here
CC BY-SA
Europeana Essentials
CC BY-SACC BY-SA
Work by Péter Kiraly (Göttingen Research alliance)
http://144.76.218.178/europeana-
qa/languages.php?collectionId=all&field=aggregated
9. Title here
CC BY-SA
Europeana Essentials
CC BY-SACC BY-SA
Work by Péter Kiraly (Göttingen Research alliance)
http://144.76.218.178/europeana-
qa/languages.php?collectionId=all&field=aggregated
10. Title here
CC BY-SA
Multilinguism
Europeana Essentials
CC BY-SACC BY-SA
● Officially we get metadata in 44 languages
● But there are more languages used in individual
metadata fields
• Over 400 language codes
• E.g., 6 values in x-aramaic-latn - not a valid code by the way
• But the most common case is lack of language information!
11. France, Public Domain
1932, National Library of France
Agence de presse Mondial Photo-Presse.
Tournoi royal de motos à Londres :
changement d'une roue de side-car en
marche
How to make these
data work together?
1. Data Modeling for
interoperability and
richer data
12. Title here
CC BY-SACC BY-SA
Metadata conversion flows
● Mappings of metadata: the metadata comes to Europeana after one or two
(expert-crafted) mappings to "interoperability formats".
13. Title here
CC BY-SACC BY-SA
Following the Linked Open Data principles
http://vimeo.com/36752317
14. Title here
CC BY-SACC BY-SACC BY-SA
• To develop the open data ecosystem, facilitating better
communication between developers and publishers;
• To provide guidance to publishers, promoting the re-use of data;
• To foster trust in the data among developers
Data on the Web
Best Practices
Working Group
https://www.w3.org/2013/dwbp/
15. CC BY-SA
• Use terms from shared vocabularies, preferably standardized
ones
• Check that classes, properties, terms, elements or attributes
used to represent a dataset do not replicate those defined by
vocabularies used for other datasets.
• Or if you have to replicate, indicate mappings clearly
BP 15: Reuse vocabularies, preferably
standardized ones
16. CC BY-SA
• Accept that precise specs can enable automated reasoning but that
complex vocabularies require more effort to produce and hamper
reuse of data
• Minimize ontological commitment of your vocabulary – or seek to
minimize the commitment of others’ vocabularies
• Check examples of “softer” specs, e.g. Schema.org or SKOS
BP 16: Choose the right formalization
level
17. The Europeana Data Model (EDM)
CC BY-SA
An RDF-based model that reuses many vocabularies:
• DC
• SKOS
• OAI-ORE
• Web Annotation
• RDA
• FOAF
• WGS84
• ccRel
• ODRL/POE
• CIDOC CRM
• EBUcore
• DOAP
• SVCS
• DCAT
• ADMS…
W3C Data on the Web BP – Data Vocabularies
Complete list of elements at
https://github.com/europeana/corelib/wiki/EDMObjectTemplatesEuropeana
18. Title here
CC BY-SA
Title here
CC BY-SA
Europeana Essentials
CC BY-SA
A basic EDM example
CC BY-SA
Clavecin, Bartolomeo Cristofori
Cite de la Musique,
MIMO - Musical Instruments Museums Online|CC BY-NC-SA
Europeana Data Model example
19. Title here
CC BY-SACC BY-SA
A community driven model
• Involving experts from libraries, archives, museums and academics
• Adopting a collaborative, softer form of standardization
• Documenting the base model refering to community extensions
http://pro.europeana.eu/europeana-tech
Europeana Assembly General Meeting, Rijksmuseum,
Amsterdam, 2015
20. Title here
CC BY-SACC BY-SA
Extension in DM2E project (Digital Manuscripts to Europeana)
http://onto.dm2e.eu/schemas/dm2e
EDM enables specialization of classes
and properties.
This allows partners to define
extensions answering the needs of
specific communities.
Different semantic grains
21. (Some of) what it takes
CC BY-SA
• Re-using is easier when one has a cool-head approach to semantics
• Flexibility is required: we sometimes changed definitions because we
had some semantic overcommitment
22. Title here
CC BY-SACC BY-SA
Semantic interoperability also
requires a general effort on quality
We have set up a Data Quality Committee working on
recommendations for the community on:
○ Mandatory metadata elements for ingestion of EDM data
○ Metadata checking and normalization
○ Meaningful metadata values (in the context of use)
○ Coordination with other quality-related initiatives
http://pro.europeana.eu/get-involved/europeana-tech/data-quality-committee
23. How to make these data
work together?
2. Enriching metadata
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
24. Title here
CC BY-SACC BY-SA
Europeana Linked Data Strategy
Our lines of work
CC BY-SA
• The Europeana Data Model (EDM) offers a base for linking
metadata
• Aims at providing data as resources (with URIs!), not only strings
• Enables the development of a multilingual data environment
• We apply automatic enrichment to link object metadata to
reference datasets
• We encourage data providers to contribute their own links to
vocabularies
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
25. Title here
CC BY-SACC BY-SA
CC BY-SA
Thumbnail
Descriptive Metadata
Link to data
provider
Rights
27. Title here
CC BY-SACC BY-SA
Warning: multilingual enrichment is not
easy
Poisonous India or the Importance of a Semantic and Multilingual
Enrichment Strategy
Marlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012
http://link.springer.com/chapter/10.1007%2F978-3-642-35233-
1_25
28. Title here
CC BY-SACC BY-SA
Building a network of contextual
information
Europeana grows a “Semantic Layer” linking to contextual resources (e.g.
concepts, persons, places).
Diagram by Stefan Gradmann
29. Title here
CC BY-SACC BY-SA
Contextual entities in EDM
edm:Agent
foaf:name
skos:altLabel
rdaGr2:biographicalInformation
rdaGr2:dateOfBirth
skos:Concept
skos:prefLabel
skos:altLabel
skos:broader
skos:related
skos:definition….
edm:TimeSpan
skos:prefLabel
dcterms:isPartOf
edm:begin
edm:end
….
edm:Place
wgs84_pos:lat
wgs84_pos:long
skos:prefLabel
skos:note
dcterms:isPartOf….
Representing (real-world) entities related to an object
as fully fledged resources, not just strings
30. Europeana Essentials
CC BY-SA
Example: a concept from a
specialized thesaurus (MIMO)
CC BY-SA
Clavecin, Bartolomeo Cristofori
Cite de la Musique,
MIMO - Musical Instruments Museums Online|CC BY-NC-SA
31. Title here
CC BY-SACC BY-SA
Example: an AAT concept in EDM
edm:ProvidedCHO
Hourglass
urn:imss:instrument:401058
dc:type
skos:Concept
http://vocab.getty.edu/aat/3
00198626
skos:prefLabel
skos:prefLabel
skos:prefLabel
hourglasses@en
uurglazen@nl
reloj de las
horas@es
skos:broader
http://vocab.getty.edu/aat/300206197
=sandglasses
32. Title here
CC BY-SACC BY-SA
Europeana Linked Data Strategy
LOD Vocabularies currently recognized by Europeana in providers'
metadata
CC BY-SA
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
Vocabulary URL
MIMO Concepts http://www.mimo-db.eu/
MIMO Instrument makers http://www.mimo-db.eu/
The Getty - Art & Architecture Thesaurus (AAT) http://vocab.getty.edu/
The Getty - Union List of Artist Names (ULAN) http://vocab.getty.edu/
Virtual International Authority File (VIAF) http://viaf.org/viaf/
Geonames http://sws.geonames.org/
IconClass http://iconclass.org/
Gemeinsame Normdatei (GND) http://d-nb.info/gnd
Israel Museum Jerusalem Concepts http://www.imj.org.il/imagine/thesaurus/objects/
Partage Plus concepts http://partage.vocnet.org/
data.europeana.eu WWI Concepts from Library of Congress
Subject Headings (LCSH) http://data.europeana.eu/concept/loc
Europeana Sounds Genres http://data.europeana.eu/concept/soundgenres/
EAGLE Material & Object Type http://www.eagle-network.eu/voc/
DISMARC Formats & Genres http://purl.org/dismarc/ns/
UDC http://udcdata.info/rdf/
UNESCO Thesaurus http://vocabularies.unesco.org/thesaurus/
33. Title here
CC BY-SACC BY-SA
Europeana Linked Data Strategy
Our lines of work
CC BY-SA
• The Europeana Data Model (EDM) offers a base for linking
metadata
• We apply automatic enrichment to link object metadata to
reference datasets
• We encourage data providers to contribute their own links to
vocabularies
• We encourage alignment activities between domain
vocabularies
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
34. Title here
CC BY-SACC BY-SA
Encouraging (semi-) automatic vocabulary alignment
CC BY-SA
http://cultuurlink.beeldengeluid.nl
35. Title here
CC BY-SACC BY-SA
The Europeana Entity
Collection and API
Netherlands, Public Domain
1660 - 1625, Rijksmuseum
Anonymous
Arrival of a Portuguese ship
36. Title here
CC BY-SACC BY-SA
Europeana Linked Data Strategy
A strategy for Entities
CC BY-SA
We are building an "Entity Collection"
• A service that acts as a centralized point of reference and access to
data about contextual entities: places, agents (persons and
organizations), concepts...
• Caching and curating data from the wider Linked Open Data cloud
• A sort of Europeana "knowledge graph" with an API
• A service can be re-used by everyone in our community
37. Title here
CC BY-SACC BY-SA
Uses cases for the Entity Collection (1/2)
CC BY-SA
Improve user experience on Europeana services
● Findability: users can search with and for people, places and subjects, not only objects. In many
more languages, and with less ambiguity
● Contextualization: users see contextual information about cultural heritage objects. Entity Pages
group and present all assertions about an entity
● Exploration: Browsing along relationships between objects and entities and between entities
Semantic auto-
completion
Entity Pages Entity based facets
Europeana Food & Drink
Project
38. Title here
CC BY-SACC BY-SA
Uses cases for the Entity Collection (2/2)
CC BY-SA
Crowdsourcing
● Objects can be annotated with references to
entities of their context
Automatic enrichment of providers' metadata
● A controlled vocabulary to help recognize references to entities
Republication for reuse
● Entities can be republished as an open source to the community
Semantic and
Metadata annotations
Pundit Annotation Client
from Digital Manuscripts to
Euiropeana (DM2E)
39. Data currently in the Entity Collection
CC BY-SA
Mostly corresponding to a selection made for Europeana's Semantic
Enrichment
• Places
a subset of Geonames, corresponding to places which are part of European
countries and of some specific feature classes.
• Agents
a subset of DBpedia corresponding to most of the instances of dbp:Artist
with some exceptions, and integrated from 49 DBpedia language editions.
• Concepts
a subset of DBpedia corresponding to a selection concepts matching the
needs from Europeana Collections (e.g., WWI battles).
Europeana Sounds music genres (obtained from Wikidata)
Photo Consortium's photography vocabulary
• Organizations
Extracted from Europeana's CRM and aligned to Wikidata when possible
216,302
resources
1,572
resources
165,005
resources
1,077
resources
40. Title here
CC BY-SACC BY-SA
The Entity Collection
Contribution to multilingual coverage
Entities effectively used to enrich Europeana Objects
Entities present in the Entity Collection
41. Selecting data sources
CC BY-SA
An intellectual effort by data experts, leveraging the following criteria:
• Availability and access: open license, published on the web as linked
data
• Granularity, size and coverage: multilingual data, helping to answer key
user needs for Europeana's CH collections. Too generic or large datasets
can create too much ambiguity for the simple processes we have (e.g.
enrichment)
• Quality: intrinsic aspects like correctness of representation (data
structures)
• Connectivity: good data sources are well-connected internally and
externally to other datasets
42. The Entity Collection and API
DBpedia resource for “Mozart” in our data
CC BY-SA
Coreference links to 6 other
datasets
(e.g. Freebase, Wikidata)
Inter-linking information
Preferred labels for 48
languages
43. Entity API - suggest method
CC BY-SA
/entities/suggest.json?text=neo
44. The Europeana Entity Collection – Where we stand
CC BY-SA
• We've made enough progress to release a first version of the
Entity Collection and its API, used in Europeana's production
services.
• But there are still challenges and decisions to ensure
consistency and relevance over time:
• Expand data coverage with new data sources for, e.g., events
• Employ the EC to better enrich Europeana object metadata
• Enhance discoverability, especially for search engines, e.g. via
Schema.org publication
45. Title here
CC BY-SACC BY-SA
Title here
CC BY-SA
Name of image | Creator
Providing organization|
Country, licence
Name of image | Creator
Providing organization| Country, licence
antoine.isaac@europeana.eu
@antoine_isaac