Presentation for the 3rd Workshop on Humanities in the Semantic Web (WHiSe), co-located with the 15th Extended Semantic Web Conference (ESWC 2020)
June 2, 2020, online
http://whise.cc/2020/
Scaling API-first – The story of a global engineering organization
Europeana as a Linked Data (Quality) case
1. Europeana as a Linked Data
(Quality) case
Antoine Isaac
with slides from Hugo Manguinhas, Valentine Charles, Juliane Stiller,
Mónica Marrero and other colleagues
3rd Workshop on Humanities in the Semantic Web (WHiSe)
Co-located with the 15th Extended Semantic Web Conference (ESWC 2020)
June 2, 2020
2. Outline
CC BY-SA
• Brief intro to Europeana
• Metadata quality challenges
• Using Linked Data technology to make data richer
• Encouraging data enhancements across the board
• How all this fits Research-related efforts
3. Who is Europeana?
CC BY-SA
● A non-profit foundation
● A community of 2400 experts in digital heritage: the
Europeana Network
● A mission: improve access to Europe's digital cultural
heritage
4. What is Europeana?
CC BY-SA
● The European Commission's digital platform for cultural
heritage
● Providing access to over 58M objects from over 3500
museums, libraries, archives
5.
6. What is Europeana?
CC BY-SA
● An Open Data platform providing several services
● Europeana portal: https://europeana.eu
● Europeana APIs: https://pro.europeana.eu/resources/apis
7. How does it work?
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
8. Title here
CC BY-SACC BY-SA
What’s inside Europeana?
Europeana Essentials
CC BY-SACC BY-SA
● Descriptive and technical metadata: title, creator, subject,
rights…
● Editorial content like virtual exhibitions
● (recently started) user-generated metadata, incl.
transcriptions, semantic annotations
● Thumbnails
As a rule, digitized content is served on our partners’
websites
Except for some specific projects
● Newspapers
● WWI user-generated content
9. Data flow in Europeana’s network
Data providers: cultural institutions that provide metadata and links to
digitized content
Aggregators: organizations or projects
that gather data from a specific country
or domain (music, fashion,
archaeology…)
10. France, Public Domain
1932, National Library of France
Agence de presse Mondial Photo-Presse.
Tournoi royal de motos à Londres :
changement d'une roue de side-car en
marche
Data Quality Issues
in Cultural Heritage
Caveat: some
examples have been
already cleaned
12. Title here
CC BY-SA
Heterogeneity
Europeana Essentials
CC BY-SACC BY-SA
58M objects, from 3,500 institutions
● Many different themes and types of objects
Books, newspapers, letters, diaries, archival papers, paintings, maps, drawings, photographs,
music, spoken word, radio broadcasts, film, newsreels, fashion, sculpture, 3D objects, and
more
● Libraries, archives, museums have different ways to describe objects.
Even within a sector, big differences can be observed
13. Title here
CC BY-SA
Multilinguism
Europeana Essentials
CC BY-SACC BY-SA
58M objects, from 44 countries
● Officially we get metadata in 38 languages
● But there are more languages used in individual metadata
fields
14. Title here
CC BY-SA
Multilinguism
Europeana Essentials
CC BY-SACC BY-SA
● Officially we get metadata in 38 languages
● But there are more languages used in individual metadata
fields
• Over 400 language codes
e.g., 6 values in x-aramaic-latn - not a valid code by the way
• The most common case is lack of language information!
15. How to get more
homogeneous, richer &
multilingual data?
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
16. Title here
CC BY-SA
Data modeling for interoperability
and richer metadata
CC BY-SA
● Like many aggregators, we ask our providers to give metadata using
one metadata model: the Europeana Data Model (EDM)
● But we cannot do whatever we like: we do not operate in isolation!
● Our approach must be
○ easy and rewarding for our partners
○ based on community-agreed best practices
17. A community sport
• Involving (technical) experts from libraries, archives, museums and
academics – the EuropeanaTech community
• Adopting a collaborative, softer form of standardization
http://pro.europeana.eu/europeana-tech
Europeana Assembly General Meeting, Rijksmuseum,
Amsterdam, 2015
18. Title here
CC BY-SA
Prior to EDM: flat metadata
records
CC BY-SA
● No links between objects and persons, places…
● Mixing data on real object and digital content
● Causing a lot of mapping quality problems
19. Title here
CC BY-SACC BY-SA
Following Best Practices, such as the
Linked Open Data principles
http://vimeo.com/36752317
20. Massive re-use of vocabularies in EDM
CC BY-SA
Plus
• Web Annotation
• RDA
• WGS84
• EBUcore
• ccRel
• ODRL
• DOAP
• SVCS
• DCAT
• ADMS
…
(sometimes only for one property!)
http://pro.europeana.eu/edm-documentation
EDM in Linked Open vocabularies (LOV)
OAI-ORE FOAF
21. Title here
CC BY-SA
Title here
CC BY-SA
Europeana Essentials
CC BY-SA
Data modeling for interoperability
and richer metadata
CC BY-SA
Clavecin, Bartolomeo Cristofori
Cite de la Musique,
MIMO - Musical Instruments Museums Online|CC BY-NC-SA
http://pro.europeana.eu/edm-documentation
22. Enriching metadata
CC BY-SA
• EDM gives a base for (linking to) multilingual, semantic metadata
• data as resources with web URIs, not only strings
• We encourage data providers to contribute their own links/data to
local or external vocabularies
https://pro.europeana.eu/page/europeana-semantic-enrichment
23. CC BY-SA
LOD Vocabularies currently recognized by Europeana in providers'
metadata
Vocabulary URL
MIMO Concepts http://www.mimo-db.eu/
MIMO Instrument makers http://www.mimo-db.eu/
The Getty - Art & Architecture Thesaurus (AAT) http://vocab.getty.edu/
The Getty - Union List of Artist Names (ULAN) http://vocab.getty.edu/
Virtual International Authority File (VIAF) http://viaf.org/viaf/
Geonames http://sws.geonames.org/
IconClass http://iconclass.org/
Gemeinsame Normdatei (GND) http://d-nb.info/gnd
Israel Museum Jerusalem Concepts http://www.imj.org.il/imagine/thesaurus/objects/
Partage Plus concepts http://partage.vocnet.org/
data.europeana.eu WWI Concepts from Library of Congress
Subject Headings (LCSH) http://data.europeana.eu/concept/loc
Europeana Sounds Genres http://data.europeana.eu/concept/soundgenres/
EAGLE Material & Object Type http://www.eagle-network.eu/voc/
DISMARC Formats & Genres http://purl.org/dismarc/ns/
UDC http://udcdata.info/rdf/
UNESCO Thesaurus http://vocabularies.unesco.org/thesaurus/
YSO General Finnish Ontology https://finto.fi/yso/en/
https://pro.europeana.eu/page/europeana-semantic-enrichment
24. Title here
CC BY-SACC BY-SA
Enriching metadata
CC BY-SA
• EDM gives a base for (linking to) multilingual, semantic metadata
• data as resources with web URIs, not only strings
• We encourage data providers to contribute their own links/data to
local or external vocabularies
• We are going to further develop crowdsourcing/"nichesourcing" of
metadata
• In parallel, we apply automatic enrichment to link object metadata to
reference datasets for places, persons, concepts
https://pro.europeana.eu/page/europeana-semantic-enrichment
26. Title here
CC BY-SACC BY-SA
Enriching metadata –
Contextual Entities
CC BY-SA
We are building an "Entity Collection"
• Centralized point of reference and access to data about contextual
entities: places, agents (persons and organizations), concepts...
• Caching and curating data from the wider Linked Open Data cloud
• A sort of Europeana knowledge graph
• With a dedicated API
https://pro.europeana.eu/page/entity#entity-collection
27. Data currently in the Entity Collection
CC BY-SA
• Places
a subset of Geonames, corresponding to places which are part of
European countries and of some specific feature classes.
• Agents
a subset of DBpedia corresponding to most of the instances of
dbp:Artist with some exceptions, and integrated from 49 DBpedia
language editions.
• Concepts
a subset of DBpedia and Wikidata corresponding to a selection of
concepts matching our needs, e.g., WWI battles, music genres
(Europeana Sounds aggregator) and a photography vocabulary
(Europeana Photography aggregator)
• Organizations
Extracted from Europeana's CRM and aligned to Wikidata when
possible
216,302
resources
1,572
resources
165,005
resources
1,077
resources
https://pro.europeana.eu/page/entity#entity-collection
28. Selecting data sources
CC BY-SA
• Availability and access: open license, published as linked data
• Granularity, size and coverage: multilingual data, with a rather generic
scope. But too generic or too large datasets can create too much ambiguity
for the simple processes we have (e.g., enrichment)
• Quality: intrinsic aspects like correctness of representation
• Connectivity: good data sources are well-connected internally and
externally to other datasets
29. An example
DBpedia resource for “Mozart” in the Entity Collection
CC BY-SA
Coreference links to 6 other
datasets
(e.g. Freebase, Wikidata)
Inter-linking information
Preferred labels for 48
languages
33. Title here
CC BY-SACC BY-SA
Multilingual enrichment is not easy!
Poisonous India or the Importance of a Semantic and Multilingual
Enrichment Strategy
Marlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012
http://link.springer.com/chapter/10.1007%2F978-3-642-35233-
1_25
34. Encouraging everyone
on the way to improve
their data
University Of Edinburgh, CC BY
Roslin Glass Slides, creator unknown
Photograph of two men step cutting on the ice face of
the Tasman Glacier, New Zealand in the late 19th or
early 20th century.
35. Title here
CC BY-SA
Challenges for working on
quality improvement
● Methodological frameworks are not easy to apply
● Getting stakeholders interested is hard for us
● Communication lines are rather long
● It’s a sensitive area
● It’s hard to get users involved
CC BY-SA
36. Title here
CC BY-SA
A general effort on quality
CC BY-SA
We have set up a Data Quality Committee to analyze
quality issues and make recommendations to the
Europeana community about:
○ Mandatory metadata elements
○ Metadata checking and normalization
○ Multilingualism
…
http://pro.europeana.eu/get-involved/europeana-tech/data-quality-committee
39. CC BY-SA
Europeana Publishing Framework: Metadata
languages attributes happy users
(using Europeana portal in their
native language)
links to vocabularies context
(for users browsing Europeana portal by
persons, places, or concepts)
enabling elements visibility
(collections being findable along various
dimensions: by subject, type, creator, date)
40. A community sport, again!
• Involving (technical) experts from libraries, archives, museums and
academics – the EuropeanaTech community
• Adopting a collaborative, softer form of standardization
http://pro.europeana.eu/europeana-tech
Europeana Assembly General Meeting, Rijksmuseum,
Amsterdam, 2015
41. France, Public Domain
1932, National Library of France
Agence de presse Mondial Photo-Presse.
Tournoi royal de motos à Londres :
changement d'une roue de side-car en
marche
Europeana and the
Research
community
43. Europeana & CLARIN
• 180K Europeana sources loaded into
CLARIN’s Virtual Language Observatory,
Europeana now largest provider of
individual metadata records in the VLO
• Selection based on quality, accessability,
processability and reusability
• Full case study at https://bit.ly/2J5w8jc
• Challenge for SW (not new!): generic &
rich models/formats vs. community-
specific & easier to consume
Building partnerships with research infrastructures
Europeana Research
CC BY-SA
44. Title here
CC BY-SA
Semantic Web technology can
help too, here
Europeana is involved in initiatives that can help bridge gaps
● International Image Interoperability Framework (IIIF)
● Not only images : representation of document structures, (linking to)
metadata, etc.
● With a strong focus on research cases (manuscripts, newspapers)
Cf. https://www.slideshare.net/antoineisaac/iiif-and-the-europeana-mission
● Linked Art
● Shared Model based on LOD to describe Art
● Re-using a (LOUD) subset of CIDOC CRM
CC BY-SA
https://iiif.io
https://linked.art
45. Title here
CC BY-SA
Semantic Web technology can
help too, here
● The SW approaches enables to create links between
underlying models and vocabularies
● W3C Web Annotation, CIDOC CRM, EDM
● Vocabularies expressed using SKOS
● Heavy reliance on JSON-LD
● Importance of data patterns
● Linked Open Usable Data - Rob Sanderson (Getty)
● See for example “The Importance of being LOUD”
CC BY-SA
https://www.slideshare.net/azaroth42/the-importance-of-being-loud
48. Helping FAIRification
of Cultural Data
University Of Edinburgh, CC BY
Roslin Glass Slides, creator unknown
Photograph of two men step cutting on the ice face of
the Tasman Glacier, New Zealand in the late 19th or
early 20th century.
49. Title here
CC BY-SA
How do Europeana's data and services
meet the FAIR requirements?
Europeana Essentials
CC BY-SACC BY-SA
Findable
● The Europeana aggregation network partially homogenizes its data via
a shared data model
● Providers and Europeana seek to enrich the data with multilingual,
semantic resources
● We promote persistent identifiers and links across them
● Europeana provides a search engine
● Data is made findable through other platforms (e.g., CLARIN)
https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
50. Title here
CC BY-SA
How do Europeana's data and services
meet the FAIR requirements?
Europeana Essentials
CC BY-SACC BY-SA
Accessible
● Data is published as (Linked Data) web resources
● Freely available, standard web APIs
Interoperable
● Europeana uses a community-based model
● Following best practices, such as mixing and re-using existing data
models and vocabularies
● We promote more open and richer content access protocols (IIIF)
https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
51. Title here
CC BY-SA
How do Europeana's data and services
meet the FAIR requirements?
Europeana Essentials
CC BY-SACC BY-SA
Re-usable
● The conditions for re-using digitized content are made clear, using
shared vocabularies (Creative Commons, RightsStatements.org)
● Metadata is fully open – CC0
● Data model seeks to bridge with other communities’ models, such as
W3C Web Annotation, Schema.org
https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
52. CC BY-SA
• Active in 2014-2016
• To develop the open data ecosystem, facilitating better
communication between developers and publishers;
• To provide guidance to publishers, promoting the re-use
of data;
• To foster trust in the data among developers
• Linked Data, but not only!
Data on the Web
Best Practices
Working Group
https://www.w3.org/2013/dwbp/
53. CC BY-SA
• Use terms from shared vocabularies, preferably standardized ones
• Check that classes, properties, terms, elements or attributes used to
represent a dataset do not replicate those defined by vocabularies used for
other datasets.
• e.g. using the Linked Open Vocabularies repository
• Or if you have to replicate, indicate mappings clearly
Best Practice 15: Reuse vocabularies,
preferably standardized ones
Data on the Web Best Practices W3C Recommendation
54. CC BY-SA
• Accept that (OWL) semantics establish precise specs and can enable
automated reasoning but that complex vocabularies require more effort to
produce and hamper reuse of data
• Minimize ontological commitment of your vocabulary – or seek to minimize the
commitment of others’ vocabularies
• Check that inference does not produce too many statements that are
unnecessary for target applications
• Check examples of “softer” specs, e.g. Schema.org or SKOS
Best Practice 16: Choose the right
formalization level
Data on the Web Best Practices W3C Recommendation
55. Title here
CC BY-SA
Is it perfect?
Europeana Essentials
CC BY-SACC BY-SA
No. In particular we would always like to get more input
from users and researchers (the perspective is very CH-
focused).
But we’re working on it and we hope the situation is better
than if we wouldn't have done anything!
Has Semantic Web technology helped?
YES
56. Want to engage?
Do you want to hear more about these issues? Check coming “Enriching
research – enriching metadata” webinars
Europeana Research has a grants programme to fund events that bring
together cultural heritage and researchers. Check future calls!
Join the Europeana Network
and (one of) its communities!
CC BY-SA
https://www.raa.se/in-english/events-seminars-and-cultural-experiences/workshop-
on-digitised-collections-enriching-research-enriching-metadata/
https://pro.europeana.eu/page/grants-programme
https://pro.europeana.eu
57. Title here
CC BY-SACC BY-SA
Title here
CC BY-SA
Name of image | Creator
Providing organization|
Country, licence
Name of image | Creator
Providing organization| Country, licence
antoine.isaac@europeana.eu
@antoine_isaac