Presentation for the paper "Designing a multilingual knowledge graph as service for cultural heritage" at the DCMI2018 conference https://www.dublincore.org/conferences/2018/abstracts/#559
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Designing a multilingual knowledge graph - DCMI2018
1. Designing a Multilingual Knowledge
Graph as a Service for Cultural
Heritage
Some Challenges and Solutions
Valentine Charles, Hugo Manguinhas, Antoine Isaac - Europeana
Foundation
Nuno Freire - INESC-ID
Sergiu Gordea - AIT
DCMI Conference 2018
2. What is Europeana?
CC BY-SA
We aggregate metadata:
• From all EU countries
• ~3,700 galleries, libraries,
archives and museums
• More than 58M objects
• In more than 40 languages
• Huge amount of references to
places, agents, concepts, time
periods
Europeana aggregation infrastructure
Europeana| CC BY-SA
The Platform for Europe’s Digital Cultural Heritage
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
3. Europeana Linked Data Strategy
Our lines of work
CC BY-SA
• The Europeana Data Model (EDM) offers a base for linking
metadata
• We apply automatic enrichment to link object metadata to
reference datasets
• We encourage data providers to contribute their own links to
vocabularies
• We encourage alignment activities between domain
vocabularies
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
4. Europeana Linked Data Strategy
LOD Vocabularies currently recognized by Europeana in providers'
metadata
CC BY-SA
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
Vocabulary URL
MIMO Concepts http://www.mimo-db.eu/
MIMO Instrument makers http://www.mimo-db.eu/
The Getty - Art & Architecture Thesaurus (AAT) http://vocab.getty.edu/
The Getty - Union List of Artist Names (ULAN) http://vocab.getty.edu/
Virtual International Authority File (VIAF) http://viaf.org/viaf/
Geonames http://sws.geonames.org/
IconClass http://iconclass.org/
Gemeinsame Normdatei (GND) http://d-nb.info/gnd
Israel Museum Jerusalem Concepts http://www.imj.org.il/imagine/thesaurus/objects/
Partage Plus concepts http://partage.vocnet.org/
data.europeana.eu WWI Concepts from Library of Congress
Subject Headings (LCSH) http://data.europeana.eu/concept/loc
Europeana Sounds Genres http://data.europeana.eu/concept/soundgenres/
EAGLE Material & Object Type http://www.eagle-network.eu/voc/
DISMARC Formats & Genres http://purl.org/dismarc/ns/
UDC http://udcdata.info/rdf/
UNESCO Thesaurus http://vocabularies.unesco.org/thesaurus/
5. Europeana Linked Data Strategy
A strategy for Entities
CC BY-SA
We are building an "Entity Collection"
• A service that acts as a centralized point of reference and access to
data about contextual entities: places, agents (persons and
organizations), concepts...
• Caching and curating data from the wider Linked Open Data cloud
• A sort of Europeana "knowledge graph" with an API
• A service can be re-used by everyone in our community
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
6. Uses cases for the Entity Collection (1/2)
CC BY-SA
Improve user experience on Europeana services
● Findability: users can search with and for people, places and subjects, not only objects. In many
more languages, and with less ambiguity
● Contextualization: users see contextual information about cultural heritage objects. Entity Pages
group and present all assertions about an entity
● Exploration: Browsing along relationships between objects and entities and between entities
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
Semantic auto-
completion
Entity Pages Entity based facets
Europeana Food & Drink
Project
7. Uses cases for the Entity Collection (2/2)
CC BY-SA
Crowdsourcing
● Objects can be annotated with references to
entities of their context
Automatic enrichment of providers' metadata
● A controlled vocabulary to help recognize references to entities
Republication for Re-use
● Entities can be republished as an open source to the community
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
Semantic and
Metadata annotations
Pundit Annotation Client
from Digital Manuscripts to
Euiropeana (DM2E)
8. Related work
CC BY-SA
• Knowledge graph creation and maintenance: Google's Knowledge
Graph, DBpedia, Wikidata, BabelNet, VIAF, Entity Facts, SNAC,
Europeana Food and Drinks
• (Vocabulary) web services: STW, DigitalNZ's Concept API...
• W3C's best practices for publishing (linked) data on the Web
• Data alignment tools and methods
• Semantic discovery services: Worldcat Identities, etc.
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
9. In this presentation
CC BY-SA
• Challenges, decisions and results for
• Building a knowledge graph for Europeana
• Accessing it for exploitation
• Disclaimers:
• The focus is operational, this is not groundbreaking research!
Sometimes we will state the obvious...
• Europeana's Entity Collection is still work in progress!
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
10. The things we'll be talking about
Entity Collection processes in Europeana
CC BY-SA
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
11. Building a knowledge graph
for Europeana
France, Public Domain
1914, National Library of France
Agence de presse Meurisse
Concours de cycles nautiques sur le lac
d’Enghien : Berregent piloté par Austerling
12. Selecting data sources
CC BY-SA
An intellectual effort by data experts, leveraging the following criteria:
• Availability and access: open license, published on the web as linked
data
• Granularity and coverage: similar or complementary sets of entities,
multilingual data, helping to answer key user needs for Europeana's CH
collections. Too generic datasets can create too much ambiguity for the
simple processes we have (e.g. enrichment)
• Size: larger vocabularies are useful, but sometimes create too much
ambiguity
• Quality: intrinsic aspects like correctness of representation (data
structures)
• Connectivity: good data sources are well-connected internally and
externally to other datasets
An approach based on pivot (e.g. DBpedia, Wikidata) and specialized data
sources (e.g. AAT) is likely to work well if links can be made between them
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
13. Statement selection and mappings
CC BY-SA
Conceptually distinct steps, which may be grouped at implementation time:
• Selection of entities within one dataset
• E.g. selecting artists (excluding pop stars) from DBpedia
• Mappings between model of data source and model of KG
• Made easier if both use standards, e.g. SKOS or FOAF - both re-used
by the Europeana Data Model
• Selection of relevant statements
• E.g. filtering unwanted languages larger vocabularies are useful, but
sometimes create too much ambiguity
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
14. Example: integrating DBpedia resources
CC BY-SA
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
15. Data integration and reconciliation
CC BY-SA
• Import of entities needs integrating old statements with new ones if
entities are redundant recognized to be the same (via available sameAs-
like equivalence statements)
• Several options for data integration on entities:
a. unification: lumping all statements together - possibly leading to
inconsistencies or cardinality constraints
b. first come, first serve: only adding statements from most recent sources
when it's possible, i.e. not violating cardinality constraints
c. most representative: possibly replacing old statements by new ones if
they come from a "prefered" source
d. differentiated most representative: grouping data sources in sets that
have different level of preference, so as to apply c between these
sets. b and c may be applied within the sets.
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
16. Alignment and curation
CC BY-SA
• Alignment aims at recognizing more equivalent entities in the data
sources to be integrated
• It can use automatic or semi-automatic tools like Wikidata
Mix'n'Match or CultuurLink, both experimented in Europeana's
context
• It requires a lot of effort/expertise at the scale we're considering!
• Curators should be able to edit data to maintain integrity in the Entity
Collection or elsewhere downstream
• Removing, editing, adding statements or deprecating entities, e.g. to
prevent ambiguities that lead to wrong enrichment of object
metadata
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
17. Data currently in the Entity Collection
CC BY-SA
Mostly corresponding to a selection made for Europeana's
Semantic Enrichment
• Places
a subset of Geonames, corresponding to places which are part of
European countries and of some specific feature classes.
• Agents
a subset of DBpedia corresponding to most of the instances of dbp:Artist
with some exceptions, and integrated from 49 DBpedia language editions.
• Concepts
a subset of DBpedia corresponding to a selection concepts matching the
needs from Europeana Collections (e.g., WWI battles).
Europeana Sounds music genres (obtained from Wikidata)
Photo Consortium's photography vocabulary
• Organizations
Extracted from Europeana's CRM and aligned to Wikidata when possible
216,302
resources
1,572
resources
165,005
resources
1,077
resources
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
18. Multilingual coverage of the Entity Collection
And its contribution to automatic enrichment
CC BY-SA
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
Entities effectively used to enrich Europeana Objects
Entities present in the Entity Collection
https://docs.google.com/document/d/1Nek_SPDtIR3waYwwdgRHY1YiTCj3SK8Eh9xYpnSkFkY
19. France, Public Domain
1932, National Library of France
Agence de presse Mondial Photo-Presse.
Tournoi royal de motos à Londres :
changement d'une roue de side-car en marche
Accessing and
exploiting the data in
the Entity Collection
20. Entity URIs
CC BY-SA
• Step 0 for publishing our Linked Data on the web: minting URIs for our
entities
• It's always a difficult choice between length, ease of persistence, human-
friendliness...
• We've looked at best practices and consulted our community
• Chosen pattern is http://data.europeana.eu/{entity_class}/
{scheme}/{localID}
• Where localID is a sequential identifier
• For example, the agent Leonardo da Vinci: http://data.europeana.eu/
agent/base/146741
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
21. The Entity API
CC BY-SA
• Still on alpha state
• Looking at best practices for Linked Data and JSON-LD
• Available documentation for this API at:
https://pro.europeana.eu/resources/apis/entity
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
22. The Entity API - entity look-up/resolving
CC BY-SA
• Linked Data content negotiation at data.europeana.eu (for JSON-LD and
HTML)
• API call for JSON-LD via entity identifiers
• https://www.europeana.eu/api/entities/[entity_class]/base/
[ID].jsonld
• resolve method for getting data on an Entity for an external URI (that
appear in sameAs-like equivalence statement)
• https://www.europeana.eu/api/entities/resolve?uri=[URI]
• We created a specific JSON-LD context to make the data easier to
consume for web developers, e.g. hiding RDF namespace abbreviations
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
23. The Entity Collection
DBpedia resource for “Mozart” in our data
CC BY-SA
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
Coreference links to 6 other
datasets
(e.g. Freebase, Wikidata)
Inter-linking information… still
need to switch references to link
to Europeana Entities
Preferred labels for 48
languages
24. The Entity API - discovery
CC BY-SA
• Suggestion of entities for a type string, used the auto-complete function in
Europeana's search box
• https://www.europeana.eu/api/entities/suggest
• For example: https://www.europeana.eu/api/entities/suggest?
wskey=apidemo&text=leo&type=agent
• Ranking of suggestions is based on:
• Europeana relevance: number of Europeana objects whose
description contains one of the entity's labels
• Popularity as computed in the Wikidata pagerank (Diefenbach &
Thalhammer, 2018)
• Coming soon: general entity search, based on free (Solr-style) querying on
metadata fields for entities
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
25. Entity API - suggest method
CC BY-SA
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage
/entities/suggest.json?text=neo
26. Conclusions
CC BY-SA
• We've made enough progress to release a first version of the
Entity Collection and its API, used in Europeana's production
services.
• But there are still challenges and decisions to ensure
consistency and relevance over time:
• Expand data coverage (and test extensibility) with new data
sources for, e.g., events
• Continue elaborating and testing data integration and filtration
strategies
• Employ the EC to better enrich Europeana object metadata
• Enhance discoverability, especially for search engines, e.g. via
Schema.org publication
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage