Over the past five years, the amount of contextual entities in Europeana’s metadata has grown considerably. These entities are provided as references as part of the metadata delivered by Europeana or selected by Europeana semantic automatic enrichment. Pursuing their efforts towards the creation of a semantic network around cultural heritage objects, Europeana and its partners providers and aggregators are investigating ways to better exchange vocabulary data and manage co-references/alignments between vocabularies. In this presentation we will explore the potential of tools such as OpenSkos and Cultuurlink for supporting the building of networked references.
Presented at the 6th DBpedia Community Meeting in The Hague 2016, see http://wiki.dbpedia.org/meetings/TheHague2016
Streamlining Python Development: A Guide to a Modern Project Setup
Building an ecosystem of networked references
1. Building an ecosystem of
networked references
Hugo Manguinhas | DBpedia Community Meeting 2016
2. Europeana has many data challenges
Building an ecosystem of networked references
CC BY-SA
We aggregate very heterogeneous metadata:
• More than 48M objects
• 3,500 galleries, libraries, archives and museums
• 50 languages
• From all EU countries
• Level of quality varies greatly
• Huge amount of references to places, agents, concepts, time
3. Europeana Linked Data Strategy
Motivation
Building an ecosystem of networked references
CC BY-SA
• Improve user experience
• support better ways of searching and navigating through the
collections, eliminating ambiguity and clarifying the meaning of
descriptions
• better adapt to the language of the user
• by improving the interlinking of data
• brings more context to the objects
• alleviates polysemy issues
• better language coverage
• Contributes to build a web of data ('knowledge graph') that
third parties can use to improve their users' experience
4. Europeana Linked Data Strategy
Our efforts and lines of work
Building an ecosystem of networked references
CC BY-SA
• Europeana Data Model (EDM) offers a base for linking
data
• We apply an enrichment strategy to link source data to
reference data, incl. DBpedia
• Encourage data providers to contribute their own
vocabularies so that we can benefit from data links made
at data providers’ level
• We encourage alignment activities between domain
vocabularies
5. Europeana Linked Data Strategy
Vocabularies currently provided to Europeana
CC BY-SA
Building an ecosystem of networked references
6. Europeana Linked Data Strategy
Europeana also hosts vocabularies
CC BY-SA
Building an ecosystem of networked references
7. Europeana Linked Data Strategy
A strategy for Entities
Building an ecosystem of networked references
CC BY-SA
• As a cornerstone for our strategy we are building an
"Entity Collection"
• A service that acts as a centralized point of reference and
access to data about contextual entities
• Caching and curating data from the wider Linked Open Data
cloud
• A sort of Europeana "knowledge graph"
8. The Entity Collection
Use Cases
CC BY-SA
Building an ecosystem of networked references
Europeana Collections Portal
● Findability: users can look for entities, not
only records (Entity-Based Search)
● Understandability: Entity Pages group and
present all assertions about an entity
● Exploration: Navigation along relationships
becomes possible
Crowdsourcing
● Objects can be annotated with references to
entities
● A controlled vocabulary for client applications
Enrichment of Provider’s Data
● A controlled vocabulary to help identify
named references to entities
Republication for Re-use
● Entities can be republished as an open
source to the community
Entity Collection
9. The Entity Collection
Why DBpedia?
CC BY-SA
Building an ecosystem of networked references
• It offers labels in about 124 languages through all its
language editions of which 48 match the languages that
Europeana supports
• It gives fairly complete and accurate descriptive metadata
about entities
• Works great as a “pivot” vocabulary, providing further links to
other vocabularies such as Wikidata and Freebase
10. Entity Collection
The Entity Collection
Integrating DBpedia resources
CC BY-SA
Building an ecosystem of networked references
RDF
dumps
48 Language
Editions
(~3.6GB/GZ)
http://data.dws.informatik.uni-
mannheim.de/dbpedia/2014/
Triple
Store
MongoDB
SPARQL
Dumps were carefully selected and
downloaded for all languages that
Europeana supports...
SPARQL queries select
DBpedia resources for each
EDM Contextual Class
Each DBpedia resource is converted
to EDM using XSLT and further
filtered
loaded...
11. The Entity Collection
Some statistics for DBpedia
CC BY-SA
Building an ecosystem of networked references
Entity Class Target vocabulary Size
Places GeoNames 140,097
Concepts DBpedia 5,284
GEMET 280
Agents DBpedia 161,209
Time Semium Time 2,566
13. The Entity Collection
Is DBpedia enough?
CC BY-SA
• Not enough coreferencing information to other vocabularies
• particularly to the ones we receive from data providers (e.g.
MIMO)
• Labels and values are not always accurate and normalized
• need for better reference data (e.g. VIAF)
Building an ecosystem of networked references
14. The Entity Collection
Our roadmap for the next years
CC BY-SA
Building an ecosystem of networked references
• Generate Europeana URIs for Entities
• Make entity services and data available via an API, and
further integrate existing components
• Integrate vocabularies that can further improve
• entity descriptions and multilingual coverage (e.g. VIAF)
• linking between entities (e.g. Wikidata)
• Integrate alignments
• particularly, links between local/domain vocabularies to pivot
vocabularies
15. Linking Metadata to MIMO with CultuurLink
About CultuurLink
CC BY-SA
Building an ecosystem of networked references
• Online tool for aligning SKOS vocabularies,
http://cultuurlink.beeldengeluid.nl
• Successor to EuropeanaConnect's Amalgame
• Developed by Spinque
• With support of the Network Digital Heritage
• Semi-automatic discovery of alignments
• Ability to define complex strategies
• Manual assessment and export of the results
16. Linking Metadata to MIMO with CultuurLink
The Europeana Sounds Experiment
CC BY-SA
Building an ecosystem of networked references
• Scope
• Evaluate MIMO as target vocabulary for enrichment of subject fields
• Evaluate the potential of CultuurLink for alignment of vocabularies
• Participants
• British Library, CREM, MMSH, NISV (6 collections in total)
• Work done
• A SKOS vocabulary was generated for each collection from the labels
found within subject fields
• Each participant used CultuurLink to discover alignments by designing
and testing different strategies
17. Linking Metadata to MIMO with CultuurLink
Example from CNRS
CC BY-SA
Building an ecosystem of networked references
see http://cultuurlink.beeldengeluid.nl
18. Linking Metadata to MIMO with CultuurLink
Results of the experiment
CC BY-SA
Building an ecosystem of networked references
• There were a couple of issues with metadata quality
• matching previous observations: enrichment works better when
metadata quality is good
• The feedback was positive for CultuurLink
• Applying different strategies showed to be crucial for discovering
alignments between concepts
• ... taking into account different features of the data (prefered /
alternative labels, different languages, vernacular labels, etc.)
• Providers have realized the interest and feasibility of linking to a
richer, more multilingual pivot vocabulary such as MIMO
19. Linking Metadata to MIMO with CultuurLink
Demo
CC BY-SA
Building an ecosystem of networked references
http://cultuurlink.beeldengeluid.nl
20. Conclusion
CC BY-SA
Building an ecosystem of networked references
• A Strategy for Entities is a “must” for Europeana
• There is no “one fits all” vocabulary
• We have a long way to go…
• ...but we are making progress