Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Europeana as a Linked Data (Quality) case

Presentation for the 3rd Workshop on Humanities in the Semantic Web (WHiSe), co-located with the 15th Extended Semantic Web Conference (ESWC 2020)
June 2, 2020, online
http://whise.cc/2020/

  • Be the first to comment

Europeana as a Linked Data (Quality) case

  1. 1. Europeana as a Linked Data (Quality) case Antoine Isaac with slides from Hugo Manguinhas, Valentine Charles, Juliane Stiller, Mónica Marrero and other colleagues 3rd Workshop on Humanities in the Semantic Web (WHiSe) Co-located with the 15th Extended Semantic Web Conference (ESWC 2020) June 2, 2020
  2. 2. Outline CC BY-SA • Brief intro to Europeana • Metadata quality challenges • Using Linked Data technology to make data richer • Encouraging data enhancements across the board • How all this fits Research-related efforts
  3. 3. Who is Europeana? CC BY-SA ● A non-profit foundation ● A community of 2400 experts in digital heritage: the Europeana Network ● A mission: improve access to Europe's digital cultural heritage
  4. 4. What is Europeana? CC BY-SA ● The European Commission's digital platform for cultural heritage ● Providing access to over 58M objects from over 3500 museums, libraries, archives
  5. 5. What is Europeana? CC BY-SA ● An Open Data platform providing several services ● Europeana portal: https://europeana.eu ● Europeana APIs: https://pro.europeana.eu/resources/apis
  6. 6. How does it work? France, Public Domain 1914, National Library of France Agence de presse Meurisse Concours de cycles nautiques sur le lac d’Enghien : Berregent piloté par Austerling
  7. 7. Title here CC BY-SACC BY-SA What’s inside Europeana? Europeana Essentials CC BY-SACC BY-SA ● Descriptive and technical metadata: title, creator, subject, rights… ● Editorial content like virtual exhibitions ● (recently started) user-generated metadata, incl. transcriptions, semantic annotations ● Thumbnails As a rule, digitized content is served on our partners’ websites Except for some specific projects ● Newspapers ● WWI user-generated content
  8. 8. Data flow in Europeana’s network Data providers: cultural institutions that provide metadata and links to digitized content Aggregators: organizations or projects that gather data from a specific country or domain (music, fashion, archaeology…)
  9. 9. France, Public Domain 1932, National Library of France Agence de presse Mondial Photo-Presse. Tournoi royal de motos à Londres : changement d'une roue de side-car en marche Data Quality Issues in Cultural Heritage Caveat: some examples have been already cleaned 
  10. 10. Title here CC BY-SA Sparseness of (meta)data CC BY-SA
  11. 11. Title here CC BY-SA Heterogeneity Europeana Essentials CC BY-SACC BY-SA 58M objects, from 3,500 institutions ● Many different themes and types of objects Books, newspapers, letters, diaries, archival papers, paintings, maps, drawings, photographs, music, spoken word, radio broadcasts, film, newsreels, fashion, sculpture, 3D objects, and more ● Libraries, archives, museums have different ways to describe objects. Even within a sector, big differences can be observed
  12. 12. Title here CC BY-SA Multilinguism Europeana Essentials CC BY-SACC BY-SA 58M objects, from 44 countries ● Officially we get metadata in 38 languages ● But there are more languages used in individual metadata fields
  13. 13. Title here CC BY-SA Multilinguism Europeana Essentials CC BY-SACC BY-SA ● Officially we get metadata in 38 languages ● But there are more languages used in individual metadata fields • Over 400 language codes e.g., 6 values in x-aramaic-latn - not a valid code by the way • The most common case is lack of language information!
  14. 14. How to get more homogeneous, richer & multilingual data? France, Public Domain 1914, National Library of France Agence de presse Meurisse Concours de cycles nautiques sur le lac d’Enghien : Berregent piloté par Austerling
  15. 15. Title here CC BY-SA Data modeling for interoperability and richer metadata CC BY-SA ● Like many aggregators, we ask our providers to give metadata using one metadata model: the Europeana Data Model (EDM) ● But we cannot do whatever we like: we do not operate in isolation! ● Our approach must be ○ easy and rewarding for our partners ○ based on community-agreed best practices
  16. 16. A community sport • Involving (technical) experts from libraries, archives, museums and academics – the EuropeanaTech community • Adopting a collaborative, softer form of standardization http://pro.europeana.eu/europeana-tech Europeana Assembly General Meeting, Rijksmuseum, Amsterdam, 2015
  17. 17. Title here CC BY-SA Prior to EDM: flat metadata records CC BY-SA ● No links between objects and persons, places… ● Mixing data on real object and digital content ● Causing a lot of mapping quality problems
  18. 18. Title here CC BY-SACC BY-SA Following Best Practices, such as the Linked Open Data principles http://vimeo.com/36752317
  19. 19. Massive re-use of vocabularies in EDM CC BY-SA Plus • Web Annotation • RDA • WGS84 • EBUcore • ccRel • ODRL • DOAP • SVCS • DCAT • ADMS … (sometimes only for one property!) http://pro.europeana.eu/edm-documentation EDM in Linked Open vocabularies (LOV) OAI-ORE FOAF
  20. 20. Title here CC BY-SA Title here CC BY-SA Europeana Essentials CC BY-SA Data modeling for interoperability and richer metadata CC BY-SA Clavecin, Bartolomeo Cristofori Cite de la Musique, MIMO - Musical Instruments Museums Online|CC BY-NC-SA http://pro.europeana.eu/edm-documentation
  21. 21. Enriching metadata CC BY-SA • EDM gives a base for (linking to) multilingual, semantic metadata • data as resources with web URIs, not only strings • We encourage data providers to contribute their own links/data to local or external vocabularies https://pro.europeana.eu/page/europeana-semantic-enrichment
  22. 22. CC BY-SA LOD Vocabularies currently recognized by Europeana in providers' metadata Vocabulary URL MIMO Concepts http://www.mimo-db.eu/ MIMO Instrument makers http://www.mimo-db.eu/ The Getty - Art & Architecture Thesaurus (AAT) http://vocab.getty.edu/ The Getty - Union List of Artist Names (ULAN) http://vocab.getty.edu/ Virtual International Authority File (VIAF) http://viaf.org/viaf/ Geonames http://sws.geonames.org/ IconClass http://iconclass.org/ Gemeinsame Normdatei (GND) http://d-nb.info/gnd Israel Museum Jerusalem Concepts http://www.imj.org.il/imagine/thesaurus/objects/ Partage Plus concepts http://partage.vocnet.org/ data.europeana.eu WWI Concepts from Library of Congress Subject Headings (LCSH) http://data.europeana.eu/concept/loc Europeana Sounds Genres http://data.europeana.eu/concept/soundgenres/ EAGLE Material & Object Type http://www.eagle-network.eu/voc/ DISMARC Formats & Genres http://purl.org/dismarc/ns/ UDC http://udcdata.info/rdf/ UNESCO Thesaurus http://vocabularies.unesco.org/thesaurus/ YSO General Finnish Ontology https://finto.fi/yso/en/ https://pro.europeana.eu/page/europeana-semantic-enrichment
  23. 23. Title here CC BY-SACC BY-SA Enriching metadata CC BY-SA • EDM gives a base for (linking to) multilingual, semantic metadata • data as resources with web URIs, not only strings • We encourage data providers to contribute their own links/data to local or external vocabularies • We are going to further develop crowdsourcing/"nichesourcing" of metadata • In parallel, we apply automatic enrichment to link object metadata to reference datasets for places, persons, concepts https://pro.europeana.eu/page/europeana-semantic-enrichment
  24. 24. Title here CC BY-SACC BY-SA Enriching metadata CC BY-SA
  25. 25. Title here CC BY-SACC BY-SA Enriching metadata – Contextual Entities CC BY-SA We are building an "Entity Collection" • Centralized point of reference and access to data about contextual entities: places, agents (persons and organizations), concepts... • Caching and curating data from the wider Linked Open Data cloud • A sort of Europeana knowledge graph • With a dedicated API https://pro.europeana.eu/page/entity#entity-collection
  26. 26. Data currently in the Entity Collection CC BY-SA • Places a subset of Geonames, corresponding to places which are part of European countries and of some specific feature classes. • Agents a subset of DBpedia corresponding to most of the instances of dbp:Artist with some exceptions, and integrated from 49 DBpedia language editions. • Concepts a subset of DBpedia and Wikidata corresponding to a selection of concepts matching our needs, e.g., WWI battles, music genres (Europeana Sounds aggregator) and a photography vocabulary (Europeana Photography aggregator) • Organizations Extracted from Europeana's CRM and aligned to Wikidata when possible 216,302 resources 1,572 resources 165,005 resources 1,077 resources https://pro.europeana.eu/page/entity#entity-collection
  27. 27. Selecting data sources CC BY-SA • Availability and access: open license, published as linked data • Granularity, size and coverage: multilingual data, with a rather generic scope. But too generic or too large datasets can create too much ambiguity for the simple processes we have (e.g., enrichment) • Quality: intrinsic aspects like correctness of representation • Connectivity: good data sources are well-connected internally and externally to other datasets
  28. 28. An example DBpedia resource for “Mozart” in the Entity Collection CC BY-SA Coreference links to 6 other datasets (e.g. Freebase, Wikidata) Inter-linking information Preferred labels for 48 languages
  29. 29. An enrichment example Links to contextual entities
  30. 30. And what it allows
  31. 31. And what it allows
  32. 32. Title here CC BY-SACC BY-SA Multilingual enrichment is not easy! Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy Marlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012 http://link.springer.com/chapter/10.1007%2F978-3-642-35233- 1_25
  33. 33. Encouraging everyone on the way to improve their data University Of Edinburgh, CC BY Roslin Glass Slides, creator unknown Photograph of two men step cutting on the ice face of the Tasman Glacier, New Zealand in the late 19th or early 20th century.
  34. 34. Title here CC BY-SA Challenges for working on quality improvement ● Methodological frameworks are not easy to apply ● Getting stakeholders interested is hard for us ● Communication lines are rather long ● It’s a sensitive area ● It’s hard to get users involved CC BY-SA
  35. 35. Title here CC BY-SA A general effort on quality CC BY-SA We have set up a Data Quality Committee to analyze quality issues and make recommendations to the Europeana community about: ○ Mandatory metadata elements ○ Metadata checking and normalization ○ Multilingualism … http://pro.europeana.eu/get-involved/europeana-tech/data-quality-committee
  36. 36. https://pro.europeana.eu/post/publishing-framework
  37. 37. Title here CC BY-SA CC BY-SA Convincing by impact
  38. 38. CC BY-SA Europeana Publishing Framework: Metadata languages attributes happy users (using Europeana portal in their native language) links to vocabularies context (for users browsing Europeana portal by persons, places, or concepts) enabling elements visibility (collections being findable along various dimensions: by subject, type, creator, date)
  39. 39. A community sport, again! • Involving (technical) experts from libraries, archives, museums and academics – the EuropeanaTech community • Adopting a collaborative, softer form of standardization http://pro.europeana.eu/europeana-tech Europeana Assembly General Meeting, Rijksmuseum, Amsterdam, 2015
  40. 40. France, Public Domain 1932, National Library of France Agence de presse Mondial Photo-Presse. Tournoi royal de motos à Londres : changement d'une roue de side-car en marche Europeana and the Research community
  41. 41. Europeana Research Partnerships Expertise Research Grants Programme CommunityEuropeana portal Connections Europeana APIs Europeana R&D Projects Europeana Research CC BY-SA https://pro.europeana.eu/page/europeana-research
  42. 42. Europeana & CLARIN • 180K Europeana sources loaded into CLARIN’s Virtual Language Observatory, Europeana now largest provider of individual metadata records in the VLO • Selection based on quality, accessability, processability and reusability • Full case study at https://bit.ly/2J5w8jc • Challenge for SW (not new!): generic & rich models/formats vs. community- specific & easier to consume Building partnerships with research infrastructures Europeana Research CC BY-SA
  43. 43. Title here CC BY-SA Semantic Web technology can help too, here Europeana is involved in initiatives that can help bridge gaps ● International Image Interoperability Framework (IIIF) ● Not only images : representation of document structures, (linking to) metadata, etc. ● With a strong focus on research cases (manuscripts, newspapers) Cf. https://www.slideshare.net/antoineisaac/iiif-and-the-europeana-mission ● Linked Art ● Shared Model based on LOD to describe Art ● Re-using a (LOUD) subset of CIDOC CRM CC BY-SA https://iiif.io https://linked.art
  44. 44. Title here CC BY-SA Semantic Web technology can help too, here ● The SW approaches enables to create links between underlying models and vocabularies ● W3C Web Annotation, CIDOC CRM, EDM ● Vocabularies expressed using SKOS ● Heavy reliance on JSON-LD ● Importance of data patterns ● Linked Open Usable Data - Rob Sanderson (Getty) ● See for example “The Importance of being LOUD” CC BY-SA https://www.slideshare.net/azaroth42/the-importance-of-being-loud
  45. 45. Title here CC BY-SA https://twitter.com/jbaiter_/status/1267553133942751232 It can work!
  46. 46. A community sport, again…
  47. 47. Helping FAIRification of Cultural Data University Of Edinburgh, CC BY Roslin Glass Slides, creator unknown Photograph of two men step cutting on the ice face of the Tasman Glacier, New Zealand in the late 19th or early 20th century.
  48. 48. Title here CC BY-SA How do Europeana's data and services meet the FAIR requirements? Europeana Essentials CC BY-SACC BY-SA Findable ● The Europeana aggregation network partially homogenizes its data via a shared data model ● Providers and Europeana seek to enrich the data with multilingual, semantic resources ● We promote persistent identifiers and links across them ● Europeana provides a search engine ● Data is made findable through other platforms (e.g., CLARIN) https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
  49. 49. Title here CC BY-SA How do Europeana's data and services meet the FAIR requirements? Europeana Essentials CC BY-SACC BY-SA Accessible ● Data is published as (Linked Data) web resources ● Freely available, standard web APIs Interoperable ● Europeana uses a community-based model ● Following best practices, such as mixing and re-using existing data models and vocabularies ● We promote more open and richer content access protocols (IIIF) https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
  50. 50. Title here CC BY-SA How do Europeana's data and services meet the FAIR requirements? Europeana Essentials CC BY-SACC BY-SA Re-usable ● The conditions for re-using digitized content are made clear, using shared vocabularies (Creative Commons, RightsStatements.org) ● Metadata is fully open – CC0 ● Data model seeks to bridge with other communities’ models, such as W3C Web Annotation, Schema.org https://pro.europeana.eu/post/europeana-and-the-fair-principles-for-research-data
  51. 51. CC BY-SA • Active in 2014-2016 • To develop the open data ecosystem, facilitating better communication between developers and publishers; • To provide guidance to publishers, promoting the re-use of data; • To foster trust in the data among developers • Linked Data, but not only! Data on the Web Best Practices Working Group https://www.w3.org/2013/dwbp/
  52. 52. CC BY-SA • Use terms from shared vocabularies, preferably standardized ones • Check that classes, properties, terms, elements or attributes used to represent a dataset do not replicate those defined by vocabularies used for other datasets. • e.g. using the Linked Open Vocabularies repository • Or if you have to replicate, indicate mappings clearly Best Practice 15: Reuse vocabularies, preferably standardized ones Data on the Web Best Practices W3C Recommendation
  53. 53. CC BY-SA • Accept that (OWL) semantics establish precise specs and can enable automated reasoning but that complex vocabularies require more effort to produce and hamper reuse of data • Minimize ontological commitment of your vocabulary – or seek to minimize the commitment of others’ vocabularies • Check that inference does not produce too many statements that are unnecessary for target applications • Check examples of “softer” specs, e.g. Schema.org or SKOS Best Practice 16: Choose the right formalization level Data on the Web Best Practices W3C Recommendation
  54. 54. Title here CC BY-SA Is it perfect? Europeana Essentials CC BY-SACC BY-SA No. In particular we would always like to get more input from users and researchers (the perspective is very CH- focused). But we’re working on it and we hope the situation is better than if we wouldn't have done anything! Has Semantic Web technology helped? YES
  55. 55. Want to engage? Do you want to hear more about these issues? Check coming “Enriching research – enriching metadata” webinars Europeana Research has a grants programme to fund events that bring together cultural heritage and researchers. Check future calls! Join the Europeana Network and (one of) its communities! CC BY-SA https://www.raa.se/in-english/events-seminars-and-cultural-experiences/workshop- on-digitised-collections-enriching-research-enriching-metadata/ https://pro.europeana.eu/page/grants-programme https://pro.europeana.eu
  56. 56. Title here CC BY-SACC BY-SA Title here CC BY-SA Name of image | Creator Providing organization| Country, licence Name of image | Creator Providing organization| Country, licence antoine.isaac@europeana.eu @antoine_isaac

×