SlideShare a Scribd company logo
1 of 212
Download to read offline
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
EUROPEANA MEETING
UNDER FINLAND’S PRESIDENCY
OF THE COUNCIL OF THE EU
ESPOO, FINLAND
24 October 2019
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Andy Neale
Technical Director
Europeana Foundation
Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
Contribution to EU GDP
by culture and creative sectors
Trade Surplus
in cultural goods
€ 8.7B 4.2%
New Agenda for Culture
Automotive + Manufacturing +
Chemical Industries
Cultural + Creative
Sector
7.8M 4.4M>
Employment
young professionals
(15-29 yrs old)
19.1%
The role of
Europeana
Europeana Party People @ Christmas party, CC BY
We support cultural heritage institutions in
their digital transformation
Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
3.700
CHIs across Europe
EUROPEANA COLLECTIONS
58m
Cultural heritage records
Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
2.5bln
Information items
1. Common Tech & data architecture
Europeana
Data Model +
Metis
2. Common policies & standards
Europeana
● Licensing
Framework
● Publishing
Framework
Statements for works that are
not in copyright
Statements for works where the
copyright status is unclear
Statements for works that are in
copyright
3. Websites & APIs
Europeana
Collections
Programme
Europeana Party People @ Christmas party, CC BY
Objectives
1. Stimulate reflection on multilingualism in digital cultural heritage at
large using Europeana as a case study;
2. Develop a deeper understanding of the multilingualism
problem/opportunity space for digital cultural heritage;
3. Consider what options can be pursued to provoke action at the local
level, furthering the multilingual capabilities;
4. Provide input and feedback for the Europeana multilingual strategy.
Sessions
1. Setting the scene
2. User interactions
3. Multilingual metadata
4. Content translation
5. Conclusions and steps for progress
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Juliane
Stiller
Information Specialist
You, We & Digital
‘Multilingual Developments in
Digital Cultural Heritage Domain:
Problem Space & Solutions’
20
● 10 years researcher at
Humboldt-Universität zu
Berlin in Europeana-related
projects
● multilinguality, interaction
patterns, metadata and its
quality, research on search
and browse, retrieval,
evaluation
● since 2019 consultancy and
training in digital literacy
@stillinsky
Agenda
• Multilinguality: the problem space
• Bridging the language gap
• Translations
• Enrichments
• What is left to do?
21
22
Multilinguality
The Problem Space
Christoffel van Sichem: Bouw van de toren van Babel
Content
Information Access
Interactions
User Interface
Metadata and digital
CH objects
Search, Browse & Explore
Show user‘s
preferred language
Bridge the gap between
language of user input
and content
Layers of digital CH system
User Interface
24
Challenges:
• Translation of static and dynamic
pages
• Switching languages via text or
icons such as flags
• Default language
• Determine the user‘s preferred
language through IP address or
browser settings
User Interface
Interactions: Search
25
Interactions
Mismatch between query and
content language
• Mona Lisa 203 results
• Monna Lisa 13 results
• La Gioconda 376 results 
• La Joconde 78 results
26
Interactions
Roma, Galleria Corsini - La
Gioconda,
Interactions: Browse
● Search vs. browse
● (Metadata) text vs. object
27
Interactions
Interactions: Explore
cater for different information needs in different languages:
• Entities
• Colors, format
• Access & copyrights
• Inspiration
28
Interactions
Content & Metadata
29
Image Credit: both from Europeana with Titlte „Kinderbuch” from
Spielzeugmuseum der Stadt Nürnberg (CC BY-NC-SA)
Content
Metadata multilinguality
30+ 40 other languages....
Content
Bridging the language gap
Translations & Enrichments
31
Bridge by Mark Robinson (CC-BY 2.0)
To bridge the gap between language of user
input and content, one can translate
1. Queries
2. Content / Metadata
3. .....
32
1) Translating queries
33
Query
English
Spanish
French
....
comes with challenges ....
Database
Information Access
Cultural heritage queries
34
κερκυρα
poblet
bævre
humble østerskovvej
espana salamanca
academia coleccam
documentos estatutos
εσκι σεχιρ
first war world
berlin berliner mauer or wall
alphonse mucha
Query heterogeneity & long tail
35
Europeana queries in a month in 2016
442 times: Wolfgang Amadeus
Mozart
once: full history of ging
tsholing in bhutan
Queries in cultural heritage are
● Short
● Heterogeneous
● Focus on entities: 61.96% of the queries contain NE (Stiller, Gäde &
Petras, 2010)
● Highly ambiguous in language:
○ “culture”, “administration”, “paris”, “madonna”
● Semantically ambiguous:
○ “barber” (composer or hairdresser)
36
Multilingual academic search
● informational queries from the psychology domain in 4
languages: pubpsych.eu
● Buildung domain-specific lexical resources and map them
to queries; entries look like this:
○ wohlbefinden|||en:well-being|||es:bienestar|||fr:bien-etre
○ wohlfuhlen|||en:well-being|||es:bienestar|||fr:bien-etre
○ Well-being|||es:bienestar|||de:wohlbefinden|||fr:bien-etre
● Translation does not depend on language identification
● Deals well with NE -> no match in Lexicon, no translation
More Info on the project: https://www.clubs-project.eu/en/
Query
2) Translate the content
38
Spanish
French
German
English
Content
English
French
German
Spanish
Content
Database
Metadata
heterogeneity &
sparsity
39
http://www.europeana.eu/port
al/en/record/92022/Bibliograp
hicResource_1000125938148.ht
ml
https://www.europeana.eu/portal/en/record/92022/Bibl
iographicResource_1000125938148.html
Challenges
• Missing training data for small languages
• Missing training data for (sub)domains
• Amount of language pairs is immense with 50+
languages
• Metadata is too scarce for good translation results
40
Enrichment
41
42
Enrich
metadata
Number of enriched objects, their type and
vocabularies
GeoNames
7 Millions
GEMET, DBpedia
9.2 Millions
Semium Time
10.2 Millions
DBpedia
144,000
Time Concept
Locations
Agents
Enriched entities in Europeana
Semantically incorrect
enrichment
Polen (Dutch) Polen (Basque)
What is left to do?
45
Adapt to queries
Entity graphs for
exploration
• Object
• Person
• Concept
• Period
• Location
• Event
46
Evaluate solution based on goal
○ E.g. for ML retrieval we might not need the perfect fluent
translation
○ Identify the impact of different workflows / processes on
multilinguality of system
○ Translations do not only have an impact on data but also on
retrieval and therefore on user satisfaction
47
Thank you!
http://tatecollectives.tumblr.com/tagged/1840s-GIF-Party
48
@stillinsky
hello@you-we-digital.com
References
• Petras, V., Hill, T., Stiller, J., & Gäde, M. (2017). Europeana – a Search Engine for Digitised Cultural
Heritage Material. Datenbank-Spektrum, 1–6. https://doi.org/10.1007/s13222-016-0238-1
• Hill, T. D., Charles, V., Isaac, A., & Stiller, J. (2016). “Searching for Inspiration”: User Needs and Search
Architecture in Europeana Collections. ASIS&T 2016 Annual Meeting.
• Manguinhas H (2016) Europeana Semantic Enrichment Framework. Documentation, Europeana.
https://docs.google.com/document/d/1JvjrWMTpMIH7WnuieNqcT0zpJAXUPo6x4uMBj1pEx0Y
• Stiller, J. (editor) )(2016) Best practices for multilingual access. Tech. rep.
http://pro.europeana.eu/files/Europeana_
Professional/Publications/BestPracticesForMultilingualAccess_whitepaper.pdf
• Stiller, J., Gäde, M., & Petras, V. (2013). Multilingual access to digital libraries: The Europeana use
case. Information-Wissenschaft Und Praxis, 64, 86–95.
• Olensky, M., Stiller, J., & Dröge, E. (2012). Poisonous India or the Importance of a Semantic and
Multilingual Enrichment Strategy. In 6th Research Conference, MTSR 2012, Cádiz, Spain, November
28-30, 2012. (pp. 252–263). Berlin: Springer.
• Stiller, Gäde, Petras (2010): Ambiguity of Queries and the Challenges for Query Language Detection.
49
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Rickard
Domeij
Language Planner
Language Council of Sweden, Institute of
Language and Folklore
Multilingualism, technology
and language policy
Content
● The LC and the multilingual language policy of Sweden (and EU)
● Multilingually accessible services
● Language technology (LT) and language resources
● National Language Bank
● First experiences in digital humanities and cultural heritage
● Challenges for LT in cultural heritage
● Next steps
Multilingual language policy
● The LC monitors and promotes the languages of Sweden and their use
● Language policy (2005) and Language act (2009)
● Status and rights to use Swedish and other languages in Sweden
● National minority languages: Sami, Meänkieli, Finnish, Romani, Jiddish
● Swedish sign, Nordic languages, EU-languages, immigrant languages
● Public agencies have to reach out to the whole population
● Also good for business
Multilingually accessible services
● Vision: a multilingual society in which all citizens are included with
respect to different backgrounds and languages --> digital inclusion
● Access to info and services according to language rights and needs
● Switch between languages and modes according to preferences
● Example: have a web text read aloud in your language
● Essential for people with disabilities but also useful for others = design
for all (e.g. subtitling)
LT to make it possible
● Conversions between languages and modes
● Different modes: writing, speech, gestures …
● Multilingualism = multilinguality + multimodality
● LT modules: text-to-speech (TTS), speech-to-text (STT), machine
translation (MT) …
● Applications: recitation, dictation, translation …
● Voice translation: STT > MT > TTS
LT to make it possible II
● Problems with quality and trust, especially on unrestricted data
● User and domain adaptation, user interaction
● Ex: respeaking system for subtitling on tv
● Accessibility often means loss of quality, but other gains
● Accessible and usable
Language resources needed
● Data and tools: corpora, markup tools, lexicons, language models …
● Rule-based methods, especially for less resourced languages
● Market forces are not enough
● Stimulate the development of LT and multilingually accessible services
by national means (ex: respeaking system for Swedish tv)
● National Language Bank (NLB) to make resources available for R&D
An NLB promotes the development of technology, which benefits the languages in
Sweden and improves access to information for everyone.
Digital agenda for Sweden (2011)
National research infrastructure (2017-
00626) funded by the Swedish Research
Council by 1,5 mil./year until 2025.
Two main types of data:
Multilingual texts and terms from PAs
Multimodal cultural heritage collections
First experiences in cultural heritage
● Available voice recognition and MT doesn’t work!
● Instead try other methods:
○ ”sound browsing” to explore speech recordings acoustically
○ respeaking for transcribing speech
○ transcription of handwritten dialect text in Transcribus
○ time-alignment of existing transcripts to sound in ELAN
○ linking from text to speech data in the archives (see next page)
● Usage centered, participative design in multidisciplinary teams
● Tilltal project (SAF16-0917:1)
First experiences in cultural heritage
● State-of-the-art voice recognition and MT doesn’t work!
● Instead try other methods:
○ ”sound browsing” to explore speech recordings acoustically
○ respeaking for transcribing speech
○ transcription of handwritten dialect text in Transcribus
○ time-alignment of existing transcripts to sound in ELAN
○ linking from text to speech data in the archives
● Usage centered, participative design in multidisciplinary teams
● Tilltal project (SAF16-0917:1)
Challenges for LT in cultural heritage
● Interface or content (= multilingual in a broad sense)
● Far beyond modern standard language use
● Great variation makes domain adaptation hard
● Variation in place (dialects and languages), time (old Swedish) and
situation (informal-formal)
● Modal variation in collections: (handwritten) text, speech, pictures
● Hard to handle as researchers want to explore a collection as a whole
Next steps
● Linked data to describe the collection conceptually and relationally
● Multilingual search methods for handling language variation in place,
time and situation
● Domain adopted speech-to-text conversion to transcribe recordings
● Crowdsourcing for correcting
● Shared resources for the languages, dialects, domains etc
● Long time funding for the National Language Bank
● Collaborative projects involving LTists, researchers and data holders
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Andrejs Vasiļjevs
Executive Chairman
Tilde
Project Manager
Culture information systems centre Riga
Jānis Ziediņš
Learnings from the automatic
translation projects and how to
apply them for the culture and
heritage sector
Culture information systems centre
65
Our mission is to assist cultural heritage institutions -
ARCHIVES LIBRARIES MUSEUMS
maintain and make available cultural heritage for future generations
through the latest information technology solutions.
6
6
6
7
Benefit for eGovernment
6
8
State Gov.lv platform
Platform for the
provision and
management of
e-Services
Single Public Administration
Data Area
Municipality IS
Other IS
State information systems
MT platform
OpenData
6
9
7
0
7
1
7
2
7
3
7
4
7
5
7
6
Digitization of the Cultural Heritage Content
The National Library of Latvia is implementing a European Regional Development
Fund (ERDF) and nationally co-funded project in the field of Latvia's digital cultural
heritage, together with project partners – the National Archives of Latvia, the State
Inspection for Heritage Protection of Latvia, and the Cultural Information System
Centre.
The project will further develop the Digital Object Management and Conservation
System, develop the Copyright Management and Content Licensing System, publish
several Open Datasets, including Related Open Datasets, and develop the Stage of
an Integrated Centralized Open System Information Platform.
7
7
7
8
7
9
Translation test
A photomontage postcard with five
views of Riga. The central city
panorama with the new Pontoon
Bridge opened in 1896 and the Mazā
Guild building in the right corner.
Below these images, the city theatre,
Vērmanes Garden and the bridge
across the canal by Bastejkalns.
A postcard is assembled from five views
of Riga - downtown panorama with the
new Pontonbridge discovered in 1896,
the Little Guild House in the right
corner, under these images - City
Theatre, Verman Gardens, a bridge
over the canal near BastejHill.
Manual translation Hugo.lv translation
VRVM 176655 http://www.nmkk.lv/Items/ItemViewForm.aspx?i
d=167748
8
0
AI for breaking
language barriers
Enablers of AI
ML Algorithms Computing PowerBig Data
84
Based on Tilde Neural MT
technologies that have won the
1st place at the
WMT2017-2019, a global
competition between the
World’s top language
technology providers
Best
WMT
2017
Best
WMT
2018
Best
WMT
2019
• Generic MT systems were
trained on
52 million parallel sentences
• Cultural domain MT systems
were customized with
additional
826 000 parallel sentences
5 million monolingual
sentences
Books
Public sector
data
▪ Fiction
▪ Scientific literature
▪ Technical literature (manuals, instructions)
▪ News from popular media (also multilingual
media)
▪ Company press releases
▪ Multilingual web site content
▪ Laws, regulations, directives, etc.
▪ Documents of internal and external use
▪ Press releases
▪ Public sector web site data
News and web
content
Proprietary
translation
memories
▪ Professional and amateur translator produced
data
▪ Translation memories of translation and
localisation service providing companies
▪ Translation memories of international
organisations
Datafor MT System
Development
Comparison to Google –
Automatic Evaluation
Comparison to Google –
Human Evaluation
Usability, productivity and
integration
Translation add-on
for browsers
Translation API
Plug-ins for
CAT tools
Translation widget
Hugo.lv – AI powered language technology portal
90
91
1.1
million terms
22
subject fields
164 216
terms in culture
domain
92
EU Council Presidency Translator
2017-2020
93
EU Presidency bildīte
EU COUNCIL PRESIDENCY
TRANSLATOR
94
EU PRESIDENCY TRANSLATOR
AI-powered Neural MTCEF eTranslation
MT systems for the 24 official
EU languages enabling
translation of full documents,
preserving text formatting
AI-powered custom Neural
MT providing superior-quality
translation adapted for the
Presidency requirements
95
Web Site – Text Translation
96
Formatting-Rich Document Translation
97
Website Translation
98
BENEFITS FOR ESTONIA, BULGARIA, AUSTRIA
• Enables Presidency staff to quickly translate documents
• Empowers visiting journalists and delegates to access info in
the local language, e.g., press releases, local news sites
• Supports staff translators in their work by boosting
translation productivity up to 35%
• Lowers costs of translation for documents by utilizing
post-edited machine translation
• Allows public sector organizations to translate content and
websites into multiple languages
99
From September, 2017 to October, 2019 the EU Council
Presidency Translator has processed:
32 159 082
million
words
2.83
million
sentences
1.09
million
translation
requests
~207 books (there are 155
thousand words on average in one
Harry Potter book)
STATISTICS
100
101
Conclusions
• New generation of Neural MT strongly improves quality and applicability of
machine translation, especially for morphology rich languages
• Domain specific data is crucial for making MT suitable for cultural and other
domains
• Depending on the application, translation needs can be served by selecting
the most efficient approach – pure MT, human review of the MT, or fully
human translation
• We will be happy to share our experience, technologies and tools :)
Thank you!
Jānis Ziediņš, janis.ziedins@kis.gov.lv
Andrejs Vasiļjevs, andrejs@tilde.com
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Heli
Kautonen
Library Director
Finnish Literature Society SKS
Design for Diversity
Design for Diversity
Heli Kautonen
Library Director, Finnish Literature Society (SKS)
24.10.2019
Europeana meeting on multilingualism, Hanaholmen, Finland
1831
Photo © Gary Wornell, SKS 2019
Image © SKS 2010
Suomalaisen
Kirjallisuuden
Seura
(Finnish Literature Society)
Photo © Gary Wornell, SKS 2019
Photo: Alexandre Caffiaux, Université de Lille, 2018. CC-BY 2.0
Diversity
Diversity
Photo: Jackster121212 - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=80077504C
Photo: Heli Kautonen 2017
Design
Universal Design
Critical Design
Inclusive Design
Value-Sensitive Design
Photo: Helsinki City Museum, CC-BY 4.0
Source: Finna.fi
Photo: newobj Source: Github.com
How might we…?
Photo: Heli Kautonen 2016
…measure
the value…
…now,
next year,
in the future?
Photo: Heli Kautonen 2019
Development Implementation Operation and maintenance Initiation
(of a new service)
time
Process-time Use-time Future
Who are involved in
the development and
implementation of
your service?
What kinds of benefits
can be identified?
Who uses your
service? Are there
other stakeholders?
What kinds of benefits
can be identified?
Who could (re)use
your service or
materials in the
(undefined) future?
What kinds of benefits
can be anticipated?
Model for temporal division of benefits
Kautonen, H. & Nieminen, M. (2018): Conceptualizing Benefits of User-Centered Design for Digital Library
Services. Liber Quarterly, 28(1), ss. 1–34. DOI: http://doi.org/10.18352/lq.10231.
TrustEfficiency
Revenue
Better
quality
Learning &
competence
Self
esteem
Ease
of use
Cost
savings
COMMITMENT
Sustainability
”for + with
society”
Prof. Linda Doyle
Trinity College Dublin
Photo: Heli Kautonen 2019
Photo: Heli Kautonen 2019
2031
Questions and comments
heli.kautonen@finlit.fi
Twitter: @helimuori
https://fi.linkedin.com/in/heli-kautonen-38136512
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Dasha
Moskalenko
Manager Service Design
Europeana Foundation
Europeana case study
UX Design and user testing
Ο Ζητιάνος Φοιτητής, Άγνωστος δημιουργός, 1945,Ίδρυμα Μουσείου Νίκου Καζαντζάκη, Greece, CC BY-NC-ND
Καντσονίσιμα-Σατιρίσιμα-Ψυθιρίσιμα, Άγνωστος δημιουργός, 1971, Ίδρυμα Μουσείου Νίκου Καζαντζάκη, Greece, CC BY-NC-ND
Language in Portuguese
Language detection and display (for validation)Query translated in 24 languages
Results displayed based on relevance in all languagesResults displayed in original languageSearch term highlighted
Sort by language availableLanguage tag showing item’s original languageLanguages in which item metadata is available
Item’s original language & option for automatically translation
Hands showing the French sign language alphabet, Wellcome Collection, CC BY
europeana.eu
@EuropeanaEU
THANK YOU!
Questions & comments are welcome.
dasha.moskalenko@europeana.eu
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Matias
Frosterus
Information Systems Manager
with Mikko Lappalainen, Osma Suominen,
Satu Niininen
National Library of Finland
Multilingual linked vocabularies
and automatic subject indexing
services - National Library's
Finto and Annif
THE NATIONAL LIBRARY OF FINLAND
Libraries and access
THE NATIONAL LIBRARY OF FINLAND
Libraries and access
?
THE NATIONAL LIBRARY OF FINLAND
Libraries and access
?
THE NATIONAL LIBRARY OF FINLAND
Libraries and access
!
THE NATIONAL LIBRARY OF FINLAND
Libraries and access
!
THE NATIONAL LIBRARY OF FINLAND
Libraries and access
THE NATIONAL LIBRARY OF FINLAND
The goal
▪ Bringing the library know-how into use for all of the public
sector
▪ But better!
▪ Better vocabularies
▪ Publication, use, and integration of those better vocabularies
▪ Automated tools to make it even easier
THE NATIONAL LIBRARY OF FINLAND
What is needed?
▪ Modern linked data vocabularies
▪ A way to publish them for everyone to use
▪ A way to integrate them into your systems
▪ A way to make using them less labour-intensive
THE NATIONAL LIBRARY OF FINLAND
Vocabularies
THE NATIONAL LIBRARY OF FINLAND
Vocabularies
▪ Starting point: General Finnish Thesaurus YSA
▪ Developed in the 1980’s mainly for book indexing
▪ Over 30,000 terms
▪ Monolingual but has a Swedish counterpart Allärs
THE NATIONAL LIBRARY OF FINLAND
Thesaurus to ontology
▪ Reconstruction of YSA into machine-readable and multilingual YSO
▪ Trilingual terms for concepts (fin, swe, eng)
▪ YSA and Allärs merged together and translated into English
▪ Concepts are a compromise between Finnish and Swedish as YSA
and Allärs are not completely identical
▪ Links to Library of Congress Subject Headings (LCSH)
▪ Linking to Wikidata underway
▪ YSO just made the list of Europeana dereferenceable vocabularies
that can be enriched in the Europeana portal
THE NATIONAL LIBRARY OF FINLAND
Annotate in one language, find using another
THE NATIONAL LIBRARY OF FINLAND
Challenges of multilinguality
▪ Founded on the concepts of the Finnish cultural sphere
▪ Some concepts may not be common outside of that
▪ sandwich cakes, uncles (maternal)
▪ väheneminen = minskning (antal) = decrease (passive)
vähentäminen = minskning (aktiv reducering av antal) = decrease (active)
▪ Liikuntalukiot = idrottsgymnasier = general upper secondary schools
focusing on sport and exercise
THE NATIONAL LIBRARY OF FINLAND
Challenges of multilinguality
▪ Some may result in somewhat awkward terms
▪ rivers = joet = floder, åar och älvar
▪ The original Swedish thesaurus Allärs had three terms that could be
used interchangeably
THE NATIONAL LIBRARY OF FINLAND
Challenges of multilinguality
▪ Can also affect hierarchy
▪ pesät
⤷ muurahaispesät (literally ant nests)
bon
⤷ myrstackar
nests
⤷ ant hills
▪ For more information, see http://urn.fi/URN:NBN:fi-fe201705106375
Satu Niininen, Susanna Nykyri, Osma Suominen, (2017) "The future of
metadata: open, linked, and multilingual – the YSO case", Journal of
Documentation, Vol. 73 Issue: 3, pp.451-465, doi: 10.1108/JD-06-2016-0084.
THE NATIONAL LIBRARY OF FINLAND
YSO
YSO
Upper
hierarchy
General
concepts
Specific
concepts
THE NATIONAL LIBRARY OF FINLAND
YSO
YSO
Upper
hierarchy
General
concepts
Specific
concepts
THE NATIONAL LIBRARY OF FINLAND
Adapted into use outside the library domain
▪ Extended with domain ontologies
▪ Using the core provided by YSO
▪ Helps interoperability!
▪ Developed by the domain experts in various organizations
THE NATIONAL LIBRARY OF FINLAND
Adapted into use outside the library domain
▪ Extended with domain ontologies
▪ Using the core provided by YSO
▪ Helps interoperability!
▪ Developed by the domain experts in various organizations
▪ Over a dozen domain ontologies such as:
▪ AFO - Agriculture - 7 000 concepts
▪ JUHO - Government - 6 300
▪ KAUNO - Literature - 5 000
▪ KULO - Cultural research - 1 500
▪ LIITO - Economics - 3 000
▪ SOTO - Military - 2 000
▪ TERO - Health - 6 500
▪ And others
THE NATIONAL LIBRARY OF FINLAND
Domain ontologies all extending YSO in
THE NATIONAL LIBRARY OF FINLAND
KOKO
▪ An ”ontology cloud” which combines the domain ontologies
and the general ontology into a cohesive whole
KOKO
▪ An ”ontology cloud” which combines the domain ontologies
and the general ontology into a cohesive whole
THE NATIONAL LIBRARY OF FINLAND
Vocabulary service
THE NATIONAL LIBRARY OF FINLAND
National vocabulary and ontology service
Finto
▪ A bit of history
▪ FinnONTO-research project (2003-2012)
▪ Built research prototypes of services and started the ontologization
process of the various thesauri
▪ The National Library began the Finto project in 2013 funded by
the Ministry of Education and Culture and the Ministry of Finance
▪ A national vocabulary and ontology service for the whole public
sector
THE NATIONAL LIBRARY OF FINLAND
Finto offers
THE NATIONAL LIBRARY OF FINLAND
Finto offers
Free to use
Open licenses
http://finto.fi
THE NATIONAL LIBRARY OF FINLAND
Adopted widely in Finland
▪ Finto is used in many organizations in Finland to annotate
their various resources, among them
▪ The national broadcasting company Yle
▪ Suomi.fi citizen’s portal to public services
▪ Various public sector content systems
▪ Websites of various ministries
▪ Various museums, archives, and libraries
THE NATIONAL LIBRARY OF FINLAND
Skosmos
▪ The heart beating inside Finto
▪ Open source SKOS vocabulary browser
▪ http://skosmos.org
▪ Publication and use of light-weight ontologies, thesauri and classifications
▪ Web interface
▪ REST API
▪ SPARQL endpoint
▪ Community
▪ https://groups.google.com/forum/#!forum/skosmos-users
How does it work?
▪ Make your thesaurus into SKOS
SPARQL
▪ Put in in a SPARQL triple store
How does it work?
SPARQL
Skosmos
▪ Point Skosmos at your SPARQL endpoint
How does it work?
SPARQL
Skosmos
▪ And serve your thesaurus for
humans, Linked Data agents,
and REST API access
How does it work?
THE NATIONAL LIBRARY OF FINLAND
Key features
▪ Multilingual browser interface (10 languages)
▪ Autocomplete search
▪ Alphabetical index
▪ Concept hierarchy display
▪ Concept groups (thematic index)
▪ New concepts
▪ REST API for enabling use of vocabularies in other
applications
▪ responses usually JSON-LD
www.loterre.fr/skosmos
http://chemskos.com
Skosmos installations around the world
http://vocabularies.unesco.org/ http://aims.fao.org/standards/agro
voc/functionalities/search
THE NATIONAL LIBRARY OF FINLAND
Automated subject indexing
THE NATIONAL LIBRARY OF FINLAND
Many possible solutions
THE NATIONAL LIBRARY OF FINLAND
Some problems
YSO
KOKO
AFO
JUHO
€ £ $
THE NATIONAL LIBRARY OF FINLAND
Automated Subject Indexing made easy:
Annif
▪ An open source multilingual automated subject indexing
system using machine learning and our own vocabularies
THE NATIONAL LIBRARY OF FINLAND
Where to get the learning material?
Metadata about 13M documents,
many of them tagged with subjects! Hot tub by a lake
Andrei Niemimäki
CC BY-SA
Hot tub by a lake
Andrei Niemimäki
CC BY-SA
Metadata about 13M documents,
many of them tagged with subjects!
Hot tub by a lake
Andrei Niemimäki
CC BY-SA
Metadata about 13M documents,
many of them tagged with subjects!
Finna API
▪ All Finna metadata is
▪ YSO and KOKO widely used
THE NATIONAL LIBRARY OF FINLAND
▪ Try it out for yourself at http://annif.org/
Automated Subject Indexing made easy:
Annif
Prototype in 2017
THE NATIONAL LIBRARY OF FINLAND
Automated Subject Indexing made easy:
Annif
VsAutomating our own processes Creating generic tools for many contexts
THE NATIONAL LIBRARY OF FINLAND
Annif development
▪ Packaging Annif into an easy-to-deploy solution via Docker
▪ Tuning the various algorithms and their hyperparameters
powering Annif
▪ Making integration easier through a Finto API
THE NATIONAL LIBRARY OF FINLAND
Summary
THE NATIONAL LIBRARY OF FINLAND
Summary
Interlinked multilingual vocabularies
for various domains
A national service for
publishing and using
said vocabularies
An automated system
for making it easy
to produce annotations
with said vocabularies
THE NATIONAL LIBRARY OF FINLAND
Summary
Interlinked multilingual vocabularies
for various domains
A national service for
publishing and using
said vocabularies
An automated system
for making it easy
to produce annotations
with said vocabularies
All the while
utilizing library
know-how
Richer metadata
Cross-domain findability and interoperability
More efficient workflows
New connections, new possibilities
THE NATIONAL LIBRARY OF FINLAND
Thank you!
matias.frosterus@helsinki.fi
finto-posti@helsinki.fi
@Fintopalvelu
All pictures used under CC0
license unless otherwise noted
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Hugo
Manguinhas
Product Manager API
Europeana Foundation
Case Study -
Translation of object metadata
using the Knowledge Graph
Multilingual experience
Collections
Object metadata
Text objects
Search
Browse
Display
Translatable dataUsage scenarios
Editorial content
User interface
Object Metadata
What is the title of the object?
Who created or contributed it?
What topics is the object about?
What kind of object it is?
When was it created or published?
Where was it created or is located?
...
KNOWLEDGE GRAPH
Bulong Miao, Wellcome Collection, United Kingdom, CC BY
About the Knowledge Graph
● Vast network of data sources made available in
the wider Linked Open Data cloud
● Can be linked to and used to bring more
contextual information to the items
● Vast and readily available source of controlled
translations
Part of the Linking Open (LOD) Data Project Cloud Diagram, CC-BY-SA.
EDM and the Knowledge Graph
We encourage data providers to
● Contribute links to their own
vocabularies and publish them as
Linked Open Data
● Use available reference vocabularies
to describe their content
Clavecin, Bartolomeo Cristofori
Cite de la Musique,
MIMO - Musical Instruments Museums
Online|CC BY-NC-SA
● Available as Linked Open Data and
therefore part of the Knowledge Graph
● The rights statements have been
translated into: Estonian, Finnish,
French, German, Polish and Spanish,
but 7 more translation efforts are
ongoing
Research has shown that the
official translation of rights
information leads to better
investment/effort into adoption
of rs.org and thus more accurate
copyright info
General Finnish Ontology (YSO)
<skos:Concept rdf:about="http://www.yso.fi/onto/yso/p4349">
<skos:prefLabel xml:lang="sv">hederstecken</skos:prefLabel>
<skos:prefLabel xml:lang="fi">kunniamerkit</skos:prefLabel>
<skos:prefLabel xml:lang="en">medals of honour</skos:prefLabel>
<skos:altLabel xml:lang="sv">ordnar</skos:altLabel>
<skos:altLabel xml:lang="sv">ordnar (hederstecken)</skos:altLabel>
<skos:broader rdf:resource="http://www.yso.fi/onto/yso/p1581"/>
<skos:related rdf:resource="http://www.yso.fi/onto/yso/p4347"/>
<skos:related rdf:resource="http://www.yso.fi/onto/yso/p4348"/>
<skos:related rdf:resource="http://www.yso.fi/onto/yso/p11634"/>
<skos:exactMatch rdf:resource="http://www.yso.fi/onto/koko/p30868"/>
<skos:exactMatch rdf:resource="http://www.yso.fi/onto/ysa/Y96541"/>
<skos:exactMatch rdf:resource="http://www.yso.fi/onto/allars/Y23916"/>
</skos:Concept>
Vocabularies used by Data Providers
language coverage: 0.36
(topics and subjects)
Not all vocabularies are
properly language tagged!
Europeana’s Knowledge Graph
Entity
Collection
Entity Collection: benefits
● Allows Europeana to establish links to the
Knowledge Graph through means of semantic
enrichment of the object metadata
● Harmonizes vocabularies from the multiplexity of
data providers into a single point of reference
● Exploits coreference links between vocabularies
to increase multilingual coverage
Entity
Collection
Entity
Collection
Entity Collection: multilingual coverage
language coverage: 13.1
(topics and subjects)
For persons drops to 4.8
Entity Collection: multilingual improvements
Steps to improve the Knowledge Graph
● Promote alignment efforts
between vocabularies used by
data providers to complementary
vocabularies such as Wikidata
● Promote translation
efforts/campaigns to increase
multilingual coverage of the
Knowledge Graph prioritising on
discovery-enabling metadata
fields
A FOCUSED VIEW ON
THE GENERAL STRATEGY
Idrottstävlingar på Eyravallen. "Benke". 27 september 1955.,Örebro Kuriren, Örebro läns museum, Sweden, Public domain
Multilingual search, browse and display
Usage scenarios
● Enter search query in chosen language
● See search results and interact with filters in
chosen language
● Display object metadata on item page
● Navigate to entities
Proposals for indexing and storing translations
● Automated identification of language if needed (only 26.5% of the data
provider’s metadata is language qualified)
● Use translations from multilingual knowledge graph
● Augment the provider metadata with static translation of the fields to English
(to fill metadata values not covered by the knowledge graph)
● Store and index translated metadata for search and display (original metadata
+ languages of the knowledge graph + English)
Proposals for search on object metadata
Identify
language
Original
query
Translate to
English
Multilingual
index
User
Disambiguates
Search
Translated query (English)
Suggest Entity
(Knowledge Graph)
Entity-based query
Multilingual query:
entity based query
OR original query +
translated query
#1: French
#2: Spanish
#3: Polish
Proposals for display of object metadata
Multilingual
Database
Translate from
English
Obtain metadata
(Knowledge Graph)
In original language
or English
Obtain
metadata
In other
language
Request
metadata
MULTILINGUAL EXPERIENCE
OUTCOMES
● Users can search and filter in one of 24 official languages
● Item page metadata would display in chosen language if knowledge
graph translations were present
● Where chosen language is not supported, display will default to
source language and offer option to view in English
Challenges & Open Questions
● How successful is automated language detection?
● Would prioritising static translation of discovery-enabling metadata fields to
English be “good enough”?
● How well can we statically translate remaining metadata fields to English,
specially when they contain single or short phrases?
● Would dynamic translation of metadata (for languages other than English) be
good enough?
The Chinese Market, 1767 - 1769, Rijksmuseum, Netherlands, Public domain
europeana.eu
@EuropeanaEU

More Related Content

What's hot

Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Europeana Licensing
 

What's hot (15)

Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for Europe
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
 
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
 
The META-NET Language White Paper Series
The META-NET Language White Paper SeriesThe META-NET Language White Paper Series
The META-NET Language White Paper Series
 
Digital Humanities @ Net7
Digital Humanities @ Net7Digital Humanities @ Net7
Digital Humanities @ Net7
 
Pundit, an Open Source semantic annotation tool for the web
Pundit, an Open Source semantic annotation tool for the webPundit, an Open Source semantic annotation tool for the web
Pundit, an Open Source semantic annotation tool for the web
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
 
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
Luis Ferraro - DG CONNECT - culture and creativity in the digital realm 062013
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...
 
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
 
META-NET: Language Technology for Europe
META-NET: Language Technology for EuropeMETA-NET: Language Technology for Europe
META-NET: Language Technology for Europe
 

Similar to Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1, 24 october 2019

Similar to Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1, 24 october 2019 (20)

Europeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana 2019 - Connect Communities
Europeana 2019 - Connect Communities
 
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertu
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertuMate Toth: Digitisation and creative re-use of cultural content #blokexpertu
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertu
 
Europeana Aggregators' Fair day 2
Europeana Aggregators' Fair day 2Europeana Aggregators' Fair day 2
Europeana Aggregators' Fair day 2
 
EurnewsLDN_Krzysztof_Nichczynski
EurnewsLDN_Krzysztof_NichczynskiEurnewsLDN_Krzysztof_Nichczynski
EurnewsLDN_Krzysztof_Nichczynski
 
Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1
 
Future Library Unconference 2013 - Ad polle
Future Library Unconference 2013 - Ad polleFuture Library Unconference 2013 - Ad polle
Future Library Unconference 2013 - Ad polle
 
Europeana 2019 - Connect Communities - Pitch your project
Europeana 2019 - Connect Communities - Pitch your projectEuropeana 2019 - Connect Communities - Pitch your project
Europeana 2019 - Connect Communities - Pitch your project
 
Rob Davies : How we got here
Rob Davies : How we got hereRob Davies : How we got here
Rob Davies : How we got here
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programme
 
Europeana en CARARE
Europeana en CARAREEuropeana en CARARE
Europeana en CARARE
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityEvaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
 
Introducing DARIAH
Introducing DARIAHIntroducing DARIAH
Introducing DARIAH
 
LoCloud: Local content in the Europeana cloud overview, Kate Fernie
LoCloud: Local content in the Europeana cloud overview, Kate FernieLoCloud: Local content in the Europeana cloud overview, Kate Fernie
LoCloud: Local content in the Europeana cloud overview, Kate Fernie
 
Cross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projectsCross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projects
 
Dh2016 dstp
Dh2016 dstpDh2016 dstp
Dh2016 dstp
 
From Digitisation to Preservation, Creative Re-Use of Cultural Content, and C...
From Digitisation to Preservation, Creative Re-Use of Cultural Content, and C...From Digitisation to Preservation, Creative Re-Use of Cultural Content, and C...
From Digitisation to Preservation, Creative Re-Use of Cultural Content, and C...
 
Europeana 1914-1918, User-Generated Content and Linked Open Data
Europeana 1914-1918, User-Generated Content and Linked Open DataEuropeana 1914-1918, User-Generated Content and Linked Open Data
Europeana 1914-1918, User-Generated Content and Linked Open Data
 
Digital cultural heritage as humanities data: a labs approach
Digital cultural heritage as humanities data: a labs approachDigital cultural heritage as humanities data: a labs approach
Digital cultural heritage as humanities data: a labs approach
 
Europeana Generic Services Projects Meeting, 29-30 October 2018, The Hague, E...
Europeana Generic Services Projects Meeting, 29-30 October 2018, The Hague, E...Europeana Generic Services Projects Meeting, 29-30 October 2018, The Hague, E...
Europeana Generic Services Projects Meeting, 29-30 October 2018, The Hague, E...
 
Europeana 2019 - Connect Communities - 29 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 29 November 2019 - AuditoriumEuropeana 2019 - Connect Communities - 29 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 29 November 2019 - Auditorium
 

More from Europeana

Europeana Network Association Members Council Meeting 2019, The Hague by Alba...
Europeana Network Association Members Council Meeting 2019, The Hague by Alba...Europeana Network Association Members Council Meeting 2019, The Hague by Alba...
Europeana Network Association Members Council Meeting 2019, The Hague by Alba...
Europeana
 
Europeana Network Association Members Council Meeting 2019, The Hague by Isab...
Europeana Network Association Members Council Meeting 2019, The Hague by Isab...Europeana Network Association Members Council Meeting 2019, The Hague by Isab...
Europeana Network Association Members Council Meeting 2019, The Hague by Isab...
Europeana
 

More from Europeana (20)

Europeana Climate Action Community meetup 29_03_2022.pdf
Europeana Climate Action Community meetup 29_03_2022.pdfEuropeana Climate Action Community meetup 29_03_2022.pdf
Europeana Climate Action Community meetup 29_03_2022.pdf
 
French Presidency - 1 march 2022
French Presidency - 1 march 2022French Presidency - 1 march 2022
French Presidency - 1 march 2022
 
Europeana web conference portuguese presidency of the council of the eu - jun...
Europeana web conference portuguese presidency of the council of the eu - jun...Europeana web conference portuguese presidency of the council of the eu - jun...
Europeana web conference portuguese presidency of the council of the eu - jun...
 
Europeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 27-28 November 2019 - AuditoriumEuropeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
 
Europeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana 2019 - Connect Communities
Europeana 2019 - Connect Communities
 
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Juli...
Europeana Network Association Members Council Meeting 2019, The Hague by Juli...Europeana Network Association Members Council Meeting 2019, The Hague by Juli...
Europeana Network Association Members Council Meeting 2019, The Hague by Juli...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Gina...
Europeana Network Association Members Council Meeting 2019, The Hague by Gina...Europeana Network Association Members Council Meeting 2019, The Hague by Gina...
Europeana Network Association Members Council Meeting 2019, The Hague by Gina...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Alba...
Europeana Network Association Members Council Meeting 2019, The Hague by Alba...Europeana Network Association Members Council Meeting 2019, The Hague by Alba...
Europeana Network Association Members Council Meeting 2019, The Hague by Alba...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Harr...
Europeana Network Association Members Council Meeting 2019, The Hague by Harr...Europeana Network Association Members Council Meeting 2019, The Hague by Harr...
Europeana Network Association Members Council Meeting 2019, The Hague by Harr...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Isab...
Europeana Network Association Members Council Meeting 2019, The Hague by Isab...Europeana Network Association Members Council Meeting 2019, The Hague by Isab...
Europeana Network Association Members Council Meeting 2019, The Hague by Isab...
 

Recently uploaded

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Recently uploaded (18)

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 

Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1, 24 october 2019

  • 1. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 EUROPEANA MEETING UNDER FINLAND’S PRESIDENCY OF THE COUNCIL OF THE EU ESPOO, FINLAND 24 October 2019
  • 2. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Andy Neale Technical Director Europeana Foundation
  • 3. Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain Contribution to EU GDP by culture and creative sectors Trade Surplus in cultural goods € 8.7B 4.2% New Agenda for Culture
  • 4. Automotive + Manufacturing + Chemical Industries Cultural + Creative Sector 7.8M 4.4M> Employment
  • 6. The role of Europeana Europeana Party People @ Christmas party, CC BY
  • 7. We support cultural heritage institutions in their digital transformation
  • 8. Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain 3.700 CHIs across Europe
  • 9. EUROPEANA COLLECTIONS 58m Cultural heritage records Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain 2.5bln Information items
  • 10. 1. Common Tech & data architecture Europeana Data Model + Metis
  • 11. 2. Common policies & standards Europeana ● Licensing Framework ● Publishing Framework
  • 12. Statements for works that are not in copyright Statements for works where the copyright status is unclear Statements for works that are in copyright
  • 13.
  • 14. 3. Websites & APIs Europeana Collections
  • 15. Programme Europeana Party People @ Christmas party, CC BY
  • 16. Objectives 1. Stimulate reflection on multilingualism in digital cultural heritage at large using Europeana as a case study; 2. Develop a deeper understanding of the multilingualism problem/opportunity space for digital cultural heritage; 3. Consider what options can be pursued to provoke action at the local level, furthering the multilingual capabilities; 4. Provide input and feedback for the Europeana multilingual strategy.
  • 17. Sessions 1. Setting the scene 2. User interactions 3. Multilingual metadata 4. Content translation 5. Conclusions and steps for progress
  • 18.
  • 19. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Juliane Stiller Information Specialist You, We & Digital ‘Multilingual Developments in Digital Cultural Heritage Domain: Problem Space & Solutions’
  • 20. 20 ● 10 years researcher at Humboldt-Universität zu Berlin in Europeana-related projects ● multilinguality, interaction patterns, metadata and its quality, research on search and browse, retrieval, evaluation ● since 2019 consultancy and training in digital literacy @stillinsky
  • 21. Agenda • Multilinguality: the problem space • Bridging the language gap • Translations • Enrichments • What is left to do? 21
  • 22. 22 Multilinguality The Problem Space Christoffel van Sichem: Bouw van de toren van Babel
  • 23. Content Information Access Interactions User Interface Metadata and digital CH objects Search, Browse & Explore Show user‘s preferred language Bridge the gap between language of user input and content Layers of digital CH system
  • 24. User Interface 24 Challenges: • Translation of static and dynamic pages • Switching languages via text or icons such as flags • Default language • Determine the user‘s preferred language through IP address or browser settings User Interface
  • 26. Mismatch between query and content language • Mona Lisa 203 results • Monna Lisa 13 results • La Gioconda 376 results  • La Joconde 78 results 26 Interactions Roma, Galleria Corsini - La Gioconda,
  • 27. Interactions: Browse ● Search vs. browse ● (Metadata) text vs. object 27 Interactions
  • 28. Interactions: Explore cater for different information needs in different languages: • Entities • Colors, format • Access & copyrights • Inspiration 28 Interactions
  • 29. Content & Metadata 29 Image Credit: both from Europeana with Titlte „Kinderbuch” from Spielzeugmuseum der Stadt Nürnberg (CC BY-NC-SA) Content
  • 30. Metadata multilinguality 30+ 40 other languages.... Content
  • 31. Bridging the language gap Translations & Enrichments 31 Bridge by Mark Robinson (CC-BY 2.0)
  • 32. To bridge the gap between language of user input and content, one can translate 1. Queries 2. Content / Metadata 3. ..... 32
  • 33. 1) Translating queries 33 Query English Spanish French .... comes with challenges .... Database Information Access
  • 34. Cultural heritage queries 34 κερκυρα poblet bævre humble østerskovvej espana salamanca academia coleccam documentos estatutos εσκι σεχιρ first war world berlin berliner mauer or wall alphonse mucha
  • 35. Query heterogeneity & long tail 35 Europeana queries in a month in 2016 442 times: Wolfgang Amadeus Mozart once: full history of ging tsholing in bhutan
  • 36. Queries in cultural heritage are ● Short ● Heterogeneous ● Focus on entities: 61.96% of the queries contain NE (Stiller, Gäde & Petras, 2010) ● Highly ambiguous in language: ○ “culture”, “administration”, “paris”, “madonna” ● Semantically ambiguous: ○ “barber” (composer or hairdresser) 36
  • 37. Multilingual academic search ● informational queries from the psychology domain in 4 languages: pubpsych.eu ● Buildung domain-specific lexical resources and map them to queries; entries look like this: ○ wohlbefinden|||en:well-being|||es:bienestar|||fr:bien-etre ○ wohlfuhlen|||en:well-being|||es:bienestar|||fr:bien-etre ○ Well-being|||es:bienestar|||de:wohlbefinden|||fr:bien-etre ● Translation does not depend on language identification ● Deals well with NE -> no match in Lexicon, no translation More Info on the project: https://www.clubs-project.eu/en/
  • 38. Query 2) Translate the content 38 Spanish French German English Content English French German Spanish Content Database
  • 40. Challenges • Missing training data for small languages • Missing training data for (sub)domains • Amount of language pairs is immense with 50+ languages • Metadata is too scarce for good translation results 40
  • 43. Number of enriched objects, their type and vocabularies GeoNames 7 Millions GEMET, DBpedia 9.2 Millions Semium Time 10.2 Millions DBpedia 144,000 Time Concept Locations Agents Enriched entities in Europeana
  • 45. What is left to do? 45
  • 46. Adapt to queries Entity graphs for exploration • Object • Person • Concept • Period • Location • Event 46
  • 47. Evaluate solution based on goal ○ E.g. for ML retrieval we might not need the perfect fluent translation ○ Identify the impact of different workflows / processes on multilinguality of system ○ Translations do not only have an impact on data but also on retrieval and therefore on user satisfaction 47
  • 49. References • Petras, V., Hill, T., Stiller, J., & Gäde, M. (2017). Europeana – a Search Engine for Digitised Cultural Heritage Material. Datenbank-Spektrum, 1–6. https://doi.org/10.1007/s13222-016-0238-1 • Hill, T. D., Charles, V., Isaac, A., & Stiller, J. (2016). “Searching for Inspiration”: User Needs and Search Architecture in Europeana Collections. ASIS&T 2016 Annual Meeting. • Manguinhas H (2016) Europeana Semantic Enrichment Framework. Documentation, Europeana. https://docs.google.com/document/d/1JvjrWMTpMIH7WnuieNqcT0zpJAXUPo6x4uMBj1pEx0Y • Stiller, J. (editor) )(2016) Best practices for multilingual access. Tech. rep. http://pro.europeana.eu/files/Europeana_ Professional/Publications/BestPracticesForMultilingualAccess_whitepaper.pdf • Stiller, J., Gäde, M., & Petras, V. (2013). Multilingual access to digital libraries: The Europeana use case. Information-Wissenschaft Und Praxis, 64, 86–95. • Olensky, M., Stiller, J., & Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In 6th Research Conference, MTSR 2012, Cádiz, Spain, November 28-30, 2012. (pp. 252–263). Berlin: Springer. • Stiller, Gäde, Petras (2010): Ambiguity of Queries and the Challenges for Query Language Detection. 49
  • 50. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Rickard Domeij Language Planner Language Council of Sweden, Institute of Language and Folklore Multilingualism, technology and language policy
  • 51. Content ● The LC and the multilingual language policy of Sweden (and EU) ● Multilingually accessible services ● Language technology (LT) and language resources ● National Language Bank ● First experiences in digital humanities and cultural heritage ● Challenges for LT in cultural heritage ● Next steps
  • 52. Multilingual language policy ● The LC monitors and promotes the languages of Sweden and their use ● Language policy (2005) and Language act (2009) ● Status and rights to use Swedish and other languages in Sweden ● National minority languages: Sami, Meänkieli, Finnish, Romani, Jiddish ● Swedish sign, Nordic languages, EU-languages, immigrant languages ● Public agencies have to reach out to the whole population ● Also good for business
  • 53. Multilingually accessible services ● Vision: a multilingual society in which all citizens are included with respect to different backgrounds and languages --> digital inclusion ● Access to info and services according to language rights and needs ● Switch between languages and modes according to preferences ● Example: have a web text read aloud in your language ● Essential for people with disabilities but also useful for others = design for all (e.g. subtitling)
  • 54. LT to make it possible ● Conversions between languages and modes ● Different modes: writing, speech, gestures … ● Multilingualism = multilinguality + multimodality ● LT modules: text-to-speech (TTS), speech-to-text (STT), machine translation (MT) … ● Applications: recitation, dictation, translation … ● Voice translation: STT > MT > TTS
  • 55.
  • 56. LT to make it possible II ● Problems with quality and trust, especially on unrestricted data ● User and domain adaptation, user interaction ● Ex: respeaking system for subtitling on tv ● Accessibility often means loss of quality, but other gains ● Accessible and usable
  • 57. Language resources needed ● Data and tools: corpora, markup tools, lexicons, language models … ● Rule-based methods, especially for less resourced languages ● Market forces are not enough ● Stimulate the development of LT and multilingually accessible services by national means (ex: respeaking system for Swedish tv) ● National Language Bank (NLB) to make resources available for R&D An NLB promotes the development of technology, which benefits the languages in Sweden and improves access to information for everyone. Digital agenda for Sweden (2011)
  • 58. National research infrastructure (2017- 00626) funded by the Swedish Research Council by 1,5 mil./year until 2025. Two main types of data: Multilingual texts and terms from PAs Multimodal cultural heritage collections
  • 59. First experiences in cultural heritage ● Available voice recognition and MT doesn’t work! ● Instead try other methods: ○ ”sound browsing” to explore speech recordings acoustically ○ respeaking for transcribing speech ○ transcription of handwritten dialect text in Transcribus ○ time-alignment of existing transcripts to sound in ELAN ○ linking from text to speech data in the archives (see next page) ● Usage centered, participative design in multidisciplinary teams ● Tilltal project (SAF16-0917:1)
  • 60.
  • 61. First experiences in cultural heritage ● State-of-the-art voice recognition and MT doesn’t work! ● Instead try other methods: ○ ”sound browsing” to explore speech recordings acoustically ○ respeaking for transcribing speech ○ transcription of handwritten dialect text in Transcribus ○ time-alignment of existing transcripts to sound in ELAN ○ linking from text to speech data in the archives ● Usage centered, participative design in multidisciplinary teams ● Tilltal project (SAF16-0917:1)
  • 62. Challenges for LT in cultural heritage ● Interface or content (= multilingual in a broad sense) ● Far beyond modern standard language use ● Great variation makes domain adaptation hard ● Variation in place (dialects and languages), time (old Swedish) and situation (informal-formal) ● Modal variation in collections: (handwritten) text, speech, pictures ● Hard to handle as researchers want to explore a collection as a whole
  • 63. Next steps ● Linked data to describe the collection conceptually and relationally ● Multilingual search methods for handling language variation in place, time and situation ● Domain adopted speech-to-text conversion to transcribe recordings ● Crowdsourcing for correcting ● Shared resources for the languages, dialects, domains etc ● Long time funding for the National Language Bank ● Collaborative projects involving LTists, researchers and data holders
  • 64. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Andrejs Vasiļjevs Executive Chairman Tilde Project Manager Culture information systems centre Riga Jānis Ziediņš Learnings from the automatic translation projects and how to apply them for the culture and heritage sector
  • 65. Culture information systems centre 65 Our mission is to assist cultural heritage institutions - ARCHIVES LIBRARIES MUSEUMS maintain and make available cultural heritage for future generations through the latest information technology solutions.
  • 66. 6 6
  • 67. 6 7
  • 68. Benefit for eGovernment 6 8 State Gov.lv platform Platform for the provision and management of e-Services Single Public Administration Data Area Municipality IS Other IS State information systems MT platform OpenData
  • 69. 6 9
  • 70. 7 0
  • 71. 7 1
  • 72. 7 2
  • 73. 7 3
  • 74. 7 4
  • 75. 7 5
  • 76. 7 6
  • 77. Digitization of the Cultural Heritage Content The National Library of Latvia is implementing a European Regional Development Fund (ERDF) and nationally co-funded project in the field of Latvia's digital cultural heritage, together with project partners – the National Archives of Latvia, the State Inspection for Heritage Protection of Latvia, and the Cultural Information System Centre. The project will further develop the Digital Object Management and Conservation System, develop the Copyright Management and Content Licensing System, publish several Open Datasets, including Related Open Datasets, and develop the Stage of an Integrated Centralized Open System Information Platform. 7 7
  • 78. 7 8
  • 79. 7 9
  • 80. Translation test A photomontage postcard with five views of Riga. The central city panorama with the new Pontoon Bridge opened in 1896 and the Mazā Guild building in the right corner. Below these images, the city theatre, Vērmanes Garden and the bridge across the canal by Bastejkalns. A postcard is assembled from five views of Riga - downtown panorama with the new Pontonbridge discovered in 1896, the Little Guild House in the right corner, under these images - City Theatre, Verman Gardens, a bridge over the canal near BastejHill. Manual translation Hugo.lv translation VRVM 176655 http://www.nmkk.lv/Items/ItemViewForm.aspx?i d=167748 8 0
  • 82. Enablers of AI ML Algorithms Computing PowerBig Data
  • 83.
  • 84. 84 Based on Tilde Neural MT technologies that have won the 1st place at the WMT2017-2019, a global competition between the World’s top language technology providers Best WMT 2017 Best WMT 2018 Best WMT 2019
  • 85. • Generic MT systems were trained on 52 million parallel sentences • Cultural domain MT systems were customized with additional 826 000 parallel sentences 5 million monolingual sentences Books Public sector data ▪ Fiction ▪ Scientific literature ▪ Technical literature (manuals, instructions) ▪ News from popular media (also multilingual media) ▪ Company press releases ▪ Multilingual web site content ▪ Laws, regulations, directives, etc. ▪ Documents of internal and external use ▪ Press releases ▪ Public sector web site data News and web content Proprietary translation memories ▪ Professional and amateur translator produced data ▪ Translation memories of translation and localisation service providing companies ▪ Translation memories of international organisations Datafor MT System Development
  • 86. Comparison to Google – Automatic Evaluation
  • 87. Comparison to Google – Human Evaluation
  • 88. Usability, productivity and integration Translation add-on for browsers Translation API Plug-ins for CAT tools Translation widget
  • 89. Hugo.lv – AI powered language technology portal
  • 90. 90
  • 91. 91 1.1 million terms 22 subject fields 164 216 terms in culture domain
  • 92. 92 EU Council Presidency Translator 2017-2020
  • 93. 93 EU Presidency bildīte EU COUNCIL PRESIDENCY TRANSLATOR
  • 94. 94 EU PRESIDENCY TRANSLATOR AI-powered Neural MTCEF eTranslation MT systems for the 24 official EU languages enabling translation of full documents, preserving text formatting AI-powered custom Neural MT providing superior-quality translation adapted for the Presidency requirements
  • 95. 95 Web Site – Text Translation
  • 98. 98 BENEFITS FOR ESTONIA, BULGARIA, AUSTRIA • Enables Presidency staff to quickly translate documents • Empowers visiting journalists and delegates to access info in the local language, e.g., press releases, local news sites • Supports staff translators in their work by boosting translation productivity up to 35% • Lowers costs of translation for documents by utilizing post-edited machine translation • Allows public sector organizations to translate content and websites into multiple languages
  • 99. 99 From September, 2017 to October, 2019 the EU Council Presidency Translator has processed: 32 159 082 million words 2.83 million sentences 1.09 million translation requests ~207 books (there are 155 thousand words on average in one Harry Potter book) STATISTICS
  • 100. 100
  • 101. 101 Conclusions • New generation of Neural MT strongly improves quality and applicability of machine translation, especially for morphology rich languages • Domain specific data is crucial for making MT suitable for cultural and other domains • Depending on the application, translation needs can be served by selecting the most efficient approach – pure MT, human review of the MT, or fully human translation • We will be happy to share our experience, technologies and tools :)
  • 102. Thank you! Jānis Ziediņš, janis.ziedins@kis.gov.lv Andrejs Vasiļjevs, andrejs@tilde.com
  • 103. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Heli Kautonen Library Director Finnish Literature Society SKS Design for Diversity
  • 104. Design for Diversity Heli Kautonen Library Director, Finnish Literature Society (SKS) 24.10.2019 Europeana meeting on multilingualism, Hanaholmen, Finland
  • 105. 1831 Photo © Gary Wornell, SKS 2019
  • 106. Image © SKS 2010 Suomalaisen Kirjallisuuden Seura (Finnish Literature Society)
  • 107. Photo © Gary Wornell, SKS 2019
  • 108. Photo: Alexandre Caffiaux, Université de Lille, 2018. CC-BY 2.0 Diversity
  • 109. Diversity Photo: Jackster121212 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=80077504C
  • 110. Photo: Heli Kautonen 2017 Design
  • 111. Universal Design Critical Design Inclusive Design Value-Sensitive Design Photo: Helsinki City Museum, CC-BY 4.0 Source: Finna.fi Photo: newobj Source: Github.com
  • 112. How might we…? Photo: Heli Kautonen 2016
  • 113. …measure the value… …now, next year, in the future? Photo: Heli Kautonen 2019
  • 114. Development Implementation Operation and maintenance Initiation (of a new service) time Process-time Use-time Future Who are involved in the development and implementation of your service? What kinds of benefits can be identified? Who uses your service? Are there other stakeholders? What kinds of benefits can be identified? Who could (re)use your service or materials in the (undefined) future? What kinds of benefits can be anticipated? Model for temporal division of benefits Kautonen, H. & Nieminen, M. (2018): Conceptualizing Benefits of User-Centered Design for Digital Library Services. Liber Quarterly, 28(1), ss. 1–34. DOI: http://doi.org/10.18352/lq.10231.
  • 116. ”for + with society” Prof. Linda Doyle Trinity College Dublin Photo: Heli Kautonen 2019
  • 117. Photo: Heli Kautonen 2019 2031
  • 118. Questions and comments heli.kautonen@finlit.fi Twitter: @helimuori https://fi.linkedin.com/in/heli-kautonen-38136512
  • 119. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Dasha Moskalenko Manager Service Design Europeana Foundation Europeana case study UX Design and user testing
  • 120. Ο Ζητιάνος Φοιτητής, Άγνωστος δημιουργός, 1945,Ίδρυμα Μουσείου Νίκου Καζαντζάκη, Greece, CC BY-NC-ND
  • 121.
  • 122.
  • 123. Καντσονίσιμα-Σατιρίσιμα-Ψυθιρίσιμα, Άγνωστος δημιουργός, 1971, Ίδρυμα Μουσείου Νίκου Καζαντζάκη, Greece, CC BY-NC-ND
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 132. Language detection and display (for validation)Query translated in 24 languages
  • 133. Results displayed based on relevance in all languagesResults displayed in original languageSearch term highlighted
  • 134. Sort by language availableLanguage tag showing item’s original languageLanguages in which item metadata is available
  • 135. Item’s original language & option for automatically translation
  • 136. Hands showing the French sign language alphabet, Wellcome Collection, CC BY europeana.eu @EuropeanaEU THANK YOU! Questions & comments are welcome. dasha.moskalenko@europeana.eu
  • 137. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Matias Frosterus Information Systems Manager with Mikko Lappalainen, Osma Suominen, Satu Niininen National Library of Finland Multilingual linked vocabularies and automatic subject indexing services - National Library's Finto and Annif
  • 138. THE NATIONAL LIBRARY OF FINLAND Libraries and access
  • 139. THE NATIONAL LIBRARY OF FINLAND Libraries and access ?
  • 140. THE NATIONAL LIBRARY OF FINLAND Libraries and access ?
  • 141. THE NATIONAL LIBRARY OF FINLAND Libraries and access !
  • 142. THE NATIONAL LIBRARY OF FINLAND Libraries and access !
  • 143. THE NATIONAL LIBRARY OF FINLAND Libraries and access
  • 144. THE NATIONAL LIBRARY OF FINLAND The goal ▪ Bringing the library know-how into use for all of the public sector ▪ But better! ▪ Better vocabularies ▪ Publication, use, and integration of those better vocabularies ▪ Automated tools to make it even easier
  • 145. THE NATIONAL LIBRARY OF FINLAND What is needed? ▪ Modern linked data vocabularies ▪ A way to publish them for everyone to use ▪ A way to integrate them into your systems ▪ A way to make using them less labour-intensive
  • 146. THE NATIONAL LIBRARY OF FINLAND Vocabularies
  • 147. THE NATIONAL LIBRARY OF FINLAND Vocabularies ▪ Starting point: General Finnish Thesaurus YSA ▪ Developed in the 1980’s mainly for book indexing ▪ Over 30,000 terms ▪ Monolingual but has a Swedish counterpart Allärs
  • 148.
  • 149. THE NATIONAL LIBRARY OF FINLAND Thesaurus to ontology ▪ Reconstruction of YSA into machine-readable and multilingual YSO ▪ Trilingual terms for concepts (fin, swe, eng) ▪ YSA and Allärs merged together and translated into English ▪ Concepts are a compromise between Finnish and Swedish as YSA and Allärs are not completely identical ▪ Links to Library of Congress Subject Headings (LCSH) ▪ Linking to Wikidata underway ▪ YSO just made the list of Europeana dereferenceable vocabularies that can be enriched in the Europeana portal
  • 150. THE NATIONAL LIBRARY OF FINLAND Annotate in one language, find using another
  • 151. THE NATIONAL LIBRARY OF FINLAND Challenges of multilinguality ▪ Founded on the concepts of the Finnish cultural sphere ▪ Some concepts may not be common outside of that ▪ sandwich cakes, uncles (maternal) ▪ väheneminen = minskning (antal) = decrease (passive) vähentäminen = minskning (aktiv reducering av antal) = decrease (active) ▪ Liikuntalukiot = idrottsgymnasier = general upper secondary schools focusing on sport and exercise
  • 152. THE NATIONAL LIBRARY OF FINLAND Challenges of multilinguality ▪ Some may result in somewhat awkward terms ▪ rivers = joet = floder, åar och älvar ▪ The original Swedish thesaurus Allärs had three terms that could be used interchangeably
  • 153. THE NATIONAL LIBRARY OF FINLAND Challenges of multilinguality ▪ Can also affect hierarchy ▪ pesät ⤷ muurahaispesät (literally ant nests) bon ⤷ myrstackar nests ⤷ ant hills ▪ For more information, see http://urn.fi/URN:NBN:fi-fe201705106375 Satu Niininen, Susanna Nykyri, Osma Suominen, (2017) "The future of metadata: open, linked, and multilingual – the YSO case", Journal of Documentation, Vol. 73 Issue: 3, pp.451-465, doi: 10.1108/JD-06-2016-0084.
  • 154. THE NATIONAL LIBRARY OF FINLAND YSO YSO Upper hierarchy General concepts Specific concepts
  • 155. THE NATIONAL LIBRARY OF FINLAND YSO YSO Upper hierarchy General concepts Specific concepts
  • 156. THE NATIONAL LIBRARY OF FINLAND Adapted into use outside the library domain ▪ Extended with domain ontologies ▪ Using the core provided by YSO ▪ Helps interoperability! ▪ Developed by the domain experts in various organizations
  • 157. THE NATIONAL LIBRARY OF FINLAND Adapted into use outside the library domain ▪ Extended with domain ontologies ▪ Using the core provided by YSO ▪ Helps interoperability! ▪ Developed by the domain experts in various organizations ▪ Over a dozen domain ontologies such as: ▪ AFO - Agriculture - 7 000 concepts ▪ JUHO - Government - 6 300 ▪ KAUNO - Literature - 5 000 ▪ KULO - Cultural research - 1 500 ▪ LIITO - Economics - 3 000 ▪ SOTO - Military - 2 000 ▪ TERO - Health - 6 500 ▪ And others
  • 158. THE NATIONAL LIBRARY OF FINLAND Domain ontologies all extending YSO in
  • 159. THE NATIONAL LIBRARY OF FINLAND KOKO ▪ An ”ontology cloud” which combines the domain ontologies and the general ontology into a cohesive whole
  • 160. KOKO ▪ An ”ontology cloud” which combines the domain ontologies and the general ontology into a cohesive whole
  • 161. THE NATIONAL LIBRARY OF FINLAND Vocabulary service
  • 162. THE NATIONAL LIBRARY OF FINLAND National vocabulary and ontology service Finto ▪ A bit of history ▪ FinnONTO-research project (2003-2012) ▪ Built research prototypes of services and started the ontologization process of the various thesauri ▪ The National Library began the Finto project in 2013 funded by the Ministry of Education and Culture and the Ministry of Finance ▪ A national vocabulary and ontology service for the whole public sector
  • 163. THE NATIONAL LIBRARY OF FINLAND Finto offers
  • 164. THE NATIONAL LIBRARY OF FINLAND Finto offers Free to use Open licenses
  • 166.
  • 167. THE NATIONAL LIBRARY OF FINLAND Adopted widely in Finland ▪ Finto is used in many organizations in Finland to annotate their various resources, among them ▪ The national broadcasting company Yle ▪ Suomi.fi citizen’s portal to public services ▪ Various public sector content systems ▪ Websites of various ministries ▪ Various museums, archives, and libraries
  • 168. THE NATIONAL LIBRARY OF FINLAND Skosmos ▪ The heart beating inside Finto ▪ Open source SKOS vocabulary browser ▪ http://skosmos.org ▪ Publication and use of light-weight ontologies, thesauri and classifications ▪ Web interface ▪ REST API ▪ SPARQL endpoint ▪ Community ▪ https://groups.google.com/forum/#!forum/skosmos-users
  • 169. How does it work? ▪ Make your thesaurus into SKOS
  • 170. SPARQL ▪ Put in in a SPARQL triple store How does it work?
  • 171. SPARQL Skosmos ▪ Point Skosmos at your SPARQL endpoint How does it work?
  • 172. SPARQL Skosmos ▪ And serve your thesaurus for humans, Linked Data agents, and REST API access How does it work?
  • 173. THE NATIONAL LIBRARY OF FINLAND Key features ▪ Multilingual browser interface (10 languages) ▪ Autocomplete search ▪ Alphabetical index ▪ Concept hierarchy display ▪ Concept groups (thematic index) ▪ New concepts ▪ REST API for enabling use of vocabularies in other applications ▪ responses usually JSON-LD
  • 174. www.loterre.fr/skosmos http://chemskos.com Skosmos installations around the world http://vocabularies.unesco.org/ http://aims.fao.org/standards/agro voc/functionalities/search
  • 175. THE NATIONAL LIBRARY OF FINLAND Automated subject indexing
  • 176. THE NATIONAL LIBRARY OF FINLAND Many possible solutions
  • 177. THE NATIONAL LIBRARY OF FINLAND Some problems YSO KOKO AFO JUHO € £ $
  • 178. THE NATIONAL LIBRARY OF FINLAND Automated Subject Indexing made easy: Annif ▪ An open source multilingual automated subject indexing system using machine learning and our own vocabularies
  • 179. THE NATIONAL LIBRARY OF FINLAND Where to get the learning material?
  • 180. Metadata about 13M documents, many of them tagged with subjects! Hot tub by a lake Andrei Niemimäki CC BY-SA
  • 181. Hot tub by a lake Andrei Niemimäki CC BY-SA Metadata about 13M documents, many of them tagged with subjects!
  • 182. Hot tub by a lake Andrei Niemimäki CC BY-SA Metadata about 13M documents, many of them tagged with subjects!
  • 183. Finna API ▪ All Finna metadata is ▪ YSO and KOKO widely used
  • 184. THE NATIONAL LIBRARY OF FINLAND ▪ Try it out for yourself at http://annif.org/ Automated Subject Indexing made easy: Annif Prototype in 2017
  • 185. THE NATIONAL LIBRARY OF FINLAND Automated Subject Indexing made easy: Annif VsAutomating our own processes Creating generic tools for many contexts
  • 186. THE NATIONAL LIBRARY OF FINLAND Annif development ▪ Packaging Annif into an easy-to-deploy solution via Docker ▪ Tuning the various algorithms and their hyperparameters powering Annif ▪ Making integration easier through a Finto API
  • 187. THE NATIONAL LIBRARY OF FINLAND Summary
  • 188. THE NATIONAL LIBRARY OF FINLAND Summary Interlinked multilingual vocabularies for various domains A national service for publishing and using said vocabularies An automated system for making it easy to produce annotations with said vocabularies
  • 189. THE NATIONAL LIBRARY OF FINLAND Summary Interlinked multilingual vocabularies for various domains A national service for publishing and using said vocabularies An automated system for making it easy to produce annotations with said vocabularies All the while utilizing library know-how Richer metadata Cross-domain findability and interoperability More efficient workflows New connections, new possibilities
  • 190. THE NATIONAL LIBRARY OF FINLAND Thank you! matias.frosterus@helsinki.fi finto-posti@helsinki.fi @Fintopalvelu All pictures used under CC0 license unless otherwise noted
  • 191. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Hugo Manguinhas Product Manager API Europeana Foundation Case Study - Translation of object metadata using the Knowledge Graph
  • 192. Multilingual experience Collections Object metadata Text objects Search Browse Display Translatable dataUsage scenarios Editorial content User interface
  • 193. Object Metadata What is the title of the object? Who created or contributed it? What topics is the object about? What kind of object it is? When was it created or published? Where was it created or is located? ...
  • 194. KNOWLEDGE GRAPH Bulong Miao, Wellcome Collection, United Kingdom, CC BY
  • 195. About the Knowledge Graph ● Vast network of data sources made available in the wider Linked Open Data cloud ● Can be linked to and used to bring more contextual information to the items ● Vast and readily available source of controlled translations Part of the Linking Open (LOD) Data Project Cloud Diagram, CC-BY-SA.
  • 196. EDM and the Knowledge Graph We encourage data providers to ● Contribute links to their own vocabularies and publish them as Linked Open Data ● Use available reference vocabularies to describe their content Clavecin, Bartolomeo Cristofori Cite de la Musique, MIMO - Musical Instruments Museums Online|CC BY-NC-SA
  • 197. ● Available as Linked Open Data and therefore part of the Knowledge Graph ● The rights statements have been translated into: Estonian, Finnish, French, German, Polish and Spanish, but 7 more translation efforts are ongoing Research has shown that the official translation of rights information leads to better investment/effort into adoption of rs.org and thus more accurate copyright info
  • 198. General Finnish Ontology (YSO) <skos:Concept rdf:about="http://www.yso.fi/onto/yso/p4349"> <skos:prefLabel xml:lang="sv">hederstecken</skos:prefLabel> <skos:prefLabel xml:lang="fi">kunniamerkit</skos:prefLabel> <skos:prefLabel xml:lang="en">medals of honour</skos:prefLabel> <skos:altLabel xml:lang="sv">ordnar</skos:altLabel> <skos:altLabel xml:lang="sv">ordnar (hederstecken)</skos:altLabel> <skos:broader rdf:resource="http://www.yso.fi/onto/yso/p1581"/> <skos:related rdf:resource="http://www.yso.fi/onto/yso/p4347"/> <skos:related rdf:resource="http://www.yso.fi/onto/yso/p4348"/> <skos:related rdf:resource="http://www.yso.fi/onto/yso/p11634"/> <skos:exactMatch rdf:resource="http://www.yso.fi/onto/koko/p30868"/> <skos:exactMatch rdf:resource="http://www.yso.fi/onto/ysa/Y96541"/> <skos:exactMatch rdf:resource="http://www.yso.fi/onto/allars/Y23916"/> </skos:Concept>
  • 199. Vocabularies used by Data Providers language coverage: 0.36 (topics and subjects) Not all vocabularies are properly language tagged!
  • 201. Entity Collection: benefits ● Allows Europeana to establish links to the Knowledge Graph through means of semantic enrichment of the object metadata ● Harmonizes vocabularies from the multiplexity of data providers into a single point of reference ● Exploits coreference links between vocabularies to increase multilingual coverage Entity Collection Entity Collection
  • 202. Entity Collection: multilingual coverage language coverage: 13.1 (topics and subjects) For persons drops to 4.8
  • 204. Steps to improve the Knowledge Graph ● Promote alignment efforts between vocabularies used by data providers to complementary vocabularies such as Wikidata ● Promote translation efforts/campaigns to increase multilingual coverage of the Knowledge Graph prioritising on discovery-enabling metadata fields
  • 205. A FOCUSED VIEW ON THE GENERAL STRATEGY Idrottstävlingar på Eyravallen. "Benke". 27 september 1955.,Örebro Kuriren, Örebro läns museum, Sweden, Public domain
  • 206. Multilingual search, browse and display Usage scenarios ● Enter search query in chosen language ● See search results and interact with filters in chosen language ● Display object metadata on item page ● Navigate to entities
  • 207. Proposals for indexing and storing translations ● Automated identification of language if needed (only 26.5% of the data provider’s metadata is language qualified) ● Use translations from multilingual knowledge graph ● Augment the provider metadata with static translation of the fields to English (to fill metadata values not covered by the knowledge graph) ● Store and index translated metadata for search and display (original metadata + languages of the knowledge graph + English)
  • 208. Proposals for search on object metadata Identify language Original query Translate to English Multilingual index User Disambiguates Search Translated query (English) Suggest Entity (Knowledge Graph) Entity-based query Multilingual query: entity based query OR original query + translated query #1: French #2: Spanish #3: Polish
  • 209. Proposals for display of object metadata Multilingual Database Translate from English Obtain metadata (Knowledge Graph) In original language or English Obtain metadata In other language Request metadata
  • 210. MULTILINGUAL EXPERIENCE OUTCOMES ● Users can search and filter in one of 24 official languages ● Item page metadata would display in chosen language if knowledge graph translations were present ● Where chosen language is not supported, display will default to source language and offer option to view in English
  • 211. Challenges & Open Questions ● How successful is automated language detection? ● Would prioritising static translation of discovery-enabling metadata fields to English be “good enough”? ● How well can we statically translate remaining metadata fields to English, specially when they contain single or short phrases? ● Would dynamic translation of metadata (for languages other than English) be good enough?
  • 212. The Chinese Market, 1767 - 1769, Rijksmuseum, Netherlands, Public domain europeana.eu @EuropeanaEU