SlideShare a Scribd company logo
1 of 51
Download to read offline
Linked data for knowledge curation in
humanities research
Enrico Daga
Research Fellow, Knowledge Media Institute, The Open University
14th January 2020, Lancaster University / History Dept.
enrico.daga@open.ac.uk - @enridaga
https://isds.kmi.open.ac.uk/
https://kmi.open.ac.uk/
ArtificiaI intelligence R&D lab:
• Knowledge Representation and Reasoning
• Semantic Web
• Machine Learning
• Data Science
Application areas:
• Teaching & Learning
• Scholarly Data Analysis
• Social Media Analysis
• Smart Cities
• Humanities
What are Linked Data?
(a super-fast summary)
Invented the web in 1989
(yeah!)
Invented the semantic web
in 1994 (duh?)
“To a computer, then, the web is a flat, boring
world devoid of meaning”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
“This is a pity, as in fact documents on the web
describe real objects and imaginary concepts,
and give particular relationships between them”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
“Adding semantics to the web involves two things:
allowing documents which have information in
machine-readable forms, and allowing links to be
created with relationship values.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
“The Semantic Web is not a separate Web but an
extension of the current one, in which information is
given well-defined meaning, better enabling
computers and people to work in cooperation.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
Linked Data example
credits: EUCLID Project http://euclid-project.eu/ 23
URIs	(Unique	Resource	Identifiers)	are	used	to	identify	things	(also	
called	entities)	in	the	real	world	
For	instance:	people,	places,	events,	companies,	products,	movies,	etc.
credits: EUCLID Project http://euclid-project.eu/
This did not come out of the blue
World’s academic communities has been dealing for years with knowledge	
representation	
Artificial	intelligence, natural language processing, model management,
and many other research fields largely contributed
Some ancestors	traced the way …
EXAMPLE
• Instances are associated with one or several classes:
Boddingtons rdf:type Ale .
Grafentrunk rdf:type Bock .
Hoegaarden rdf:type White .
Jever rdf:type Pilsner .
Ale rdfs:subClassOf TopFermentedBeer .
White rdfs:subClassOf TopFermentedBeer .
TopFermentedBeer rdfs:subClassOf Beer .
Bock rdfs:subClassOf BottomFermentedBeer .
rdfs:subClassOf rdf:type owl:TransitiveProperty .
Ontologies
Complexity
Types
Labels
Descriptions
Comments
Class
Hierarchies
Relations
Documented
meaning
Basic Logic
Rules
Inferences
Transitivity
Domain
Range
Rules
Description Logic
Reasoning
Class unions
Set semantics
Intersections
Disjointness
[…]
light-weight heavy-weight
Ontologies, different types of
Domain independent: SKOS, OWL, Prov, Time, …
Foundational, general purpose:
• DOLCE, SUMO (“Upper Ontologies”)
• CIDOC-CRM: broad scope, targets “cultural heritage” in general
Pragmatic, community-oriented:
• Dublin Core Metadata Initiative
• Google’s schema.org
• https://linked.art/
• Humanities forums: LinkedPasts series, WHiSe Workshops
https://lov.linkeddata.es/dataset/lov
Linked Data in a nutshell
hCps://en.wikipedia.org/wiki/Linked_data
Linked Data is a way of publishing structured information that allows data
to be connected and enriched by means of links among their entities.
• LD uses the World Wide Web as publishing platform
• LD is based on basic Web standards (URIs, HTTPs, RDF)
• open to everyone
• LD enables the adoption of shared schemas (Ontologies)
• LD makes the data self-explanatory and self-documented
• LD enables your data to refer to other data
• … and other data to refer to yours!
Linked Open Data Cloud in 2007
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data Cloud in 2010
2010 - The OU launches the data.open.ac.uk 
Linked Open Data portal, the first of its kind in the UK
The OU Open Knowledge Graph
http://data.open.ac.uk
Linked Open Data Cloud in 2014
Crawlable
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
https://lod-cloud.net/
https://lod-cloud.net/
Linked Open Data Cloud in 03/2019
https://recogito.pelagios.org/
How this wealth of data can support the
retrieval of documentary evidence?
The identification and cataloguing of documentary evidence from textual
corpora is an important part of empirical research based on
historiographical methodology.
The Listening Experience Database
• An open and freely searchable database
that brings together a mass of data
about people’s experiences of listening
to music of all kinds, in any historical
period and any culture.
• Sophisticated data model, natively in RDF
• Linked Open Data: 

http://data.open.ac.uk/context/led
• Since 2012, the LED project has collected
over 10,000 unique listening experiences
from a variety of textual sources
https://led.kmi.open.ac.uk/
Problem: humanists coin new concepts!
• Traditional AI research is focused on common sense notions
• keyword & topic based information retrieval (documents related to
“Science” or “Music”)
• events as declared statements (e.g. U.S. based attacked by Iran missiles)
• Problem: humanities databases are built on novel concepts, e.g.
• Listening experience (LED Project)
• Reading experience (EU funded READ-IT project)
• Sitting Experience (DH/Arts History PhD at the OU)
Manual workflow
Problems: the activity (a) requires effort / time, (b) is not systematic, (c) is
prone to errors, and (d) the methodology is (often) not documented
How to help scholars on finding a piece of evidence in a text?
How to detect concepts beyond keywords?
We coin the expression themed evidence, to refer to (direct or indirect) traces of a
fact or situation relevant to a theme of interest and study the problem of
identifying them in texts.
The task of identifying themed evidence is at the intersection between topical text
classification (finding texts relevant to a certain theme) and event retrieval (find
events mentioned in texts).
Not all topical texts are themed evidence and the nature of the event itself is often
assumed, implicit, and left to the reader
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Finding Listening Experiences (theme: music)
• RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of
amateurs who perform admirably the best orchestral works. The usual supper
followed. After propitiating me with a trio from ’Cosi Fan Tutte’, they drew me to
the piano.
• MASONB-31, positive: In the evening we went to Rev. Baptist Noel’s chapel,
where one is always sure of edification from the sermon if not from the psalms.
• MASONB-88, negative: Flags and pendants were suspended from the windows,
[. . . ] the colors of the German States were waving harmoniously together, and
the banners of the Fine Arts, with appropriate inscriptions, particularly those of
music, poetry and painting, were especially honored, and floated triumphant
amidst the standards of electorates, dukedoms, and kingdoms.
A Hybrid Approach
• Themed evidence are a subset of topical texts (e.g. about “music”) - distributional semantics
• Common knowledge graphs include a large amounts of interlinked entities, including topical
entities (in the category “music”) - entity linking to structured knowledge
• Background knowledge can be used for learning features and tuning elements of the method -
corpus based analysis
• LE Database includes text excerpts that can be analysed as positive examples.
• Project Gutenberg >58k books in the public domain (48790 en)
• DBpedia is a large knowledge graph published as Linked Data. Includes SPARQL endpoint and a
NER tool: DBpedia Spotlight
• We formalise the task as a binary classification problem; approach in three steps:
1. Statistical relatedness analysis -> From a Key Terms (e.g. “Music”)
2. Themed-entity detection -> About a key subject (e.g. dbpedia:Music)
3. Hybridisation phase
Statistical relatedness
0 rontgen[N]
1 play[V]
2 Brahms[N]
3 symphony[N]
4 another[D]
5 musical[J]
6 take[V]
7 always[R]
8 happen[V]
9 specially[R]
10 count[V]
11 something[N]
12 sort[N]
Statistical relatedness // Example
RECMUS-619, positive: Introduced to the Anacreontic Society,
consisting of amateurs who perform admirably the best
orchestral works. The usual supper followed. After propitiating me
with a trio from 'Cosi Fan Tutte', they drew me to the piano.
• Anacreontic[n]: 4.13048797627
• amateur[n]: 4.60138704262
• admirably[r]: 3.65226351076
• orchestral[j]: 7.09262661606
• trio[n]: 5.60459207257
• piano[n]: 6.36957273307
Correct
Statistical relatedness // Example
MASONB-31, positive: In the evening we went to Rev. Baptist
Noel's chapel, where one is always sure of edification from the
sermon if not from the psalms.
psalm[n]: 4.05596201177
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Wrong
Statistical relatedness // Example
MASONB-88, negative: Flags and pendants were suspended from the windows,
[...] the colours of the German States were waving harmoniously together, and
the banners of the Fine Arts, with appropriate inscriptions, particularly those of
music, poetry and painting, were especially honored, and ︎oated triumphant
amidst the standards of electorates, dukedoms, and kingdoms.
harmoniously[r]:4.96754289705
music[n]:1.0
poetry[n]:5.93071678171
painting[n]:4.39244380382
triumphant[j]:3.80869437369
amidst[i]:3.6638322575
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Wrong
2> Themed entity detection
• DBPedia Spotlight to identify %entities%
• SPARQL query to filter the ones related to
dbcat:Music
• Where %entities% are the resources identified by
the NER engine, and %d% is a parameter, set to 5
(>5 too much noise).
SELECT distinct ?sub WHERE {
VALUES ?sub { %entities% }
?sub dc:subject ?subject .
?subject skos:broader{0:%d%} cat:Music
}
3> Hybridisation
Entity boost. To
promote terms mapped
to entities
PoS Filter: demote
terms other then verbs
and nouns, to privilege
factual statements
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Hybrid Approach // Example
RECMUS-619, positive: Introduced to the Anacreontic Society,
consisting of amateurs who perform admirably the best
orchestral works. The usual supper followed. After propitiating me
with a trio from 'Cosi Fan Tutte', they drew me to the piano.
http://dbpedia.org/resource/Anacreontic_Society
http://dbpedia.org/resource/Orchestra
http://dbpedia.org/resource/Trio_(music)
http://dbpedia.org/resource/Così_fan_tutte
http://dbpedia.org/resource/Piano Correct
Hybrid Approach // Example
MASONB-31, positive: In the evening we went to Rev. Baptist
Noel's chapel, where one is always sure of edification from the
sermon if not from the psalms.
http://dbpedia.org/resource/
Evening_Prayer_(Anglican)
http://dbpedia.org/resource/Psalms
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Correct
Hybrid Approach // Example
MASONB-88, negative: Flags and pendants were suspended from the windows, [...]
the colours of the German States were waving harmoniously together, and the
banners of the Fine Arts, with appropriate inscriptions, particularly those of music,
poetry and painting, were especially honored, and ︎oated triumphant amidst the
standards of electorates, dukedoms, and kingdoms.
http://dbpedia.org/resource/Music
Correct
http://led.kmi.open.ac.uk/findler
What about supporting curation?
How to support users in cataloguing the documentary evidence?
How to detect the entities and their relationships in the sources?
How to automatically populate the database with metadata?
Knowledge Extraction (KE)
• Bet: metadata curation could be supported with KE methods
• KE: automatic or semi-automatic derivation of formal symbolic knowledge from
unstructured or semi-structured sources
• Approaches in the literature vary in task / scope:
• (Named) Entity Recognition and Classification (Person, Work, Time, Place,
…)
• Entity Linking (DBpedia, Gazetteers)
• Relation Extraction (listener of, in place)
• Event extraction (Performance)
• Machine reading
Example #1
"I then went to Amsterdam to conduct Oedipus at the
Concertgebouw, which was celebrating its fortieth
anniversary by a series of sumptuous musical
productions. The fine Concertgebouw orchestra, always
at the same high level, the magnificent male choruses
from the Royal Apollo Society, soloists of the first rank -
among them Mme Hélène Sadoven as Jocasta, Louis van
Tulder as Oedipus, and Paul Huf, an excellent reader -
and the way in which my work was received by the public,
have left a particularly precious memory that I recall with
much enjoyment."
listener: Igor Strawinsky
time: in the beginning of 1928
place: Amsterdam
opera: Oedipus Rex
/by: Igor Strawinsky
performer: Concertgebouw orch.
environment: Public
Igor Stravinksy
An Autobiography (1936), p. 139.
https://led.kmi.open.ac.uk/entity/lexp/1435674909834
Example #2
"Music is certainly a pleasure that may be
reckoned intellectual, and we shall never
again have it in the perfection it is this
year, because Mr. Handel will not
compose any more! Oratorios begin next
week, to my great joy, for they are the
highest entertainment to me."
listener: Mrs Delany
time: March, 1737
place: London
opera: Operas and Oratorios
/by: G. F. Handel
environment: Public
From: Mary Granville, and Augusta Hall (ed.),
Autobiography and Correspondence of Mary Granville, Mrs
Delany: with interesting Reminiscences of King George the
Third and Queen Charlotte, volume 1 (London, 1861), p.
594.
https://led.kmi.open.ac.uk/entity/lexp/1444424772006
Feedback:	@enridaga	|	www.enridaga.net
Analysis: detect the Listener & Place of a LE
• Q1 - in the excerpt? The place is mentioned in the
excerpt in 25.9% cases. The listener only in 13.4%.
• Q2 - near the excerpt? Only 10% of the times the
place mention is less than 5 paragraphs from the
excerpt. The agent, in 4% of the cases.
• Q3 - in the source? 83.2% of the times the place is
mentioned at least once in the source. In 11.4%
the place hasn’t been found.
• Q4 - in the meta? 64.8% of the listeners are also
the authors of the text - 5874 cases in LED.
Distance	of	entity	(in	n	of	paragraphs)
Open problems
• Implicit information, based on inference requiring expertise (e.g. Mr
Handel is G.F Handel, Oedipus is “Oedipus Rex”)
• The role of contextual knowledge is fundamental (1) in identifying
the agent from the metadata of the source; (2) common sense
inference (“in the beginning of 1928”)
• Entities can exist in distributed, heterogeneous resources
(encyclopaedic KBs, domain-specific taxonomies, gazetteers, …)
• Cultural studies typically coin novel concepts (ListeningExperience)
with original schemas. Portability of the methods is even more at risk!
Daga, E and Motta, E. "Challenging knowledge extraction to support the curation of documentary evidence in the humanities." (2019).
Summary
• Linked Data transforms the way information is shared on the Web
• but also enable opportunities to apply AI techniques to more
applications domains
• supporting users in finding and curating documentary evidence is an
important and difficult task
• finding complex concepts in texts is (more) possible then before,
although most of these techniques have not been applied at scale yet
• traditional AI research is challenged by the richness and diversity of use
cases in the humanities, especially considering the knowledge extraction
WHiSe 3
Call for papers!
3rd Workshop on Humanities in
the Semantic Web (WHiSe)
Co-located with the 15th Extended
Semantic Web Conference (ESWC 2020)
Heraklion, Crete, Greece
31/05 or 31/06, 2020 (TBD)
Submission deadline:
28th February
http://whise.cc/
https://commons.wikimedia.org/wiki/File:Edward_Burne-Jones_-_Tile_Design_-
_Theseus_and_the_Minotaur_in_the_Labyrinth_-_Google_Art_Project.jpg
PhD position open soon
Title: “Distributed Linked Data for Cultural Heritage”
The aim of this project is to research and develop distributed, Linked Data systems
that enable cultural content to be shared between museums and the public. This may
include innovative ways of publishing digital artworks and related resources by
memory institutions as well as enabling the public to share their own experiences of
visiting and engaging with cultural heritage. The PhD will benefit from being closely
connected with the EU funded SPICE project [1] which is developing methods and
tools to allow citizen groups to actively participate with museums internationally.
[1] http://kmi.open.ac.uk/projects/name/spice
enrico.daga@open.ac.uk - @enridaga
Thank you

More Related Content

What's hot

Jankowski, curriculum vitae, 29 february 2012
Jankowski, curriculum vitae, 29 february 2012Jankowski, curriculum vitae, 29 february 2012
Jankowski, curriculum vitae, 29 february 2012
Nick Jankowski
 
Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...
WARCnet
 
Scientific publishing workshop, background materials, finland, jankowski, 8...
Scientific publishing workshop,  background materials, finland,  jankowski, 8...Scientific publishing workshop,  background materials, finland,  jankowski, 8...
Scientific publishing workshop, background materials, finland, jankowski, 8...
Nick Jankowski
 
Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...
Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...
Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...
Nick Jankowski
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
Rafael Alvarado
 

What's hot (20)

Digital Humanities
Digital HumanitiesDigital Humanities
Digital Humanities
 
Jankowski, curriculum vitae, 29 february 2012
Jankowski, curriculum vitae, 29 february 2012Jankowski, curriculum vitae, 29 february 2012
Jankowski, curriculum vitae, 29 february 2012
 
Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...Whose Archives? Reflections on ethics and the cultural significance of web ar...
Whose Archives? Reflections on ethics and the cultural significance of web ar...
 
Scientific publishing workshop, background materials, finland, jankowski, 8...
Scientific publishing workshop,  background materials, finland,  jankowski, 8...Scientific publishing workshop,  background materials, finland,  jankowski, 8...
Scientific publishing workshop, background materials, finland, jankowski, 8...
 
scholarly Publishing, CORE, Tampere
scholarly Publishing, CORE, Tamperescholarly Publishing, CORE, Tampere
scholarly Publishing, CORE, Tampere
 
Slides accompanying introductory statements, NM&S podcast, 7 july2013
Slides accompanying introductory statements, NM&S podcast, 7 july2013Slides accompanying introductory statements, NM&S podcast, 7 july2013
Slides accompanying introductory statements, NM&S podcast, 7 july2013
 
Jankowski, introduction slides, i cs oii panel scholarly communication, 22sep...
Jankowski, introduction slides, i cs oii panel scholarly communication, 22sep...Jankowski, introduction slides, i cs oii panel scholarly communication, 22sep...
Jankowski, introduction slides, i cs oii panel scholarly communication, 22sep...
 
Granada0611 digital humanities
Granada0611 digital humanitiesGranada0611 digital humanities
Granada0611 digital humanities
 
Jankowski, syllabus, version5, design elements, 14 feb2012
Jankowski, syllabus, version5, design elements, 14 feb2012Jankowski, syllabus, version5, design elements, 14 feb2012
Jankowski, syllabus, version5, design elements, 14 feb2012
 
KCL CeRch presentation, enhanced scholarly publications, 27march2012
KCL CeRch presentation, enhanced scholarly publications, 27march2012KCL CeRch presentation, enhanced scholarly publications, 27march2012
KCL CeRch presentation, enhanced scholarly publications, 27march2012
 
NECTAR_VRE1
NECTAR_VRE1NECTAR_VRE1
NECTAR_VRE1
 
Making scholarly publications accessible online
Making scholarly publications accessible onlineMaking scholarly publications accessible online
Making scholarly publications accessible online
 
Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...
Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...
Jankowski Presentation, Scholarly Publishing And The Web, Final Version, 24fe...
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
 
Roger Malina on A Historical Perspective on the Art-Sci-Tech field
Roger Malina on A Historical Perspective on the Art-Sci-Tech fieldRoger Malina on A Historical Perspective on the Art-Sci-Tech field
Roger Malina on A Historical Perspective on the Art-Sci-Tech field
 
Dh usp 2013
Dh usp 2013Dh usp 2013
Dh usp 2013
 
Intro slidecast, jankowski, internet practice, part 2, draft4, 18 feb2012
Intro slidecast, jankowski, internet practice, part 2, draft4, 18 feb2012Intro slidecast, jankowski, internet practice, part 2, draft4, 18 feb2012
Intro slidecast, jankowski, internet practice, part 2, draft4, 18 feb2012
 
The artof of knowledge engineering, or: knowledge engineering of art
The artof of knowledge engineering, or: knowledge engineering of artThe artof of knowledge engineering, or: knowledge engineering of art
The artof of knowledge engineering, or: knowledge engineering of art
 
Principles and pragmatics of a Semantic Culture Web
 Principles and pragmatics of a Semantic Culture Web Principles and pragmatics of a Semantic Culture Web
Principles and pragmatics of a Semantic Culture Web
 
EFA 2.0 - How minorities, autonomists and independentists use social media an...
EFA 2.0 - How minorities, autonomists and independentists use social media an...EFA 2.0 - How minorities, autonomists and independentists use social media an...
EFA 2.0 - How minorities, autonomists and independentists use social media an...
 

Similar to Linked data for knowledge curation in humanities research

UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18
Rafael Alvarado
 
Judaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrariansJudaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrarians
Dov Winer
 
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Alessandro Adamou
 
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Northern California Technical Processes Group
 

Similar to Linked data for knowledge curation in humanities research (20)

Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities research
 
Judaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrariansJudaica europeana dovwinerjudaicalibrarians
Judaica europeana dovwinerjudaicalibrarians
 
Estado arte de las Humanidades Digitales. Algunos proyectos de investigación
Estado arte de las Humanidades Digitales. Algunos proyectos de investigaciónEstado arte de las Humanidades Digitales. Algunos proyectos de investigación
Estado arte de las Humanidades Digitales. Algunos proyectos de investigación
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
 
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
 
Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)
 
STEAM to STEM: Redesigning Science Itself by Roger Malina
STEAM to STEM: Redesigning Science Itself by Roger MalinaSTEAM to STEM: Redesigning Science Itself by Roger Malina
STEAM to STEM: Redesigning Science Itself by Roger Malina
 
Annotation and Scholarship
Annotation and ScholarshipAnnotation and Scholarship
Annotation and Scholarship
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
Tagging and Folksonomies
Tagging and FolksonomiesTagging and Folksonomies
Tagging and Folksonomies
 
Patterns in scholarly publications online: Erdős and beyond
Patterns in scholarly publications online: Erdős and beyondPatterns in scholarly publications online: Erdős and beyond
Patterns in scholarly publications online: Erdős and beyond
 
Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...
 
20080606 VöGler GöTtingen E Humanities
20080606 VöGler GöTtingen E Humanities20080606 VöGler GöTtingen E Humanities
20080606 VöGler GöTtingen E Humanities
 
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
Goldminers of the Digital Age: How Libraries are Selecting, Presenting, and D...
 
Open Access and Knowledge Sharing
Open Access and Knowledge SharingOpen Access and Knowledge Sharing
Open Access and Knowledge Sharing
 
Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTA
 

More from Enrico Daga

Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Enrico Daga
 

More from Enrico Daga (17)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Linked data for knowledge curation in humanities research

  • 1. Linked data for knowledge curation in humanities research Enrico Daga Research Fellow, Knowledge Media Institute, The Open University 14th January 2020, Lancaster University / History Dept. enrico.daga@open.ac.uk - @enridaga
  • 2. https://isds.kmi.open.ac.uk/ https://kmi.open.ac.uk/ ArtificiaI intelligence R&D lab: • Knowledge Representation and Reasoning • Semantic Web • Machine Learning • Data Science Application areas: • Teaching & Learning • Scholarly Data Analysis • Social Media Analysis • Smart Cities • Humanities
  • 3. What are Linked Data? (a super-fast summary)
  • 4. Invented the web in 1989 (yeah!) Invented the semantic web in 1994 (duh?)
  • 5. “To a computer, then, the web is a flat, boring world devoid of meaning” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 6. “This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 7. “Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 8. “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
  • 9. Linked Data example credits: EUCLID Project http://euclid-project.eu/ 23
  • 11. This did not come out of the blue World’s academic communities has been dealing for years with knowledge representation Artificial intelligence, natural language processing, model management, and many other research fields largely contributed Some ancestors traced the way …
  • 12. EXAMPLE • Instances are associated with one or several classes: Boddingtons rdf:type Ale . Grafentrunk rdf:type Bock . Hoegaarden rdf:type White . Jever rdf:type Pilsner . Ale rdfs:subClassOf TopFermentedBeer . White rdfs:subClassOf TopFermentedBeer . TopFermentedBeer rdfs:subClassOf Beer . Bock rdfs:subClassOf BottomFermentedBeer . rdfs:subClassOf rdf:type owl:TransitiveProperty .
  • 14.
  • 15. Ontologies, different types of Domain independent: SKOS, OWL, Prov, Time, … Foundational, general purpose: • DOLCE, SUMO (“Upper Ontologies”) • CIDOC-CRM: broad scope, targets “cultural heritage” in general Pragmatic, community-oriented: • Dublin Core Metadata Initiative • Google’s schema.org • https://linked.art/ • Humanities forums: LinkedPasts series, WHiSe Workshops https://lov.linkeddata.es/dataset/lov
  • 16. Linked Data in a nutshell hCps://en.wikipedia.org/wiki/Linked_data Linked Data is a way of publishing structured information that allows data to be connected and enriched by means of links among their entities. • LD uses the World Wide Web as publishing platform • LD is based on basic Web standards (URIs, HTTPs, RDF) • open to everyone • LD enables the adoption of shared schemas (Ontologies) • LD makes the data self-explanatory and self-documented • LD enables your data to refer to other data • … and other data to refer to yours!
  • 17. Linked Open Data Cloud in 2007 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 18. Linked Open Data Cloud in 2010 2010 - The OU launches the data.open.ac.uk  Linked Open Data portal, the first of its kind in the UK
  • 19. The OU Open Knowledge Graph http://data.open.ac.uk
  • 20. Linked Open Data Cloud in 2014 Crawlable http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
  • 22.
  • 24. How this wealth of data can support the retrieval of documentary evidence? The identification and cataloguing of documentary evidence from textual corpora is an important part of empirical research based on historiographical methodology.
  • 25. The Listening Experience Database • An open and freely searchable database that brings together a mass of data about people’s experiences of listening to music of all kinds, in any historical period and any culture. • Sophisticated data model, natively in RDF • Linked Open Data: 
 http://data.open.ac.uk/context/led • Since 2012, the LED project has collected over 10,000 unique listening experiences from a variety of textual sources https://led.kmi.open.ac.uk/
  • 26. Problem: humanists coin new concepts! • Traditional AI research is focused on common sense notions • keyword & topic based information retrieval (documents related to “Science” or “Music”) • events as declared statements (e.g. U.S. based attacked by Iran missiles) • Problem: humanities databases are built on novel concepts, e.g. • Listening experience (LED Project) • Reading experience (EU funded READ-IT project) • Sitting Experience (DH/Arts History PhD at the OU)
  • 27. Manual workflow Problems: the activity (a) requires effort / time, (b) is not systematic, (c) is prone to errors, and (d) the methodology is (often) not documented How to help scholars on finding a piece of evidence in a text?
  • 28. How to detect concepts beyond keywords? We coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. The task of identifying themed evidence is at the intersection between topical text classification (finding texts relevant to a certain theme) and event retrieval (find events mentioned in texts). Not all topical texts are themed evidence and the nature of the event itself is often assumed, implicit, and left to the reader Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
  • 29. Finding Listening Experiences (theme: music) • RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of amateurs who perform admirably the best orchestral works. The usual supper followed. After propitiating me with a trio from ’Cosi Fan Tutte’, they drew me to the piano. • MASONB-31, positive: In the evening we went to Rev. Baptist Noel’s chapel, where one is always sure of edification from the sermon if not from the psalms. • MASONB-88, negative: Flags and pendants were suspended from the windows, [. . . ] the colors of the German States were waving harmoniously together, and the banners of the Fine Arts, with appropriate inscriptions, particularly those of music, poetry and painting, were especially honored, and floated triumphant amidst the standards of electorates, dukedoms, and kingdoms.
  • 30. A Hybrid Approach • Themed evidence are a subset of topical texts (e.g. about “music”) - distributional semantics • Common knowledge graphs include a large amounts of interlinked entities, including topical entities (in the category “music”) - entity linking to structured knowledge • Background knowledge can be used for learning features and tuning elements of the method - corpus based analysis • LE Database includes text excerpts that can be analysed as positive examples. • Project Gutenberg >58k books in the public domain (48790 en) • DBpedia is a large knowledge graph published as Linked Data. Includes SPARQL endpoint and a NER tool: DBpedia Spotlight • We formalise the task as a binary classification problem; approach in three steps: 1. Statistical relatedness analysis -> From a Key Terms (e.g. “Music”) 2. Themed-entity detection -> About a key subject (e.g. dbpedia:Music) 3. Hybridisation phase
  • 31. Statistical relatedness 0 rontgen[N] 1 play[V] 2 Brahms[N] 3 symphony[N] 4 another[D] 5 musical[J] 6 take[V] 7 always[R] 8 happen[V] 9 specially[R] 10 count[V] 11 something[N] 12 sort[N]
  • 32. Statistical relatedness // Example RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of amateurs who perform admirably the best orchestral works. The usual supper followed. After propitiating me with a trio from 'Cosi Fan Tutte', they drew me to the piano. • Anacreontic[n]: 4.13048797627 • amateur[n]: 4.60138704262 • admirably[r]: 3.65226351076 • orchestral[j]: 7.09262661606 • trio[n]: 5.60459207257 • piano[n]: 6.36957273307 Correct
  • 33. Statistical relatedness // Example MASONB-31, positive: In the evening we went to Rev. Baptist Noel's chapel, where one is always sure of edification from the sermon if not from the psalms. psalm[n]: 4.05596201177 Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019. Wrong
  • 34. Statistical relatedness // Example MASONB-88, negative: Flags and pendants were suspended from the windows, [...] the colours of the German States were waving harmoniously together, and the banners of the Fine Arts, with appropriate inscriptions, particularly those of music, poetry and painting, were especially honored, and ︎oated triumphant amidst the standards of electorates, dukedoms, and kingdoms. harmoniously[r]:4.96754289705 music[n]:1.0 poetry[n]:5.93071678171 painting[n]:4.39244380382 triumphant[j]:3.80869437369 amidst[i]:3.6638322575 Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019. Wrong
  • 35. 2> Themed entity detection • DBPedia Spotlight to identify %entities% • SPARQL query to filter the ones related to dbcat:Music • Where %entities% are the resources identified by the NER engine, and %d% is a parameter, set to 5 (>5 too much noise). SELECT distinct ?sub WHERE { VALUES ?sub { %entities% } ?sub dc:subject ?subject . ?subject skos:broader{0:%d%} cat:Music }
  • 36. 3> Hybridisation Entity boost. To promote terms mapped to entities PoS Filter: demote terms other then verbs and nouns, to privilege factual statements Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
  • 37. Hybrid Approach // Example RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of amateurs who perform admirably the best orchestral works. The usual supper followed. After propitiating me with a trio from 'Cosi Fan Tutte', they drew me to the piano. http://dbpedia.org/resource/Anacreontic_Society http://dbpedia.org/resource/Orchestra http://dbpedia.org/resource/Trio_(music) http://dbpedia.org/resource/Così_fan_tutte http://dbpedia.org/resource/Piano Correct
  • 38. Hybrid Approach // Example MASONB-31, positive: In the evening we went to Rev. Baptist Noel's chapel, where one is always sure of edification from the sermon if not from the psalms. http://dbpedia.org/resource/ Evening_Prayer_(Anglican) http://dbpedia.org/resource/Psalms Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019. Correct
  • 39. Hybrid Approach // Example MASONB-88, negative: Flags and pendants were suspended from the windows, [...] the colours of the German States were waving harmoniously together, and the banners of the Fine Arts, with appropriate inscriptions, particularly those of music, poetry and painting, were especially honored, and ︎oated triumphant amidst the standards of electorates, dukedoms, and kingdoms. http://dbpedia.org/resource/Music Correct
  • 41. What about supporting curation? How to support users in cataloguing the documentary evidence? How to detect the entities and their relationships in the sources? How to automatically populate the database with metadata?
  • 42.
  • 43. Knowledge Extraction (KE) • Bet: metadata curation could be supported with KE methods • KE: automatic or semi-automatic derivation of formal symbolic knowledge from unstructured or semi-structured sources • Approaches in the literature vary in task / scope: • (Named) Entity Recognition and Classification (Person, Work, Time, Place, …) • Entity Linking (DBpedia, Gazetteers) • Relation Extraction (listener of, in place) • Event extraction (Performance) • Machine reading
  • 44. Example #1 "I then went to Amsterdam to conduct Oedipus at the Concertgebouw, which was celebrating its fortieth anniversary by a series of sumptuous musical productions. The fine Concertgebouw orchestra, always at the same high level, the magnificent male choruses from the Royal Apollo Society, soloists of the first rank - among them Mme Hélène Sadoven as Jocasta, Louis van Tulder as Oedipus, and Paul Huf, an excellent reader - and the way in which my work was received by the public, have left a particularly precious memory that I recall with much enjoyment." listener: Igor Strawinsky time: in the beginning of 1928 place: Amsterdam opera: Oedipus Rex /by: Igor Strawinsky performer: Concertgebouw orch. environment: Public Igor Stravinksy An Autobiography (1936), p. 139. https://led.kmi.open.ac.uk/entity/lexp/1435674909834
  • 45. Example #2 "Music is certainly a pleasure that may be reckoned intellectual, and we shall never again have it in the perfection it is this year, because Mr. Handel will not compose any more! Oratorios begin next week, to my great joy, for they are the highest entertainment to me." listener: Mrs Delany time: March, 1737 place: London opera: Operas and Oratorios /by: G. F. Handel environment: Public From: Mary Granville, and Augusta Hall (ed.), Autobiography and Correspondence of Mary Granville, Mrs Delany: with interesting Reminiscences of King George the Third and Queen Charlotte, volume 1 (London, 1861), p. 594. https://led.kmi.open.ac.uk/entity/lexp/1444424772006 Feedback: @enridaga | www.enridaga.net
  • 46. Analysis: detect the Listener & Place of a LE • Q1 - in the excerpt? The place is mentioned in the excerpt in 25.9% cases. The listener only in 13.4%. • Q2 - near the excerpt? Only 10% of the times the place mention is less than 5 paragraphs from the excerpt. The agent, in 4% of the cases. • Q3 - in the source? 83.2% of the times the place is mentioned at least once in the source. In 11.4% the place hasn’t been found. • Q4 - in the meta? 64.8% of the listeners are also the authors of the text - 5874 cases in LED. Distance of entity (in n of paragraphs)
  • 47. Open problems • Implicit information, based on inference requiring expertise (e.g. Mr Handel is G.F Handel, Oedipus is “Oedipus Rex”) • The role of contextual knowledge is fundamental (1) in identifying the agent from the metadata of the source; (2) common sense inference (“in the beginning of 1928”) • Entities can exist in distributed, heterogeneous resources (encyclopaedic KBs, domain-specific taxonomies, gazetteers, …) • Cultural studies typically coin novel concepts (ListeningExperience) with original schemas. Portability of the methods is even more at risk! Daga, E and Motta, E. "Challenging knowledge extraction to support the curation of documentary evidence in the humanities." (2019).
  • 48. Summary • Linked Data transforms the way information is shared on the Web • but also enable opportunities to apply AI techniques to more applications domains • supporting users in finding and curating documentary evidence is an important and difficult task • finding complex concepts in texts is (more) possible then before, although most of these techniques have not been applied at scale yet • traditional AI research is challenged by the richness and diversity of use cases in the humanities, especially considering the knowledge extraction
  • 49. WHiSe 3 Call for papers! 3rd Workshop on Humanities in the Semantic Web (WHiSe) Co-located with the 15th Extended Semantic Web Conference (ESWC 2020) Heraklion, Crete, Greece 31/05 or 31/06, 2020 (TBD) Submission deadline: 28th February http://whise.cc/ https://commons.wikimedia.org/wiki/File:Edward_Burne-Jones_-_Tile_Design_- _Theseus_and_the_Minotaur_in_the_Labyrinth_-_Google_Art_Project.jpg
  • 50. PhD position open soon Title: “Distributed Linked Data for Cultural Heritage” The aim of this project is to research and develop distributed, Linked Data systems that enable cultural content to be shared between museums and the public. This may include innovative ways of publishing digital artworks and related resources by memory institutions as well as enabling the public to share their own experiences of visiting and engaging with cultural heritage. The PhD will benefit from being closely connected with the EU funded SPICE project [1] which is developing methods and tools to allow citizen groups to actively participate with museums internationally. [1] http://kmi.open.ac.uk/projects/name/spice