The document is a presentation by Gill Hamilton from the National Library of Scotland about linked open data and their experiments with it. It discusses three main tips for preparing data for linked open data: 1) using URIs to identify resources rather than strings, 2) not simplifying data structures when converting to linked data, and 3) focusing on making unique contributions by working with distinctive parts of the collection. The presentation also advocates for openly licensing metadata and using open vocabularies.
National Library of Scotland Linked Data Executive Briefing
1. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
the reality of
linked data
in libraries
CILIP Linked Data Executive Briefing
24 November 2015
Gill Hamilton
Digital Access Manager
2. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
• about the Library and me
• our modest experiments
• Top Tips for preparing for
LOD
I’ll be looking at ….
CILIP Linked Data Executive Briefing
24 November 2015
3. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
CILIP Linked Data Executive Briefing
24 November 2015
4. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
CILIP Linked Data Executive Briefing
24 November 2015
5. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
CILIP Linked Data Executive Briefing
24 November 2015
6. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
CILIP Linked Data Executive Briefing
24 November 2015
7. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
• if I learned one thing it is …
• dabbling in the DOD
• RDA RDF
our experiments
CILIP Linked Data Executive Briefing
24 November 2015
8. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Och!
Dinnae fash
yersel!
oh no!
Oh No!
OH NO!
HELP!
CILIP Linked Data Executive Briefing
24 November 2015
9. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
you and I know it already …
we call it interoperability
ye
olden
days
BM
rules
AAC
R
MAR
C
collaboration
RDA
LOD
C
L
O
S
E
D
O
P
E
N
Open
vocs
CILIP Linked Data Executive Briefing
24 November 2015
10. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
http://www.math.uh.edu/~tomforde/images/UniverseAndMan.jpg
CILIP Linked Data Executive Briefing
24 November 2015
dabbling
in the DOD
researching
RDA
RDF
11. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
1. strings and things
2. be smart, not dumb
3. uniqueness
….. and some more
CILIP Linked Data Executive Briefing
24 November 2015
12. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Top Tip
1
we have strings
“Hamilton, Gill”
but we need things too
we need URIs
CILIP Linked Data Executive Briefing
24 November 2015
13. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
machines are really stupid
Main catalogue Archive catalogue Moving image catalogue
Hamilton, Gill W. Hamilton, Gillian W. Hamilton, G. W.
you are really really smart
CILIP Linked Data Executive Briefing
24 November 2015
14. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
machines are really stupid
you are really really smart
CILIP Linked Data Executive Briefing
24 November 2015
15. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
record just 1 more thing
the URI
C’est tout!
stop digging
CILIP Linked Data Executive Briefing
24 November 2015
16. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
and …
you might
be lucky
CILIP Linked Data Executive Briefing
24 November 2015
17. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Get others to help
We think
this is in
Cambrai
Or do you
think
Cambrai
is here?
Do you
think
Cambrai is
here?
CILIP Linked Data Executive Briefing
24 November 2015
18. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Frankly my
dear!
I don’t give a
damn about
your domain
Never, EVER dumb your data
CILIP Linked Data Executive Briefing
24 November 2015
Top Tip 2
19. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
DOD
data
structure
DOD.title
DOD.keyword
DOD.who
local & closed global & open
X
S
L
T
LOD in
DC
DC:title
DC:subject
DC:creator
CILIP Linked Data Executive Briefing
24 November 2015
20. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
CILIP Linked Data Executive Briefing
24 November 2015
21. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
DOD
data
local
& closed
global & open
DOD in
LOD
structure
DOD.title
DOD.keyword
DOD.who
M
A
P
P
I
N
G
DC
RDA
SCHEMA
CILIP Linked Data Executive Briefing
24 November 2015
22. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
CILIP Linked Data Executive Briefing
24 November 2015
23. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
concentrate on the unique
Order to Capt. Campbell by Maj. Duncan
You are hereby ordered to fall upon the
rebells, the McDonalds of Glencoe, and put
all to the sword under seventy.
CILIP Linked Data Executive Briefing
24 November 2015
Top Tip 3
24. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
uniqueness is about
making the BEST use of limited resources
making the BEST contribution
making BEST metadata for LOD
CILIP Linked Data Executive Briefing
24 November 2015
25. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
1. record URIs
2. don’t dumb the data
3. work on unique stuff
….. and some more
CILIP Linked Data Executive Briefing
24 November 2015
26. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
the other tips
CILIP Linked Data Executive Briefing
24 November 2015
27. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
ZERO
CC
openly licence your metadata
CILIP Linked Data Executive Briefing
24 November 2015
28. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
LCNAF
LCSH
TGMI
AAT
TGN
DDC … soon?
http://id.loc.gov/
http://www.getty.edu/research/tools/vocabularies/
use open vocabularies
29. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
DEMAND better systems
CILIP Linked Data Executive Briefing
24 November 2015
30. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
URI management
CILIP Linked Data Executive Briefing
24 November 2015
the continued or prolonged existence
of something.
firm belief in the reliability, truth,
or ability of someone or something
31. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
http://www.math.uh.edu/~tomforde/images/UniverseAndMan.jpg
CILIP Linked Data Executive Briefing
24 November 2015
32. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Gill Hamilton
Digital Access Manager
g.hamilton@nls.uk
NationalLibraryofScotland
@natlibscot
thank you
CILIP Linked Data Executive Briefing
24 November 2015
Editor's Notes
Scotland’s LDL. Main building on GIVB, special and general RR, Causewayside map RR, Hillington with MIA. New building opening next year at KelvinHall in Glasgow – 1st time that the Library has a centre in another city. It will promote and make available our MIA and digital collections.
Our collections are vast and varied 2 million maps, 7 million manuscripts, more than 4 million books, more than 30,000 films and videos . Our fastest growing collections are our digital collections which are already run to about 5 million items, most of this is legal deposit however a growing number is digitisation. We have available 12,000 digitised books or other resources with several thousand in processing
And we have a new and ambitious strategy htat was launched just last month. It has 6 priorities including guardianship of the collection, working with research communities, developing the physical libraries but most important and excitingly for me are is the priority that commits the library describing all its collections in the next 10 years and giving digital access to a third of its collection. A great challenge for us but will deliver a radically different kind of national library
And me… I’m the digital access manager. I oversee the access to the digital collections, lead on resource discovery and library management systems. As part of the new strategy I just moved to a new team and will be working hard with my colleagues to develop the systems and processes that will allow us to deliver the strategy. It’s exciting times.
I am by no means an expert in linked data. I guess I’m an expert in metadata (but not from the cataloguing point of view) more from processing poitn of view.
Oh and in my spare time I do things like cycle across America
Our experiments with linked data – just a little bit of back ground about what we’ve been doing.
I should stress this is only experiments, we have nothing in production.
So when I started loooking into LOD I would understand it, get a sense of it and then lose it. Then understand it again and lose it again. It seems not uncommon. Especially when the technologies are mentioned. You just get freaked out.
But what I learned is that it’s actually very familiar and you don’t need to overly fret about it.
If I learned one thing about experimenting is that it isn’t new, and is very familiar. When you strip away the bamboozling chitter chatter, the technology and the threats of the end of the Library then you see that it’s a continuum on interoperability but this time we move from the local library domain to a true global domain.
LOD lets you move out of the traditional library sphere and reach from local to global
It’s nothing new
The continuum of interoperability
The BM rules and libraries start to describe things more consistently
AACR – then AACR which has the english speaking world describing things in the same way
MARC – that allows us to exchange data with each other for the 1st time
Collaboration – and MARC brings an era of collaboration between librares and develop shared cataloguing programmes and shared authorities like LCSH and LCNAF
RDA – then we have new modern description guidelines that has an RDF representation
And then we have published open vocabularies like those from the Getty
And then we Linked Open
MARC, RDF
Closed open
Local global
It’s all familiar too you it just has a new name and new technologies to try bamboozle you
15 million records in the DOD representing the digitised output of the Library. Digitsed books maps, broadsides, photos, posters, manuscripts.
It’s a nice place to start coz it’s home grown and very familiar, has a nice structure and the resources are consistently well described using AACR and a range of standard and open vocabularies.
We output in XML a very small section of data of the photos of the construction of the Forth Bridge and worked at transforming the XML in to RDF. So we took our W3C schools.
We mapped our data to the RDF representation of Dublin Core
We tried to discover URIs for the vocabularies that we used using a tool called Google Refine
We made RDF for a single resource
And we openly published the DOD element set in an open registry.
And here’s what we
Over the summer we’ve done some experimenting and research into RDA and RDF.
We were interested in learning more about RDA, not the faux RDA we use in MARC which just seems to be a couple of new fields about content and carrier but real pure RDA. Cataloguing from scratch in RDA, learning all about the benefits of FRBR.
To do this we used RIMMF which is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC). From RIMMF you can output RDA WEMI in RDF. I’ll tell you a secret, you can also output it as MARC!)
.RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
You could publish that as a document, but then you would have to issue updates everytime something changed - a registry has much better functionality if the RDF is likely to change, which RDA is/does
We were interested in learning more about RDA, not the faux RDA we use in MARC but real pure RDA.
RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
RIMMF is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC)
You could publish that as a document, but then you would have to issue updates everytime something changed - a registry has much better functionality if the RDF is likely to change, which RDA is/does
So from our experiments I have 3 tips for you to consider if you want to prepare yourself for linked data
From all our experiments the biggest issue is this
The data we have in the Library and that you all probably have is that we record “strings”. Human readable strings It’s a piece of text.
Linked data is about linking things together or URIs so machines can process them. So we need to gather the URIs
Now you might think that a machine could do the matching for you but machines are really really stoopid
So consider this – this is a real example from the library.
We have 3 several databases and people are sometimes described differently in each database.
Let’s say we want to find the URI in LCNAF for me. (well actually I’m not in LCNAF)
Do you think this is the same person?
Probably! And if not you’d get on the phone and ask. Then you’d find out that I’m mortified by my middle name Wendy
But machines can’t do that. Machines don’t know that these are the same people. You’d have to program a lot of regular expression work and checking other data and still you couldn’t be confident that a machine would get it right.
So machines can’t be trusted to go off and look for URIs. You’ll get multiple hits and false drops.
The data we have in the Library and that you all probably have is that we record “strings”. Human readable strings. Not things, URIs, machine processable
Her;s another one ….. Horses.
You’ve described horses in your database and you want a machine to try and find URIs for that?
Not a chance!
It’s hard for machines
It’s about language
It’s about ambiguity
It’s simple for humans, machines have no intellignence
So my tip is …...
In your database stop recording only the string
When you record Gill Hamilton record the URI too –
When you write the word “horse” into your database record it’s URI too from LCSH
And look in your authority file and see “pseudo URIs” like the LCSH and LCNAF id numbers – these are infact, usually the URI of. You can’t always be sure of this tho, especially with things like subject headings due to their complexity, but it’s probably OK for names.
Get others to help you find URIs
You can crowdsource this – send the machine off to find Cambria and it tells you there are 2 Cambrais. Well ask people what they think is the correct Cambrai. The crowd will tell you
Also, we had colleagues at work do this for us. Students found URIs for LCSH and DDC for us for a subset.
Or if you have colleagues who have work on reception desks, where their work is stop start.
We learned this for DOD.
Very well considered database structure.
We clearly understand the semantics of our database.
When we say “keyword” we know what that means.
When we say “Who” we know what that means.
We were experimenting we were so very pleased to make some RDF.
But we really really hated that we had to squeeze our data in to dublin core and it caused us to lose data
We didn’t like that we were losing semantics
We spend a lot of money on smart people and we don’t want to lose the meaning.
Our DOD.who is much much richer than DC:creator, we can indicate roles. Author, depicted, is collector, is subject of
So we explored a different approach, actually a more open approach.
We published the structure of the DOD as an RDF representation.
Converted it into LOD
It means everyone can see how you structure you data. It is open
You don’t need to compromise your data. It’s in its orginal
And then you can write mappings in to other formats
Can have you cake and eat it!
To do this you need some kind of registry to record and publish your element set. It’s actually quite straightforwad
It means everyone can see how you structure you data. It is open
You don’t need to comprimise your data. It’s in its orgina
We were interested in learning more about RDA, not the faux RDA we use in MARC but real pure RDA.
RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
RIMMF is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC)
You could publish that as a document, but then you would have to issue updates everytime something changed - a registry has much better functionality if the RDF is likely to change, which RDA is/does
In terms of linked data you don’t need to worry about the bibliographic universe.
Someone else will sort that, the publishers or the national libraries. What you should concentrate on are your unique collections. Describe them and describe them well. You probably already do that, or are thinking of doing it.
Others will sort the traditional published bibliographic universeno-one will sort the unique stuff
When you have limited resources only focus on your unique collections, invest your time there. For example the published output of the UK available as LOD – that’s a problem for national libraries. BL have started doing this by publishing LOD for BNB. Invest your effort on what is unique. Perhaps that photo collection. Perhaps those local history pamphlets. Perhaps those manuscripts
Best contribution. You will then make the best contribution. You wont be replicating anything else that is being done and you can be satisfied that it is a valueable contribution. You add to the linked data universe, you don’t duplicate it.
BEST metadata –
what we’re thinking about in the Library is, coz of our strategy, we will most likely digitised a lot fo unique material such as our manuscripts and archives. To do that we need to touch the describe them. As we describe them we can record URIs from open vocabularies. So we improve access to the collections in terms of traditional and linked metadata and access in that we can present a digital version of the original. WIN WIN WIN
So to recap
We were interested in learning more about RDA, not the faux RDA we use in MARC but real pure RDA.
RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
RIMMF is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC)
You could publish that as a document, but then you would have to issue updates everytime something changed - a registry has much better functionality if the RDF is likely to change, which RDA is/does
It’s the O in open.
To link you need to publish open coz others will use and re-use your metadata for the purposes of linking.
If you’re nervous about this remember
Metadata is an advert to the resource, it isn’t the resource. Your digital object can be licensed another way
You don’t need to publish all of your metadata as CC-0. Perhaps you have curated info that you want to retain the intellectual property over. Just don’t include it with the metadata that is CC-0
So you might not be able to do anything to make your metadata linked but others will do it for you. For example giving your metadata for digital resources to Europeana and they will turn it into linked open data to power Europeana. The data your sending to OCLC is being turned in to linked data.
use open vocabulariesthe big library ones are:DDCLCNAFLCSHTGMIthe others are TGN, AAT, and perhaps v specificy vocs for your collection focus
If you have a specialised local voc then consider publishing it and mapping to other open vocabularies
https://commons.wikimedia.org/wiki/File:LibraryCongressFront1.JPG
ionEnglish: Library of Congress, Washington, D.C., United States
Français : Extérieur du Thomas Jefferson Building de la Bibliothèque du Congrès à Washington D.C., aux États-Unis.
Date8 June 2006SourceOwn workAuthorTheAgency (CJStumpf) 21:26, 9 February 2007 (UTC)
https://commons.wikimedia.org/wiki/File:Getty_Museum_from_Getty_Research_Institute,_February_21,_2015.jpg
DescriptionEnglish: View of Getty Museum at Unforgetting LA Edit-a-Thon at Getty Research Institute, February 21, 2015
Date21 February 2015, 16:48:40SourceOwn workAuthorPbjamesphoto
Demand better systems that can use modern content standards such as RDA
That can help you manage URIs (creation, deprecation)
Help you publish LOD and make RDF represenations
<a title="By James Montgomery Flagg (Library of Congress) [Public domain or Public domain], via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File%3AUnclesamwantyou.jpg"><img width="256" alt="Unclesamwantyou" src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/Unclesamwantyou.jpg/256px-Unclesamwantyou.jpg"/></a>
We were interested in learning more about RDA, not the faux RDA we use in MARC but real pure RDA.
RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
RIMMF is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC)
You could publish that as a document, but then you would have to issue updates everytime something changed - a registry has much better functionality if the RDF is likely to change, which RDA is/does
We were interested in learning more about RDA, not the faux RDA we use in MARC but real pure RDA.
RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
RIMMF is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC)
You could publish that as a document, but then you would have to issue updates everytime something changed - a registry has much better functionality if the RDF is likely to change, which RDA is/does
Services that LOD relies on are all in beta. DDC is down, LC was down for maintenance, we don’t have systems do manage URIs. It’s all in Beta
It’s difficult convincing management to make even modest investment coz is difficult to demonstrate.
We’re going that way anyway …. It’s a continuum.
Reach out from the library boundary in to the global graph of linked data.