The document discusses the transition from MARC to linked data standards like RDF. Some key points:
- MARC was developed for a world with clear boundaries, but now information is fractal across the web. Linked data standards allow for a more interconnected approach.
- RDF uses URIs to identify concepts and allows for stating relationships between concepts as triples of subject-predicate-object. This provides a way to integrate library data with the broader web of data.
- Efforts are underway to assign URIs to concepts in standards like RDA to make library data available as linked open data using RDF. This will provide benefits like improved findability, interoperability, and integration with other datasets.
1. RDF, RDA,
and other TLAs
Dorothea Salo
Monday, January 2, 2012
2. Captatio benevolentiae
â˘I am not a cataloger.
â˘Not even working as a librarian these days!
â˘I am not a developer, either.
â˘(I am doing a bit of standards work. Not
in this area, though.)
â˘What I am? An educator and sometime
tech translator. I hope thatâs enough.
Monday, January 2, 2012
3. We built MARC when
stood between us and patron.
Photo: Deborah Fitchett, âCatalogue cardsâ http://www.ďŹickr.com/photos/deborahďŹtchett/2970373235/ CC-BY
Monday, January 2, 2012
4. We built MARC when
the world was clearly bounded.
Photo: NASA Goddard Photo and Video, âNASA Blue Marbleâ http://www.ďŹickr.com/photos/gsfc/4392965590/ CC-BY
Monday, January 2, 2012
5. These days,
stands between us and patron.
Photo: Declan Jewell, âMy Deskâ http://www.ďŹickr.com/photos/declanjewell/2743737312 CC-BY
Monday, January 2, 2012
6. These days,
worldâs looking a bit fractal!
Photo: NASA Goddard Photo and Video, âStill centered over the Atlanticâ http://www.ďŹickr.com/photos/gsfc/4409800816/ CC-BY
Monday, January 2, 2012
7. Review:
â˘Where are the less-than-perfect ďŹts
between library practice and the current
information landscape?
â˘What does this mean for library systems
of information organization?
Monday, January 2, 2012
8. Problems with MARC/AACR2/ISBD
(if youâre a networked computer)
â˘Globally-unique identiďŹers for whatâs in our
bibliographic universe?
â˘And what IS in our bibliographic universe, anyway?
â˘Interoperability? Who speaks MARC outside
libraries?
â˘This is a problem on both ends of the pipeline, these days!
â˘FREE TEXT (for anything not transcribed) MUST DIE.
â˘It is the LEAST consistent, internationalizable, interoperable way to record
information on a computer.
â˘Put another way: we havenât controlled all the cataloging practices we usefully could.
http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/
Monday, January 2, 2012
9. Practical implications
â˘Designing standards and practices around what
computers do well, and what they need in order
to do what they do.
â˘Designing for being PART of the data universe,
not all of it.
â˘âopen world assumption:â no one body has all the data! or all the answers!
â˘And nobody can impose their view of the world on everybody else. (Fortunately,
nobody necessarily has to.)
â˘Designing for consistency, ďŹexibility and
extensibility without sacriďŹcing comprehensibility
â˘(this is a tall order; weâre not there yet. is anyone?)
Monday, January 2, 2012
10. ... vocabulary note
â˘âSemantic Web:â Tim Berners-Lee
disappearing into his own navel.
â˘Term is a bit out-of-favor these days.
â˘âLinked data:â a real-world effort to make
large datastores more interoperable
â˘RDF: invented by the SemWebbers, now a
cornerstone for linked data
â˘Does this mean that all data will be stored as RDF? NO, IT DOES NOT (and
you have my permission to slap anybody who says it will).
â˘Totally possible to provide an RDF view onto non-RDF data, IF AND ONLY IF
the data structure and meaning are thought through in an RDFfy way.
Monday, January 2, 2012
12. Linked Data principles
http://www.w3.org/DesignIssues/LinkedData.html
â˘use URIs as names for things
â˘use HTTP URIs so that people can look
up those things
â˘(this is one of Linked Dataâs concessions to pragmatism, compared to the
original SemWebbers)
â˘when someone looks up a URI, provide
useful information, using the standards
â˘include links to other URIs so that they
can discover more things
Monday, January 2, 2012
13. Things computers like
â˘Unique identiďŹers
â˘for anything you plan to discuss or refer to
â˘that NEVER CHANGE OR DISAPPEAR. (Sorry, name-authority strings.)
â˘How do we do this given the open-world assumption?
â˘Consistent, predictable, human-language-
independent data
â˘Free text (including punctuation) makes computers sad. They arenât human.
They donât understand it. They can be cued to PRODUCE it, but only based on
rules theyâre given about the underlying data.
â˘Computers produce typography and layout, but donât understand those, either.
â˘Controlled vocabularies
â˘(If theyâre well-provisioned with identiďŹers; see above.)
Monday, January 2, 2012
14. Globally unique identiďŹers
â˘Astonishingly, we already have a relatively
easy way to do this. The Web is an inďŹnitely
extensible information space: all the
globally-unique identiďŹers we can dream up!
â˘Term of art: âURI.â
â˘In practice, 99 times out of 100 this will be a plain old ordinary URL.
â˘The 100th time, itâll mostly look like a URL, just with a different preďŹx.
â˘EVERYTHING in linked-data-land revolves
around URIs. Theyâre plumbing.
â˘And like plumbing, we usually donât have to look at them. Just know that theyâre
there.
Monday, January 2, 2012
15. URI wins
â˘Internationalization
â˘We can present http://viaf.org/viaf/99258155/ as âTchaikovsky, Peter
Ilich, 1840-1893.â A Russian library can present the same URI as
âЧаКкОвŃкиК, ĐĐľŃŃ ĐĐťŃиŃ, 1840-1893.â
â˘Both libraries can exchange information about Tchaikovsky and his works
(e.g. holdings) without language barriers due to the URI intermediary.
â˘Interoperability
â˘Websites with Tchaikovsky information? Finding aids? Metadata for
digitized images? Can all use this URI to refer to Tchaikovsky. This makes
it painless for computers to aggregate Tchaikovsky-related information,
with minimal if any human intervention!
Monday, January 2, 2012
16. What to do with URIs
â˘RDFâs answer: âWe say things about stuff.â
â˘At base, RDF really is that simple!
â˘Base unit of RDF: âtripleâ
â˘Subject, property, value/object. Much like subject-verb-object in English sentence.
â˘Example: âDorothea Salo is the author of âInnkeeper at the Roach Motel.ââ
Monday, January 2, 2012
17. What to do with URIs
â˘RDFâs answer: âWe say things about stuff.â
â˘At base, RDF really is that simple!
â˘Base unit of RDF: âtripleâ
â˘Subject, property, value/object. Much like subject-verb-object in English sentence.
â˘Example: âDorothea Salo is the author of âInnkeeper at the Roach Motel.ââ
isAuthorOf
âInnkeeper at the
Dorothea Salo
Roach Motelâ
Monday, January 2, 2012
18. What to do with URIs
â˘RDFâs answer: âWe say things about stuff.â
â˘At base, RDF really is that simple!
â˘Base unit of RDF: âtripleâ
â˘Subject, property, value/object. Much like subject-verb-object in English sentence.
â˘Example: âDorothea Salo is the author of âInnkeeper at the Roach Motel.ââ
isAuthorOf
âInnkeeper at the
Dorothea Salo
Roach Motelâ
... wait. Whereâd all the URIs go?
Monday, January 2, 2012
19. A pause: just URIs?
â˘Not strictly, according to RDF.
â˘âLiterals,â that is, text strings, are also OK as objects. (Donât tell
catalogers this!) But theyâre STRONGLY discouraged.
â˘âBlank nodesâ can also happen -- usually when a triple wants to use an
entire RDF statement as object. In lieu of giving the entire statement its
own URI, you get a âblank nodeâ in the graph. Which is ugly, but so it
goes.
Monday, January 2, 2012
20. URI-izing a triple
isAuthorOf âInnkeeper at the
Dorothea Salo
Roach Motelâ
Monday, January 2, 2012
21. URI-izing a triple
http://viaf.org/viaf/ isAuthorOf âInnkeeper at the
21599115/ Roach Motelâ
Monday, January 2, 2012
22. URI-izing a triple
isAuthorOf http://
http://viaf.org/viaf/
digital.library.wisc.edu/
21599115/
1793/22088
Monday, January 2, 2012
23. URI-izing a triple
isAuthorOf http://
http://viaf.org/viaf/
digital.library.wisc.edu/
21599115/
1793/22088
vocabularies! with URIs!
Monday, January 2, 2012
24. URI-izing a triple
isAuthorOf http://
http://viaf.org/viaf/
digital.library.wisc.edu/
21599115/
1793/22088
Monday, January 2, 2012
25. URI-izing a triple
dcterms:creator http://
http://viaf.org/viaf/
digital.library.wisc.edu/
21599115/
1793/22088
Monday, January 2, 2012
27. MODS, too.
Hey, look, URIs!
(this is new in MODS version 3.4)
Monday, January 2, 2012
28. MODS, too.
Hey, look, URIs!
(this is new in MODS version 3.4)
Monday, January 2, 2012
29. (you should be able to read
these diagrams now)
Diagram: Stephen J. Miller, âTeaching RDA after the National Implementation Decisionsâ
Monday, January 2, 2012
30. (even these)
Diagram: Stephen J. Miller, âTeaching RDA after the National Implementation Decisionsâ
Monday, January 2, 2012
31. But... but...
â˘What if the same thing has two URIs?
â˘Foreseen problem! There are ways for linked data to express URI
equivalences... though there are huge arguments about when two URIs
are really-truly equivalent.
â˘My sense is that this decision is contextual. (AKA: âwill Amazon.com use
FRBR?â) Whatâs equivalent for your purposes may not be for mine. And
thatâs okay!
â˘Where do we get URIs from?
â˘This will be part of the new cataloging infrastructure a-borning, but the
answer works out to âa lot of the same places we already get authority
information and catalog records from,â e.g. VIAF.
â˘But weâre no longer LIMITED to just those! Key point. Think about ORCID!
Monday, January 2, 2012
34. But... but...
â˘Whereâs the record? And standards for
the record?
â˘The record is what you make it! Thereâll be metric tons of data about
Tchaikovsky linking to (and thus reachable through) his URL. (Somebodyâll
make a list of his pet dogsâ names. Guaranteed. People are funny about
dogs.) Whatâs useful to you, you use. What isnât, you ignore. Thatâs how
the open world works.
â˘If we need to impose rules on the data weâll be putting out there (and we
probably do!), there are ways to do that. We just canât expect to impose
those ways on anybody else. (Though we can put our rules out there for
others to follow, and we probably should!)
Monday, January 2, 2012
35. Trust: an unsolved problem
â˘Review: what happened with <meta>
tags on the web?
â˘Right. Whatâs to stop the same thing
happening in a linked-data environment?
â˘Whatâs to stop me from saying Iâm Tchaikovsky?
â˘The SemWeb people handwaved this for a long time.
â˘For our purposes? Weâll pick and choose the vocabularies and domains we
trust, I expect, just as we already do.
Monday, January 2, 2012
36. RDF in XML
â˘RDF has its own namespace, but no
schema (itâs an openended universe!).
â˘Root element: <rdf:RDF>
â˘Vocabulary in any other XML namespace
can be shoehorned into RDF triples.
â˘But donât fool yourself: RDF triples and graphs and standard XML vocabulary
hierarchies do NOT map cleanly or automatically to each other.
â˘So MARC/AACR2 is FAR from the only metadata expression thatâs looking at
a retooling!
â˘Typical triple expression in XML:
â˘<rdf:Description about=â{subject}â> <predicate /> <object />
</rdf:Description>
â˘XML is NOT the only syntax for RDF.
Monday, January 2, 2012
37. Retooling tools: GRDDL,
SKOS, and OWL
â˘Gleaning Resource Descriptions from
Dialects of Languages
â˘W3C standard for providing a transformation of an existing XML
vocabulary into an RDF expression.
â˘Once thereâs a GRDDL transform, users of the vocabulary need change
(almost) nothing! Vocabulary instance + GRDDL transform = RDF!
â˘Simple Knowledge Organization System
â˘RDF data model (plus URIs, of course) for commonly-used controlled-
vocabulary structures such as thesauri and subject-heading lists.
â˘Web Ontology Language (yes, I know)
â˘SEMWEB NERDS ONLY. Ontologies are serious business.
Monday, January 2, 2012
38. So whatâs this âRDA Vocabulariesâ
work that Diane and Karen et al.
are doing?
Assigning URIs to stuff in RDA, so
that systems expecting URI-linked
data get it.
Seriously.
Thatâs what all the fuss is about.
Monday, January 2, 2012
39. RDFizing RDA
â˘What does RDA actually talk about?
â˘FRBR model: Group 1, 2, and 3 entities
â˘(though Group 1 is still kind of squidgy, really, and some application
developers are questioning its usefulness)
â˘DCMI model (because life can NEVER be simple)
â˘Relationships among entities
â˘What do we want to say about them?
â˘Are there existing ways to say these things that are good enough for our
purposes? Can we reuse them, or at least map to them?
â˘When there arenât, how do we say what we need to in ways that are most
useful for the rest of the world?
â˘Assigning URIs to it all
Monday, January 2, 2012
40. Model friction
â˘FRBR: entity-relationship model
â˘... like relational databases, which is nice
â˘not entirely RDFish, which is not quite so nice and has caused head-scratching
â˘But head-scratching is normal in this space! Modeling is hard!
â˘FRBR does give us some abstractions to
model and assign URIs to.
â˘And IFLA was supposed to do that... but they havenât.
â˘So the RDA folks have provisionally done it: FRBRoo.
â˘When IFLA gets back in the game, formal equivalences will be deďŹned and
published between FRBRoo and whatever IFLA comes up with.
â˘FRBR isnât perfect. (Gasp. I know, right?)
â˘So sticking strictly to FRBR as we model (relationships particularly) causes
problems for music and multimedia catalogers, among others.
Monday, January 2, 2012
41. RDA properties
â˘Expressed (URLized) without reference to FRBR.
â˘This is also the variant the linked-data web will generally see and use.
â˘Which makes a certain amount of sense, because itâs quite possible to understand a lot
of bibliographic data intuitively without reference to FRBR.
â˘And weâll never get the whole world to agree on FRBR; we canât even agree ourselves!
â˘Given âsubpropertiesâ which are the same
thing, only FRBRized (and with their own URLs).
â˘So the linked-data web sees a URL for âBook format.â
â˘But we, because we are librarians and our systems understand us, understand that
âBook formatâ is intrinsically tied up with a Manifestation.
â˘This also covers us when an RDA property may apply to more than one FRBR entity,
e.g. Extent: itâs the same property, but two subproperties!
Monday, January 2, 2012
42. Diagram: Hillmann et al., âRDA Vocabularies: Process, Outcome, Useâ D-Lib Magazine. http://www.dlib.org/dlib/january10/hillmann/01hillmann.html
Monday, January 2, 2012
43. The ugliest case:
Diagram: Hillmann et al., âRDA Vocabularies: Process, Outcome, Useâ D-Lib Magazine. http://www.dlib.org/dlib/january10/hillmann/01hillmann.html
Monday, January 2, 2012
44. ... wait, where did
Dublin Core go?
â˘Dublin Core, as we all know, is
annoyingly vague.
â˘Sadly, thereâs an awful lot of DC data that
weâll have to map into this model.
â˘Ironic but true: librarians invented DC for the larger web, and then became
nearly the only people to use it extensively.
â˘âSuperproperties:â DC terms that map to
several RDA properties. (E.g. âcreatorâ)
â˘Probably the worst way to solve the problem... except for all the other ways.
Monday, January 2, 2012
45. Disentangling
aggregated statements
â˘The last refuge of the text string!
â˘E.g. publication statements, which aggregate place, publisher name, and
date of publication.
â˘What if you only WANT one of those three bits of information? ARGH.
â˘RDA doesnât ďŹx this. So RDA Vocabs is
trying to.
â˘First, URLize each piece separately. Cool. Done. No problem.
â˘Then deďŹne a âSyntax Encoding Schemeâ for the aggregate. Yuck.
â˘I have to tell you, this is a heinously ugly âďŹx.â Given legacy data, though,
hard to imagine better.
Monday, January 2, 2012
46. Thereâs more modeling
pilpul.
A lot of it.
Iâll spare you.
Like I said, itâs plumbing.
Monday, January 2, 2012
47. What is actually happening?
â˘Weâre ďŹguring out what weâre talking
about.
â˘Weâre ďŹguring out what we want to say
about it.
â˘Weâre assigning URIs to all those things
(abstractions included!) so that we can
exchange information with the rest of the
web.
Monday, January 2, 2012
48. Summary by Diane Hillmann
http://managemetadata.org/blog/2011/09/08/ďŹne-wine-and-old-ďŹsh/
⢠Data should be able to be encoded in a variety of
ways, to suit a variety of functions, uses, and systems.
⢠Data should be managed at a granular, statement
level, but also be available in a variety of record
âformats.â (with records being understood as primarily an on-the-ďŹy method
of aggregating data for a variety of downstream users)
⢠Although current data is expressed mostly as text
strings, data improvement strategies will be designed
to change most of them to URIs as soon as
practicable.
⢠Data deďŹnitions and speciďŹcations will be easily
available on the web, allowing mapping to be simpler
and easier to tweak.
Monday, January 2, 2012
49. Library workďŹows
â˘Given what you now know about RDF
and linked data, and your experiences
with cataloging, how do you think the
practice of cataloging will change in an
RDF-based environment?
Monday, January 2, 2012
50. SPARQL
â˘With XML data, you generally just dump
it on the web and let people ďŹgure out
what (if anything) to do with it.
â˘This means a lot of translator-writing and bandwidth cost.
â˘(Thereâs an XML query language called XQuery, but nobody uses it.)
â˘You can do this with RDF too (and some do), but itâs not really ideal.
â˘SPARQL: query language for RDF.
â˘Looks a LOT like SQL, intentionally so. The hardest thing to get to grips
with is namespace declarations, and thatâs not really all that hard.
â˘âSPARQL endpoint:â URL for a given set of RDF data that you can send
queries to and get answers from.
â˘How does this change your answer about library workďŹows?
Monday, January 2, 2012
52. Your data ages like
Photo: Matthew, âred wine bottle 1â http://www.ďŹickr.com/photos/falcon1961/3408961521/ CC-BY
Monday, January 2, 2012
53. Your software
applications age like
Photo: amanda mandy, âpeixe pelo todo.â http://www.ďŹickr.com/photos/polaina/3128038858/ CC-BY
Monday, January 2, 2012