Presentation given on Dec. 4, 2014 at the University of Hawaii Library, on the topic of changes in the library metadata world, with a focus on Linked Open Data.
1. Diane I. Hillmann
Metadata Management Associates LLC
Presentation at the University of Hawaii
Wednesday, December 3, 2014
2. After we empty the
card catalog …
We hear that there’s
a crisis in libraries,
but we still haven’t
realized how
pervasive it is
Reality: we’ve gotten
rid of the cards, now
we need to get rid of
the catalog.
If we don’t, we may
lose our institutional
support, our mission,
and our way …
Linked Open Data, Hawaii (Dec. 2014) 2
3. “As librarians, we pride ourselves on operating outside of the
commercial marketplace. However, whether we like it or not, we
are working in an information environment the dynamics of which
are very much like those of a free market, except the the currency
spent by our “customers” is not money, but time and attention. …
We may believe, for example, that our carefully-crafted catalog
records provide excellent value in return for the time and energy
required to use them—and we may be right. But if our patrons
doubt that the catalog will return good value in exchange for the
time and energy required to use it, then whatever value the catalog
may actually contain becomes irrelevant.”
Rick Anderson, The Crisis in Research Librarianship,
Journal of Academic Librarianship, July 2011
Linked Open Data, Hawaii (Dec. 2014) 3
4. “We must look with cold and hard-headed rationality at our
current practices and ask ourselves not what value they offer,
but rather what value our patrons believe they offer. If what we
offer our patrons is not perceived as valuable by them, then we
have two choices: change their minds, or redirect our resources.
The former is virtually impossible; the latter is enormously
painful. But the latter is possible, and if we do not undertake
such a redirection ourselves, it will almost certainly be
undertaken for us.”
Rick Anderson, The Crisis in Research Librarianship,
Journal of Academic Librarianship, July 2011
Linked Open Data, Hawaii (Dec. 2014) 4
5. “Wikipedia is founded on the belief (largely correct, as it
turns out) that crowds both can and will provide high-quality
content and metadata to the world at no charge.
For our part, in research libraries we still tend to treat
books as if they are primarily tools for linear reading, and
metadata records as artisanal products. We still build
collections that are fenced off from the larger information
world and encourage our patrons, against all reason, to
begin their information searches within the confines of
our artificially limited collections.”
Rick Anderson, The Crisis in Research Librarianship,
Journal of Academic Librarianship, July 2011
Linked Open Data, Hawaii (Dec. 2014) 5
6. “In the big picture, very little will change: libraries
will need to be in the data business to help people
find things. In the close-up view, everything is
changing-- the materials and players are different,
the machines are different, and the technologies
can do things that were hard to imagine even 20
years ago.”
Eric Hellman
http://go-to-hellman.blogspot.com/2011/07/library-data-why-bother.html
Linked Open Data, Hawaii (Dec. 2014) 6
7. “Today, we face another significant time of change that is
being prompted by today’s library user. This user no
longer visits the physical library as his primary source of
information, but seeks and creates information while
connected to the global computer network. The change
that libraries will need to make in response must include
the transformation of the library’s public catalog from a
stand-alone database of bibliographic records to a highly
hyperlinked data set that can interact with information
resources on the World Wide Web. The library data can
then be integrated into the virtual working spaces of the
users served by the library.”
--Karen Coyle, Understanding the Semantic Web: Bibliographic Data and Metadata, Jan. 2010
Linked Open Data, Hawaii (Dec. 2014) 7
8. If all of this sounds otherworldly and vague, it is
because there is no specific vision of where these
changes will lead us. The crystal ball is unfortunately
shortsighted, in no small part because this is a time of
rapid change in many aspects of the information
ecology. The few things that are certain, however,
point to the Web, and its eventual successors, as the
place to be. For libraries, this means yet another
evolutionary step in the library of our catalog: from
metadata to metaDATA.”
--Karen Coyle, Understanding the Semantic Web: Bibliographic Data and
Metadata, Jan. 2010
Linked Open Data, Hawaii (Dec. 2014) 8
9. Questionable Assumptions?
We’re going to continue to build records for library catalogs
We’ve always shared ‘records’ in cataloging, and that’s still
the right way to share data
The choice of the ‘right metadata format’ (e.g., DC, MODS,
RDA, etc.) is critically important
The proliferation of metadata formats is a bad thing
The ‘old’ way of cataloging materials one-at-a-time always
produces better quality data than any other method
Linked Open Data, Hawaii (Dec. 2014) 9
10. Questioning Our
Data Models
Today’s metadata is not
about choices of formats, it’s
about ensuring
interoperability and
harmonization for our data in
the world
Our old model is based on
catalog cards, regardless of
the methods of storage and
delivery through our online
catalogs
The new metadata
environment provides better
ways to express
relationships—both content
to content and concept to
concept
Linked Open Data, Hawaii (Dec. 2014) 10
11. Model of ‘the World’ /XML
XML assumes a 'closed' world (domain),
usually defined by a schema:
"We know all of the data describing this
resource. The single description must be a
valid document according to our schema.
The data must be valid.”
XML's document model provides a
neat equivalence to a metadata
'record’
Linked Open Data, Hawaii (Dec. 2014) 11
12. Model of ‘the World’ /RDF
RDF assumes an 'open' world:
"There's an infinite amount of unknown data
describing this resource yet to be discovered. It
will come from an infinite number of providers.
There will be an infinite number of descriptions.
Those descriptions must be consistent."
RDF's statement-oriented data model
has no notion of 'record’ (rather,
statements can be aggregated for a
fuller description of a resource)
Linked Open Data, Hawaii (Dec. 2014) 12
13. RDF? Huh?
The current Web is primarily a Web of DOCUMENTS, where
URLs embedded in documents link to other documents. The
Semantic Web is a Web of DATA that exists outside of
documents, and focuses on meaning or semantics
RDF is a general-purpose frame work that provides
structured, machine-understandable metadata for the Web
RDF Schemas (RDFS) describe the meaning of each
property name, Web Ontology Language (OWL) is also used
Metadata vocabularies can be developed without central
coordination
Linked Open Data, Hawaii (Dec. 2014) 13
14. Semantic Web Building Blocks
Each component of an RDF statement (triple) is a
“resource”
RDF is about making machine-processable
statements, requiring
A machine-processable language for representing RDF
statements
A system of machine-processable identifiers for
resources (subjects, predicates, objects)
Uniform Resource Identifier (URI)
For full machine-processing potential, an RDF statement
is a set of three URIs
Linked Open Data, Hawaii (Dec. 2014) 14
15. Subject Predicate Object
Austen, Jane
Bath, UK
Pride and
prejudice
“1813”
is author of
has place of residence, etc.
has date of publication
[Object]
[Subject]
[Object]
[Subject]
Linked Open Data, Hawaii (Dec. 2014) 15
17. What is
Linked Open
Data?
”… a term used to
describe a recommended
best practice for exposing,
sharing, and connecting
pieces of data,
information, and
knowledge on the
Semantic Web using URIs
and RDF."
Linked Open Data, Hawaii (Dec. 2014) 17
18. Five Linked Data
Make your stuff available on the Web (whatever format)
under an open license
Make it available as structured data (e.g. Excel instead of an
image scan of a table)
Use non-proprietary formats (e.g. CSV instead of Excel)
Use URIs to denote things, so that people can point at your
stuff
Link your data to other data to provide context
Linked Open Data, Hawaii (Dec. 2014) 18
19. Linked Data is Inherently
Chaotic
Requires creating and aggregating data in a broader context
There is no one ‘correct’ record to be made from this data, no
objective ‘truth’
This approach is different from the cataloging tradition
BUT, the focus on vocabularies is familiar
Linked data relies on the RDF model (although XML can be
used to express RDF, it’s not always a happy marriage)
The bottom-up chaos and uncertainty of the linked data
world is possibly the hardest thing for catalogers to get their
heads around
Linked Open Data, Hawaii (Dec. 2014) 19
20. Delving
Deeper Into
Data
Where do we find sources of data
that might be useful to us in the
short and long term?
How do we assess this data for
quality and stability?
Are we sure this data will work
better for us than what we use
now?
Linked Open Data, Hawaii (Dec. 2014) 20
22. Where do the Identifiers come
from?
LC NAF (Jane Austen)
RDA Registry (isAuthorOf)
Worldcat (Work identifier)
Bath, UK (Geonames)
Date is a ‘literal’, with no identifier (but could be ‘typed’)
Linked Open Data, Hawaii (Dec. 2014) 22
27. Jane Austen identifier
information from VIAF
“Pride and prejudice”
work identifier
Note that the ‘person’ portion is embedded from the VIAF files,
But the record is about “Pride and prejudice” and the display is created
by OCLC.
Linked Open Data, Hawaii (Dec. 2014) 27
36. New Data Management?
Managing data at the statement level rather than record
level
Emphasis on evaluation coming in and provenance going
out
Shift in human effort from creating standard cataloging
records to knowledgeable human intervention in machine-based
processes
Extensive use of data created outside libraries
Intelligent re-use of our legacy data and redistribution of our
data more widely
Linked Open Data, Hawaii (Dec. 2014) 36
37. Big Challenges/Big Ideas
Records are still important but not as we’ve used them in the
past
We might want to think about records as the instantiation of a
point of view [News: traditional library data has a point of view]
In this world, records are ‘packages’ for pickup and delivery
MARC required consensus because of limitations built into
the technology
For any data in statements destined for the Semantic Web, we
need provenance, so we know “Who sez?”
Being able to assign quality and trust markers for statements
based on who, what, when is critical
Linked Open Data, Hawaii (Dec. 2014) 37
38. Mapping Our
Way Around
-There’s not just one way to
align bibliographic data: we
don’t have to agree on one
‘authoritative’ mapping
-‘Crosswalking’ strategies,
aimed at use by particular
applications, see that
activity as primarily
accomplished by networks
-Crosswalks only recognize
one relationship: sameAs—
a very blunt instrument!
Linked Open Data, Hawaii (Dec. 2014) 38
39. What We Mean by ‘Mapping’
rdam:extent
bibo:numVolumes bibo:numPages
dct:extent
m21:M300
isbd:”has extent”
rdam:extentOfText
dct:format
rdau:extent
rdau:extentOfText
dc:format
rdau:duration
rdae:duration
m21:M306__a
unim:U127__a
unim:U215__a
Linked Open Data, Hawaii (Dec. 2014) 39
42. Will This Shift Cost Too Much?
We need to support efforts to invest in more distributed
innovation and focused collaboration
It’s the human effort that costs us
Cost of traditional cataloging is far too high, for increasingly
dubious value
Our current investments have reached the end of their
usefulness
All the possible efficiencies for traditional cataloging have
already been accomplished
Waiting for leadership from the big players costs valuable
time with no guarantees of results
Linked Open Data, Hawaii (Dec. 2014) 42
43. How Does Quality Happen?
Lessons from the library community
Quality is quantifiable and measurable
To be effective, enforcement of standards of quality must
take place at the community level
Looking more broadly:
Data problems are not unique to particular communities
General strategies can improve interoperability
Quality is not tied to any particular creation strategy
Human created metadata can be extremely variable
Machine-created metadata is far more consistent, but that
consistency may not be correct
Linked Open Data, Hawaii (Dec. 2014) 43
44. The Bottom Line
Our big investment is (and has always been) in our data, not
our systems
Over many changes in format of materials, we’ve always
struggled to keep our focus on the data content that
endures, regardless of structure or presentation format
We are in a great position to have influence on how the
future develops, but we can’t be afraid to change, or afraid
to fail
Linked Open Data, Hawaii (Dec. 2014) 44
45. Additional Resources
Introducing Linked Data and The Semantic Web
(http://www.linkeddatatools.com/semantic-web-basics)
Free Your Metadata (http://freeyourmetadata.org/)
Linked Open Data Laundromat (http://lodlaundromat.org)
Van Hooland, Seth and Ruben Verborgh. Linked data for
libraries, archives and museums : how to clean link and
publish your metadata. Chicago : Neal-Schuman, 2014
Linked Open Data, Hawaii (Dec. 2014) 45
46. Contact Information
Diane Hillmann
metadata.maven@gmail.com
Links:
http://RDARegistry.info
http://marc21rdf.info
http://managemetadata.com/blog/
The First MetadataMobile