2. DPLA• What is it?
• Where are its materials coming from?
• Where is its metadata coming from?
• What does that tell us about the metadata?
• How do you think they’ll collect the metadata?
• What will they need to do with the metadata, once
collected?
• What problems will they run into, do you think?
3. Some eternal verities
• What’s in our catalogs isn’t all the
metadata (broad sense) we have.
• BLASPHEMY: a lot of that catalog metadata probably
isn’t even the most important metadata academic
libraries have! Why might that be?
• Possibly not the most prolific source of metadata
either. This will be truer as time passes. Why?
• What about public libraries? Archives?
• The rest of our metadata exists in many
forms and formats.
• The major, often only, form of interaction
with our metadata is computer-mediated.
• Other people have metadata too!
4. Practical implications
• We need to design standards and practices around
what computers do well, and what they need in
order to do what they do.
• We need to design for being PART of the data
universe, not all of it.
• “open world assumption:” no one body has all the data! or all the
answers!
• And nobody can impose their view of the world on everybody
else. (Fortunately, nobody necessarily has to.)
• Designing for consistency, flexibility and
extensibility without sacrificing comprehensibility
• (this is a tall order; we’re not there yet. is anyone?)
5. Things computers like
• Unique identifiers
• for anything you plan to discuss or refer to
• that NEVER CHANGE OR DISAPPEAR. (Sorry, name-authority strings.)
• How do we do this given the open-world assumption?
• Consistent, predictable, human-language-independent
data
• Free text (including punctuation) makes computers sad. They aren’t
human. They don’t understand it. They can be cued to PRODUCE it, but
only based on rules they’re given about the underlying data.
• Computers produce typography and layout, but don’t understand
those, either.
• Controlled vocabularies
• (If they’re well-provisioned with identifiers; see above.)
6. We have
and we both love and hate them.
Photo: Doc Searls, “silos,” http://www.flickr.com/photos/docsearls/5500714140/ CC-BY
8. Possibility 1:
One standard to rule them all
• Issues with this?
• Technical issues
• Quality issues
• Language issues
• Sociological issues
• Who’s trying this? On what level?
9. Possibility 2:
Metasearch
• Issues with this?
• Technical issues
• Quality issues
• Sociological issues
• Who’s trying this? On what level?
Diagram: Angela Pratesi and Kalsang (by permission)
10. Possibility 2:
Metasearch
• Issues with this?
• Technical issues
• Quality issues
• Language issues
• Sociological issues
• Who’s trying this? On what level?
11. Possibility 3:
Big metadata bucket
• Issues with this?
• Technical issues
• Quality issues
• Sociological issues
• Who’s trying this? On what level?
Diagram: Angela Pratesi and Kalsang (by permission)
12. Possibility 3:
Big metadata bucket
• Issues with this?
• Technical issues
• Quality issues
• Language issues
• Sociological issues
• Who’s trying this? On what level?
13. How do you make a big
metadata bucket?
• Given...
• Different file formats (XML, relational-database,
Excel, plain-text, etc)
• Different structures with different granularity
• Different standards... or no standard at all
• Different controlled vocabularies... or none
• One option: the Google route
• But what do we lose there?
14. Crosswalking: the nxn problem
• As you build your bucket, you find that
people are using n metadata standards.
• You decide you want to be able to translate
any of them into any of the others.
• Guess what? You need to write nxn-n
(nearly n2) crosswalks.
• This gets impossibly unwieldy very quickly. How many
metadata standards do you know about, just from
this class?
• And how compatible will the standards be, anyway?
15. Okay, okay, master
standard, then!
• Crosswalk everything you take in to one
standard. Then you only need to write n
crosswalks!
• Issues with this?
• Technical issues
• Quality issues
• Language issues
• Sociological issues
17. Five stars of linked data
(the first three, at least)
Sir Tim Berners-Lee:
18. Review: URLs as identifiers
• Where have we seen this already?
• Why URLs?
• What library-type stuff has already been
identified with URLs?
• What would need to be, do you think?
19. So, seriously...
• Every term in every controlled vocabulary, every
element in every metadata standard, every
“document” we might ever talk about (in all its
FRBRish permutations) needs its own URL?
• SERIOUSLY?
• ... basically, yep.
• Not every time. (Dates are dates. Human names are strings.)
• It gets worse, though: XML-based languages use element
nesting to carry meaning, and relational databases use table
membership and data typing. How do you translate THOSE
to URLs?
23. The fundamental strategy
• Break down everything we can say about
the world into the smallest units of
meaning we can manage.
• That’s smaller than you’d think, as we’ll see!
• Build up search indexes, user displays,
and machine interactions from there.
• I’m being vague about “machine interactions.” Don’t
take that to mean they aren’t important! They’re
just a bit more than I can explain here and now.
• Try not to reinvent wheels.
• But if you must, make sure to link new and old.
25. Okay, so we have a
bunch of URIs.
What do we actually DO with them?
We plug them into RDF.
26. ... vocabulary note
• “Semantic Web:” Tim Berners-Lee disappearing
into his own navel.
• Term is a bit out-of-favor these days.
• “Linked data:” a real-world effort to make large
datastores more interoperable
• RDF: invented by the SemWebbers, now a
cornerstone for linked data
• Does this mean that all data will be stored as RDF? NO, IT
DOES NOT (and you have my permission to slap anybody
who says it will).
• Totally possible to provide an RDF view onto non-RDF data,
IF AND ONLY IF the data structures and meanings are
thought through in an RDFfy way.
27. What to do with URIs
• RDF’s answer: “We say things about stuff.”
• At base, RDF really is that simple!
• Base unit of RDF: “triple”
• Subject, property, value/object. Much like subject-verb-
object in English sentence.
• Example: “Dorothea Salo is the author of ‘Innkeeper at the
Roach Motel.’”
Dorothea Salo
“Innkeeper at the
Roach Motel”
isAuthorOf
... wait. Where’d all the URLs go?
30. Building up from triples
Diagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”
31. ... which can get tangled
Diagram: Stephen J. Miller, “Teaching RDA after the National Implementation Decisions”
32. But... but...
• What if the same thing has two URIs?
• Foreseen problem! There are ways for linked data to express
URI equivalences... though there are huge arguments about
when two URIs are really-truly equivalent.
• My sense is that this decision is contextual. (AKA: “will
Amazon.com use FRBR?”) What’s equivalent for your
purposes may not be for mine. And that’s okay!
• Where do we get URIs from?
• This will be part of the new cataloging infrastructure a-
borning, but the answer works out to “a lot of the same
places we already get authority information and catalog
records from,” e.g. VIAF.
• But we’re no longer LIMITED to just those! Key point. Think
about ORCID!
33.
34. But... but...
• Where’s the record? And standards for
the record?
• The record is what we make it! What’s useful to us,
we use. What isn’t, we ignore. That’s how the open
world assumption works.
• If we need to impose rules on the data we’ll be
putting out there (and we probably do!), there are
ways to do that.
• We just can’t expect to impose those ways on
anybody else. (Though we can put our rules out
there for others to follow, and we probably should!)
35. Trust: an unsolved problem
• Review: what happened with <meta> tags
on the web?
• Right. What’s to stop the same thing
happening in a linked-data environment?
• What’s to stop me from writing a triple that says
I’m Tchaikovsky?
• For our purposes? We’ll pick and choose the
vocabularies and domains we trust, I expect, just
as we already do.
38. Thanks!
• Copyright 2013 by Dorothea Salo.
• This lecture and slide deck are licensed
under a Creative Commons Attribution
3.0 United States License.
• Please respect ownership and licensing
of included materials. Thanks!