Kevin Balster presented on current and future cataloging practices, focusing on the transition from MARC to BIBFRAME. Some key points:
- MARC has limitations like being library-specific and not supporting linked data. BIBFRAME is being developed to replace MARC using linked data standards like RDF.
- BIBFRAME models bibliographic data using RDF triples and allows linking to other datasets on the semantic web. It also better supports serials modeling than MARC.
- The transition will require mapping legacy MARC data to BIBFRAME and new cataloging workflows. Standards and tools are still in development and the change will not happen overnight.
2. Current Cataloging Environment
•Content Standards: RDA, AACR2, DACS, DCRM, other community guidelines (e.g., OLAC, MLA,
CONSER), etc.
•Encoding Standards: MARC 21, EAD, Dublin Core, MODS, MADS, VRA Core, schema.org, etc.
•Exchange Formats: MARC 21, RDF, etc.
3. What’s Wrong with MARC?
•Library specific format
•Stuck data
•Repetitive entry of shared metadata
•MARC Must Die – Roy Tennant
5. What is BIBFRAME?
•Vocabulary built on the Resource Description Framework (RDF)
•Terms describing the Libraries/Archives/Museum world
•Allows our data to live in a Linked Data environment
6. Down the Rabbit Hole (RDF)
•Simple statements to express relationships (AKA Triples)
•Best used with Uniform Resource Identifiers (URIs)
•The essential piece of Linked Data
(Subject) (Predicate) (Object)
Fahrenheit 451 written by Ray Bradbury
<http://worldcat.org/entity/work/id/268886> <http://rdaregistry.info/Elements/w/author> <http://dbpedia.org/resource/Ray_Bradbury>
7. Further Down the Rabbit Hole (Linked Data)
•Data encoded in RDF, using global identifiers
•Published online, supporting use with data from other sources
Library Land Wikipedia
Fahrenheit 451 written by Ray Bradbury Ray Bradbury
<http://…> <http://…> <http://dbpedia.org/resource/Ray_Bradbury>
9. BIBFRAME Developments
•Version 2.0
•Library of Congress Tools: MARC to BIBFRAME Conversion Specifications/Comparison Viewer
•Upcoming Tool: BIBFRAME Editor
•Library of Congress BIBFRAME Pilot Phase 2
•Zepheira: bibfra.me
10. Future Cataloging
•No direct cataloging in encoding format (i.e., none of this)
•Browser interface with material profiles
12. The Very Near Future of MARC 21
•Necessary to map legacy data
•Not all data will be turned into linked data
•Not all libraries will (be able to) make the switch
•Workflows depend on it!
13. The FRBR Family
•IFLA Library Reference Model (LRM)
• Resolves inconsistencies between FR models
• Drastic changes to serials
• RDA to be frozen August 2017-April 2018 (3R Project)
14. FRBR vs. LRM
Serial work
FRBR LRM
Expression Expression
Manifestation Manifestation Manifestation
Serial work
Expression
Manifestation
Serial work
Expression
Manifestation
Serial work
Expression
Manifestation
15. Exchange Formats in Action
•MARC 21 & RDF:
• Easily exchanged and interpreted
• Elements have universal and (relatively) persistent meaning
•MARC 21:
• Natural state: Record
• Innate mechanisms for version control and provenance
• “Over-shares”
•RDF (i.e., BIBFRAME)
• Natural state: Triple
• Lacks self describing metadata
• “Tight lipped”
16. RDF Questioning
•Who created the statement about the thing? (provenance)
• Based on what information? (authoritativeness)
•Did anybody ever make a different statement about the thing? (version control)
18. Next Steps
•In need of specialized cataloging communities
• BIBFRAME still being updated
• Best practices for BIBFRAME & LRM
•In need of vendors, programmers, and IT folks
• Linked Data technological infrastructure
20. Sources & Other Readings
•Roy Tennant, “MARC Must Die,” Library Journal October 15, 2002,
http://lj.libraryjournal.com/2002/10/ljarchives/marc-must-die/#_
•“Semantic Web,” World Wide Web Consortium (W3C), https://www.w3.org/standards/semanticweb/
•“MARC 21 to BIBFRAME 2.0 Conversion Specifications,” Library of Congress,
http://www.loc.gov/bibframe/mtbf/
•“BIBFRAME Comparison Tool,” Library of Congress, http://id.loc.gov/tools/bibframe/compare-id/full-
ttl
•MacKenzie Smith, Carl G. Stahmer, Xiaoli Li, and Gloria Gonzalez, “BIBFLOW: A Roadmap for Library
Linked Data Transition,” https://bibflow.library.ucdavis.edu/roadmap/
•Pat Riva, Patrick Le Bœuf, Maja Žumer, editors, “IFLA Library Reference Model,” March 2017 version,
International Federation of Library Associations and Institutions,
https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf
•James Hennelly and Judy Kuhagen, “3R Project,” Presentation for the RDA Steering Committee, May
16, 2017, http://www.rda-rsc.org/sites/all/files/3R%20Update%20Hennelly%20and%20Kuhagen.pdf
21. Image Sources
•RDA Toolkit: http://access.rdatoolkit.org/
•MARC 21 Format for Bibliographic Data (245 field):
https://www.loc.gov/marc/bibliographic/bd245.html
•Overview of the BIBFRAME 2.0 Model: http://www.loc.gov/bibframe/docs/bibframe2-
model.html
•Functional Requirements for Bibliographic Records, Final Report:
https://www.ifla.org/files/assets/cataloguing/frbr/frbr_2008.pdf
•Zepheira BIBFRAME Scribe: http://editor.bibframe.zepheira.com/static/
•OCLC Connexion
•BIBFLOW: A Roadmap for Library Linked Data Transition:
https://bibflow.library.ucdavis.edu/roadmap/
Editor's Notes
Hello and welcome. As you can probably guess from the title, you are going to be in for some BIBFRAME for the next hour. However, this will not be an overview of the nuts and bolts of BIBFRAME. Rather, I aim to provide a higher level overview of the BIBFRAME model, and show some of the ways that cataloging in a BIBFRAME environment may be different than in our current environment. And since BIBFRAME is not the only change on the horizon, I will also attempt to cover a few other upcoming changes.
So before we look to see where we’re going, it will be useful to see where we currently stand. Within the current cataloging environment, catalogers essentially work within the scope of a small set of various standards. Obviously, this list does not include any of the tools or technology that catalogers work with – things like cataloging utilities, integrated library systems, digital asset managements systems, or library service platforms, but I would like to focus on the small number of schemas we work with that dictate much of how we catalog.
No cataloger would need to deal with all of these standards, but all of these standards are likely used by at least a handful of catalogers. This presentation will mostly focus on the bolded standards.
First on our list are the content standards. These provide rules and instructions for describing the materials we catalog. They provide guidance on what constitutes title information, what to capitalize, etc. There are a varied number of these standards, and they are often community specific. RDA and AACR2 are the general rules, while DACS is used for archives, DCRM is for rare materials, and so on. There are also a number of community specific guidelines that often accompany the official standards. For example, in RDA, and previously in AACR2, the Program for Cooperative Cataloging and the Library of Congress often issues their own interpretations of instructions, and states their policies when they differ from the explicit instructions. And the Cooperative Online Serials Program, or CONSER, publishes several supplemental documents to help serials catalogers.
Next up are encoding standards. These stndards dictate how to encode metadata created when following a content standard. MARC 21 is probably the most common example, and is made up of a number of numeric fields and alphanumeric subfields that correspond to various metadata elements where the information is entered. Once again, there are a number of other encoding standards that are used by different communities.
Last up are the exchange formats. These are stable schemas that allow data to be exchanged between different parties. MARC 21 is once again the most common exchange format used in cataloging. By having strictly defined fields and subfields, metadata encoded in MARC 21 can easily be moved around and can be deciphered by anybody who knows what the heck MARC 21 is.
While MARC has been an extremely successful tool for catalogers, it is really showing its age. In the age of the internet, MARC stands out by being virtually unknown or ignored by the “regular” community, and is used solely within the library community. The structure of MARC is also extremely rigid. Metadata recorded in MARC records are extremely difficult to pull out in an automated fashion, and they cannot easily be integrated with data from other sources.
Another issue with using MARC when cataloging in RDA is that it is completely self contained. The RDA model, based on FRBR has a structure that allows for common metadata to be shared amongst several resources, but MARC records are single, flat files, so common metadata must be re-entered for every record.
The limitations of MARC have been well documented, but up until recently, there have been no good options for replacement.
Depending on your opinion, the beginning of the end of MARC came in 2011 when the Library of Congress announced the Bibliographic Framework Initiative (BIBFRAME). They tapped Zepheira, a private company founded by Eric Miller, who was formerly part of the World Wide Web Consortium (W3C), to formulate the first draft of the BIBFRAME data model. The intention of BIBFRAME is to serve as the replacement for MARC 21.
Even though BIBFRAME is intended to replace MARC, its structure is vastly different.
BIBFRAME is built on the Resource Description Framework, or (RDF), which we will cover in just a minute.
The scope of BIBFRAME is the Libraries/Archives/Museum world, but it is being built in such a way that it should be interoperable with outside schemas and vocabularies, and would allow our data to live on the web in a linked data environment.
RDF: Resource Description Framework. The central piece of Linked Data. RDF is a data model for making simple statements about resources and the relationships between them. This model is expressed as triples, which follows a subject-predicate-object structure. The subject and object normally represent the entities, and the predicate represents the relationship between them. For example, the statement Fahrenheit 451 was written by Ray Bradbury, can be translated into a triple with Fahrenheit 451 as the subject, Ray Bradbury as the object, and the relationship “written by” as the predicate.
By describing entities and relationships in triples, and describing everything using a URI, we can add semantic meaning to the statement. This semantic meaning is incredibly important for machine “understanding.” If a machine comes across a statement coded in HTML, it cannot parse out the meaning of the statement. It’s just a collection of text. By providing a dereferenceable URI which can stand in for the entity, we can provide a contextual anchor that is useful if the machine comes across other entities tied to the same URI.
Linked Data: describing data using HTTP URIs/IRIs in an RDF framework. This allows machines to “understand” the semantics of the triples.
Here we see an example of how assigning a URI to an entity in a triple can facilitate connections to other resources. By using a “global” URI for Ray Bradbury, in this case a URI tied to Wikipedia, any other statements that include references to Ray Bradbury that are tied to the same URI can be linked to our triple.
Another important piece of linked data is publishing RDF triples online and making them available for querying using standard query tools such as SPARQL. This is what allows RDF triples from different sources to be linked together.
Work – Instance – Item
This differs from RDA/FRBR’s WEMI model, but the idea is similar. The reasoning is that BIBFRAME should be content model agnostic since RDA is just one of many content models used by the library/archives/museum community. In theory, the RDA manifestation and item correspond one-to-one with the BIBFRAME instance and item, while the RDA work and expression both correspond to the BIBFRAME work. This means that a translation of a work, which would be treated as an expression of the same work in RDA would be treated as a different work in BIBFRAME. In either case, there would be a relationship established between the two entities.
The Library of Congress released version 2.0 of BIBFRAME in April 2016. Now available on the Library of Congress BIBFRAME site are tools that allow for the investigation of LC’s mapping from MARC to BIBFRAME. The conversion specifications contains instructions on how to perform the conversions, and the Comparision Viewer provides a way of seeing an individual LC record get transformed into BIBFRAME.
Coming soon is the BIBFRAME Editor which is being built to support the 2nd phase of LC’s BIBFRAME pilot which was scheduled to begin in early June 2017.
And Zepheira has since moved on to further develop its own version of BIBFRAME that is still being updated.
In our current environment, catalogers frequently work directly in an encoding schema. When cataloging in MARC, catalogers almost universally work directly in MARC, either in a cataloging utility, or an ILS, or some other system. This cataloging interface will most likely make dramatic changes in a linked data environment.
Both the Library of Congress Editor that is being finalized, and Zepheira’s Scribe tool are browser based cataloging tools where catalogers follow prompts that are customized for particular format profiles. Browser based cataloging using profiles allows for varied options on the cataloging prompts. When cataloging directly into a MARC record, the metadata we record is stringently based on how MARC 21 defines the various fields. When cataloging outside of a specific format, prompts can be based on encoding standards, such as RDA, or built in-house using non-standard terms. All that matters is that the recorded metadata is granular enough to move between various encoding standards.
It is also important to note that with these new browser-based interfaces, the mapping from the description happens behind the scenes, so catalogers will not be working directly in RDF.
Having a granular description is important because it allows for mapping between the “generic” description and any number of encoding standards. By avoiding mapping between the various encoding standards, any issues that arise from metadata being lost from being mapped to a less granular schema can be avoided.
That being said, MARC is not going away any time soon.
First, direct mapping of MARC data to BIBFRAME will be necessary for legacy data. And even after legacy data is mapped, the MARC records will not just disappear.
Second, it is not know if all MARC data will be transformed into linked data. Things like payment or invoice information may be better off not being converted.
Next, there are still a large number of institutions that are not investigating linked data. Many may not feel that they will be able to make the switch for a variety of reasons, and other simply may not want to make the switch.
Finally, as seen in the illustration from the BIBFLOW project conducted by UC Davis and Zepheira, MARC data is used extensively throughout the library environment. It will be extremely difficult to pry MARC out of these workflows.
There has been much discussion about the IFLA Library Reference Model (formerly FRBR LRM), and the conclusions it draws. The document itself was created to resolve inconsistencies between the various FR models. Since the previous models varied in a number of ways, it is not surprising that the LRM model differs from the FRBR model that many catalogers know about.
One area that has been drastically changed is serials. Within the FRBR model, a serial work can have a number of expressions. In practice, these expressions usually involve language editions. And each expression can have a number of manifestations, often corresponding to print and online versions. Within the LRM, each serial work has exactly one expression and exactly one manifestation.
It remains to be seen how those changes will filter down to RDA. There is currently a project underway to update RDA, and part of this project involves restructuring elements and instructions to better align with the LRM. This is the RDA Toolkit Restructure and Redesign Project, or 3R.
Now that we have a basic overview of the serials model within the LRM, we can get at least a minimum understanding of how these changes may impact our cataloging practices.
Within our current MARC environment, RDA instructions based on the LRM could mean drastic changes to the “single record approach” and the “provider neutral record.” Following the single record approach allows for including two or more different kinds of manifestations (usually print and online) of the same expression on a single record. The provider neutral guidelines are used for creating single descriptions for serials that have multiple online manifestations available from different providers by leaving out metadata that is specific to the providers. In both situations, we would likely need to create separate records for each manifestation.
While MARC has some problems, it has served admirably as an exchange format for the library community. RDF is a more recent exchange format, but the two have some common characteristics. Both are easily exchanged and interpreted (remember, RDF is intended to be interpreted by machines), and both are made of up elements (classes and properties in the case of RDF) which have universal and relatively persistent meaning.
However, there are also some differences which may cause some problems for libraries in the future.
Metadata encoded in MARC is normally transferred as a complete record, and contains several fields which describe the record itself. These fields allow for robust version control (i.e., when the record was updated, and who updated it) and provenance (who created the record, how authoritative the metadata is, etc.). Metadata encoded in RDF triples does not have these same benefits. A triple may provide information about an entity, but it does not provide information about itself.
When attempting to discover what kind of information we need to be able to draw from our metadata, it may be useful to consider asking questions of the metadata.
Since RDF by itself may not be able to answer these questions, we need to look elsewhere.
Without going into detail about the specific tools, the questions to our answers may like in the technical infrastructure that is necessary to support a linked data environment. While catalogers have gotten used to understanding the tools that provide the answers to questions we have about our metadata, it may be necessary to look outside of our community to the IT and programming community for help.