Join this discussion on the benefits and process of harvesting to aggregators such as DPLA, Europeana and other aggregators. Through case studies we'll outline three stages of the process, including 1) mapping, migrating, and normalizing data in open source digital repositories, 2) making use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI - PMH), and 3) reaping the benefits of increased exposure. Presenters welcome lively discussion and questions from participants of all technical backgrounds and skill levels.
2. Introductions
Erin Tripp, Bus. Dev.
Staff librarian since 2011.
Erin delivers Islandora
training at events worldwide
and has managed more than
40 digital repository projects.
Contact Details
● Email: erin@discoverygarden.ca
● Twitter: @eeohalloran or @discgarden
● Hashtags: #islandora #ALAAC16
3. Agenda
Objectives Overview
By Show of Hands & Introductions
Why Should We Care?
Repository Requirements
OAI-PMH Overview
Case Studies
Top Takeaways
4. Objectives for
Today
Learn a thing or two about:
● OAI-PMH
● Common Harvesters
● Who to ask for help
● What questions to ask
● Confidence to continue
learning/ try a new tool
5. By Show of
Hands...
Who is interested in
● National Harvester,
● State Harvester,
● Subject Harvester, or
● Proprietary Discovery Service
Harvester?
Who has already been involved in a
harvesting project?
Who has experience using
● XLSTs
● OAI-PMH
● REPOX?
7. Why should we care? Discoverability.
February 2015 LITA panelists said Top Technology Trends include enhancing
discoverability (Enis, 2015)
Making content accessible where the search originates (e.g. Google, Google
Scholar, WorldCat, DPLA, Europeana) creates value for digital libraries and
users
Repositories contributing to aggregators can experience increased site visits
from 55-109 per cent (DPLA, n.d.)
8. Why should we care? Discoverability.
Increased exposure through
● Blogs, social media and Wikipedia,
Provide richer context and increase the visibility of your collections
Make your collections available for re-use by other services (Europeana, n.d.)
Access to valuable skills
Data modelling
Copyright and licensing
Reporting on access usage analytics (Europeana, n.d.)
9. Why should we care? Discoverability.
Using open source
Linking up to thousands of other collections
Interoperable (no vendor lock in/ proprietary formats)
Access to Wikimedia Commons (Europeana, n.d.)
Expanding your network
Connect with like-minded industry professionals
Identify potential partners and joint funding opportunities
Reach out to other sectors – creatives, education, tourism and more (Europeana, n.d.)
10. Why should we care? Discoverability.
Anecdotally, repository harvest can:
● Act as incentive for people to deposit content into
the repository / buy-in from stakeholders
● Clean up and normalize metadata resulting in better
raw material to support discovery
12. OAI-PMH
Open Archives Initiative Protocol
for Metadata Harvesting (OAI-
PMH)
Low-barrier mechanism for
repository interoperability
OAI-PMH is a set of six requests
(aka verbs or services) that are
invoked within HTTP
13. Providers
Data Providers are repositories that
expose structured metadata via
OAI-PMH
= Repository
Service Providers then make OAI-
PMH service requests to
harvest that metadata
= Harvester
14. Vocabulary
Request/ Verb/ Service
The action that the service
provider (harvester) is
requesting from the data
provider (repository)
Response Size
The maximum number of
records to issue per
response
15. Vocabulary… continued
Resumption Token
When a request returns records greater than the response size a
resumptionToken is issued such that the service provider can resume
harvesting from where it left off
Identify
This request used to retrieve information about a repository. Some of the
information returned is required as part of the OAI-PMH.
Example: YourSite/oai2?verb=Identify
16. Vocabulary… continued
ListMetadataFormats
This request is used to retrieve the metadata formats available from a
repository.
Example: YourSite/oai2?verb=ListMetadataFormats
ListRecords
This request is used to harvest records from a repository. Optional arguments
permit selective harvesting of records based on set membership and/or
datestamp.
Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc
17. Vocabulary… continued
ListSets
This request is used to retrieve the set structure of a repository, useful for
selective harvesting
All Collections Example: YourSite/oai2?verb=ListSets
Specific Collection Example:
YourSite/oai2?
verb=ListRecords&metadataPrefix=oai_dc&set=ir_citationCollection
18. Repository
Requirements
Accessible to the web
Storing standards, XML-based
descriptive metadata
The ability to apply additional
metadata mapping if needed
(rather in or external to
repository)
Access to documentation and
XSLTs used for metadata
mapping
19. Repository
Requirements
Pass XML metadata to service
provider from the:
1. Preservation (storage)
component or
2. Discovery (index)
component
Provide a method to harvest a TN
and link back to repository
Accommodate customization
20. Repository Requirements … Continued
For example: University of South Carolina
video content model is tiered for preservation,
media production and
streaming web access. We only want to
harvest one of three possible records
22. Europeana
Our material comes from all over
Europe and the scope of the
collections is really quite
astonishing. [...]
http://www.europeana.eu/
http://pro.europeana.eu/
23. Intermediate
Aggregator
Digibess repo stores digitized
objects from 18 Economic and
Social Sciences libraries in Italy
Europeana requires an intermediate
aggregator; a national harvester
such as Cultura Italia
Cultura Italia harvests custom
“Pico” metadata format from
Digibess and then is harvested by
Europeana
24. Harvesting Tools
Digibess pre-dated Islandora OAI
module and REPOX
aggregator
Used Proai servlet
oaiprovider-1.2.2
Harvest resulted in examining in
general needs and specific
applications of the protocol
26. REPOX
Since the Digibess project a new
intermediate aggregator has been
released called REPOX.
It aims to provide [...] Europeana
partners a simple solution to import,
convert and expose their
bibliographic data via OAI-PMH
http://repox.sysresearch.org/
28. DPLA
The Digital Public Library of
America brings together the riches
of America’s libraries, archives, and
museums, and makes them freely
available to the world.
https://dp.la/info/
29. Service Hub
Empire State Digital Network
(ESDN) is the New York State
service hub for the DPLA
Hosted and administered by the
Metropolitan New York Library
Council in conjunction with eight
allied regional library councils
working collectively in New York
State as the ESLN
Liaise with partners for data
aggregation, mapping and licensing
30. Mapping &
Testing
Harvests from partners using OAI-
PMH
o Provides all partner metadata to
DPLA through one OAI-PMH feed
from REPOX
Undertakes data review and QA
prior to exposing feed to DPLA for
harvest
33. Other Discovery
Services
WorldCat, Summon, & Primo are
commercial discovery services
Local discovery layers can also
collocate resources for discovery
OAI -PMH modules within your
repository framework can allow for
these services to harvest your
repository
34. Everyone is
Harvesting
Everyone
Connecticut State Library
aggregating data to Research It
State Library harvests University of
Connecticut
Archives and Special Collections,
ILS and other
University of Connecticut Library
harvests to Summon/ Primo and will
be harvested by DPLA
35. Creating Lots of
Portals
University of Connecticut Library
started harvesting in mid 2014
Notable increases in access to
digital content since harvest (one of
many factors)
Access statistics available at
CTDA Statistics
38. Top Takeaways
-
Data Providers
● Server Load/ Application Load
● Permissions / Copyright
● Relationships with Service
Providers
● Repository Buy-in
● Increased Discovery
● Metadata Normalization
39. Top Takeaways -
Service
Providers
● Knowledge of
○ XSLT,
○ OAI-PMH, and
○ Metadata Schema Knowledge
(DC, MODS, QDC, MARC XML)
● Technical staff to set-up and
maintain the aggregator & write
scripts to transform harvested
metadata
● Relationships with Data
Providers
41. Discussion
● What are your biggest
challenges?
● What Resources do you find
helpful?
● What was your AH HA!
moment?
● What was most useful in this
presentation?
43. Demonstration
To follow along or try it at home,
navigate to….
http://sandbox.discoverygarden.ca/
OR
http://islandora.ca/downloads
Click Islandora > Islandora Utility
Modules > Islandora OAI