This document summarizes Tim Donohue's presentation on digital preservation at IDEALS, a digital repository at the University of Illinois. It begins with the initial goals of IDEALS to preserve and provide access to digital scholarship. However, they quickly realized challenges around infrastructure, expertise, and resources. The presentation outlines IDEALS' process of bringing in a preservation librarian, training, and assessing their needs against standards like OAIS. It describes IDEALS' policies and procedures for digital preservation, including format support documentation and activities at different levels of support. The document acknowledges gaps in IDEALS' preservation efforts and closes by revisiting their original goals.
Making communications land - Are they received and understood as intended? we...
Digital Preservation in the Wild
1. DIGITAL PRESERVATION IN
THE WILD
Tim Donohue
Research Programmer - IDEALS
University of Illinois
(with many thanks to Sarah Shreeves)
CARLI Digital Preservation Forum – July 21, 2009
3. IDEALS: the “dream”
“Create a reliable and easy to use repository service
to preserve, manage, and provide persistent and
widespread access to the digital scholarship faculty
and students now produce…”
4. IDEALS: the “dream”
“Create a reliable and easy to use repository service
to preserve, manage, and provide persistent and
widespread access to the digital scholarship faculty
and students now produce…”
Can we
What’s it mean
BUT… preserve
to preserve this everything?
stuff? What kind of
infrastructure?
What kind What kind of
of expertise resources?
do we need?
5. IDEALS: the initial reality
Backup tapes stored
next to the server!
Not Really Our Server Room!
6. 1. Brought in
Preservation Librarian
2. Training and self
education
3. Assessment of where
we were and where
we needed to go
7. The Foundation
Open Archival Information System (OAIS) Model
http://public.ccsds.org/publications/archive/650x0b1.pdf
Image borrowed from the ICPSR Digital Preservation Tutorial:
http://www.icpsr.umich.edu/dpm/
8. The Foundation II
TRAC (Trustworthy Repositories Audit & Certification)
http://www.crl.edu/PDF/trac.pdf
Documentation
Organizational Infr.
Transparency
Digital Object Mgmt
Technical Infr. & Security
Adequacy
Measurability?
9. The Digital Preservation Platform
Image borrowed from the ICPSR Digital Preservation Tutorial:
http://www.icpsr.umich.edu/dpm/
10. From Dorothea Salo. 2009. Institutional repositories for the digital arts and
Humanities. Humanities Digital Curation Institute. Champaign IL. May 2009.
http://www.slideshare.net/cavlec/digital-preservation-and-institutional-repositories
11. “Preservation” needs to be
unpacked.
Not about the technology.
Explicitness is key.
You don’t have to preserve
everything to the fullest
extent if you say you aren’t.
12. The 5 Stages of Preservation
Denial / Ignorance
Anger / Fear
Bargaining
Depression
Acceptance & Hope
Based on the Kübler-Ross five stages of grief:
http://en.wikipedia.org/wiki/K%C3%BCbler-Ross_model
13. Denial / Ignorance
backups**
** - This service is entirely fictional
Again, Not Really Our Server Room!
16. Depression
How can we
This is too ever preserve
hard. everything?
Why even
We don’t have try?
the resources
for this.
17. Acceptance & Hope
We can take small steps to…
Preserve some things locally
Develop policies (say what you do)
Enact policies via procedures (do what you say)
Work with others on best practices to preserve the rest
18. The Principles of Preservation
(1) Say what you do…
(2) Do what you say…
Based on: Sarah Shreeves. 2009. Saying what we do – Doing what we say: Preservation
Issues (Metadata and Otherwise) in Institutional Repositories. American Library Association
Conference. Chicago IL. July 2009.
19. IDEALS - Saying what we do
Secured explicit administrative support and commitment
for digital preservation management program in IDEALS.
http://hdl.handle.net/2142/135
Developed high level preservation policy:
http://hdl.handle.net/2142/2383
Developed actionable procedures and policies that can
be reassessed and changed as needed
Began next stage of identifying & documenting gaps
20. IDEALS Preservation Support Policy
Format-based,
Low Confidence (gray area)
“Categories of Support”
Openly Documented
High Confidence
Full Support
No Embedded
Medium Confidence Widely Adopted
Content or DRM
No migration promised
Low Confidence
“Bit-level” support only Uncompressed or
Widely Supported
Lossless Compression
https://services.ideals.uiuc.edu/wiki/bin/view/IDEALS/PreservationSupportPolicy
21. IDEALS Format Support Matrix
Compilation of “known” formats
Concentration on textual formats
Microsoft Office OpenOffice.org, HTML
Proprietary Open
Limited OpenOffice.org Microsoft Office, HTML
Adoption Widely Adopted
Limited Microsoft Office Adobe PDF, HTML
Widely Supported
Support
Embedded MS Powerpoint (w/ Audio or Video) MS Powerpoint
Nothing Embedded
Content / DRM
Lossy JPEG TIFF, JPEG 2000 No/Lossless
Compression Compression
22. IDEALS Format Recommendations
Textual Images
CSV, Text, PDF/A, XML, TIFF, JPEG 2000
Open Document Format
RTF, MS Office, PDF, HTML GIF, JPEG, PNG
Audio Video
AIFF, WAVE, Ogg Vorbis AVI, Motion JPEG 2000
AAC, MP3, Real, WMA MP2, MP4, Quicktime, WMV
High Confidence / Preference
Medium Confidence / Preference
https://services.ideals.uiuc.edu/wiki/bin/view/IDEALS/FormatRecommendations
23. IDEALS – Doing what we say
Basic Activities (All Items: )
Regular Virus Scans, Checksum verification
Nightly off-campus backups
Refresh storage media
Preservation Metadata (minimal)
Format, checksum, file size, etc.
Permanent Identifiers (Handles)
Always keep the original document
Monitoring and reassessment of formats
24. IDEALS – Doing what we say
Intermediate Activities ( )
Additional monitoring, more frequent reassessment
When possible, attempt to migrate formats to preserve
content and style (hopefully)
No promises that functionality will be preserved
(e.g.) Powerpoint PDF (possible functionality loss)
(e.g.) PDF 1.4 PDF/A (possible style loss)
25. IDEALS – Doing what we say
Full Support Activities ( )
Additional monitoring, more frequent reassessment
When necessary, migrate document to successive
format.
Attempt to preserve content, style and functionality
(e.g.) PDF/A successor to PDF/A
26. Our First Preservation Problem…
Character issues in Word
(and PDF)
Found by chance
Consultation with
submitter
Caused by conversion to
Word (from Wordperfect)
Resubmitted as RTF
27. We Acknowledge our Gaps
Not checking format
validity (yet)
Minimal metadata
collection
Not checking files for
problems (besides
viruses)
Not checking every
automated conversion
28. Back to that “dream”?
“Create a reliable and easy to use repository service
to preserve, manage, and provide persistent and
widespread access to the digital scholarship faculty
and students now produce…”
Total Items: 11,500 Total Downloads: 870,000+
30. Contact Info
Tim Donohue
University of Illinois
tdonohue@illinois.edu
http://www.ideals.uiuc.edu/
http://www.ideals.uiuc.edu/wiki/
This work is licensed under a Creative Commons Attribution-
Noncommercial 3.0 United States License