08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Digital Preservation in Perspective: How far have we come, and what's next
1. Digital Preservation in Perspective:
How far have we come, and what's next?
Jeff Rothenberg
March 26, 2012
Color photo by Jeff Rothenberg
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24
2. A brief history of digital preservation
• Early statements of the problem
– Jay Bolter, Margaret Hedstrom, David Bearman
– Avra Michelson’s & my 1992 American Archivist paper
– My 1995 Scientific American article
– Into the Future film (CLIR, 1997; shown on PBS)
– Tora Bikson’s & my 1999 report for the Dutch National Archives
• Gradual recognition of the problem
– By librarians, archivists, modern museum curators
– But without much technological depth of understanding in most cases
– OAIS Preservation Planning assumed migration, though admits problems
• Some experiments & demonstrations
– U. Leeds & U. Mich: CEDARS & CAMiLEON projects; BBC Domesday Book
– Dutch National Archives Testbed: migration & UVC “data archiving”
– UCSD Supercomputing Center & NARA: formalisms (e-mail only)
– Guggenheim “ErlKing” renewal project
– Dutch Royal Library (KB): Dioscuri emulator & eDepot
• Few serious attempts at implementation
– Most implementations essentially ignore long-term preservation
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 0
3. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 1
4. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 2
5. Color photo by Jeff Rothenberg
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 3
6. Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Responses
• Distinctions across disciplines
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 4
7. What should preservation mean?
“The goal of digital preservation is the accurate rendering of authenticated
content over time.”
—ALA “medium” definition
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 5
8. Preserve originals as well as “vernacular renditions”
The Canterbury Tales
Original Vernacular Rendition
Whan that Aprill, with his shoures soote When in April the sweet showers fall
The droghte of March hath perced to the roote That pierce March’s drought to the root and all
And specially from every shires ende And specially from every shire’s end
Of Engelond, to Caunterbury they wende, Of England they to Canterbury went,
The hooly blisful martir for to seke The holy blessed martyr there to seek
That hem hath holpen, whan that they were seeke. Who helped them when they lay so ill and weak
• Used by scholars for serious research • Used by non-scholars for casual research
• Used to generate & evaluate vernacular renditions • May be used by scholars for research as well
• Accessed by non-scholars for aesthetic purposes • Not thought of as a preservation copy
(with help, e.g., see below) • Not used as a source for later vernacular
renditions
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 6
9. A particular “view” of information may be crucial
Example: Space Shuttle O-ring damage vs. temperature
Prior to Challenger
3 1
Levels of
2 1
O-ring
damage 1 1 1 1 2
0 1 3 1 1 2 1 1 1 2 1 1 1 1
53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 80 81
Temperature °F
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 7
10. Revealing View of Space Shuttle O-ring Data
Extrapolation of damage curve to the 31o F
temperature forecast for Challenger’s
launch on January 28, 1986.
Dots indicate temperature and O-ring damage for 24
successful launches prior to Challenger. Curve shows
that increasing damage is related to cooler temperature.
3 3
2 2
1 1
0 0
30o 35o 40o 45o 50o 55o 60o 65o 70o 75o 80o 85o
Temperature oF
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 8
11. Furthermore, many digital artifacts are inherently digital
• Inherently digital artifacts are those whose perceptibility, meaning, or
usability arise from and rely on their being encoded in digital form
• They cannot be meaningfully represented as page images
– Doing so loses essential aspects of their contents and/or behavior
• Examples include dynamic, active or interactive artifacts
– Multimedia (e.g., web pages, CD-ROM publications, Ph.D. dissertations)
– Dynamically generated (e.g., JavaScript, cgi, ASP or PHP web pages, Servelets)
– Active presentation (e.g., animation, simulation, virtual reality)
– Interactive (e.g., applets, interactive virtual reality, games)
– Digital artwork
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 9
12. What you see is not what you get
V2.24 ERwin
if
%JoinPKPK(oldrows,newrows,” <> “,” or “)
then
select count(*) into numrows
from %Child
where
%JoinFKPK(%Child,oldrows,” = “,” and”);
if (numrows > 0)
then
signal parent_updrstrct_err
end if;
end if;
if
%JoinPKPK(oldrows,newrows,” <> “,” or “)
then
update %Child
set
%JoinFKPK(%Child,newrows,” = “,”,”)
where
%JoinFKPK(%Child,oldrows,” = “,” and”);
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 10
14. In fact, every digital artifact is a program
• A program
– Is a sequence of commands in some formal language
– That is intended to be interpreted
– By an interpreter that understands that language
• An interpreter
– Is an active process
– That knows how to perform commands
– Specified in a given formal language
• Interpretation ultimately involves hardware
– ASCII codes are rendered by a printer or display
– More complex entities are interpreted by software (applications)
– But all software is ultimately interpreted by hardware
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 12
15. Digital information promises to last better than analog
• Digital objects do not decay, fade, tear, crumble, dissolve, etc.
– Their media may, but not the bits themselves
• A bitstream lasts forever
– Producing exactly the same behavior, without loss (at least in principle)
– So long as it can be interpreted correctly
• But interpreting a bitstream correctly requires software
– And software must be run on hardware (a computer)
– A computer is (ultimately) an analog device, that does decay
– And both hardware and software become obsolete, long before they decay
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 13
16. So the best we can say is...
“Digital objects last forever — or five years, whichever comes first”
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 14
17. So the best we can say is...
“Digital objects last forever — or five years, whichever comes first”
min ( ∞ ,5)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 15
18. Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Responses
• Distinctions across disciplines
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 16
19. Levels of awareness of the problem
(by disciplines/institutions/individuals)
• Innocence
• Awakening
• Analysis
• Looking under the streetlamp
• Experimentation/Demonstration
• Where are we now?
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 17
20. Innocence
• Why should digital artifacts be any different?
– Preservation is preservation, isn’t it?
• Except for media obsolescence
– Isn’t this just analogous to medieval monks copying manuscripts?
• Digital artifacts don’t decay or change
– Isn’t this a dream come true for preservationists?
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 18
21. Awakening
• Digital poses unique problems
– Media obsolescence
– Description (unique and complex attributes)
– Cataloging (ephemeral reference, links)
– Metadata (unique requirements)
– Format/encoding (interpretation, conversion, corruption)
– Future rendering (in the face of obsolete software and hardware)
• Digital preservation must be proactive
– Over relatively short timeframes (5 years?)
– Otherwise artifacts are likely to be irretrievably lost
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 19
22. Analysis
• Digital artifacts
– What are their essential characteristics for preservation?
• Authenticity
– What does this mean for digital artifacts?
• Rendering
– How can we guarantee proper (or any) rendering in the future?
• Preservation
– What does (should) this mean for digital artifacts in various disciplines?
• Costs
– What are the up-front and long-term costs of digital preservation?
– How should these costs be paid and by whom?
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 20
23. Looking under the streetlamp
• Metadata
– Dublin Core, etc.
– Depends on the nature of digital artifacts & technical preservation schemes
• Reference models
– OAIS
– Premature in the absence of viable technical preservation schemes
• Institutional process models
– Premature in the absence of defined, viable technical preservation schemes
– May tend to lock in approaches that are not viable
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 21
24. The Open Archival Information System Reference Model
(OAIS)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 22
25. Experimentation/Demonstration
• BBC Domesday Book / CAMiLEON Project
– Early warning of the need for timely, extreme action
– Demonstrated the potential of hardware emulation
• Dutch Archives Testbed
– “Discovered” that migration is very hard (duh!)
• Other emulation examples
– Apple’s M68000 emulator for PowerPC
– U. Warwick’s EDSAC emulator
– Emory U’s MARBL collection
– Guggenheim: Renewing the ErlKing
– KB’s Dioscuri Emulator
• PLANETS, KEEP
– Continuing to explore technically viable approaches
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 23
26. The BBC Domesday / CAMiLEON Project
Emulated at the University of Leeds, U.K. (2002)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 24
27. EDSAC: the first electronic digital computer
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 25
28. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 26
29. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 27
30. Renewing the ErlKing
• An interactive mixed-media video experience
– By Roberta Friedman and Grahame Weinbren
– That overlays text and graphics on video content
– And branches in response to user touchscreen input
• Highly innovative when created in 1982
– Pushed the limits of affordable computers and video display
– Included a custom-built “authoring” environment
– Widely exhibited in major museums and other venues
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 28
31. The ErlKing in the Guggenheim’s “Seeing Double” Show
(March 18, 2004)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 29
32. KB’s Dioscuri Emulator
Running my 1982 Calendar/1 Program
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 30
33. Where are we now?
• Somewhere between 4 and 5
– Looking under the streetlamp
– Experimentation/Demonstration
• Few end-to-end implementations
– Except for page-image artifacts (e.g., LOCKSS, Portico)
– And KB eDepot
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 31
34. Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Responses
• Distinctions across disciplines
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 32
35. Responses
• Denial
– What problem?
• Wishful thinking
– Deus ex machina
• Misguided efforts (IMHO)
– Digital garden paths
• Facing reality
– What will it take?
• Where are we now?
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 33
36. Denial
• Just save bits
– And hope for the best (let our grandchildren worry about it)
• Expect commercial sector solutions
– Microsoft, IBM, etc. will save us
• Popular formats will live forever or auto-migrate
– (What the ancient Egyptians thought)
• Convergent formats like HTML and XML solve everything
– But these are really just “scaffold” formats embedding others
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 34
37. Preservation approaches
• Save and run obsolete hardware and software
– In “computer museums”
– To read documents by running the original programs that created them
• Rely on universal, formal description of logical formats
– To allow interpreting those formats in the future
– Thereby correctly rendering saved digital artifacts
• Rely on standards and migration
– Expect new programs to read old documents in enduring standard forms
– Convert documents from old standards to new ones as standards evolve
• Rely on emulation of obsolete hardware to run saved software
– Requires no migration or conversion (aside from media)
– Saves originals in original form
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 35
38. Wishful thinking
• Metadata is all we need
– Describe formats, behavior, etc.
• Format migration
– The game of “telephone”
• Formal encoding (UCSD/NARA-ERA)
– Maybe someday
• Rely on future cryptography
– Counterexample: Hieroglyphics
• Digitize to preserve
– e.g., Shoah
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 36
39. Misguided efforts (IMHO)
• Focus on short-term preservation
– Urgent enough to preclude long-term focus (e.g., JSTOR?)
• Reject emulation without understanding it
– Seems like smoke and mirrors
• LC, NARA-ERA
– Full speed ahead and damn the technical realities
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 37
40. Facing reality
• Technological issues
– For “inherently digital” artifacts (which will become more prevalent)
• Defining/preserving “digital originals”
– Retaining original rendering & behavior
– Enabling repeated “vernacular extraction” of surrogates
• Comparative cost analyses
– Informed by technological understanding
– Looking at overall lifecycle costs
• Realistic process models
– Based on technologically viable approaches
• Facing long-term issues (KB/IBM-NL eDepot)
– Loss of metadata
– Partial loss or corruption of archival information package indexes
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 38
41. Current implementation efforts
• NARA’s ERA project
– Ill-conceived: assumed a solution would magically appear
• LC still seems somewhat aimless
– Lost half their NDIIP funding after 2006 (some since restored)
• Most so-called “archiving” efforts ignore preservation
– LOCKSS, Portico (journal archiving) offer no real preservation
– Internet Archive seems based on wishful thinking
• BL proceeding rationally
– Pursuing a broadly-based, intelligent strategy
• KB may still be in the lead
– eDepot designed to address long-term preservation
– Using a two-pronged migration/emulation approach
– Planets & KEEP projects continuing to explore longer-term issues
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 39
42. Where are we now?
• Still at 1?
– Denial
• Somewhere between 2 and 4?
– Misguided efforts
– Facing reality
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 40
43. Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Responses
• Distinctions across disciplines
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 41
44. Distinctions across contexts
• Disciplines: Libraries, Archives, Museums
– Archives: preserve “record” value
– Libraries: preserve[/contextualize] content/rendering
– Museums: preserve/recreate/contextualize experience
• Institutions: National, Commercial, NGO
– Commercial: film industry, petrochemical, pharma
(core vs. ancillary assets)
– Shoah Fndn (Spielberg): http://dornsife.usc.edu/vhi/preservation
• Individuals
– Mostly not yet begun
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 42
45. Outline
• What should we mean by digital preservation?
• Levels of awareness of the problem
• Responses
• Distinctions across disciplines
• Remaining challenges
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 43
46. Remaining challenges
• Integrate true long-term perspective
– Render “inherently digital” artifacts
– Recognize the executability of all digital artifacts
– Preserve digital originals and facilitate “vernacular renditions”
• Engage the Computer Science (ICT) field
– Conference sessions, working groups, etc.
• Perform serious cost and process analyses
– Based on viable technological approaches
• Try some small-scale “end-to-end” demonstrations
– Long-term focus
– Inherently digital artifacts
– Preserve digital originals and produce “vernacular renditions”
– Develop and test realistic process models
– Instrument, measure, and evaluate:
- Authenticity, quality, accessibility, usability, cost
- Effort, scalability, reproducibility (of process)
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 44
47. Expected cost & effectiveness comparisons
archaeology
formalizatio
standards
emulation
migration
H,M,L: High, Med, Low
viewers
+,- : Frequent, Rare
Cost:
Per-approach (x 1)
Create EVM or formalism 0 H/- 0 0 0 H/-
Per-platform (x 10)
Create H/W emulators 0 0 0 0 0 H/-
Port to new platforms 0 L/- M/- H/- M/- M/-
Per-format (x 1000)
Reverse-engineer 0 H/- H/- H/+ H/+ 0
Obtain necessary S/W 0 0 0 M/+ M/- L/+
Per-artifact (x 100,000,000)
Process at Ingest 0 H H 0 0 L
Convert over time 0 M/- H/- H/+ H/+ 0
Access H M L L L L
Effectiveness:
On each artifact L M M M M H
% of formats handled L L L M L H
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 45
48. References for Jeff Rothenberg
http://www.JeffRothenberg.org
jeff@JeffRothenberg.org
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 46