Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

Smithsonian Institution Archives
Lynda Schmitz Fuhrig
Why Can’t I Read This File?
Born-Digital Challenges at the Smithsonian Institution Archives
MARAC Fall 2011 presentation

  • Be the first to comment

Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

  1. 1. Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives<br />Lynda Schmitz Fuhrig<br />Mid-Atlantic Regional <br />Archives Conference <br />Fall 2011, Bethlehem, PA<br />
  2. 2.
  3. 3. Smithsonian Institution Archives’ Mission<br /><ul><li> Appraise, acquire, </li></ul>and preserve<br /><ul><li> Offer a range of research and reference services
  4. 4. Create and promote products and services that broaden the understanding of the Smithsonian
  5. 5. Provide professional archival and conservation expertise</li></ul>Above, a collection storage area for the Smithsonian Institution Archives, located on the third floor of Capital Gallery West. Upper left, in 1894 a room on the fourth floor, East Wing of the Smithsonian Institution Building, was converted for use as the Smithsonian Institution Archives. <br />
  6. 6. SI Archives Digital Services Division<br />Curate and preserve born-digital collections<br />Digitize images, video, and audio<br />Research digital preservation issues<br />Promote the archives through web and outreach<br />SIA Accession 11-124<br />
  7. 7. Born-digital records that document the Smithsonian’s history<br /><ul><li> Text
  8. 8. Images
  9. 9. Drawings/CAD
  10. 10. Databases and spreadsheets
  11. 11. Audio
  12. 12. Video
  13. 13. Websites and social media
  14. 14. Email accounts</li></ul>Many part of mixed collection of paper and electronic<br />Removable media <br />or server/ftp transfer<br />SIA Accession 11-281<br />
  15. 15.
  16. 16. SI Archives’ procedures<br /><ul><li>Inspect media
  17. 17. Virus scan
  18. 18. Conduct transfer</li></ul>/ingest with checksums<br /><ul><li> Make copy
  19. 19. Analyze files for </li></ul>formats and issues<br /><ul><li> Convert proprietary </li></ul>files to <br />preservation formats<br />
  20. 20. Current preservation formats<br />MS Word/WordPerfect PDF/A or PDF<br />PowerPoint, Excel PDF/A or PDF<br />GIF, JPG, BMP, etc. TIF<br />Access databases SIARD XML<br />Audio WAV/BWF<br />Websites crawled and captured as WARC <br />Email saved to XML following CERP/EMCAP preservation schema<br />Born-digital video not straight-forward. Different options<br />Digitized video Motion JPG2000 <br />
  21. 21. Tools for processing<br />Open source and proprietary software<br />Jhove, Droid, FITS (FITS is also a format)<br />MediaInfo<br />In-house batch scripts<br />Duke Data Accessioner<br />Evaluating Curator’s Workbench<br />CERP (SIA-Rockefeller Archive Center) parser<br />
  22. 22. Files in disguise<br /><ul><li> No extension – right click to open in Notepad to see coding, especially helpful with WordPerfect
  23. 23. Wrong extension – .doc could be a Word or it could be WordPerfect</li></ul>BMP that is a JPG<br /><ul><li> Complete unknowns that date back 20 years or more</li></ul>Accession 10-052<br />
  24. 24. Older files<br />Gerber <br />PCD (Kodak Photo CD)<br />EXE (Executables)<br />Gerber overlay, by AA7JC, Creative Commons: Attribution-NonCommercial-ShareAlike 2.0 Generic.<br />
  25. 25. DATs (Digital Audio Tapes)<br />Transfer them now, if you can!<br />Machine production ended<br />Tapes susceptible to fungus, other problems<br />DAT recorded in 1990 <br />for the Folk Masters radio program. <br />SIA Accession 06-106<br />
  26. 26. It Says It Is PDF/A<br />Accession 08-149<br />
  27. 27.
  28. 28. But It’s Not PDF/A<br />
  29. 29. Software incompatibility issues<br />
  30. 30. New formats/flavors/technologies<br />Geospatial PDF <br />WWF – PDF that doesn’t print<br />Keep an eye <br />on mobile sites/apps<br />3D scanning and printing <br />- Point clouds<br />
  31. 31. Digital forensics<br />
  32. 32. Resources for formats<br />Sustainability of Digital Formats – Library of Congress<br /><br />Pronom – The National Archives in the UK<br /><br />Unified Digital Formats Registry – Expected date of operation 2012<br /><br />FILExt – File Extension Source<br /><br />TrID – File Identifier<br /><br />
  33. 33. Lynda Schmitz Fuhrig<br />Digital Services Division<br /><br />Smithsonian Institution Archives website:<br /><br />