Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill.

Abstract:
Creation and improvement of tools for digital preservation is a difficult task without an established way to assess any progress in their quality. This happens due to low presence of solid evidence and a lack of accessible approaches to create such evidence. Software benchmarking, as an empirical method, is used in various fields to provide objective evidence about the quality of software tools. However, digital preservation field is still missing a proper adoption of that method. This paper establishes a theory of benchmarking of tools in digital preservation as a solid method for gathering and sharing the evidence needed to achieve widespread improvements in tool quality. To this end, we discuss and synthesize literature and experience on the theory and practice of benchmarking as a method and define a conceptual framework for benchmarks in digital preservation. Four benchmarks that address different digital preservation scenarios are presented. We compare existing reports on tool evaluation and how they address the main components of benchmarking, and we discuss the question of whether the field possesses the right combination of social factors that make benchmarking a promising method at this point in time. The conclusions point to significant opportunities for collaborative benchmarks and systematic evidence sharing, but also several major challenges ahead.

  • Login to see the comments

  • Be the first to like this

Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker

  1. 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarks for Digital Preservation Tools
 
 Kresimir Duretec, Artur Kulmukhametov, Andreas Rauber and Christoph Becker
 Vienna University of Technology and University of Toronto
 
 
 
 IPRES 2015 , Chapel Hill, North Carolina, USA November 3, 2015 1
  2. 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview 1. Need for evidence in digital preservation •Software tools in digital preservation and evidence around their quality 2. Software benchmarks in digital preservation •What, Why, How •Software quality 3. Benchmark proposals •Photo migration •Property extraction from documents 4. Are we ready? Next steps https://goo.gl/ps6x8I In case you get bored
  3. 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview 1. Need for evidence in digital preservation •Software tools in digital preservation and evidence around their quality 2. Software benchmarks in digital preservation •What, Why, How •Software quality 3. Benchmark proposals •Photo migration •Property extraction from documents 4. Are we ready? Next steps https://goo.gl/ps6x8I In case you get bored
  4. 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evidence in digital preservation Important topic !!!
  5. 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evidence in digital preservation Important topic !!!
  6. 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software tools in digital preservation •many tools (>400 tools in COPTR registry http://coptr.digipres.org) •categories •identification, validation & characterisation •migration •emulation •…. •high quality required Droid
  7. 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . However !!! Blog post at OPF by paul on 19th Oct 2012. http://openpreservation.org/blog/2012/10/19/practitioners-have-spoken-we-need-better-characterisation/
  8. 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . However !!! Blog post at OPF by johan on 31st Jan 2014. http://openpreservation.org/blog/2014/01/31/why-cant-we-have-digital-preservation-tools-just-work/
  9. 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . However !!! Panel at IPRES 2014
  10. 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evidence of software quality in digital preservation •the need for evaluation is not new ! •testbeds and numerous individual evaluation initiatives •often take-up is missing 2002 2006 2010 Dutch Digital Preservation Testbed DELOS Digital PreservationTestbed Planets Testbed ?
  11. 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview 1. Need for evidence in digital preservation •Software tools in digital preservation and evidence around their quality 2. Software benchmarks in digital preservation •What, Why, How •Software quality 3. Benchmark proposals •Photo migration •Property extraction from documents 4. Are we ready? Next steps https://goo.gl/ps6x8I In case you are already bored
  12. 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software benchmarks •Proposal : Start creating software benchmarks to extend the evidence base around software quality of digital preservation tools •major characteristics •collaborative •open •public
  13. 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software benchmarks - definition Source Definition key words SPEC glossary “a "benchmark" is a test, or set of tests, designed to compare the performance of one computer system against the performance of others” test, compare, performance IEEE Glossary “1. a standard against which measurements or comparisons can be made. 2. a procedure, problem, or test that can be used to compare systems or components to each other or to a standard. 3. a recovery file” measurements, comparison, test W. Tichy “a benchmark is a task domain sample executed by a computer or by a human and computer. During execution, the human or computer records well-defined performance measurements. A benchmark provides a level playing field for competing ideas, and (assuming the benchmark is sufficiently representative) allows repeatable and objective comparisons” task domain sample, performance measurements, repeatable objective comparison Sim et al. “a test or set of tests used to compare the performance of alternative tools or techniques” test, compare
  14. 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software benchmarks - definition Source Definition key words SPEC glossary “a "benchmark" is a test, or set of tests, designed to compare the performance of one computer system against the performance of others” test, compare, performance IEEE Glossary “1. a standard against which measurements or comparisons can be made. 2. a procedure, problem, or test that can be used to compare systems or components to each other or to a standard. 3. a recovery file” measurements, comparison, test W. Tichy “a benchmark is a task domain sample executed by a computer or by a human and computer. During execution, the human or computer records well-defined performance measurements. A benchmark provides a level playing field for competing ideas, and (assuming the benchmark is sufficiently representative) allows repeatable and objective comparisons” task domain sample, performance measurements, repeatable objective comparison Sim et al. “a test or set of tests used to compare the performance of alternative tools or techniques” test, compare DP Software Benchmark is a set of tests which enable rigorous comparison of software solutions for a specific task in digital preservation
  15. 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarks in other fields •Software Engineering •transactions •virtualisation •databases •… •Information Retrieval •text •video •music •…
  16. 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarks in other fields community specifications tools & methods that need benchmarks yearly meeting resultsiterative process
  17. 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmark components Software engineering Motivating comparison Task sample Performance measures
  18. 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmark components Information retrieval Tasks Datasets Answersets Measures Representation formats
  19. 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmark components Software engineering Motivating comparison Task sample Performance measures Information retrieval Tasks Datasets Answersets Measures Representation formats Digital preservation Motivating comparison Function Dataset Ground truth Performance measures
  20. 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivating comparison Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •What is benchmark supposed to compare ? •Which quality aspects are addressed ? •What would be the benefits ?
  21. 21. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivating comparison Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •software quality •many aspects •we need a model •ISO SQUARE
  22. 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •specific task tool is expected to perform
  23. 23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dataset Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •set of digital objects •images •audio •video •relevant •accurate •complete
  24. 24. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ground truth Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •correct answers •optional element •machine readable
  25. 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance measures Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •fitness of the benchmarked tool •type •quantitative •qualitative •calculated •by human •by machine
  26. 26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview 1. Need for evidence in digital preservation •Software tools in digital preservation and evidence around their quality 2. Software benchmarks in digital preservation •What, Why, How •Software quality 3. Benchmark proposals •Raw Photo migration •Property extraction from documents 4. Are we ready? Next steps https://goo.gl/ps6x8I In case you are bored
  27. 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RAW Photo migration benchmark Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •various RAW file formats •migrate to DNG •many tools should be tested •migrate from RAW to DNG •collection of photographs in RAW format •SSIM
  28. 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Property values extraction benchmark Digital preservation Motivating comparison Function Dataset Ground truth Performance measures •different document properties •coverage of needed properties •correctness of extracted values •extract property values from a document •collection of documents in MS Word 2007 file format •correct values •coverage •accuracy •exact match ratio
  29. 29. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview 1. Need for evidence in digital preservation •Software tools in digital preservation and evidence around their quality 2. Software benchmarks in digital preservation •What, Why, How •Software quality 3. Benchmark proposals •Raw Photo migration •Property extraction from documents 4. Are we ready? Next steps https://goo.gl/ps6x8I In case you are bored
  30. 30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benchmarking - community driven process •benchmarks emerge from the community consensus •community participation needs to be enabled •based on established results where possible •community needs to incur the costs •continuous evolution of benchmarks is necessary •Are we ready ?
  31. 31. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Is the community ready ? Indicators •projects •workshops •own conference •several other journals and conferences • concerns •vocabularies •metrics
  32. 32. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion
  33. 33. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Next step create benchmark specifications Benchmarking forum workshop Friday 6th November 9:00 - 17:00
  34. 34. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Questions ? https://goo.gl/ps6x8I In case you were not bored

×