SlideShare a Scribd company logo
1 of 75
Download to read offline
Software curation as a digital
preservation service
Euan Cochrane
Yale University Library
Keith Webster
Dean of University Libraries
@cmkeithw
@euanc
Software  curation  –  why?
April 1, 2015 3
Archiving Static Content
April 1, 2015 4
What About Executable Content?
Games
April 1, 2015 5
What About Executable Content?
Application-
specific
contentGames
WordPerfect 1.0 doc
Can you read it today?
100 years from now?
Original Wang doc
Can you read it today?
100 years from now?
Simulation model
Can you re-run old
model with new data?
Useful	
  knowledge
Sharable	
  
knowledge
• We have spent 20 years converting material to
digital form, establishing standards and protocols,
and looking after it
We also have a track-record in curating born-digital content
And some of us are making progress with social media products
• The rapid development in computing
technology and the Internet have opened up
new applications for the basic sources of
research — the base material of research data
— which has given a major impetus to
scientific work in recent years.
• Access to research data increases the returns
from public investment in this area; reinforces
open scientific inquiry; encourages diversity of
studies and opinion; promotes new areas of
work and enables the exploration of topics
not envisioned by the initial investigators.
• The value of data lies in their use. Full and
open access to scientific data should be
adopted as the international norm for the
exchange of scientific data derived from
publicly funded research.
What about the products of research?
The data may still be discoverable and accessible - but
executable?
Data come in different forms, shapes and sizes
Opera5ng	
  System	
  Usage	
  Over	
  Time
0.00%
20.00%
40.00%
60.00%
80.00%
2003 2006 2009 2012 2015
Win8
Win7
Vista
Win2003
Older	
  Win
WinXP
W2000
Win98
Win95
WinNT
Linux
Mac
Mobile
Why?  –  Software  dependent  content
Old  software  is  required  to  authentically  
render  old  content
Original	
  content	
  in	
  original	
  software	
  
(WordPerfect	
  in	
  Windows	
  95)
Original	
  content	
  in	
  newer	
  software	
  
(LibreOffice	
  Writer	
  in	
  Windows	
  
Vista)
Research  results  are  at  risk  of  loss  without  
original  software
Original	
  content	
  in	
  original	
  software	
  	
  
(WordStar	
  for	
  DOS	
  in	
  Microsoft	
  DOS)	
  
[NB:	
   equation	
   predicting	
   tree	
   growth	
   rates	
   includes	
  
exponents	
  documented	
  using	
  upper	
  line	
  of	
  text]
Original	
  content	
  in	
  newer	
  software	
  	
  
(LibreOffice	
  Writer	
  in	
  Windows	
  Vista)	
  
[NB:	
  equation	
  layout	
  and	
  meaning	
  changed]
Why?  –  Software  dependent  
content
• We	
  need	
  to	
  curate	
  and	
  preserve	
  operating	
  systems	
  to	
  support	
  access	
  to	
  assets	
  that	
  depend	
  on	
  them	
  
• We	
  need	
  to	
  curate	
  and	
  preserve	
  software	
  applications	
  to	
  support	
  access	
  to	
  content	
  that	
  depends	
  
on	
  them	
  
• We	
  need	
  to	
  create	
  and	
  preserve	
  fonts,	
  scripts,	
  plug-­‐ins	
  and	
  other	
  dependencies	
  to	
  support	
  
access	
  to	
  content	
  that	
  requires	
  them	
  
• We	
  need	
  to	
  preserve	
  whole	
  desktop	
  environments	
  (e.g.	
  Salmon	
  Rushdie’s	
  desktop	
  at	
  Emory	
  
university)	
  to	
  support	
  access	
  to	
  the	
  experience	
  of	
  interacting	
  with	
  it	
  
• We	
  need	
  to	
  curate	
  and	
  preserve	
  pre-­‐configured	
  disk	
  images	
  with	
  software	
  already	
  installed	
  on	
  
them	
  –	
  for	
  running	
  on	
  emulated	
  hardware
Software  Curation  –  
How?
How?  –  Emulation/Virtualization  
• An	
  emulation	
  software	
  package	
  
(“emulator”)	
  is	
  used	
  to	
  create	
  a	
  virtual	
  
version	
  of	
  one	
  computer	
  within	
  another	
  
computer	
  that	
  has	
  different	
  hardware	
  
• Old	
  software	
  can	
  be	
  run	
  on	
  the	
  “emulated”	
  
computer	
  hardware	
  just	
  like	
  it	
  was	
  running	
  
on	
  the	
  original	
  physical	
  computer.	
  	
  
• Many	
  emulators	
  were	
  originally	
  developed	
  
to	
  run	
  old	
  video	
  games
How?  –  Emulation/Virtualization  
• Emulation	
  is	
  often	
  used	
  to	
  support	
  old	
  hardware	
  devices	
  that	
  
require	
  obsolete	
  software	
  
(e.g.	
  assembly	
  line	
  management	
  software,	
  scientific	
  instruments,	
  industrial	
  machinery,	
  etc)	
  
• Emulation	
  is	
  widely	
  used	
  by	
  mobile	
  phone	
  application	
  developers	
  
to	
  develop	
  software	
  for	
  phone-­‐hardware	
  using	
  desktop-­‐PC	
  
hardware	
  	
  
(i.e.	
  phone	
  hardware	
  is	
  emulated	
  on	
  desktop	
  pcs	
  to	
  build	
  phone-­‐compatible	
  applications)	
  
• Virtualization	
  =	
  emulation	
  but	
  with	
  compatible	
  hardware	
  
(some	
  of	
  the	
  host	
  machine’s	
  hardware	
  is	
  used	
  directly	
  by	
  the	
  “virtualized”	
  computer)	
  
Virtualization	
  bridges	
  the	
  gap	
  between	
  departure	
  of	
  recently	
  obsolete	
  hardware	
  and	
  the	
  
arrival	
  of	
  hardware	
  powerful	
  enough	
  to	
  emulate	
  it
How?  -­‐  Documentation
• We	
  need	
  unique,	
  persistent	
  identifiers	
  for	
  software	
  
• We	
  need	
  software	
  catalogues	
  
• We	
  need	
  unique,	
  persistent	
  identifiers	
  for	
  disk	
  images	
  (installed	
  environments/
virtual	
  hard	
  drives)	
  
• We	
  need	
  disk	
  image/virtual	
  hard	
  drive	
  catalogues	
  
• We	
  need	
  unique,	
  persistent	
  identifiers	
  for	
  emulated/virtualized	
  hardware	
  
configurations	
  
• We	
  need	
  hardware	
  configuration	
  catalogues
How?  -­‐  Documentation
• We	
  need	
  unique,	
  persistent	
  identifiers	
  for	
  software	
  
• We	
  need	
  software	
  catalogues	
  
• We	
  need	
  unique,	
  persistent	
  identifiers	
  for	
  disk	
  images	
  (installed	
  environments/
virtual	
  hard	
  drives)	
  
• We	
  need	
  disk	
  image/virtual	
  hard	
  drive	
  catalogues	
  
• We	
  need	
  unique,	
  persistent	
  identifiers	
  for	
  emulated/virtualized	
  hardware	
  
configurations	
  
• We	
  need	
  hardware	
  configuration	
  catalogues
*Mostly,	
  the	
  internet	
  archive	
  is	
  
doing	
  great	
  work,	
  as	
  are	
  NIST	
  and	
  
PRONOM
We	
  don’t	
  
have	
  these	
  
(yet!)*
How?  –  Configuring  emulated  
hardware
• Admins	
  configure	
  an	
  emulator	
  	
  
• Admins	
  install	
  and/or	
  configure	
  the	
  emulated	
  
software	
  	
  
• Requires	
  various	
  emulator	
  specific,	
  
technically	
  challenging	
  tools
How?  –  accessing  emulated  environments  at  
libraries  and  archives  
• Users	
  access	
  emulated	
  
environments	
  via	
  dedicated	
  
machines	
  	
  
• Use	
  dedicated	
  software	
  
• At	
  libraries	
  and	
  archives	
  this	
  is	
  
mostly	
  restricted	
  to	
  reading	
  
rooms
How?  –  This  is  too  hard!  
Emulation  as  a  Service
Emulation  as  a  Service  –What  is  it?
✓ Remote	
  access	
  to	
  pre-­‐configured	
  emulated	
  and	
  virtualized	
  environments	
  via	
  any	
  modern	
  
web	
  browser	
  
✓ Abstracts	
  configuration	
  challenges	
  away	
  from	
  end-­‐users	
  
✓ Changes	
  to	
  environments	
  can	
  be	
  saved	
  or	
  discarded	
  at	
  the	
  end	
  of	
  a	
  session	
  (a	
  fresh/
unchanged	
  version	
  is	
  always	
  available)	
  
✓ Interactivity	
  can	
  be	
  restricted	
  where	
  appropriate	
  (e.g.	
  limited	
  ability	
  to	
  download	
  or	
  copy	
  
content	
  to	
  local	
  computer)	
  
✓ Relatively	
  simple	
  way	
  to	
  provide	
  custom	
  online	
  environments	
  (virtual	
  reading	
  rooms?)
EaaS  –  Background  
• bwFLA	
  project	
  from	
  University	
  of	
  Freiburg	
  in	
  Germany	
  (http://bw-­‐fla.uni-­‐
freiburg.de)	
  
• Personally	
  collaborated	
  with	
  bwFLA	
  at	
  Freiburg	
  while	
  at	
  Archives	
  New	
  Zealand	
  
• Now	
  at	
  Yale	
  University	
  Library	
  and	
  brought	
  collaboration	
  along	
  
• Yale	
  University	
  Library	
  have	
  only	
  installation	
  outside	
  of	
  Germany	
  
• Testing	
  and	
  providing	
  requirements	
  for	
  ongoing	
  development	
  
• Planning	
  to	
  implement	
  into	
  a	
  production	
  ready	
  environment	
  next	
  financial	
  year
Emulation  as  a  Service  (EaaS)–  Why?
• A	
  lot	
  of	
  old	
  digital	
  content	
  can	
  only	
  be	
  properly	
  accessed	
  using	
  emulation	
  tools	
  
• Emulation	
  is	
  technically	
  specialized	
  
• Old	
  software	
  can	
  be	
  challenging	
  for	
  modern	
  users	
  to	
  understand	
  
• Modern	
  users	
  don’t	
  expect	
  to	
  have	
  to	
  come	
  into	
  a	
  reading	
  room	
  to	
  access	
  digital	
  
content	
  
• Maintain	
  control	
  over	
  content:	
  users	
  can’t	
  copy	
  data	
  in	
  or	
  out	
  unless	
  authorized	
  
(screenshots	
  are	
  inevitably	
  excluded)
Emulation  as  a  Service  (EaaS)–  Why?
• Strong	
  separation	
  between	
  environments,	
  objects	
  and	
  emulators/configurations	
  
• Emulation	
  can	
  be	
  provided	
  remotely	
  (outsourced)	
  with	
  disk	
  image	
  archives	
  and/or	
  
content	
  maintained	
  locally)	
  
• Small	
  derivative	
  environments	
  can	
  be	
  created	
  from	
  base-­‐environments	
  –saving	
  space	
  
• Standard	
  environments	
  can	
  be	
  reused	
  and	
  customized	
  
• Provides	
  ability	
  to	
  cite	
  environments
EaaS  usage  Examples
• Puppet	
  Motel	
  
• Hebrew	
  Texts	
  
• Companies	
  Data	
  
• See:	
  http://blogs.loc.gov/digitalpreservation/2014/08/
emulation-­‐as-­‐a-­‐service-­‐eaas-­‐at-­‐yale-­‐university-­‐library/
EaaS  –  How  it  works  

Architecture  and  design
EaaS  –  How  it  works  

(For  Technical  Administrators)
• Admins	
  configure	
  an	
  
emulator	
  on	
  local	
  PC	
  
• Admins	
  configure	
  the	
  
emulated	
  software	
  on	
  a	
  local	
  
PC	
  
• Configured	
  environment	
  gets	
  
saved	
  as	
  a	
  “disk	
  image”	
  with	
  
configuration	
  metadata
• Admins	
  confirm	
  the	
  software	
  
environment	
  stored	
  on	
  the	
  disk	
  
image	
  works	
  on	
  local	
  PC	
  
• Admins/Archivists/Librarians	
  
ingest	
  it	
  into	
  the	
  EaaS	
  service:
EaaS  –  How  it  works  

(For  Technical  Administrators)
EaaS  –  How  it  works

(For  Librarians/Archivists)
• Pre-­‐configured	
  software	
  environments	
  
(e.g.	
  a	
  Windows	
  95	
  +	
  Office	
  95	
  
environment)	
  can	
  have	
  files	
  added	
  to	
  
them	
  and	
  be	
  saved	
  as	
  a	
  variant	
  or	
  as	
  a	
  
stand-­‐alone	
  new	
  environment	
  
• Only	
  difference	
  (delta)	
  between	
  base-­‐
environments	
  and	
  customized	
  
environment	
  retained	
  –	
  saving	
  space	
  by	
  
not	
  duplicating	
  virtual	
  hard	
  drive	
  
content
• CD-­‐ROMs	
  and	
  other	
  
software	
  	
  can	
  be	
  ingested,	
  
installed/configured	
  on	
  top	
  
of	
  a	
  base	
  environment,	
  and	
  
tested	
  using	
  an	
  online	
  
interface	
  
• Newly	
  customized	
  
environment	
  can	
  be	
  stored	
  
for	
  future	
  use	
  and	
  further	
  
customization
EaaS  –  How  it  works

(For  Librarians/Archivists)
• Librarians/Archivists	
  
can	
  also	
  ingest	
  disk	
  
images	
  captured	
  
from	
  machines	
  they	
  
have	
  acquired	
  (e.g.	
  
authors’/politicians’	
  
desktops)
EaaS  –  How  it  works

(For  Librarians/Archivists)
EaaS  –  How  it  works

(For  end-­‐users)
• Users	
  can	
  click	
  on	
  links	
  in	
  a	
  
catalogue/finding	
  aid	
  to	
  
access	
  environments/
content
EaaS  –  How  it  works

(For  developers  and  system  integrators)
• Provides	
  generic	
  access	
  to	
  functionality	
  of	
  many	
  emulators	
  and	
  virtualization	
  tools	
  vi	
  a	
  
WebService	
  and	
  REST	
  API	
  
• Emulation	
  functionality	
  can	
  be	
  incorporated	
  into	
  existing	
  workflows	
  
• Emulated	
  (or	
  virtualized)	
  environments	
  can	
  be	
  embedded	
  into	
  web	
  pages	
  for	
  online	
  access	
  and	
  
online	
  exhibitions	
  
• Emulated	
  environment	
  citations,	
  thumbnails,	
  and	
  URIs/URLs	
  enable	
  easy	
  integration	
  with	
  
existing	
  catalogues	
  and	
  finding	
  aids	
  
• One-­‐click	
  “image-­‐disk-­‐and-­‐emulate”	
  workflows	
  being	
  developed	
  (collaborating	
  with	
  digital	
  
forensics	
  initiatives)
EaaS  Demo
Thank  you    -­‐-­‐-­‐  (Semi-­‐)Public  Demo  
https://demo.bw-fla.uni-freiburg.de
Username: bwfla
Password: demo
Olive  Demo
April 1, 2015 61
Execution Fidelity
Ability to precisely reproduce execution
Many moving parts
• hardware
• operating system
• dynamically linked libraries
• configuration parameters
• language settings
• time zone settings
• …
Very difficult to achieve and then maintain
Transform into a Scaling Problem
Pack up and carry the entire environment with you
(including the OS)
Transitive closure of everything you need
Central idea of a (hardware) virtual machine (VM)
But VMs are Huge!
10 GB VM
• @ 100 Mbps → at least 800 seconds (13 minutes)
download
• @ 10 Mbps → at least 8000 seconds (over two hours)
download
No one will wait that long to look at something briefly!
How do we achieve quick launch?
I

n

t

e

r

n

e

t
Video Streaming
VM Streaming Not So Easy
Access to VM image is not linear
Reference pattern depends on many runtime factors
• data dependencies
• human interaction
• spatial and temporal locality (program behavior)
Borrow an old idea from operating systems
• demand paging
• intercept missing VM pieces and fetch over Internet
• prefetching can mask stalls due to demand misses
(if hints are good)
Olive Implementation
Client Structure
1. Today’s Hardware (x86)
3. VMNetX
(demand paging and prefetching of VM state)
4. Virtual Machine Monitor (KVM/QEMU)
guestenvironment
2. Operating System (Linux) (host OS)
5. Hardware emulator (e.g. Basilisk II)
(not needed if old hardware was x86)
6. Old Operating System (guest OS)
(e.g., Windows 3.1)
7. Old Application
(e.g., Great American History Machine)
8. Data file, Script, Simulation Model, etc.
(e.g. Excel spreadsheet)
hostenvironment
Virtual Machine
(streamed over the Internet from Olive archive)
eg Laptop/Linux
Olive caching
Virtualize host hardware
Linux
Olive Implementation
VMNetX

client
FUSE
VM Image file
pristine
cache
modified
cache
to Olive server

via standard
HTTP range
requests
GuestOS
KVM / QEMU
VMMGuestApp
Unmodified
Web Server
https://youtu.be/J32NFUIC4m4
Looking Ahead
Many Technical Challenges
Scaling and performance issues
• VMs keep getting bigger, networks are never fast enough
• clever prefetching techniques
Precise emulation of hardware
• even x86 extended memory modes not quite right in QEMU
(can’t boot Windows 95 in KVM/QEMU)
• exotic hardware platforms
• host compatibility (e.g. CPU flags in x86) vs performance
• hardware performance accelerators (e.g. GPUs)
Multi-VM ensembles (e.g. HPC environments)
Tools for easy building of VMs (physical to virtual?)
Archiving entire cloud services
… many others …
We are a long way from being “done”!
Closing Thoughts
Archiving static content transformed human history
Archiving executable content will be equally transformative
Strong interest from university libraries, philanthropic foundations (e.g.
Sloan, Mellon), and national institutions (e.g. National Archives, Library
of Congress) to create a public good:
Olive reference library for the nation and the world
Library of Alexandria
I wonder what Isaac’s model would
say about this new data?
reaching back in time
Isaac’s archived VM image
Potential to Transform Scholarship
More information
https://olivearchive.org/
uqkeithw
Keith Webster
k.webster@library.uq.edu.au
kgw@cmu.edu
cmkeithw
Keith Webster

More Related Content

Viewers also liked

Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Philipp Zumstein
 
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Karen Estlund
 
Summit on Olive Project software emulation and curation service
Summit on Olive Project software emulation and curation serviceSummit on Olive Project software emulation and curation service
Summit on Olive Project software emulation and curation serviceKeith Webster
 
The big picture: reputation, rankings, assessment, and the role of libraries
The big picture: reputation, rankings, assessment, and the role of librariesThe big picture: reputation, rankings, assessment, and the role of libraries
The big picture: reputation, rankings, assessment, and the role of librariesKeith Webster
 
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Keith Webster
 
The changing landscape of scholarly communication: presentation to the NFAIS ...
The changing landscape of scholarly communication: presentation to the NFAIS ...The changing landscape of scholarly communication: presentation to the NFAIS ...
The changing landscape of scholarly communication: presentation to the NFAIS ...Keith Webster
 
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
 
Fair use week at Carnegie Mellon University
Fair use week at Carnegie Mellon UniversityFair use week at Carnegie Mellon University
Fair use week at Carnegie Mellon UniversityKeith Webster
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlCheryl Tanicala-Roldan
 
4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features4Science
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesKeith Webster
 
Future of DSpace - Steering Group panel at OR14
Future of DSpace - Steering Group panel at OR14Future of DSpace - Steering Group panel at OR14
Future of DSpace - Steering Group panel at OR14DuraSpace
 
Connecting people beyond the boundaries with information@ilnu library
Connecting people   beyond the boundaries with information@ilnu libraryConnecting people   beyond the boundaries with information@ilnu library
Connecting people beyond the boundaries with information@ilnu libraryAtul Bhatt
 
Greenstone aib 16_feb12_casarosa
Greenstone aib 16_feb12_casarosaGreenstone aib 16_feb12_casarosa
Greenstone aib 16_feb12_casarosaaccessoinformazione
 
Library Data Management Services
Library Data Management ServicesLibrary Data Management Services
Library Data Management ServicesKeith Webster
 
User Focused Digital Library: A Practical Guide
User Focused Digital Library: A Practical GuideUser Focused Digital Library: A Practical Guide
User Focused Digital Library: A Practical GuideSophia Guevara
 
Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?Keith Webster
 

Viewers also liked (20)

Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)
 
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...
 
ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
 
Summit on Olive Project software emulation and curation service
Summit on Olive Project software emulation and curation serviceSummit on Olive Project software emulation and curation service
Summit on Olive Project software emulation and curation service
 
The big picture: reputation, rankings, assessment, and the role of libraries
The big picture: reputation, rankings, assessment, and the role of librariesThe big picture: reputation, rankings, assessment, and the role of libraries
The big picture: reputation, rankings, assessment, and the role of libraries
 
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...
 
The changing landscape of scholarly communication: presentation to the NFAIS ...
The changing landscape of scholarly communication: presentation to the NFAIS ...The changing landscape of scholarly communication: presentation to the NFAIS ...
The changing landscape of scholarly communication: presentation to the NFAIS ...
 
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
 
Fair use week at Carnegie Mellon University
Fair use week at Carnegie Mellon UniversityFair use week at Carnegie Mellon University
Fair use week at Carnegie Mellon University
 
greenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrlgreenstone-bbla seminar july 2010-cheyrl
greenstone-bbla seminar july 2010-cheyrl
 
4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research Libraries
 
Future of DSpace - Steering Group panel at OR14
Future of DSpace - Steering Group panel at OR14Future of DSpace - Steering Group panel at OR14
Future of DSpace - Steering Group panel at OR14
 
Connecting people beyond the boundaries with information@ilnu library
Connecting people   beyond the boundaries with information@ilnu libraryConnecting people   beyond the boundaries with information@ilnu library
Connecting people beyond the boundaries with information@ilnu library
 
Digital library
Digital libraryDigital library
Digital library
 
Greenstone aib 16_feb12_casarosa
Greenstone aib 16_feb12_casarosaGreenstone aib 16_feb12_casarosa
Greenstone aib 16_feb12_casarosa
 
Library Data Management Services
Library Data Management ServicesLibrary Data Management Services
Library Data Management Services
 
User Focused Digital Library: A Practical Guide
User Focused Digital Library: A Practical GuideUser Focused Digital Library: A Practical Guide
User Focused Digital Library: A Practical Guide
 
Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?
 
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...
Carpenter/Lagace: NISO Recommended Practices to Support Adoption of Altmetric...
 

Similar to Software curation as a digital preservation service

December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...DeVonne Parks, CEM
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsFuture Perfect 2012
 
Application Virtualization, University of New Hampshire
Application Virtualization, University of New HampshireApplication Virtualization, University of New Hampshire
Application Virtualization, University of New HampshireTony Austwick
 
Coding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE frameworkCoding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE frameworkJames Wickett
 
fdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.pptfdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.pptKrishanPalSingh39
 
GWAVACon 2013: Vibe Hudson and NetCB Success Story 2
GWAVACon 2013: Vibe Hudson and NetCB Success Story 2GWAVACon 2013: Vibe Hudson and NetCB Success Story 2
GWAVACon 2013: Vibe Hudson and NetCB Success Story 2GWAVA
 
Top 10 dev ops tools (1)
Top 10 dev ops tools (1)Top 10 dev ops tools (1)
Top 10 dev ops tools (1)yalini97
 
Equal Access for All: Serving Students with Disabilities
Equal Access for All: Serving Students with DisabilitiesEqual Access for All: Serving Students with Disabilities
Equal Access for All: Serving Students with DisabilitiesJennifer Bartlett
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7 carnillr
 
Cincom Smalltalk News
Cincom Smalltalk NewsCincom Smalltalk News
Cincom Smalltalk NewsESUG
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 
Computer system organization
Computer system organizationComputer system organization
Computer system organizationSyed Zaid Irshad
 
DCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production ParityDCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production ParityGeoff Harcourt
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informaticsDavid Wallom
 
IWMW 1999: Browser management
IWMW 1999: Browser managementIWMW 1999: Browser management
IWMW 1999: Browser managementIWMW
 
Multimedia authoring and user interface
Multimedia authoring and user interface Multimedia authoring and user interface
Multimedia authoring and user interface nirmalbj
 

Similar to Software curation as a digital preservation service (20)

December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Application Virtualization, University of New Hampshire
Application Virtualization, University of New HampshireApplication Virtualization, University of New Hampshire
Application Virtualization, University of New Hampshire
 
Coding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE frameworkCoding Secure Infrastructure in the Cloud using the PIE framework
Coding Secure Infrastructure in the Cloud using the PIE framework
 
fdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.pptfdocuments.in_unit-2-foc.ppt
fdocuments.in_unit-2-foc.ppt
 
The Java Story
The Java StoryThe Java Story
The Java Story
 
Chap004
Chap004Chap004
Chap004
 
GWAVACon 2013: Vibe Hudson and NetCB Success Story 2
GWAVACon 2013: Vibe Hudson and NetCB Success Story 2GWAVACon 2013: Vibe Hudson and NetCB Success Story 2
GWAVACon 2013: Vibe Hudson and NetCB Success Story 2
 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
 
Top 10 dev ops tools (1)
Top 10 dev ops tools (1)Top 10 dev ops tools (1)
Top 10 dev ops tools (1)
 
Equal Access for All: Serving Students with Disabilities
Equal Access for All: Serving Students with DisabilitiesEqual Access for All: Serving Students with Disabilities
Equal Access for All: Serving Students with Disabilities
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7
 
Case study
Case studyCase study
Case study
 
Cincom Smalltalk News
Cincom Smalltalk NewsCincom Smalltalk News
Cincom Smalltalk News
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Computer system organization
Computer system organizationComputer system organization
Computer system organization
 
DCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production ParityDCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production Parity
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informatics
 
IWMW 1999: Browser management
IWMW 1999: Browser managementIWMW 1999: Browser management
IWMW 1999: Browser management
 
Multimedia authoring and user interface
Multimedia authoring and user interface Multimedia authoring and user interface
Multimedia authoring and user interface
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Software curation as a digital preservation service

  • 1. Software curation as a digital preservation service Euan Cochrane Yale University Library Keith Webster Dean of University Libraries @cmkeithw @euanc
  • 3. April 1, 2015 3 Archiving Static Content
  • 4. April 1, 2015 4 What About Executable Content? Games
  • 5. April 1, 2015 5 What About Executable Content? Application- specific contentGames WordPerfect 1.0 doc Can you read it today? 100 years from now? Original Wang doc Can you read it today? 100 years from now? Simulation model Can you re-run old model with new data?
  • 6.
  • 7.
  • 8.
  • 10. • We have spent 20 years converting material to digital form, establishing standards and protocols, and looking after it
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. We also have a track-record in curating born-digital content
  • 17. And some of us are making progress with social media products
  • 18. • The rapid development in computing technology and the Internet have opened up new applications for the basic sources of research — the base material of research data — which has given a major impetus to scientific work in recent years. • Access to research data increases the returns from public investment in this area; reinforces open scientific inquiry; encourages diversity of studies and opinion; promotes new areas of work and enables the exploration of topics not envisioned by the initial investigators. • The value of data lies in their use. Full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research. What about the products of research?
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. The data may still be discoverable and accessible - but executable?
  • 26. Data come in different forms, shapes and sizes
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Opera5ng  System  Usage  Over  Time 0.00% 20.00% 40.00% 60.00% 80.00% 2003 2006 2009 2012 2015 Win8 Win7 Vista Win2003 Older  Win WinXP W2000 Win98 Win95 WinNT Linux Mac Mobile Why?  –  Software  dependent  content
  • 33. Old  software  is  required  to  authentically   render  old  content Original  content  in  original  software   (WordPerfect  in  Windows  95) Original  content  in  newer  software   (LibreOffice  Writer  in  Windows   Vista)
  • 34. Research  results  are  at  risk  of  loss  without   original  software Original  content  in  original  software     (WordStar  for  DOS  in  Microsoft  DOS)   [NB:   equation   predicting   tree   growth   rates   includes   exponents  documented  using  upper  line  of  text] Original  content  in  newer  software     (LibreOffice  Writer  in  Windows  Vista)   [NB:  equation  layout  and  meaning  changed]
  • 35. Why?  –  Software  dependent   content • We  need  to  curate  and  preserve  operating  systems  to  support  access  to  assets  that  depend  on  them   • We  need  to  curate  and  preserve  software  applications  to  support  access  to  content  that  depends   on  them   • We  need  to  create  and  preserve  fonts,  scripts,  plug-­‐ins  and  other  dependencies  to  support   access  to  content  that  requires  them   • We  need  to  preserve  whole  desktop  environments  (e.g.  Salmon  Rushdie’s  desktop  at  Emory   university)  to  support  access  to  the  experience  of  interacting  with  it   • We  need  to  curate  and  preserve  pre-­‐configured  disk  images  with  software  already  installed  on   them  –  for  running  on  emulated  hardware
  • 37. How?  –  Emulation/Virtualization   • An  emulation  software  package   (“emulator”)  is  used  to  create  a  virtual   version  of  one  computer  within  another   computer  that  has  different  hardware   • Old  software  can  be  run  on  the  “emulated”   computer  hardware  just  like  it  was  running   on  the  original  physical  computer.     • Many  emulators  were  originally  developed   to  run  old  video  games
  • 38. How?  –  Emulation/Virtualization   • Emulation  is  often  used  to  support  old  hardware  devices  that   require  obsolete  software   (e.g.  assembly  line  management  software,  scientific  instruments,  industrial  machinery,  etc)   • Emulation  is  widely  used  by  mobile  phone  application  developers   to  develop  software  for  phone-­‐hardware  using  desktop-­‐PC   hardware     (i.e.  phone  hardware  is  emulated  on  desktop  pcs  to  build  phone-­‐compatible  applications)   • Virtualization  =  emulation  but  with  compatible  hardware   (some  of  the  host  machine’s  hardware  is  used  directly  by  the  “virtualized”  computer)   Virtualization  bridges  the  gap  between  departure  of  recently  obsolete  hardware  and  the   arrival  of  hardware  powerful  enough  to  emulate  it
  • 39. How?  -­‐  Documentation • We  need  unique,  persistent  identifiers  for  software   • We  need  software  catalogues   • We  need  unique,  persistent  identifiers  for  disk  images  (installed  environments/ virtual  hard  drives)   • We  need  disk  image/virtual  hard  drive  catalogues   • We  need  unique,  persistent  identifiers  for  emulated/virtualized  hardware   configurations   • We  need  hardware  configuration  catalogues
  • 40. How?  -­‐  Documentation • We  need  unique,  persistent  identifiers  for  software   • We  need  software  catalogues   • We  need  unique,  persistent  identifiers  for  disk  images  (installed  environments/ virtual  hard  drives)   • We  need  disk  image/virtual  hard  drive  catalogues   • We  need  unique,  persistent  identifiers  for  emulated/virtualized  hardware   configurations   • We  need  hardware  configuration  catalogues *Mostly,  the  internet  archive  is   doing  great  work,  as  are  NIST  and   PRONOM We  don’t   have  these   (yet!)*
  • 41. How?  –  Configuring  emulated   hardware • Admins  configure  an  emulator     • Admins  install  and/or  configure  the  emulated   software     • Requires  various  emulator  specific,   technically  challenging  tools
  • 42. How?  –  accessing  emulated  environments  at   libraries  and  archives   • Users  access  emulated   environments  via  dedicated   machines     • Use  dedicated  software   • At  libraries  and  archives  this  is   mostly  restricted  to  reading   rooms
  • 43. How?  –  This  is  too  hard!  
  • 44. Emulation  as  a  Service
  • 45. Emulation  as  a  Service  –What  is  it? ✓ Remote  access  to  pre-­‐configured  emulated  and  virtualized  environments  via  any  modern   web  browser   ✓ Abstracts  configuration  challenges  away  from  end-­‐users   ✓ Changes  to  environments  can  be  saved  or  discarded  at  the  end  of  a  session  (a  fresh/ unchanged  version  is  always  available)   ✓ Interactivity  can  be  restricted  where  appropriate  (e.g.  limited  ability  to  download  or  copy   content  to  local  computer)   ✓ Relatively  simple  way  to  provide  custom  online  environments  (virtual  reading  rooms?)
  • 46. EaaS  –  Background   • bwFLA  project  from  University  of  Freiburg  in  Germany  (http://bw-­‐fla.uni-­‐ freiburg.de)   • Personally  collaborated  with  bwFLA  at  Freiburg  while  at  Archives  New  Zealand   • Now  at  Yale  University  Library  and  brought  collaboration  along   • Yale  University  Library  have  only  installation  outside  of  Germany   • Testing  and  providing  requirements  for  ongoing  development   • Planning  to  implement  into  a  production  ready  environment  next  financial  year
  • 47. Emulation  as  a  Service  (EaaS)–  Why? • A  lot  of  old  digital  content  can  only  be  properly  accessed  using  emulation  tools   • Emulation  is  technically  specialized   • Old  software  can  be  challenging  for  modern  users  to  understand   • Modern  users  don’t  expect  to  have  to  come  into  a  reading  room  to  access  digital   content   • Maintain  control  over  content:  users  can’t  copy  data  in  or  out  unless  authorized   (screenshots  are  inevitably  excluded)
  • 48. Emulation  as  a  Service  (EaaS)–  Why? • Strong  separation  between  environments,  objects  and  emulators/configurations   • Emulation  can  be  provided  remotely  (outsourced)  with  disk  image  archives  and/or   content  maintained  locally)   • Small  derivative  environments  can  be  created  from  base-­‐environments  –saving  space   • Standard  environments  can  be  reused  and  customized   • Provides  ability  to  cite  environments
  • 49. EaaS  usage  Examples • Puppet  Motel   • Hebrew  Texts   • Companies  Data   • See:  http://blogs.loc.gov/digitalpreservation/2014/08/ emulation-­‐as-­‐a-­‐service-­‐eaas-­‐at-­‐yale-­‐university-­‐library/
  • 50. EaaS  –  How  it  works  
 Architecture  and  design
  • 51. EaaS  –  How  it  works  
 (For  Technical  Administrators) • Admins  configure  an   emulator  on  local  PC   • Admins  configure  the   emulated  software  on  a  local   PC   • Configured  environment  gets   saved  as  a  “disk  image”  with   configuration  metadata
  • 52. • Admins  confirm  the  software   environment  stored  on  the  disk   image  works  on  local  PC   • Admins/Archivists/Librarians   ingest  it  into  the  EaaS  service: EaaS  –  How  it  works  
 (For  Technical  Administrators)
  • 53. EaaS  –  How  it  works
 (For  Librarians/Archivists) • Pre-­‐configured  software  environments   (e.g.  a  Windows  95  +  Office  95   environment)  can  have  files  added  to   them  and  be  saved  as  a  variant  or  as  a   stand-­‐alone  new  environment   • Only  difference  (delta)  between  base-­‐ environments  and  customized   environment  retained  –  saving  space  by   not  duplicating  virtual  hard  drive   content
  • 54. • CD-­‐ROMs  and  other   software    can  be  ingested,   installed/configured  on  top   of  a  base  environment,  and   tested  using  an  online   interface   • Newly  customized   environment  can  be  stored   for  future  use  and  further   customization EaaS  –  How  it  works
 (For  Librarians/Archivists)
  • 55. • Librarians/Archivists   can  also  ingest  disk   images  captured   from  machines  they   have  acquired  (e.g.   authors’/politicians’   desktops) EaaS  –  How  it  works
 (For  Librarians/Archivists)
  • 56. EaaS  –  How  it  works
 (For  end-­‐users) • Users  can  click  on  links  in  a   catalogue/finding  aid  to   access  environments/ content
  • 57. EaaS  –  How  it  works
 (For  developers  and  system  integrators) • Provides  generic  access  to  functionality  of  many  emulators  and  virtualization  tools  vi  a   WebService  and  REST  API   • Emulation  functionality  can  be  incorporated  into  existing  workflows   • Emulated  (or  virtualized)  environments  can  be  embedded  into  web  pages  for  online  access  and   online  exhibitions   • Emulated  environment  citations,  thumbnails,  and  URIs/URLs  enable  easy  integration  with   existing  catalogues  and  finding  aids   • One-­‐click  “image-­‐disk-­‐and-­‐emulate”  workflows  being  developed  (collaborating  with  digital   forensics  initiatives)
  • 59. Thank  you    -­‐-­‐-­‐  (Semi-­‐)Public  Demo   https://demo.bw-fla.uni-freiburg.de Username: bwfla Password: demo
  • 61. April 1, 2015 61 Execution Fidelity Ability to precisely reproduce execution Many moving parts • hardware • operating system • dynamically linked libraries • configuration parameters • language settings • time zone settings • … Very difficult to achieve and then maintain
  • 62. Transform into a Scaling Problem Pack up and carry the entire environment with you (including the OS) Transitive closure of everything you need Central idea of a (hardware) virtual machine (VM)
  • 63. But VMs are Huge! 10 GB VM • @ 100 Mbps → at least 800 seconds (13 minutes) download • @ 10 Mbps → at least 8000 seconds (over two hours) download No one will wait that long to look at something briefly! How do we achieve quick launch?
  • 65. VM Streaming Not So Easy Access to VM image is not linear Reference pattern depends on many runtime factors • data dependencies • human interaction • spatial and temporal locality (program behavior) Borrow an old idea from operating systems • demand paging • intercept missing VM pieces and fetch over Internet • prefetching can mask stalls due to demand misses (if hints are good)
  • 67. Client Structure 1. Today’s Hardware (x86) 3. VMNetX (demand paging and prefetching of VM state) 4. Virtual Machine Monitor (KVM/QEMU) guestenvironment 2. Operating System (Linux) (host OS) 5. Hardware emulator (e.g. Basilisk II) (not needed if old hardware was x86) 6. Old Operating System (guest OS) (e.g., Windows 3.1) 7. Old Application (e.g., Great American History Machine) 8. Data file, Script, Simulation Model, etc. (e.g. Excel spreadsheet) hostenvironment Virtual Machine (streamed over the Internet from Olive archive) eg Laptop/Linux Olive caching Virtualize host hardware
  • 68. Linux Olive Implementation VMNetX
 client FUSE VM Image file pristine cache modified cache to Olive server
 via standard HTTP range requests GuestOS KVM / QEMU VMMGuestApp Unmodified Web Server
  • 71. Many Technical Challenges Scaling and performance issues • VMs keep getting bigger, networks are never fast enough • clever prefetching techniques Precise emulation of hardware • even x86 extended memory modes not quite right in QEMU (can’t boot Windows 95 in KVM/QEMU) • exotic hardware platforms • host compatibility (e.g. CPU flags in x86) vs performance • hardware performance accelerators (e.g. GPUs) Multi-VM ensembles (e.g. HPC environments) Tools for easy building of VMs (physical to virtual?) Archiving entire cloud services … many others … We are a long way from being “done”!
  • 72. Closing Thoughts Archiving static content transformed human history Archiving executable content will be equally transformative Strong interest from university libraries, philanthropic foundations (e.g. Sloan, Mellon), and national institutions (e.g. National Archives, Library of Congress) to create a public good: Olive reference library for the nation and the world Library of Alexandria I wonder what Isaac’s model would say about this new data? reaching back in time Isaac’s archived VM image Potential to Transform Scholarship
  • 73.