SlideShare a Scribd company logo
1 of 34
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-
nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San
Francisco, California, 94105, USA.
Archiving the Web: What can Institutions
learn from National and International Web
Archiving Initiatives
IWMW 2006, University of Bath, 15 June 2006
Maureen Pennock Michael Day Lizzie Richmond
UKOLN
University of Bath
UKOLN
University of Bath
University of Bath
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Today’s workshop
• Records Management and the web:
• Key RM principles
• Justification for archiving web-based records
• Breakout 1 - to discuss the types of record found on the web
• An archivist's perspective:
• Authenticity, accessibility, security, legal compliance
• Breakout 2 - to discuss drivers and barriers
• An overview of selected national and international web
archiving initiatives:
• Breakout 3 - to develop approaches to preserving web sites
• Feedback
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Web-Based Records
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Philosophy
• Archiving web sites & web-based records requires
collaboration from all stakeholders, including
records managers, but also IT managers, web-project
managers, webmasters, content editors, content
providers, and even senior management, across the
entire life-cycle of the records
• BUT … there is a difference in approaches between
archiving websites and archiving web-based records
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
What is a record?
• BS ISO 15489 definition: “any information that is
created, received and maintained as evidence and
information by an organisation or person in
pursuance of legal obligations or in the transaction of
business”
• Evidence of a transaction
• Anything that:
• documents a working transaction between two or more
parties
• documents the mission and goals of an organisation
• was created or received in the course of carrying out the
mission and goals of an organisation
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Key Records Management issues
• Proper care and management of records throughout
their entire life-cycle
• Not all data has to be retained
• Legal information obligations must be met
• Organisational retention schedules - identifies record
classes of concern
• Different records and record classes have different
retention periods
• Metadata must be stored with records
• Disposal and destruction processes
Leads to archival and long-term storage for some records
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Why archive website ‘records’
• Records are increasingly posted on the web
• Uniquely available informative records
• Users may act or take decisions based on this
information, with important consequences
• Records of business transactions
• Accountability & transparency
• To funding bodies
• To stakeholders
• For legal reasons
• Historical and culturally valuable
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Breakout 1
• Discuss and identify the types of records that
can appear on the Web – e.g.:
• Reports, policy documents etc
• Information – submission dates, pricing etc
• Discuss and identify the forms can they take
– e.g.:
• Text-based files
• Web-forms
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Feedback I
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Archiving the Web
An (inexperienced) archivist's
perspective
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
More definitions…
• Records management:
“…the field of management responsible for the efficient and
systematic control of the creation, receipt, maintenance, use
and disposition of records, including processes for capturing
and maintaining evidence of and information about business
activities and transactions in the form of records.” (BS ISO
15489 - 2001)
• Archives:
“…documents, irrespective of form, medium or age, intended for
long-term preservation because of their continuing value.” (BS
5454 - 2000)
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Authenticity:
• Must be demonstrably reliable as proof
• Creation and capture
• Metadata and context
• Ownership/responsibility
• Version control
• Cataloguing standards
What we want from our records
and archives …
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Accessibility:
• Must be capable of use over time
• Locate, retrieve and display
• File plans, naming conventions
• Obsolescence
• Migration strategy
• Reduced functionality?
What we want from our records
and archives …
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Security:
• Must be protected
• Physical damage and unauthorised access
• Robust destruction procedures
• Intellectual control
• Storage environment
• Disaster plan
What we want from our records
and archives …
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Legal compliance:
• Must not break the law
• Freedom of Information Act 2000
• Data Protection Act 1998
• Copyright issues?
• Defence against litigation
• Legal admissibility
What we want from our records
and archives …
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Breakout 2
• What are the main drivers for archiving web-
based records?
• Discuss and identify as many challenges or
barriers to archiving web-based records as
you can:
• Technical barriers
• Cultural barriers
• Socio-economic barriers
• Organisational barriers
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Feedback II
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Current Approaches to Archiving
the Web
National and International Initiatives
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Some basics
• Not all web archives are organised on a records
management basis
• Most web archiving initiatives:
• Emphasise the informational value of the web as a cultural
phenomenon or communication medium
• Highlight the transience of content
• Focus largely on collecting content, less on providing long-
term access (or preservation)
• Have collection strategies that are based on what can be
automatically captured from the client side
• Have problems with the deep (or hidden) web, i.e. those
driven by databases or otherwise interactive … so what
about Web 2.0?
• Tend to ignore differences in type categories or formats
• Have significant legal problems with providing access
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Approaches to collection
• Broadly four main collecting approaches (not
mutually exclusive):
• Domain capture (harvesting)
• Using specialised crawler programs to collect sites within
national (or other) identifiable domains
• Often based on the 'national' web domain
• Can usually only deal with the surface web
• Selective capture (harvesting)
• Capturing selected web sites on a given frequency
• Can usually only deal with the surface web
• Selective capture (conversion or re-engineering)
• Typically requires access at the server-side
• Can deal with the deep web
• Deposit by website owner
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Two main models
• Harvesting model
• Used by national and research libraries, university special
collections (e.g., DACHS) and the Internet Archive
• Records management model
• Addresses the issues raised earlier in this session
• May be more appropriate for specific institutional records …
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Some Examples …
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Internet Archive
• Non-profit organisation, based in US
• Wants to offer permanent access to digital online
materials of all types
• Founded in 1996, has been collecting since then …
much content donated by Alexa Internet
• Collects sites by crawling and harvesting web sites
• Sites can 'opt out' by way of robots.txt file on the web
server
• Most content is freely available to the public, e.g.
through the Wayback Machine
• Interface issues: only the URL indicates that the page
is archived
• Website: http://www.archive.org/
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
National Library of Australia
• The PANDORA Archive
• Builds on existing NLA collection policies
• Provides long-term access to selected online publications
and websites
• Permission is sought from site owners in advance
• PANDAS (v3) –PANDORA Digital Archiving System
• Open Source Software used for managing the process of
gathering, archiving and publishing website resources
• Offers end-to-end archiving workflow
• Supports modularity: currently mostly used with HTTrack,
but other harvester programs can be plugged-in
• Assigns persistent identifiers and metadata to each item
when registered
• Website: http://pandora.nla.gov.au/
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
UK WAC
• UK Web Archiving Consortium (6 members)
• British Library, National Library of Scotland, National Library
of Wales, The National Archives, Wellcome Library, JISC
• Collects Web content selectively
• Uses modified PANDAS collection/harvesting software
developed by the National Library of Australia
• Underlying harvesting program is currently HTTrack
• Permission is sought from site owners in advance
• The collections are publicly accessible
• Persistent Identifier URLs
• Central repository of metadata
• Single partner assumes responsibility for each site
• Website: http://www.webarchive.org.uk/
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Nordic Web Archive
• A collaboration between the Nordic national libraries
(Denmark, Finland, Iceland, Norway, Sweden)
• Considerable expertise available:
• For example, the Swedish Royal Library pioneered the
national domain capture approach
• Main focus on developing access tools
• NWA Toolset (open source)
• Work now taken forward as part of the WERA viewer
application developed as part of the International Internet
Preservation Consortium
• Website: http://nwa.nb.no/
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
IIPC (1)
• International Internet Preservation Consortium
• Builds co-operation between the Internet Archive and
national and research libraries
• Co-ordinated by the Bibliothèque nationale de France
• The British Library is the only current UK member, other
national library partners include the Library of Congress, the
Library and Archives Canada and the national libraries of
Australia, Denmark, Finland, Iceland, Italy, Norway and
Sweden
• Reflects those with current experience of Web archiving
• Both working-groups and tool development
• Phase II will enable new partners to join the consortium
• Website: http://netpreserve.org/
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
IIPC (2)
• Phase I - developing the IIPC toolkit
• Standards and tools for supporting:
• Acquisition - archival quality crawler (Heritrix); portable
database extraction and migration tool for database-
driven deep web sites (DeepARC)
• Managing collections - analytical and prioritization tools
for automatically focusing harvesting; curation tools to
provide a non-technical interface for selecting,
monitoring and verifying archived web sites
• Collection storage and maintenance - tools for
manipulating formats; a standardised storage format
(WARC), standards for metadata
• Access and finding aids - browse interfaces (WERA)
and search facilities (NutchWAX)
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
The National Archives (UK)
• Managing web resources (December 2001)
• ERM toolkit for government agencies
• Practical steps for active records management and
sustainability
• Useful identification of web-based records
• Scenarios
• How websites differ from other records
• Management control mechanisms
• Model action plan
• Sustainability
• Website: http://www.nationalarchives.gov.uk/
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
National Archives of Australia
• A Policy for keeping records of web-based activity
(January 2001)
• Provides clear directions to Commonwealth agencies to
implement mechanisms for creating, managing and retaining
web-based records of value
• Guidelines (March 2001)
• Challenges and responsibilities
• Types of web-based resources
• Fundamentals of good record-keeping
• Assessing risk – factors to consider
• Strategic & technical options
• Storage & preservation - issues & strategies
• Determining the best option
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Managing web-based records
• Fundamentals:
• Information Audit and Risk Assessment
• A systematic approach
• Develop policy
• Formulate plan for capture, maintenance, and
preservation
• Implement appropriate website maintenance procedures
• Assign and document responsibilities
• Identify records
• Determine retention requirements
• Capture records into recordkeeping system
• Add metadata
• Transfer content and metadata into archive as appropriate
* Based on NAA Guidelines for Archiving Web Resources
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Breakout 3
• Scenarios for each group
• Read brief
• Identify main actions for each stage of life-cycle
that play a role in archiving web-based resources
• Identify aspects of a successful long-term
preservation strategy
• What aspects of a harvesting model could be of
use? How? Why?
• What other technical development is needed?
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Feedback III
Your approach?
a centre of expertise in data curation and preservation
Archiving Web-based records IWMW 2006 15 June 2006
Go forth and archive!
Maureen Pennock
M.Pennock@ukoln.ac.uk
Michael Day
M.Day@ukoln.ac.uk
Lizzie Richmond
L.Richmond@bath.ac.uk

More Related Content

Viewers also liked

Reimagining capitalism - Principles of people centered economics
Reimagining capitalism -  Principles of people centered economicsReimagining capitalism -  Principles of people centered economics
Reimagining capitalism - Principles of people centered economicsJeff Mowatt
 
Mahara atelier de prise en main
Mahara   atelier de prise en mainMahara   atelier de prise en main
Mahara atelier de prise en mainNicolas Thorel
 
Tpn°4 poster punto g.pptx
Tpn°4 poster punto g.pptxTpn°4 poster punto g.pptx
Tpn°4 poster punto g.pptxfernando sauer
 
PDP session law
PDP session lawPDP session law
PDP session lawcpjcollege
 
XNN001 Nutrition assessment in individuals and populations
XNN001 Nutrition assessment in individuals and populationsXNN001 Nutrition assessment in individuals and populations
XNN001 Nutrition assessment in individuals and populationsramseyr
 
Laporan ti spss nisadilla n.a (21040114060053)
Laporan ti spss nisadilla n.a (21040114060053)Laporan ti spss nisadilla n.a (21040114060053)
Laporan ti spss nisadilla n.a (21040114060053)Nisadilla Hartoyo
 
IWMW 2003 b4 QA for web sites (3 Intro to Quality)
IWMW 2003 b4 QA for web sites (3 Intro to Quality)IWMW 2003 b4 QA for web sites (3 Intro to Quality)
IWMW 2003 b4 QA for web sites (3 Intro to Quality)IWMW
 
Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)
Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)
Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)lincubateur_tls
 
Example research questions
Example research questionsExample research questions
Example research questionstheelliotthouse
 
IWMW 1999: Web SIte Security
IWMW 1999: Web SIte SecurityIWMW 1999: Web SIte Security
IWMW 1999: Web SIte SecurityIWMW
 
Key Concepts Of Effective Self-Management
Key Concepts Of Effective Self-ManagementKey Concepts Of Effective Self-Management
Key Concepts Of Effective Self-ManagementThorsten Sachtje
 
Paper - Analisa Website Dinomarket.com
Paper - Analisa Website Dinomarket.comPaper - Analisa Website Dinomarket.com
Paper - Analisa Website Dinomarket.comOptima Mijatovic
 
L’attractivité de la France selon les responsables des sociétés étrangères in...
L’attractivité de la France selon les responsables des sociétés étrangères in...L’attractivité de la France selon les responsables des sociétés étrangères in...
L’attractivité de la France selon les responsables des sociétés étrangères in...Ipsos France
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designsjarati
 
Business model mixer for consulting
Business model mixer for consultingBusiness model mixer for consulting
Business model mixer for consultingArd-Pieter de Man
 

Viewers also liked (20)

Reimagining capitalism - Principles of people centered economics
Reimagining capitalism -  Principles of people centered economicsReimagining capitalism -  Principles of people centered economics
Reimagining capitalism - Principles of people centered economics
 
Italia
ItaliaItalia
Italia
 
ICT Guided Tour Asia Ed. 2015
ICT Guided Tour Asia Ed. 2015ICT Guided Tour Asia Ed. 2015
ICT Guided Tour Asia Ed. 2015
 
Mahara atelier de prise en main
Mahara   atelier de prise en mainMahara   atelier de prise en main
Mahara atelier de prise en main
 
Corporate Lobbying Information
Corporate Lobbying InformationCorporate Lobbying Information
Corporate Lobbying Information
 
Tpn°4 poster punto g.pptx
Tpn°4 poster punto g.pptxTpn°4 poster punto g.pptx
Tpn°4 poster punto g.pptx
 
PDP session law
PDP session lawPDP session law
PDP session law
 
XNN001 Nutrition assessment in individuals and populations
XNN001 Nutrition assessment in individuals and populationsXNN001 Nutrition assessment in individuals and populations
XNN001 Nutrition assessment in individuals and populations
 
Laporan ti spss nisadilla n.a (21040114060053)
Laporan ti spss nisadilla n.a (21040114060053)Laporan ti spss nisadilla n.a (21040114060053)
Laporan ti spss nisadilla n.a (21040114060053)
 
IWMW 2003 b4 QA for web sites (3 Intro to Quality)
IWMW 2003 b4 QA for web sites (3 Intro to Quality)IWMW 2003 b4 QA for web sites (3 Intro to Quality)
IWMW 2003 b4 QA for web sites (3 Intro to Quality)
 
Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)
Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)
Incubateur Toulousain - Introduction au XNA - Damien Paludetto (26/01/2011)
 
Example research questions
Example research questionsExample research questions
Example research questions
 
Karl marx
Karl marxKarl marx
Karl marx
 
IWMW 1999: Web SIte Security
IWMW 1999: Web SIte SecurityIWMW 1999: Web SIte Security
IWMW 1999: Web SIte Security
 
Key Concepts Of Effective Self-Management
Key Concepts Of Effective Self-ManagementKey Concepts Of Effective Self-Management
Key Concepts Of Effective Self-Management
 
Paper - Analisa Website Dinomarket.com
Paper - Analisa Website Dinomarket.comPaper - Analisa Website Dinomarket.com
Paper - Analisa Website Dinomarket.com
 
L’attractivité de la France selon les responsables des sociétés étrangères in...
L’attractivité de la France selon les responsables des sociétés étrangères in...L’attractivité de la France selon les responsables des sociétés étrangères in...
L’attractivité de la France selon les responsables des sociétés étrangères in...
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designs
 
Business model mixer for consulting
Business model mixer for consultingBusiness model mixer for consulting
Business model mixer for consulting
 
Managing Waqf in Turkey and Malaysia for Educational Development. The Best Pr...
Managing Waqf in Turkey and Malaysia for Educational Development. The Best Pr...Managing Waqf in Turkey and Malaysia for Educational Development. The Best Pr...
Managing Waqf in Turkey and Malaysia for Educational Development. The Best Pr...
 

Similar to IWMW 2006: Archiving the Web What can Institutions learn from National and International Web Archiving Initiatives (1)

Building blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositoriesBuilding blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositoriesIna Smith
 
Module 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdfModule 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdfERMIYASTARIKU2
 
Module 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdfModule 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdfERMIYASTARIKU2
 
Preservation planning at the British Library
Preservation planning at the British LibraryPreservation planning at the British Library
Preservation planning at the British LibraryMichael Day
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Roxanne Missingham
 
20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brighton20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brightonNeil Beagrie
 
Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Anna Perricci
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
Criteria for a trusted institutional repository
Criteria for a trusted institutional repositoryCriteria for a trusted institutional repository
Criteria for a trusted institutional repositoryIna Smith
 
Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016IzzyChad
 
From the Cradle to the Digital Vault: Tracking the Path of e-journals
From the Cradle to the Digital Vault: Tracking the Path of e-journalsFrom the Cradle to the Digital Vault: Tracking the Path of e-journals
From the Cradle to the Digital Vault: Tracking the Path of e-journalsISSN International Centre
 
LOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceLOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceChris Rusbridge
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
Managing Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectManaging Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectRecollection Wisconsin
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareHistoric Environment Scotland
 

Similar to IWMW 2006: Archiving the Web What can Institutions learn from National and International Web Archiving Initiatives (1) (20)

Building blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositoriesBuilding blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositories
 
Building blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositoriesBuilding blocks for success: criteria for trusted institutional repositories
Building blocks for success: criteria for trusted institutional repositories
 
Scaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-YuScaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-Yu
 
Module 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdfModule 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdf
 
Module 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdfModule 1 - Introduction to Records Management.pdf
Module 1 - Introduction to Records Management.pdf
 
Preservation planning at the British Library
Preservation planning at the British LibraryPreservation planning at the British Library
Preservation planning at the British Library
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brighton20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brighton
 
Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019Archiving for Now and Later - workshop at Common Field Convening 2019
Archiving for Now and Later - workshop at Common Field Convening 2019
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Criteria for a trusted institutional repository
Criteria for a trusted institutional repositoryCriteria for a trusted institutional repository
Criteria for a trusted institutional repository
 
Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web Archiving
 
Archives in museums
Archives in museumsArchives in museums
Archives in museums
 
292 daniel dollar ssp yale_28_may2008
292 daniel dollar ssp yale_28_may2008292 daniel dollar ssp yale_28_may2008
292 daniel dollar ssp yale_28_may2008
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
From the Cradle to the Digital Vault: Tracking the Path of e-journals
From the Cradle to the Digital Vault: Tracking the Path of e-journalsFrom the Cradle to the Digital Vault: Tracking the Path of e-journals
From the Cradle to the Digital Vault: Tracking the Path of e-journals
 
LOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceLOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experience
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Managing Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and SelectManaging Digital Content Over Time: Identify and Select
Managing Digital Content Over Time: Identify and Select
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
 

More from IWMW

Look who's talking now
Look who's talking nowLook who's talking now
Look who's talking nowIWMW
 
Introduction to IWMW 2000 (Liz Lyon)
Introduction to IWMW 2000 (Liz Lyon)Introduction to IWMW 2000 (Liz Lyon)
Introduction to IWMW 2000 (Liz Lyon)IWMW
 
Web Tools report
Web Tools reportWeb Tools report
Web Tools reportIWMW
 
Personal Contingency Plan - Beat The Panic
Personal Contingency Plan - Beat The PanicPersonal Contingency Plan - Beat The Panic
Personal Contingency Plan - Beat The PanicIWMW
 
Whose site is it anyway?
Whose site is it anyway?Whose site is it anyway?
Whose site is it anyway?IWMW
 
Open Source - the case against
Open Source - the case againstOpen Source - the case against
Open Source - the case againstIWMW
 
IWMW 2002: Avoiding Portal Wars - an MIS view
IWMW 2002: Avoiding Portal Wars - an MIS viewIWMW 2002: Avoiding Portal Wars - an MIS view
IWMW 2002: Avoiding Portal Wars - an MIS viewIWMW
 
What does open source mean for the institutional web manager?
What does open source mean for the institutional web manager?What does open source mean for the institutional web manager?
What does open source mean for the institutional web manager?IWMW
 
Library 2.0
Library 2.0Library 2.0
Library 2.0IWMW
 
Social participation in student recruitment
Social participation in student recruitmentSocial participation in student recruitment
Social participation in student recruitmentIWMW
 
Supporting Institutions in Changing Times: Manifesto
Supporting Institutions in Changing Times: ManifestoSupporting Institutions in Changing Times: Manifesto
Supporting Institutions in Changing Times: ManifestoIWMW
 
IWMW 2019 photo scavenger hunt highlights
IWMW 2019 photo scavenger hunt highlightsIWMW 2019 photo scavenger hunt highlights
IWMW 2019 photo scavenger hunt highlightsIWMW
 
How to Turn a Web Strategy into Web Services
How to Turn a Web Strategy into Web ServicesHow to Turn a Web Strategy into Web Services
How to Turn a Web Strategy into Web ServicesIWMW
 
Static Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionStatic Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionIWMW
 
Looking to the Future
Looking to the FutureLooking to the Future
Looking to the FutureIWMW
 
Looking to the Future
Looking to the FutureLooking to the Future
Looking to the FutureIWMW
 
Developing Communities of Practice
Developing Communities of PracticeDeveloping Communities of Practice
Developing Communities of PracticeIWMW
 
How to train your content- so it doesn't slow you down...
How to train your content- so it doesn't slow you down... How to train your content- so it doesn't slow you down...
How to train your content- so it doesn't slow you down... IWMW
 
Grassroots & Guerrillas: The Beginnings of a UX Revolution
Grassroots & Guerrillas: The Beginnings of a UX RevolutionGrassroots & Guerrillas: The Beginnings of a UX Revolution
Grassroots & Guerrillas: The Beginnings of a UX RevolutionIWMW
 
Connecting Your Content: How to Save Time and Improve Content Quality through...
Connecting Your Content: How to Save Time and Improve Content Quality through...Connecting Your Content: How to Save Time and Improve Content Quality through...
Connecting Your Content: How to Save Time and Improve Content Quality through...IWMW
 

More from IWMW (20)

Look who's talking now
Look who's talking nowLook who's talking now
Look who's talking now
 
Introduction to IWMW 2000 (Liz Lyon)
Introduction to IWMW 2000 (Liz Lyon)Introduction to IWMW 2000 (Liz Lyon)
Introduction to IWMW 2000 (Liz Lyon)
 
Web Tools report
Web Tools reportWeb Tools report
Web Tools report
 
Personal Contingency Plan - Beat The Panic
Personal Contingency Plan - Beat The PanicPersonal Contingency Plan - Beat The Panic
Personal Contingency Plan - Beat The Panic
 
Whose site is it anyway?
Whose site is it anyway?Whose site is it anyway?
Whose site is it anyway?
 
Open Source - the case against
Open Source - the case againstOpen Source - the case against
Open Source - the case against
 
IWMW 2002: Avoiding Portal Wars - an MIS view
IWMW 2002: Avoiding Portal Wars - an MIS viewIWMW 2002: Avoiding Portal Wars - an MIS view
IWMW 2002: Avoiding Portal Wars - an MIS view
 
What does open source mean for the institutional web manager?
What does open source mean for the institutional web manager?What does open source mean for the institutional web manager?
What does open source mean for the institutional web manager?
 
Library 2.0
Library 2.0Library 2.0
Library 2.0
 
Social participation in student recruitment
Social participation in student recruitmentSocial participation in student recruitment
Social participation in student recruitment
 
Supporting Institutions in Changing Times: Manifesto
Supporting Institutions in Changing Times: ManifestoSupporting Institutions in Changing Times: Manifesto
Supporting Institutions in Changing Times: Manifesto
 
IWMW 2019 photo scavenger hunt highlights
IWMW 2019 photo scavenger hunt highlightsIWMW 2019 photo scavenger hunt highlights
IWMW 2019 photo scavenger hunt highlights
 
How to Turn a Web Strategy into Web Services
How to Turn a Web Strategy into Web ServicesHow to Turn a Web Strategy into Web Services
How to Turn a Web Strategy into Web Services
 
Static Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource ConditionStatic Site Generators - Developing Websites in Low-resource Condition
Static Site Generators - Developing Websites in Low-resource Condition
 
Looking to the Future
Looking to the FutureLooking to the Future
Looking to the Future
 
Looking to the Future
Looking to the FutureLooking to the Future
Looking to the Future
 
Developing Communities of Practice
Developing Communities of PracticeDeveloping Communities of Practice
Developing Communities of Practice
 
How to train your content- so it doesn't slow you down...
How to train your content- so it doesn't slow you down... How to train your content- so it doesn't slow you down...
How to train your content- so it doesn't slow you down...
 
Grassroots & Guerrillas: The Beginnings of a UX Revolution
Grassroots & Guerrillas: The Beginnings of a UX RevolutionGrassroots & Guerrillas: The Beginnings of a UX Revolution
Grassroots & Guerrillas: The Beginnings of a UX Revolution
 
Connecting Your Content: How to Save Time and Improve Content Quality through...
Connecting Your Content: How to Save Time and Improve Content Quality through...Connecting Your Content: How to Save Time and Improve Content Quality through...
Connecting Your Content: How to Save Time and Improve Content Quality through...
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 

IWMW 2006: Archiving the Web What can Institutions learn from National and International Web Archiving Initiatives (1)

  • 1. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by- nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Archiving the Web: What can Institutions learn from National and International Web Archiving Initiatives IWMW 2006, University of Bath, 15 June 2006 Maureen Pennock Michael Day Lizzie Richmond UKOLN University of Bath UKOLN University of Bath University of Bath
  • 2. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Today’s workshop • Records Management and the web: • Key RM principles • Justification for archiving web-based records • Breakout 1 - to discuss the types of record found on the web • An archivist's perspective: • Authenticity, accessibility, security, legal compliance • Breakout 2 - to discuss drivers and barriers • An overview of selected national and international web archiving initiatives: • Breakout 3 - to develop approaches to preserving web sites • Feedback
  • 3. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Web-Based Records
  • 4. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Philosophy • Archiving web sites & web-based records requires collaboration from all stakeholders, including records managers, but also IT managers, web-project managers, webmasters, content editors, content providers, and even senior management, across the entire life-cycle of the records • BUT … there is a difference in approaches between archiving websites and archiving web-based records
  • 5. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 What is a record? • BS ISO 15489 definition: “any information that is created, received and maintained as evidence and information by an organisation or person in pursuance of legal obligations or in the transaction of business” • Evidence of a transaction • Anything that: • documents a working transaction between two or more parties • documents the mission and goals of an organisation • was created or received in the course of carrying out the mission and goals of an organisation
  • 6. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Key Records Management issues • Proper care and management of records throughout their entire life-cycle • Not all data has to be retained • Legal information obligations must be met • Organisational retention schedules - identifies record classes of concern • Different records and record classes have different retention periods • Metadata must be stored with records • Disposal and destruction processes Leads to archival and long-term storage for some records
  • 7. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Why archive website ‘records’ • Records are increasingly posted on the web • Uniquely available informative records • Users may act or take decisions based on this information, with important consequences • Records of business transactions • Accountability & transparency • To funding bodies • To stakeholders • For legal reasons • Historical and culturally valuable
  • 8. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Breakout 1 • Discuss and identify the types of records that can appear on the Web – e.g.: • Reports, policy documents etc • Information – submission dates, pricing etc • Discuss and identify the forms can they take – e.g.: • Text-based files • Web-forms
  • 9. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Feedback I
  • 10. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Archiving the Web An (inexperienced) archivist's perspective
  • 11. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 More definitions… • Records management: “…the field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use and disposition of records, including processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records.” (BS ISO 15489 - 2001) • Archives: “…documents, irrespective of form, medium or age, intended for long-term preservation because of their continuing value.” (BS 5454 - 2000)
  • 12. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Authenticity: • Must be demonstrably reliable as proof • Creation and capture • Metadata and context • Ownership/responsibility • Version control • Cataloguing standards What we want from our records and archives …
  • 13. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Accessibility: • Must be capable of use over time • Locate, retrieve and display • File plans, naming conventions • Obsolescence • Migration strategy • Reduced functionality? What we want from our records and archives …
  • 14. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Security: • Must be protected • Physical damage and unauthorised access • Robust destruction procedures • Intellectual control • Storage environment • Disaster plan What we want from our records and archives …
  • 15. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Legal compliance: • Must not break the law • Freedom of Information Act 2000 • Data Protection Act 1998 • Copyright issues? • Defence against litigation • Legal admissibility What we want from our records and archives …
  • 16. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Breakout 2 • What are the main drivers for archiving web- based records? • Discuss and identify as many challenges or barriers to archiving web-based records as you can: • Technical barriers • Cultural barriers • Socio-economic barriers • Organisational barriers
  • 17. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Feedback II
  • 18. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Current Approaches to Archiving the Web National and International Initiatives
  • 19. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Some basics • Not all web archives are organised on a records management basis • Most web archiving initiatives: • Emphasise the informational value of the web as a cultural phenomenon or communication medium • Highlight the transience of content • Focus largely on collecting content, less on providing long- term access (or preservation) • Have collection strategies that are based on what can be automatically captured from the client side • Have problems with the deep (or hidden) web, i.e. those driven by databases or otherwise interactive … so what about Web 2.0? • Tend to ignore differences in type categories or formats • Have significant legal problems with providing access
  • 20. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Approaches to collection • Broadly four main collecting approaches (not mutually exclusive): • Domain capture (harvesting) • Using specialised crawler programs to collect sites within national (or other) identifiable domains • Often based on the 'national' web domain • Can usually only deal with the surface web • Selective capture (harvesting) • Capturing selected web sites on a given frequency • Can usually only deal with the surface web • Selective capture (conversion or re-engineering) • Typically requires access at the server-side • Can deal with the deep web • Deposit by website owner
  • 21. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Two main models • Harvesting model • Used by national and research libraries, university special collections (e.g., DACHS) and the Internet Archive • Records management model • Addresses the issues raised earlier in this session • May be more appropriate for specific institutional records …
  • 22. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Some Examples …
  • 23. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Internet Archive • Non-profit organisation, based in US • Wants to offer permanent access to digital online materials of all types • Founded in 1996, has been collecting since then … much content donated by Alexa Internet • Collects sites by crawling and harvesting web sites • Sites can 'opt out' by way of robots.txt file on the web server • Most content is freely available to the public, e.g. through the Wayback Machine • Interface issues: only the URL indicates that the page is archived • Website: http://www.archive.org/
  • 24. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 National Library of Australia • The PANDORA Archive • Builds on existing NLA collection policies • Provides long-term access to selected online publications and websites • Permission is sought from site owners in advance • PANDAS (v3) –PANDORA Digital Archiving System • Open Source Software used for managing the process of gathering, archiving and publishing website resources • Offers end-to-end archiving workflow • Supports modularity: currently mostly used with HTTrack, but other harvester programs can be plugged-in • Assigns persistent identifiers and metadata to each item when registered • Website: http://pandora.nla.gov.au/
  • 25. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 UK WAC • UK Web Archiving Consortium (6 members) • British Library, National Library of Scotland, National Library of Wales, The National Archives, Wellcome Library, JISC • Collects Web content selectively • Uses modified PANDAS collection/harvesting software developed by the National Library of Australia • Underlying harvesting program is currently HTTrack • Permission is sought from site owners in advance • The collections are publicly accessible • Persistent Identifier URLs • Central repository of metadata • Single partner assumes responsibility for each site • Website: http://www.webarchive.org.uk/
  • 26. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Nordic Web Archive • A collaboration between the Nordic national libraries (Denmark, Finland, Iceland, Norway, Sweden) • Considerable expertise available: • For example, the Swedish Royal Library pioneered the national domain capture approach • Main focus on developing access tools • NWA Toolset (open source) • Work now taken forward as part of the WERA viewer application developed as part of the International Internet Preservation Consortium • Website: http://nwa.nb.no/
  • 27. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 IIPC (1) • International Internet Preservation Consortium • Builds co-operation between the Internet Archive and national and research libraries • Co-ordinated by the Bibliothèque nationale de France • The British Library is the only current UK member, other national library partners include the Library of Congress, the Library and Archives Canada and the national libraries of Australia, Denmark, Finland, Iceland, Italy, Norway and Sweden • Reflects those with current experience of Web archiving • Both working-groups and tool development • Phase II will enable new partners to join the consortium • Website: http://netpreserve.org/
  • 28. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 IIPC (2) • Phase I - developing the IIPC toolkit • Standards and tools for supporting: • Acquisition - archival quality crawler (Heritrix); portable database extraction and migration tool for database- driven deep web sites (DeepARC) • Managing collections - analytical and prioritization tools for automatically focusing harvesting; curation tools to provide a non-technical interface for selecting, monitoring and verifying archived web sites • Collection storage and maintenance - tools for manipulating formats; a standardised storage format (WARC), standards for metadata • Access and finding aids - browse interfaces (WERA) and search facilities (NutchWAX)
  • 29. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 The National Archives (UK) • Managing web resources (December 2001) • ERM toolkit for government agencies • Practical steps for active records management and sustainability • Useful identification of web-based records • Scenarios • How websites differ from other records • Management control mechanisms • Model action plan • Sustainability • Website: http://www.nationalarchives.gov.uk/
  • 30. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 National Archives of Australia • A Policy for keeping records of web-based activity (January 2001) • Provides clear directions to Commonwealth agencies to implement mechanisms for creating, managing and retaining web-based records of value • Guidelines (March 2001) • Challenges and responsibilities • Types of web-based resources • Fundamentals of good record-keeping • Assessing risk – factors to consider • Strategic & technical options • Storage & preservation - issues & strategies • Determining the best option
  • 31. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Managing web-based records • Fundamentals: • Information Audit and Risk Assessment • A systematic approach • Develop policy • Formulate plan for capture, maintenance, and preservation • Implement appropriate website maintenance procedures • Assign and document responsibilities • Identify records • Determine retention requirements • Capture records into recordkeeping system • Add metadata • Transfer content and metadata into archive as appropriate * Based on NAA Guidelines for Archiving Web Resources
  • 32. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Breakout 3 • Scenarios for each group • Read brief • Identify main actions for each stage of life-cycle that play a role in archiving web-based resources • Identify aspects of a successful long-term preservation strategy • What aspects of a harvesting model could be of use? How? Why? • What other technical development is needed?
  • 33. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Feedback III Your approach?
  • 34. a centre of expertise in data curation and preservation Archiving Web-based records IWMW 2006 15 June 2006 Go forth and archive! Maureen Pennock M.Pennock@ukoln.ac.uk Michael Day M.Day@ukoln.ac.uk Lizzie Richmond L.Richmond@bath.ac.uk