SlideShare a Scribd company logo
1 of 14
Future of Web Archiving
Stephen Abrams
California Digital Library
Martin Klein
Los Alamos National Laboratory
Jimmy Lin
University of Maryland
Michael Nelson
Old Dominion University
Digital Preservation 2014, Washington, July 22-24
www.flickr.com/photos/adesigna/4090782772
Agenda
Web archiving problems and opportunities
Memento tools
WarcBase platform
Assessing quality of archives
Discussion
Agenda
 Web archiving problems and opportunities
 Memento tools
 WarcBase platform
 Assessing quality of archives
 Discussion
Web archiving is important but (really) hard
 Why web archiving?
Continuation of longstanding mission to
collect, preserve, and provide access to the
scholarly record and our cultural heritage
Publishing/dissemination platform of
choice
 But …
www.flickr.com/photos/alaig/3522953697
www.flickr.com/photos/hier_gibt_es_nichts_zu_sehen_bitte_gehen_sie_weiter/840587382
the web isn’t the web anymore
Web in transition
Document retrieval
Document viewer
HTML
Common
Desktop
Information
Programming environment
Virtual machine
JavaScript
Personalized
Mobile/handheld/wearable
Things
www.flickr.com/photos/swamibu/2223726960 www.flickr.com/photos/sharples/79222765
A “web” of notes with links (like
references) between them …”
– Tim Berners-Lee, March 1989
(Some) other issues
 Crawlers don’t act like browsers
► Need robots that act more like people
www.flickr.com/photos/benhusmann/5126030385
(Some) other issues
 Crawlers don’t act like browsers
 Responsiveness to time-sensitive content
► Need to bypass v-e-r-y deliberate collection development
procedures
Gaurdian News and Media Limited
www.flickr.com/photos/vblibrary/7414544704
(Some) other issues
 Crawlers don’t act like browsers
 Responsiveness to time-sensitive content
 Policies, rights, and permissions
► Need to overcome legal barriers that follow the
monetization of content
www.flickr.com/photos/21664580@N04/2095574414
into traditional management
(Some) other issues
 Crawlers don’t act like browsers
 Responsiveness to time-sensitive content
 Policies, rights, and permissions
 Difficult integration into traditional management
and discovery services
► Leading to …
(Some) other issues
 Crawlers don’t act like browsers
 Responsiveness to time-sensitive content
 Policies, rights, and permissions
 Difficult integration into traditional management
and discovery services
 Siloed collections
www.flickr.com/photos/54159370@N08/7148880783
(Some) other issues
 Crawlers don’t act like browsers
 Responsiveness to time-sensitive content
 Policies, rights, and permissions
 Difficult integration into traditional management
and discovery services
 Siloed collections
 Scale
► Storage capacity
► Full-text indexing
► De-duplication
► Resources
Raiders of the Lost Ark © Paramount Pictures
Supporting research
 Little awareness in the scholarly community
 Poorly understood use cases
 Few tools
 Traditional find→download→manipulate locally
workflows may not be feasible at web scale
► Need APIs and business models for in situ analysis
berkeley.edu/teach www.flickr.com/photos/infocux/8450190120
www.flickr.com/photos/bartelomeus/4184705426
Browsing the past should be as
simple and intuitive as the now
Better discovery modalities
www.flickr.com/photos/shebalso/6357626617
mechanisms
Technological opportunities
 Better capture mechanisms
► Headless browsers
► API harvesters
…
 Better discovery modalities
► Browsing the past should be as
simple and intuitive as the now
…
Cooperative opportunities
 Complementary collection development
 Coordinated infrastructure support and operation
► Or perhaps centralized – a HathiTrust for web archives?
 Crowd sourcing selection, description, quality
assurance
www.flickr.com/photos/chiotsrun/4115059294 www.flickr.com/photos/sagesolar/9230445157
And now …
cdn.ws.citrix.com/wp-content/uploads/2012/05/iStock_000010348904XSmall.jpg

More Related Content

What's hot

Deepak semantic web_iitd
Deepak semantic web_iitdDeepak semantic web_iitd
Deepak semantic web_iitdDeepak Shevani
 
Innovative Interfaces: making the most of the data we have
Innovative Interfaces: making the most of the data we haveInnovative Interfaces: making the most of the data we have
Innovative Interfaces: making the most of the data we haveWinona Salesky
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 
Towards social webtops using semantic wiki
Towards social webtops using semantic wikiTowards social webtops using semantic wiki
Towards social webtops using semantic wikiJie Bao
 
Access 2005 Tagging
Access 2005 TaggingAccess 2005 Tagging
Access 2005 TaggingDaniele
 
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic DataNCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic DataNebraska Library Commission
 
Linked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for ArchivesLinked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for ArchivesAdrian Stevenson
 
DPOE Managing Digital Content over Time: Identify Module Resources
DPOE Managing Digital Content over Time: Identify Module ResourcesDPOE Managing Digital Content over Time: Identify Module Resources
DPOE Managing Digital Content over Time: Identify Module ResourcesNebraska Library Commission
 
Discover the invisible web
Discover the invisible webDiscover the invisible web
Discover the invisible webdrakowski
 
Building a Single User Experience
Building a Single User ExperienceBuilding a Single User Experience
Building a Single User ExperienceRachel Vacek
 
The Web, the User and the Library
The Web, the User and the LibraryThe Web, the User and the Library
The Web, the User and the LibraryGuus van den Brekel
 
Considerations for Your Mobile Library
Considerations for Your Mobile LibraryConsiderations for Your Mobile Library
Considerations for Your Mobile LibraryRachel Vacek
 

What's hot (16)

Web RDF
Web RDFWeb RDF
Web RDF
 
Dash: data sharing made easy
Dash: data sharing made easyDash: data sharing made easy
Dash: data sharing made easy
 
Deepak semantic web_iitd
Deepak semantic web_iitdDeepak semantic web_iitd
Deepak semantic web_iitd
 
Innovative Interfaces: making the most of the data we have
Innovative Interfaces: making the most of the data we haveInnovative Interfaces: making the most of the data we have
Innovative Interfaces: making the most of the data we have
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
Towards social webtops using semantic wiki
Towards social webtops using semantic wikiTowards social webtops using semantic wiki
Towards social webtops using semantic wiki
 
Access 2005 Tagging
Access 2005 TaggingAccess 2005 Tagging
Access 2005 Tagging
 
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic DataNCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
 
Linked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for ArchivesLinked Open Data: Opportunities & Barriers for Archives
Linked Open Data: Opportunities & Barriers for Archives
 
Stahmer-9-Jun15-final
Stahmer-9-Jun15-finalStahmer-9-Jun15-final
Stahmer-9-Jun15-final
 
DPOE Managing Digital Content over Time: Identify Module Resources
DPOE Managing Digital Content over Time: Identify Module ResourcesDPOE Managing Digital Content over Time: Identify Module Resources
DPOE Managing Digital Content over Time: Identify Module Resources
 
Discover the invisible web
Discover the invisible webDiscover the invisible web
Discover the invisible web
 
Ucmp 20150407
Ucmp 20150407Ucmp 20150407
Ucmp 20150407
 
Building a Single User Experience
Building a Single User ExperienceBuilding a Single User Experience
Building a Single User Experience
 
The Web, the User and the Library
The Web, the User and the LibraryThe Web, the User and the Library
The Web, the User and the Library
 
Considerations for Your Mobile Library
Considerations for Your Mobile LibraryConsiderations for Your Mobile Library
Considerations for Your Mobile Library
 

Viewers also liked

Viewers also liked (6)

Design Principles for Digital Preservation Systems
Design Principles for Digital Preservation SystemsDesign Principles for Digital Preservation Systems
Design Principles for Digital Preservation Systems
 
Nielsen global mobile report
Nielsen global mobile reportNielsen global mobile report
Nielsen global mobile report
 
EZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data CitationEZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data Citation
 
Dataset Identification and Citation
Dataset Identification and CitationDataset Identification and Citation
Dataset Identification and Citation
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
 
Plan for digital financial service in timor leste
Plan for digital financial service in timor lestePlan for digital financial service in timor leste
Plan for digital financial service in timor leste
 

Similar to Future of web archiving

Introduction To Linked Data
Introduction To Linked DataIntroduction To Linked Data
Introduction To Linked DataLeigh Dodds
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01Richard Nurse
 
Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web ArchivingAnna Perricci
 
Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingKristen Yarmey
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everythinglibrarywebchic
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futureslisld
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless OpportunityRachel Frick
 
Trekking through the world of information
Trekking through the world of informationTrekking through the world of information
Trekking through the world of informationKristin Hokanson
 
web 2.0, library systems and the library system
web 2.0, library systems and the library systemweb 2.0, library systems and the library system
web 2.0, library systems and the library systemlisld
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Michael Nelson
 
Rich Media Hoarders session for 24HourPhotoshop
Rich Media Hoarders session for 24HourPhotoshopRich Media Hoarders session for 24HourPhotoshop
Rich Media Hoarders session for 24HourPhotoshopExtensis
 
Too Late for the Library Catalog? Inconceivable!
Too Late for the Library Catalog? Inconceivable!Too Late for the Library Catalog? Inconceivable!
Too Late for the Library Catalog? Inconceivable!Courtney McDonald
 
Beyond MARC: BIBFRAME and the Future of Bibliographic Data
Beyond MARC: BIBFRAME and the Future of Bibliographic DataBeyond MARC: BIBFRAME and the Future of Bibliographic Data
Beyond MARC: BIBFRAME and the Future of Bibliographic DataEmily Nimsakont
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC
 
Digital library services and the changing environment
Digital library services and the changing environmentDigital library services and the changing environment
Digital library services and the changing environmentJohn MacColl
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museumsmherbison
 

Similar to Future of web archiving (20)

Introduction To Linked Data
Introduction To Linked DataIntroduction To Linked Data
Introduction To Linked Data
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01
 
Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web Archiving
 
Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web Archiving
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless Opportunity
 
Trekking through the world of information
Trekking through the world of informationTrekking through the world of information
Trekking through the world of information
 
web 2.0, library systems and the library system
web 2.0, library systems and the library systemweb 2.0, library systems and the library system
web 2.0, library systems and the library system
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Rich Media Hoarders session for 24HourPhotoshop
Rich Media Hoarders session for 24HourPhotoshopRich Media Hoarders session for 24HourPhotoshop
Rich Media Hoarders session for 24HourPhotoshop
 
NASA and PHP
NASA and PHPNASA and PHP
NASA and PHP
 
Too Late for the Library Catalog? Inconceivable!
Too Late for the Library Catalog? Inconceivable!Too Late for the Library Catalog? Inconceivable!
Too Late for the Library Catalog? Inconceivable!
 
Beyond MARC: BIBFRAME and the Future of Bibliographic Data
Beyond MARC: BIBFRAME and the Future of Bibliographic DataBeyond MARC: BIBFRAME and the Future of Bibliographic Data
Beyond MARC: BIBFRAME and the Future of Bibliographic Data
 
1330 mon dochart2 brock
1330 mon dochart2 brock1330 mon dochart2 brock
1330 mon dochart2 brock
 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
 
DMPTool Webinar 2: Data Management Resources You Can Use
DMPTool Webinar 2: Data Management Resources You Can UseDMPTool Webinar 2: Data Management Resources You Can Use
DMPTool Webinar 2: Data Management Resources You Can Use
 
Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)
 
Digital library services and the changing environment
Digital library services and the changing environmentDigital library services and the changing environment
Digital library services and the changing environment
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 

More from University of California Curation Center

ETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of CaliforniaETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of CaliforniaUniversity of California Curation Center
 
The UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing ResearchThe UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing ResearchUniversity of California Curation Center
 

More from University of California Curation Center (20)

ETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of CaliforniaETDs: Electronic Thesis and Dissertation Service at the University of California
ETDs: Electronic Thesis and Dissertation Service at the University of California
 
Dash UCCSC 2016
Dash UCCSC 2016Dash UCCSC 2016
Dash UCCSC 2016
 
Uc3 ucacc-2015-11-16
Uc3 ucacc-2015-11-16Uc3 ucacc-2015-11-16
Uc3 ucacc-2015-11-16
 
CDL research lifecycle
CDL research lifecycleCDL research lifecycle
CDL research lifecycle
 
What does "data publication" mean to researchers?
What does "data publication" mean to researchers?What does "data publication" mean to researchers?
What does "data publication" mean to researchers?
 
Researcher perspectives on publication and peer review of data.
Researcher perspectives on publication and peer review of data.Researcher perspectives on publication and peer review of data.
Researcher perspectives on publication and peer review of data.
 
Enhancing DMPTool: Further Streamlineing Data Mangement Planning Process
Enhancing DMPTool: Further Streamlineing Data Mangement Planning ProcessEnhancing DMPTool: Further Streamlineing Data Mangement Planning Process
Enhancing DMPTool: Further Streamlineing Data Mangement Planning Process
 
DataShare: Empowering Researcher Data Curation
DataShare: Empowering Researcher Data CurationDataShare: Empowering Researcher Data Curation
DataShare: Empowering Researcher Data Curation
 
Data preservation 101
Data preservation 101Data preservation 101
Data preservation 101
 
Creating superior data management plans with the DMPTool
Creating superior data management plans with the DMPToolCreating superior data management plans with the DMPTool
Creating superior data management plans with the DMPTool
 
ESA Ignite talk on the DMPTool by S Abrams
ESA Ignite talk on the DMPTool by S AbramsESA Ignite talk on the DMPTool by S Abrams
ESA Ignite talk on the DMPTool by S Abrams
 
DMPTool2 Webinar #1 for Administrators
DMPTool2 Webinar #1 for AdministratorsDMPTool2 Webinar #1 for Administrators
DMPTool2 Webinar #1 for Administrators
 
DMPTool2 Administrator Webinar #2
DMPTool2 Administrator Webinar #2DMPTool2 Administrator Webinar #2
DMPTool2 Administrator Webinar #2
 
DataShare for UC Campuses
DataShare for UC CampusesDataShare for UC Campuses
DataShare for UC Campuses
 
Helping librarians use the DMPTool as a centerpiece for data management
Helping librarians use the DMPTool as a centerpiece for data managementHelping librarians use the DMPTool as a centerpiece for data management
Helping librarians use the DMPTool as a centerpiece for data management
 
The UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing ResearchThe UC Curation Center (UC3): Developing Tools & Services for Managing Research
The UC Curation Center (UC3): Developing Tools & Services for Managing Research
 
Dataset Metadata Publication Through EZID
Dataset Metadata Publication Through EZIDDataset Metadata Publication Through EZID
Dataset Metadata Publication Through EZID
 
DMPTool2: Improvements and Outreach
DMPTool2: Improvements and Outreach DMPTool2: Improvements and Outreach
DMPTool2: Improvements and Outreach
 
DMPTool Webinar 11: Complementary Tools
DMPTool Webinar 11: Complementary ToolsDMPTool Webinar 11: Complementary Tools
DMPTool Webinar 11: Complementary Tools
 
DMPTool Webinar 10: More Extensive DMPs
DMPTool Webinar 10: More Extensive DMPsDMPTool Webinar 10: More Extensive DMPs
DMPTool Webinar 10: More Extensive DMPs
 

Recently uploaded

Bobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdfBobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdflunavro0105
 
Olivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxOlivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxLauraFagan6
 
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Servicedoor45step
 
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call GirlsPragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call GirlsJagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Russian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts Servicedoor45step
 
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call GirlsKarol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Zagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdfZagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdfStripovizijacom
 
Kristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design PortfolioKristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design PortfolioKristySoto
 
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiMalviyaNagarCallGirl
 
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...dajasot375
 
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlDxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlYinisingh
 
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineSHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineShivna Prakashan
 
Clines Corners Travel Center, Curio Shop, Clines Corners NM
Clines Corners Travel Center, Curio Shop, Clines Corners NMClines Corners Travel Center, Curio Shop, Clines Corners NM
Clines Corners Travel Center, Curio Shop, Clines Corners NMroute66connected
 
Triangle Vinyl Record Store, Clermont Florida
Triangle Vinyl Record Store, Clermont FloridaTriangle Vinyl Record Store, Clermont Florida
Triangle Vinyl Record Store, Clermont FloridaGabrielaMiletti
 
Pow Wow Inn, Motel/Residence, Tucumcari NM
Pow Wow Inn, Motel/Residence, Tucumcari NMPow Wow Inn, Motel/Residence, Tucumcari NM
Pow Wow Inn, Motel/Residence, Tucumcari NMroute66connected
 
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 60009654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000Sapana Sha
 
Aiims Call Girls : ☎ 8527673949, Low rate Call Girls
Aiims Call Girls : ☎ 8527673949, Low rate Call GirlsAiims Call Girls : ☎ 8527673949, Low rate Call Girls
Aiims Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?
How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?
How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?kexey39068
 
Laxmi Nagar Call Girls : ☎ 8527673949, Low rate Call Girls
Laxmi Nagar Call Girls : ☎ 8527673949, Low rate Call GirlsLaxmi Nagar Call Girls : ☎ 8527673949, Low rate Call Girls
Laxmi Nagar Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 

Recently uploaded (20)

Bobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdfBobbie goods colorinsssssssssssg book.pdf
Bobbie goods colorinsssssssssssg book.pdf
 
Olivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxOlivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptx
 
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 39 Noida✨8375860717⚡Escorts Service
 
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call GirlsPragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
 
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call GirlsJagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
 
Russian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts ServiceRussian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts Service
Russian⚡ Call Girls In Sector 104 Noida✨8375860717⚡Escorts Service
 
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call GirlsKarol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
Karol Bagh Call Girls : ☎ 8527673949, Low rate Call Girls
 
Zagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdfZagor VČ OP 055 - Oluja nad Haitijem.pdf
Zagor VČ OP 055 - Oluja nad Haitijem.pdf
 
Kristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design PortfolioKristy Soto's Industrial design Portfolio
Kristy Soto's Industrial design Portfolio
 
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
 
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
 
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call GirlDxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
Dxb Call Girl +971509430017 Indian Call Girl in Dxb By Dubai Call Girl
 
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineSHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
 
Clines Corners Travel Center, Curio Shop, Clines Corners NM
Clines Corners Travel Center, Curio Shop, Clines Corners NMClines Corners Travel Center, Curio Shop, Clines Corners NM
Clines Corners Travel Center, Curio Shop, Clines Corners NM
 
Triangle Vinyl Record Store, Clermont Florida
Triangle Vinyl Record Store, Clermont FloridaTriangle Vinyl Record Store, Clermont Florida
Triangle Vinyl Record Store, Clermont Florida
 
Pow Wow Inn, Motel/Residence, Tucumcari NM
Pow Wow Inn, Motel/Residence, Tucumcari NMPow Wow Inn, Motel/Residence, Tucumcari NM
Pow Wow Inn, Motel/Residence, Tucumcari NM
 
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 60009654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
 
Aiims Call Girls : ☎ 8527673949, Low rate Call Girls
Aiims Call Girls : ☎ 8527673949, Low rate Call GirlsAiims Call Girls : ☎ 8527673949, Low rate Call Girls
Aiims Call Girls : ☎ 8527673949, Low rate Call Girls
 
How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?
How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?
How Can You Get Dubai Call Girls +971564860409 Call Girls Dubai?
 
Laxmi Nagar Call Girls : ☎ 8527673949, Low rate Call Girls
Laxmi Nagar Call Girls : ☎ 8527673949, Low rate Call GirlsLaxmi Nagar Call Girls : ☎ 8527673949, Low rate Call Girls
Laxmi Nagar Call Girls : ☎ 8527673949, Low rate Call Girls
 

Future of web archiving

  • 1. Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy Lin University of Maryland Michael Nelson Old Dominion University Digital Preservation 2014, Washington, July 22-24
  • 2. www.flickr.com/photos/adesigna/4090782772 Agenda Web archiving problems and opportunities Memento tools WarcBase platform Assessing quality of archives Discussion Agenda  Web archiving problems and opportunities  Memento tools  WarcBase platform  Assessing quality of archives  Discussion
  • 3. Web archiving is important but (really) hard  Why web archiving? Continuation of longstanding mission to collect, preserve, and provide access to the scholarly record and our cultural heritage Publishing/dissemination platform of choice  But … www.flickr.com/photos/alaig/3522953697 www.flickr.com/photos/hier_gibt_es_nichts_zu_sehen_bitte_gehen_sie_weiter/840587382 the web isn’t the web anymore
  • 4. Web in transition Document retrieval Document viewer HTML Common Desktop Information Programming environment Virtual machine JavaScript Personalized Mobile/handheld/wearable Things www.flickr.com/photos/swamibu/2223726960 www.flickr.com/photos/sharples/79222765 A “web” of notes with links (like references) between them …” – Tim Berners-Lee, March 1989
  • 5. (Some) other issues  Crawlers don’t act like browsers ► Need robots that act more like people www.flickr.com/photos/benhusmann/5126030385
  • 6. (Some) other issues  Crawlers don’t act like browsers  Responsiveness to time-sensitive content ► Need to bypass v-e-r-y deliberate collection development procedures Gaurdian News and Media Limited
  • 7. www.flickr.com/photos/vblibrary/7414544704 (Some) other issues  Crawlers don’t act like browsers  Responsiveness to time-sensitive content  Policies, rights, and permissions ► Need to overcome legal barriers that follow the monetization of content
  • 8. www.flickr.com/photos/21664580@N04/2095574414 into traditional management (Some) other issues  Crawlers don’t act like browsers  Responsiveness to time-sensitive content  Policies, rights, and permissions  Difficult integration into traditional management and discovery services ► Leading to …
  • 9. (Some) other issues  Crawlers don’t act like browsers  Responsiveness to time-sensitive content  Policies, rights, and permissions  Difficult integration into traditional management and discovery services  Siloed collections www.flickr.com/photos/54159370@N08/7148880783
  • 10. (Some) other issues  Crawlers don’t act like browsers  Responsiveness to time-sensitive content  Policies, rights, and permissions  Difficult integration into traditional management and discovery services  Siloed collections  Scale ► Storage capacity ► Full-text indexing ► De-duplication ► Resources Raiders of the Lost Ark © Paramount Pictures
  • 11. Supporting research  Little awareness in the scholarly community  Poorly understood use cases  Few tools  Traditional find→download→manipulate locally workflows may not be feasible at web scale ► Need APIs and business models for in situ analysis berkeley.edu/teach www.flickr.com/photos/infocux/8450190120
  • 12. www.flickr.com/photos/bartelomeus/4184705426 Browsing the past should be as simple and intuitive as the now Better discovery modalities www.flickr.com/photos/shebalso/6357626617 mechanisms Technological opportunities  Better capture mechanisms ► Headless browsers ► API harvesters …  Better discovery modalities ► Browsing the past should be as simple and intuitive as the now …
  • 13. Cooperative opportunities  Complementary collection development  Coordinated infrastructure support and operation ► Or perhaps centralized – a HathiTrust for web archives?  Crowd sourcing selection, description, quality assurance www.flickr.com/photos/chiotsrun/4115059294 www.flickr.com/photos/sagesolar/9230445157

Editor's Notes

  1. Checklist, https://www.flickr.com/photos/adesigna/4090782772
  2. First of all, why is web archiving important? As members of memory institutions, it is the continuation in a new technological context of our longstanding mission and obligation to collect, preserve, and provide access to the scholar record and our collective cultural heritage. Since the web is where the content is, that is where we have to go to acquire it. But the fundamental problem is that the web is not web. As soon as you think you have quantified or characterized it, it has changed into something else; and as soon as you have processes in place to capture web content, the content is not available in the same way. What a tangled web we weave, https://www.flickr.com/photos/alaig/3522953697 Thorsten Hartmann, Untitles, https://www.flickr.com/photos/hier_gibt_es_nichts_zu_sehen_bitte_gehen_sie_weiter/840587382
  3. It’s different than what anyone – Tim Berners-Lee included – had in mind 25 years ago The web is no longer giant document retrieval system, but a programming environment The browser is no longer a document view, but a general purpose virtual machine; its fundamental language is no longer HTML but JavaScript. The mode of experience has shifted from a common to a highly personalized one; whose web are we archiving? Crumbled paper, https://www.flickr.com/photos/84564583@N08/11167321155 The great pyramid: Size matters, https://www.flickr.com/photos/swamibu/2223726960 A pile of rocks, https://www.flickr.com/photos/sharples/79222765
  4. Paywalls, robot exclusions, crawler traps, … What we need is a collection mechanism that acts like a person Ben Husmann, The FREE HUGS robot says "I am here for you“, https://www.flickr.com/photos/benhusmann/5126030385 Event-driven content doesn’t mesh well with established – meaning v-e-r-y deliberate – collection development processes Search is simple if you know the URL
  5. Event-driven content doesn’t mesh well with established – meaning v-e-r-y deliberate – collection development processes Hossam el-Hamalawy, Tahrir Square, https://www.flickr.com/photos/elhamalawy/6378330927
  6. U Can’t Touch This, https://www.flickr.com/photos/vblibrary/7414544704
  7. Dan Storey, Square peg in a round hole, https://www.flickr.com/photos/21664580@N04/2095574414
  8. Silos, https://www.flickr.com/photos/54159370@N08/7148880783
  9. Paywalls, robot exclusions, crawler traps, … Event-driven content doesn’t mesh well with established – meaning v-e-r-y deliberate – collection development processes Search is simple if you know the URL How to find enough good people? (We’re hiring!)
  10. “You’re collecting that?” May need programmatic or API access to in situ collection analysis
  11. Headless browsers (PhantomJS, Umbra, etc.), API harvesters Make browsing the past web as simple and intuitive as browsing the live web Net casting at disk Contarf Pelican Park, https://www.flickr.com/photos/shebalso/6357626617 Bart van de Biezen, Goed Zoekveld, https://www.flickr.com/photos/bartelomeus/4184705426
  12. Avoid needless duplication of effort As librarians we have historically given perhaps inordinate priority to content creators and curators and not enough to consumers. But over significant timespans it is the users who affirmatively seek out and exploit content who may be best positioned to contribute towards its successful management. Meyer lemons, https://www.flickr.com/photos/chiotsrun/4115059294 We sit in the shade and drink lemonade, https://www.flickr.com/photos/sagesolar/9230445157
  13. Michael Harries, Drawing back the curtain, http://cdn.ws.citrix.com/wp-content/uploads/2012/05/iStock_000010348904XSmall.jpg