SlideShare a Scribd company logo
1 of 16
Web Archive Retrieval Tools
Paul Doorenbosch Jaap Kamps Richard Rogers Arjen de Vries René Voorburg


       CATCH Meeting HiTime e-History, November 1, 2011
Information
               Access

                                          Paul Doorenbosch



Arjen de Vries                                         René Voorburg

                                                              Web
                                                             Archive
                         Jaap Kamps




                 New
                 Media
                         Richard Rogers
Unlimited ways to publish/access/share information
Our daily lives take place “on the Web”
Ease of publishing on the Web comes at a price



           Web content is ephemeral



Web archives preserve the heritage of the future
d to the information      defined. After the morning introduc-
  lieve that informa-     tory session, we split the workshop
               Focus on use: Web research(ers)
 search falls squarely
human-computer
                          into three new working groups, based
                          on the results of that discussion.
ome emphasis on
 val, rather than vice
 f the thrusts o f this
 attempt to character-
users engage in, to
ctivities, and to iden-
chniques and mea-
  appropriate insights
 or and performance.



participated in the
were chosen on the
ef submitted position
sented a broad spec-
and academia. Partic-
 France, Canada,
 U.S. After accep-
                                             J.
s were asked to sub-
age) position
                            -            ©
escribed relevant
pectives a few weeks
 hop. These papers
Search support has massively improved
Complex tasks are still painstaking!




Many queries, tabs, notes, cut-and-paste, ...
Exploratory and faceted search
Interactively construct a (hidden) query
Search strategy from building blocks
Each block = data or manipulations
           Strategy Builder




Build a dedicated search engine “on the fly”
Research methods become search strategies


    Store, refine, reuse, share strategies


      Researchers enrich the archive
Archival selection determines future use
Digital humanities is a paradigm switch
Supporting Complex Search Tasks
Nick Belkin Charlie Clarke Ning Gao Jaap Kamps Jussi Karlgren
                         Thanks!
              SIGIR 2011 Workshop, July 28, 2011

More Related Content

What's hot

Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
The Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of OxfordThe Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of OxfordChristine Madsen
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the userlisld
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Janifer Gatenby
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Building and Managing Social Media Collections
Building and Managing Social Media CollectionsBuilding and Managing Social Media Collections
Building and Managing Social Media CollectionsJason Casden
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataShenghui Wang
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...Andrew Bourgeois
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Jon Voss
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
ArchivePress Presentation (BL 21/7/2009)
ArchivePress Presentation (BL 21/7/2009)ArchivePress Presentation (BL 21/7/2009)
ArchivePress Presentation (BL 21/7/2009)Richard Davis
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISMicah Altman
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemWiLS
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentConstance Malpas
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosOCLC
 

What's hot (20)

Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
The Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of OxfordThe Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of Oxford
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the user
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
Building and Managing Social Media Collections
Building and Managing Social Media CollectionsBuilding and Managing Social Media Collections
Building and Managing Social Media Collections
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Ir1
Ir1Ir1
Ir1
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
ArchivePress Presentation (BL 21/7/2009)
ArchivePress Presentation (BL 21/7/2009)ArchivePress Presentation (BL 21/7/2009)
ArchivePress Presentation (BL 21/7/2009)
 
CST2560 Oct 2019
CST2560 Oct 2019CST2560 Oct 2019
CST2560 Oct 2019
 
Dulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PISDulin PermaCC Talk for MIT PIS
Dulin PermaCC Talk for MIT PIS
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library System
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environment
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 

Similar to WebART in 10 minutes

When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchJaap Kamps
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009Kevin Ashley
 
Observing Web Archives: The Case for an Ethnographic Study of Web Archiving
Observing Web Archives: The Case for an Ethnographic Study of Web ArchivingObserving Web Archives: The Case for an Ethnographic Study of Web Archiving
Observing Web Archives: The Case for an Ethnographic Study of Web ArchivingJessica Ogden
 
Lightning Talks: All EartCube Funded Projects
Lightning Talks: All EartCube Funded ProjectsLightning Talks: All EartCube Funded Projects
Lightning Talks: All EartCube Funded ProjectsEarthCube
 
Web 2.0 Tools for Researchers
Web 2.0 Tools for ResearchersWeb 2.0 Tools for Researchers
Web 2.0 Tools for Researcherstbirdcymru
 
Revolutionizing scientific communication and collaboration
Revolutionizing scientific communication and collaborationRevolutionizing scientific communication and collaboration
Revolutionizing scientific communication and collaborationKonrad Förstner
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsMarieke Guy
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsJohn Kunze
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesShawn Day
 
Digital library services and the changing environment
Digital library services and the changing environmentDigital library services and the changing environment
Digital library services and the changing environmentJohn MacColl
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminarseanb
 
Reach Out to Research : library support services (R2R)
Reach Out to Research : library support services (R2R) Reach Out to Research : library support services (R2R)
Reach Out to Research : library support services (R2R) Guus van den Brekel
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstreamChris Rusbridge
 

Similar to WebART in 10 minutes (20)

When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
CSAFE CRE Presentation
CSAFE CRE PresentationCSAFE CRE Presentation
CSAFE CRE Presentation
 
JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009JISC repositories and preservation programme: Plenary presentation 2009
JISC repositories and preservation programme: Plenary presentation 2009
 
Observing Web Archives: The Case for an Ethnographic Study of Web Archiving
Observing Web Archives: The Case for an Ethnographic Study of Web ArchivingObserving Web Archives: The Case for an Ethnographic Study of Web Archiving
Observing Web Archives: The Case for an Ethnographic Study of Web Archiving
 
Lightning Talks: All EartCube Funded Projects
Lightning Talks: All EartCube Funded ProjectsLightning Talks: All EartCube Funded Projects
Lightning Talks: All EartCube Funded Projects
 
Web 2.0 Tools for Researchers
Web 2.0 Tools for ResearchersWeb 2.0 Tools for Researchers
Web 2.0 Tools for Researchers
 
Labscope intro
Labscope introLabscope intro
Labscope intro
 
Sgci iwsg-a-10-10-16
Sgci iwsg-a-10-10-16Sgci iwsg-a-10-10-16
Sgci iwsg-a-10-10-16
 
Revolutionizing scientific communication and collaboration
Revolutionizing scientific communication and collaborationRevolutionizing scientific communication and collaboration
Revolutionizing scientific communication and collaboration
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data Citations
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
 
Digital library services and the changing environment
Digital library services and the changing environmentDigital library services and the changing environment
Digital library services and the changing environment
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminar
 
Emerge
EmergeEmerge
Emerge
 
Reach Out to Research : library support services (R2R)
Reach Out to Research : library support services (R2R) Reach Out to Research : library support services (R2R)
Reach Out to Research : library support services (R2R)
 
Ngsp
NgspNgsp
Ngsp
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstream
 

More from Jaap Kamps

ICTIR'17 Opening
ICTIR'17 OpeningICTIR'17 Opening
ICTIR'17 OpeningJaap Kamps
 
From Finding to Discovering
From Finding to DiscoveringFrom Finding to Discovering
From Finding to DiscoveringJaap Kamps
 
Expose in 10 minutes
Expose in 10 minutesExpose in 10 minutes
Expose in 10 minutesJaap Kamps
 
INEX 2010 Opening
INEX 2010 OpeningINEX 2010 Opening
INEX 2010 OpeningJaap Kamps
 
Bachelor Cultural Information Science 2010-2011
Bachelor Cultural Information Science 2010-2011Bachelor Cultural Information Science 2010-2011
Bachelor Cultural Information Science 2010-2011Jaap Kamps
 
IIiX 2012 Nijmegen Bid
IIiX 2012 Nijmegen BidIIiX 2012 Nijmegen Bid
IIiX 2012 Nijmegen BidJaap Kamps
 

More from Jaap Kamps (7)

ICTIR'17 Opening
ICTIR'17 OpeningICTIR'17 Opening
ICTIR'17 Opening
 
From Finding to Discovering
From Finding to DiscoveringFrom Finding to Discovering
From Finding to Discovering
 
Expose in 10 minutes
Expose in 10 minutesExpose in 10 minutes
Expose in 10 minutes
 
INEX 2010 Opening
INEX 2010 OpeningINEX 2010 Opening
INEX 2010 Opening
 
Bachelor Cultural Information Science 2010-2011
Bachelor Cultural Information Science 2010-2011Bachelor Cultural Information Science 2010-2011
Bachelor Cultural Information Science 2010-2011
 
IIiX 2012 Nijmegen Bid
IIiX 2012 Nijmegen BidIIiX 2012 Nijmegen Bid
IIiX 2012 Nijmegen Bid
 
Museum0610
Museum0610Museum0610
Museum0610
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

WebART in 10 minutes

  • 1. Web Archive Retrieval Tools Paul Doorenbosch Jaap Kamps Richard Rogers Arjen de Vries René Voorburg CATCH Meeting HiTime e-History, November 1, 2011
  • 2. Information Access Paul Doorenbosch Arjen de Vries René Voorburg Web Archive Jaap Kamps New Media Richard Rogers
  • 3. Unlimited ways to publish/access/share information
  • 4. Our daily lives take place “on the Web”
  • 5. Ease of publishing on the Web comes at a price Web content is ephemeral Web archives preserve the heritage of the future
  • 6. d to the information defined. After the morning introduc- lieve that informa- tory session, we split the workshop Focus on use: Web research(ers) search falls squarely human-computer into three new working groups, based on the results of that discussion. ome emphasis on val, rather than vice f the thrusts o f this attempt to character- users engage in, to ctivities, and to iden- chniques and mea- appropriate insights or and performance. participated in the were chosen on the ef submitted position sented a broad spec- and academia. Partic- France, Canada, U.S. After accep- J. s were asked to sub- age) position - © escribed relevant pectives a few weeks hop. These papers
  • 7. Search support has massively improved
  • 8. Complex tasks are still painstaking! Many queries, tabs, notes, cut-and-paste, ...
  • 10. Interactively construct a (hidden) query
  • 11. Search strategy from building blocks
  • 12. Each block = data or manipulations Strategy Builder Build a dedicated search engine “on the fly”
  • 13. Research methods become search strategies Store, refine, reuse, share strategies Researchers enrich the archive
  • 15. Digital humanities is a paradigm switch
  • 16. Supporting Complex Search Tasks Nick Belkin Charlie Clarke Ning Gao Jaap Kamps Jussi Karlgren Thanks! SIGIR 2011 Workshop, July 28, 2011

Editor's Notes

  1. Good afternoon. My name is Jaap Kamps and it is my pleasure to introduce the WebART (Web Archive Retrieval Tools) project.\n
  2. \n
  3. The project is a collaboration of three groups of researchers: \n1. Specialists working on Information Access (Computer Science, Arjen de Vries);\n2. New media scholars working on the Web and the Web Archive (Humanities, Richard Rogers); and\n3. Web Archivists from the Dutch Web Archive (Heritage Sector, Rene Voorburg en Paul Doorenbosch).\nWhat is special is that all three groups are actively building technical tools -- the Koninklijke Bibliotheek does large scale crawling; the new media scholar build dedicated crawlers/screen-scrapers and analysis tools; and the computer scientists think they know the next generation of search tools.\n
  4. The Web is a unique object with an unprecedented size and growth curve, and with distance the largest source of information on -- basically -- everything. The Web has had a revolutionary impact on how we publish, access, and share information. \n
  5. In fact, it has a fundamental impact on our daily lives that increasingly take place “on the Web.”\n
  6. However, this increasing dependence on the Web comes at a price: the ease of publishing on the Web also results in the easy loss of information—Web content tends to be ephemeral. This project addresses the problem of our future cultural heritage. Globally this has been addressed head on by the Internet Archive, now supplemented by many national initiatives.\n
  7. \n
  8. We don’t want to focus on preservation, but on its use. That is, we critically assess the value of Web archives for realistic research scenarios, and develop information access tools and methods that maximize the archive’s utility for research. Web research tends to require complex selections and manipulations of the data.\n
  9. Search technology has advanced at an insane rate over the last decade. Who is still old enough to remember the early days of the Web, when people spent considerable parts of their time to collect and organize bookmarks.\n
  10. Despite the progress, complex tasks are still poorly supported by a modern search engines! The best strategy is to slice-and-dice the complex information need into many small sub-requests, and combine all the information post-hoc and outside the search engine into a coherent answer.\n
  11. Some systems allow for more complex interaction -- for example systems catering for exploratory or faceted search.\n
  12. Such systems are creating complex search query in the back end -- and on restricted domains much of the complexity could be hidden from the searcher.\n
  13. What if we have a way to open up this box? -- and allow searchers to manipulate complex requests or search strategies directly by combining several building blocks in unconstrained ways. Modern structured DB/IR technology allows for powerful, declarative queries or search strategies turning a collection of Web pages into a high dimensional data space.\n
  14. Each building block corresponds to a particular data source or manipulation of the data. The search strategy builds effectively a dedicated search engine “on the fly”.\n
  15. What will happen if we put these tools in the hands of the Web researchers? We will develop the appropriate building blocks and incrementally let them construct complex search strategies. Effectively, this means they can on the fly do their research, rather than have a turn around time of weeks or months in developing the right kind of crawler, the right kind of analysis tool, and then executing it. Moreover, researchers can store the search strategies, reuse and refine them, and share them with colleagues. In essence, the research methods will evolve in parallel with the search strategies, at a much faster pace and scale than ever before...\n
  16. \n
  17. However, the chosen selection and archiving strategies of Web material will have a crucial impact on their future value as cultural heritage. What choices are made or enforced upon us? What is the missing Web? The broken Web? The banned Web? We will critically evaluate the state of the Web Archive the resulting recommendations may prevent the loss of digital heritage.\n
  18. Progress is particularly thorny since we combine radically different research paradigms -- the truth finding paradigm of the exact sciences and the interpretative paradigm of the humanities -- we are in a unique situation of three disciplines (Computer Science, Media Studies, Heritage Field) looking at the same object of study, although seeing it also in different ways.\n
  19. \n