SlideShare a Scribd company logo
1 of 25
Digitisation Overview Neil Fitzgerald IMPACT Project Delivery Manager 24 th  September 2009
 
British Library Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Key Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Boutique  Digitisation
Present
Strategic Content Alliance
Google’s Scanning Patent
Mass Digitisation Principles Continuous improvement Use standards to benefit resource discovery,  interoperability & digital preservation   Content selection by collection Critical mass required to  build useful service Workflow designed to deliver  quality fit for purpose OCR’d where possible
Scanning Process : Contractor Workflow
Metadata Issues ,[object Object],[object Object],[object Object],[object Object],[object Object]
MDP Book Workflow
Workflow Tools
Copyright Tools
Deliverables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Online Books
E-Book & POD
Permanent Access
Collaborative Correction
R&D Still Necessary
Future
Digital Britain
Europeana
Future Collaboration
www.bl.uk  [email_address]

More Related Content

Similar to British Library Digitisation Overview

DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMSDAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMSAxiell ALM
 
Audio MD Metadata Scheme
Audio MD Metadata SchemeAudio MD Metadata Scheme
Audio MD Metadata SchemeAriel Hess
 
Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosMuehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosEUscreen
 
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012lljohnston
 
Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertationssinglish
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
iPRES 2015 premis implementation fair
iPRES 2015 premis implementation fairiPRES 2015 premis implementation fair
iPRES 2015 premis implementation fairJesse de Vos
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionKay Gregg
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
publishing production
publishing productionpublishing production
publishing productionEssam Obaid
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositFIAT/IFTA
 
Acquiring Born-Digital Material at the Canadian Centre for Architecture
Acquiring Born-Digital Material at the Canadian Centre for ArchitectureAcquiring Born-Digital Material at the Canadian Centre for Architecture
Acquiring Born-Digital Material at the Canadian Centre for ArchitectureDavid Stevenson
 
ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014Olivier Dobberkau
 
Html Presentation
Html PresentationHtml Presentation
Html Presentationumesh patil
 
XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7Deniz Kılınç
 

Similar to British Library Digitisation Overview (20)

DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMSDAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
 
Audio MD Metadata Scheme
Audio MD Metadata SchemeAudio MD Metadata Scheme
Audio MD Metadata Scheme
 
Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosMuehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
 
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
 
Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertations
 
Current trends in DBMS
Current trends in DBMSCurrent trends in DBMS
Current trends in DBMS
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Bologna
BolognaBologna
Bologna
 
iPRES 2015 premis implementation fair
iPRES 2015 premis implementation fairiPRES 2015 premis implementation fair
iPRES 2015 premis implementation fair
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
publishing production
publishing productionpublishing production
publishing production
 
LORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal depositLORENZ Building an integrated digital media archive and legal deposit
LORENZ Building an integrated digital media archive and legal deposit
 
digital Preservation
digital Preservationdigital Preservation
digital Preservation
 
Completepresentation
CompletepresentationCompletepresentation
Completepresentation
 
Acquiring Born-Digital Material at the Canadian Centre for Architecture
Acquiring Born-Digital Material at the Canadian Centre for ArchitectureAcquiring Born-Digital Material at the Canadian Centre for Architecture
Acquiring Born-Digital Material at the Canadian Centre for Architecture
 
ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014ForgetIT Project TYPO3Camp Milano 2014
ForgetIT Project TYPO3Camp Milano 2014
 
Html Presentation
Html PresentationHtml Presentation
Html Presentation
 
XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7
 

British Library Digitisation Overview

Editor's Notes

  1. Page Footer text here... Header text here...
  2. BL has 2 mains sites at STP and Bspa
  3. One of six legal deposit libraries along with: NLW, NLS, Oxford, Cambridge, Trinity who are all currently trying to deal with Electronic Legal Deposit [born digital] material. Involved in a range of digitisation activities using a number of approaches – no grant-in-aid funding!
  4. For any type of digitisation: TS – appropriate for material/volume e.g. MDP in TIFF = 1.2Pb, using JP2 25Tb WT – most cost is in inefficient pre & post-capture processing PDD – affects ability to operationalise – direct efficiency/cost effects MS – affects costs and resource discovery options Meta – Each stage of processing impacted OCR – Processing & resource discovery implications – only good on post 1950 documents
  5. Self-selecting, i.e. obvious Treasures Drivers: cultural restitution, wider public access Sometimes private sponsorship, especially for iconic items Cultural reunification projects e.g. International Dunhuang Project and Codex Sinaiticus Focus on small scale & high quality showcases – often a re-cataloguing/metadata/resource discovery tool improvement exercise in disguise! Although it is often said there is one chance to capture these items, compelling new technology is often an exception to the rule.
  6. Google entry to market EU i2010 response – devolved to national governments Microsoft entry and withdrawal from market – Internet Archive Complex rights landscape Range of capture approaches available, e.g. move from analogue to digital conversion as historical archives processed and digital equipment quality improves/costs fall Some scanners better at dealing with certain material e.g. tight bindings, but exaggerate show through…
  7. We still don’t fully understand our audience/stakeholders are and what they want! SCA is trying to provide guidance….
  8. Industrial scanning & processing – central services Requires multiple capture loops to deal with material with specific handling issues R&D/CoC guidance on optimal capture & OCR required – project based digitisation unlikely to deliver long term improvements in isolation
  9. General principles – then some points in more detail
  10. Typical large scale workflow – should highlight the QA batch sampling method based on ISO 2895-1 – trend analysis. Proved that good quality possible in large volume workflow!
  11. Publisher and physical description: The earliest, unamended nineteenth century catalogue records in GK were very brief. Often there is no information on publisher and on the number of pages and of course no ISBN. Most often however ‘format’ is included; in effect a statement of how paper was used in the production process. In many cases publisher names have since been added as have page numbers. But when the printed catalogue was converted it was not possible to separate the various statements. Some attempt has been made to make up for this, but not always successfully. Ambiguous headings: GK made main entries under personal author, corporate body, title or sometimes initial words of the title. When the catalogue was converted to UKMARC the coding made no distinction between the various types of main entry. When the data was converted to MARC 21 algorithms were written to code the type of entry appropriately; as personal author, corporate author, title etc. However, some headings were ambiguous and could not be processed in this way. Those which could not be distinguished were placed in the 720 field, which therefore contains titles as well as some types of authors’ names. Name authorities: We now use the Library of Congress / NACO file for name authority control. This ensures that one standard form of a person’s name is preferred in catalogue entries, but that access is also provided from variant forms. GK used its own name authority forms. This may mean that there will be no tie-up with books digitised by Microsoft from another library which has used a different name form for the same author. Books originally catalogued in GK, the British Museum General Catalogue of Printed Books The records were originally intended to be used in the context of a guard book catalogue The records were converted to machine readable format in the period 1987-92 The data was copied as seen; errors in the printed catalogue were not systematically corrected The MARC format employed was a simplified version of UKMARC. On migration to the ILS some of the deficiencies of this format were addressed, but a comprehensive solution was not possible at that time
  12. Scalable services/systems required to deal with large volume of material Management information essential to improve outcomes and add value to collection holders/end users
  13. Complexity of working with historical metadata/current rights landscape requires innovation to provide solutions.
  14. Future shared responsibility [web] services which benefit all will become more prevalent.
  15. Themed deliverables for material content streams, e.g; books/newspapers/journals/special collections. Ability to repurpose files to suit future requirements essential
  16. Content will drive new services/resource discovery tools and change user demands – this item digitised by Google but archived by collection holder with IA as they don’t have own DP solution currently.
  17. POD – hardware/format wars – new channels for delivery will impact on capture approach/post capture processing.
  18. Need to join up disparate services to provide an efficient end-to-end solution.
  19. User community involvement will accelerate volume available, increase quality so its fit for purpose and change cost model.
  20. Consolidation of processing required to solve outstanding issues.
  21. UK needs more coordinated approach to ensure cultural memory is available for research and to contribute to UK plc bottom line – report did not deliver required vision/building blocks to deliver it. More integrated competitors have advanced plans to digitise own language material – UK at risk as language is so widespread – who will deliver?
  22. Current content/tools need to expand if resource is going to be one of primary destination choices.
  23. IMPACT CoC to drive cross discipline research and provide source material – datasets and collaborative correction extension.