Scaling API-first – The story of a global engineering organization
Digital Initiatives and Digital Scholarship at the British Library
1. Digital Initiatives and Digital
Scholarship at the British
Library
IMPACT Members Meeting
Neil Fitzgerald
2. www.bl.uk 2
Heritage Made Digital Vision: By 2023
• Users will be able to discover and get access to all BL digitised
content through a single platform
• Users will be able to assess what has been digitised in terms of
the wider collections
• We will have digitised significant part of our collections which
are culturally important in Britain or internationally, often in
collaboration with others
• We will have digitised major collections which are classified as
select, restricted or in other ways as being of high importance
3. www.bl.uk 3
By 2023
• A high proportion of material which is unfit for use for physical
reasons will be digitised, as will a high proportion of material
which is fragile but yet usable
• A mixed-economy approach continues to drive our digitisation.
Funding comes from a commercial and from fundraising and
other external sources
• We will have used Grant in Aid and strategically guided external
funds towards the digitation of parts of the collection which are
underrepresented in our current digital offering
• All this will be underpinned by a standard but expandable and
adaptable workflow from right clearance to ingest and access
4. www.bl.uk 4
Commercial
Strategy
Collection
Strategy
Partnership
Strategy
Cengag
e Gale
Find My
Past
Adam
Matthe
w
ProQue
st
(other)
Public / Private
Partnerships
Qatar
Euro-
peana
Save
our
Sounds
Discoveri
ng
Literature
Philan-
thropic
Open Access
Models
Google
Books
Microsof
t Books
Amazo
n
Hybrid Models
Living Knowledge
Living Knowledge, Heritage
Made Digital and Partnerships
Requires a joined up strategy
6. www.bl.uk 6
• Two Centuries of Indian Print
• Hebrew Manuscripts Digitisation
In partnership with The National Library of
Israel
• King’s Topographical Collection
• England and France, 700-1200
In partnership the Bibliothèque nationale de
France, funded by the Polonsky Foundation
• Qatar Foundation Digitisation Partnership
• Google Books Digitisation
• Public Private Partnerships (Adam Matthew
Digital, Gale Cengage, FindMyPast)
• Endangered Archives Programme
Current major digitisation initiatives
9. www.bl.uk 9
Digital Asset Management & Preservation
System (DAMPS) Project
• The existing Digital Library System was originally designed over 10
years ago
• It is used to ingest and preserve 150TB of digital content every year
• Contains numerous content types including Ebooks, Ejournals, Web
Archive, Digital images, Newspapers and Audio
• Recent improvements have included
– Setting up new content streams
– increased capacity and performance
– improved automation and MI
10. www.bl.uk 10
Challenges
• Need to cater for new and future types of content
• Need step change in ingest capacity and replication performance (from
c0.5TB per day to c5TB a day)
• Longer timescales to deliver new functionality with existing development
approach
• Very complex architecture due to historical evolution of DLS
• High on-going development and maintenance costs
• New technology opportunities exit (such as improved storage &
replication technologies)
11. www.bl.uk 11
Project Focus
• Replace the existing Digital Library System (DLS) application with a commercial ‘off the
shelf’ Digital Asset Management and Preservation (DAMPS) product capable of supporting
the Libraries current and future Collection Management activities.
• Improve the efficiency and automation of business processes for managing ingest and
preservation of existing digital & digitised content streams, enabling the future scaling up
of digital collection operations.
• Enable the rapid configuration and deployment of new digital & digitised content required
to realise the Library’s long term vision for content collection, preservation and access.
• Provide a robust, reliable and flexible solution that is compatible with the Libraries
strategic layered technical architecture model and reduces IT architectural complexity.
• Enable an improved IT and Collection Management operating models, bringing cost
efficiencies in the development, support and management of digital collection activity.
12. www.bl.uk 12
Catalogue IntegrationExternal Integration
Storage Infrastructure A/D Infrastructure
Acquisition
Discovery
Systems
UV/UP
API
I
n
g
e
s
t
Data Management
Archival Storage
A
c
c
e
s
s
Administration
Preservation Planning
13. www.bl.uk 13
Meet the Digital Scholarship Department
Founded in 2010, we support
the innovative use of British Library's digital
collections and data through:
• Working behind the scenes to get content in
digital form and online
• Offering digital research support and guidance
• Supporting collaborative projects
• Running events, competitions, and awards
Teams:
• BL Labs
• Digital Curators
• Endangered Archives Programme
14. www.bl.uk 14
Digital skill building
The Digital Scholarship Training
Programme is an internal staff training
initiative by the Digital Curator team that
launched in November 2012.
Helps us to situate our collections and
expertise in the realm of digital research.
Explore opportunities and challenges.
Delivered over 100 courses to over 400
staff members so far!
Looking now to go external with this in
2017/2018.
Library Carpentry: Software Skills for
Librarians
15. www.bl.uk 15
Some Courses
• 101 This is Digital Scholarship
• 103 Digitisation at British Library
• 105 Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions
• 107 Data Visualisation for Analysis in Scholarly Research
• 108 Geocoding Historical Information and Digital Mapping
• 109 Data on the Web: Mash-ups, API’s and The Semantic Web
• 118 Cleaning up Data
Some Hack & Yacks
• Handwritten Text Recognition with Transkribus
• From Paper Maps to the Web: A DIY Digital Maps Primer
• Literary & Historical Network Analysis using Gephi
• Interactive writing platforms: Twine and Inklewriter
17. www.bl.uk 17
Pilot will see over 4,000 items between 1713 to 1914,
mostly Bengali to be digitised and catalogued
http://www.bl.uk/projects/two-centuries-of-indian-print
Dedicated Digital Curator supporting computationally
driven research, such as text mining, with outputs,
through creating and curating datasets for inclusion
on data.bl.uk and providing digital skills training.
• Understand how best to support digital
humanities /digital research community to use these
digital collections
• Create & promote accessible datasets for analysis
• Collaboratively stimulate innovative digital research
• Deliver training workshops for Indian institutions
Two Centuries of Indian Print
Pleasing tales designed to improve
the understanding, and direct the
conduct of young persons, 1825
18. www.bl.uk 18
OCR Challenges and Opportunities
• Enables search + research at
scale across many items
• Multiple table styles
• Efficient OCR solution for Bengali
and other South Asian languages
Bengal Library Catalogue of Books,1918-1919, ‘Quarterly list’
SV 412/8
24. www.bl.uk 24
Open Cultural Heritage Datasets
Collection Guides
Datasets about our collections
Bibliographic datasets relating to our published and
archival holdings
Datasets for content mining
Content suitable for use in text and data mining
research
Datasets for image analysis
Image collections suitable for large-scale image-
analysis-based research
Datasets from UK Web Archive
Data and API services available for accessing UK Web
Archive
Digital mapping
Geospatial data, cartographic applications, digital aerial
photography and scanned historic map materials
https://data.bl.uk
Each dataset has a
Digital Object Identifier
Discussion list: http://www.jiscmail.ac.uk/CULTURAL-HERITAGE-DATASETS
25. www.bl.uk 25
British Library Data Strategy 2016
Our vision for the British Library is that research data
are as integrated into our collections, research and
services as text is today.
The British Library's users will be able to consume
research data online through tools that enable it to
be analysed, visualised and understood by non-
specialists.
26. www.bl.uk 26
Four Themes
Data Archiving
and
Preservation
Data
Discovery,
Access and
Reuse
Data CreationData
Management
28. www.bl.uk 28
Next steps
• Create an abridged version of the strategy for public
distribution
• Continue work on DataCite and THOR
• Finalise approach to data management plans
• Establish evidence-base for data archiving
• Increase access to BL datasets through https://data.bl.uk
• Stimulate use of BL datasets at the Turing Institute
• Feed in to Digital Research Service & Suite
29. www.bl.uk 29
Endangered Archives Programme
Through an annual competition, EAP grants
provide funding to preserve social and
cultural archival material that is in danger of
destruction, neglect or physical deterioration
world-wide.
To date, the EAP has awarded 290 grants in
80 countries, preserving cultural and social
archives across Africa, Asia, Europe,
Americas and Oceania.
In 2017/2018 EAP will host a Chevening
fellow to strengthen its activities in South
America and the Middle East.
EAP266: History of Bolama, the first capital of
Portuguese Guinea (1879-1941), as reflected in the
Guinean National Historical Archives