Fourth annual BL Labs Symposium, 7 Nov 2016 keynote by Professor Melissa Terras: ‘Unexpected repurposing: The British Library's digital collections and UCL teaching, research and infrastructure’
1. Unexpected Repurposing: the British
Library's digital collections and UCL
teaching, research and infrastructure
Professor Melissa Terras
Professor of Digital Humanities, UCL Dept of Information Studies
Director, UCL Centre for Digital Humanities
m.terras@ucl.ac.uk, @melissaterras
5. British Library, 28th
May 2008.
https://web.archive.org/web/20110707135434/http://pressandpolicy.bl.uk/Press-Releases/The-British-Library-19th-Century-Book-Digitisatio
Returned to library in 2012, placed under a CCO-Public domain license for
commercial and non-commercial use.
23. Staff and Students, working together
• James Baker, Adam Farquhar
• Melissa Terras, Dean Mohamedally, Tim
Weyrich,
• Stefan Alborzpour, Stelios Georgiou, Nektaria
Stavrou, Wendy Wong, Jonathan Lloyd, Meral
Sahin, Divya Surendran, James Durrant,
Muhammad Rafdi, Ali Sarraf
24.
25. Approach
• How can we search the dataset differently?
• Complex and multifaceted needs of humanities
researchers
• Boolean and Advanced Search
• Microsoft Azure 5 APIs were implemented that
functionally scale to the data
• Offering unconventional services such as bulk
download of text based on metadata queries,
word frequency lists, and OCR text previews.
38. Method
• 65k books from the British Library:
• 17th - 19th century
• 224GB compressed ALTO XML
• UCL High Performance Computing
• Support from RITS and UCLDH
• 4 humanities researchers
• Turn research questions into computational
queries
• Learn from the researchers about their needs,
wants, desires, and method.
42. Case Study 1: History of Medicine, Oliver Duke-Williams, UCL
43. Case Study 2: History of Images, Will Finley, Sheffield
44. What did this tell us?
• Best practice recommendations:
– Derived datasets for home use
– Documentating decisions
– Fixed/defined dataset
– Normalisations
45.
46.
47.
48. Common Queries
• searches for all variants of a word
• searches that return keywords in context traced
over time
• NOT searches for a word or phrase that ignored
another word or phrase
• searches for a word when in close proximity to a
second word
• searches based on image metadata
…. All returned in a derived dataset, in context.
49. Do try this at home…
1. Invest in research software engineer capacity to
deploy and maintain openly licensed largescale
digital collections from across the GLAM sector in
order to facilitate research in the arts, humanities
and social and historical sciences
2. Invest in training library staff to run these initial
queries in collaboration with humanities faculty, to
support work with subsets of data that are produced,
and to document and manage resulting code and
derived data.
54. With thanks to
• BL Labs and Digital Curators: James Baker,
Adam Farquhar, Mahendra Mahey, Ben O’Steen,
Hana Lewis
• UCL CS Student Project Team: James Baker,
Tim Weyrich, Dean Mohamedally
• Bluclobber Project Team: James Baker, James
Hetherington, David Beavan, Anne Welsh, Helen
O’Neill, Will Finley, Oliver Duke-Williams, Adam
Farquhar.
• UCL Research IT Services: James Hetherington,
Clare Gryce, Raquel Algere.