1. Research data spring
Enabling Complex Analysis of Large Scale Digital Collections27/2/2015
Lots of money has been spent digitising heritage
collections. Digitised heritage collections are data. But non-
computationally trained scholars don't know what to ask
of large quantities of data. Often they do not have access
to high performance computing facilities.
We aim to address this fundamental problem by
extending research data management processes in order
to enable novel research and a deeper understanding of
emerging research needs.
2. Team
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 2
James Baker
Curator, Digital
Research
Melissa Terras
Prof of Digital Humanities
David Beavan
Senior Research Associate
Martin Zaltz Austwick
Lecturer in Data
Visualisation
3. Scope and Gap
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 3
Non-computationally trained scholars don't know what
to ask of large quantities of digitised data
Large scale digitised collections are delivered in ad hoc
forms. Exemplar workflows for analysis of large scale
digitised collections are hard to find
Deploy and index large scale British Library (BL) digitised
collections at UCL Research IT Services (UCL RITS).
Work with researchers to turn their research questions
into computational analysis. Create and release
derived data, queries, and visualisations (that demonstrate
potential use) as citeable, CC-BY workflow packages
“I want to know
all the sentences
that mention
European cities
circa 1850 to
1900 in a BL
digitised texts
and take away
those results as a
data set”
4. Impact and Benefits
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 4
Outputs from phase one of the project would be used as
case studies and exemplars engage a wider community
and reduce research inefficiency
The project will generate engagement with new scholarly
communities around rich data resources
Narratives and workflows would be used in
interdisciplinary teaching at host institutions (Melissa:
MA/MSc Digital Humanities, Martin: BASc Arts and
Science, MRes Advanced Spatial Analysis and
Visualisation; James: BL Doctoral Training, MA History,
University of Kent)
5. Sustainability
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 5
Derived data, queries, documentation, and visualisations
released as citeable, CC-BY workflow packages with
DOIs (DataCite or Figshare)
Workflow packages embedded in teaching and
research training
Research computing communities beyond UCL deepen
understanding of complex, poorly structured, and
heterogeneous humanities data to enable process
improvement
Through BL Labs, university teaching, and BAU outreach
activities, narratives and lessons learned will have
substantial life beyond of the project
6. Outputs, milestones and indicators of success
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 6
To month 3:
● Deploy 68k digitised books (circa 4bn words!) at UCL
● Identify 3+ early career researchers (2 in hand)
● Run multi-day pilot workshop in partnership with all
parties, to work iteratively on data, workflow and
research questions
● Output: workflow packages, derived data,
visualisations to enable research insights
Social & technical barriers to analysis of large scale digitised collections are reduced
To month 7:
● Lead workshops and hackdays for the wider research community
● Deploy new BL datasets (based on researcher needs)
● Consolidate workflow packages and recipes
● Gather requirements for future infrastructure development (beyond scope of the
project)
To month 13:
● Recruit data
champions to drive
wider adoption of
methods
● Support community
led workshops
focussed on specific
domain needs and
challenges
● Create cookbook from
recepies
7. Funding
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 7
To month 3:
UCL RITS Development: £5,500
Materials Development, Management and Administration: £10,025
Delivery of pilot workshops: £4,100
Total, full economic cost: £19,625