2. Problem
»Active data storage being used for everything
»No interface to archiving (good RDM practice)
»Archiving undertaken on unsustainable/non-compliant
storage solutions (DVDs / USB drives)
»No consideration for curation and archiving, data
description, retention and review
»Items of long term value not actively identified and
realised
07/04/2015 Data Vault project pitch 2
4. Use cases
» A paper has been published, and according to the research funder’s rules, the data
underlying the paper must be made available upon. It is therefore important to store a
date-stamped golden-copy of the data associated with the paper. Even if the author’s
own copy of the data is subsequently modified, the data at the point of publication is still
available.
» Data containing personal information, perhaps medical records, needs to be stored
securely, however the data is ‘complete’ and unlikely to change, yet hasn’t reached the
point where it should be deleted.
» Data analysis of a data set has been completed, and the research finished. The data may
need to be accessed again, but is unlikely to change, so needn’t be stored in the
researcher’s active data store. An example might be a set of completed crystallography
analyses, which whilst still useful, will not need to be re-analysed.
» Data is subject to retention rules and must be kept securely for a given period of time, for
example EPSRC funded data that needs to be stored securely for ten years.
07/04/2015 Data Vault project pitch 4
5. Impact and Benefits
» Impact:
› Encourage archiving and proper long term storage of data
› This leads to better data management practice
» Benefits
› Researchers
› University HE Community
› IT Services
07/04/2015 Data Vault project pitch 5
6. Sustainability
»Open source software developed in github
»Monthly sprints with fortnightly catchups
»Institutional commitment
»Embedding in RDM Programmes
»Solution open to all – we’ve seen a lot of interest over
time, and during this sandpit event
07/04/2015 Data Vault project pitch 6
7. Outputs, milestones and indicators of success
»Phase 1
› Month 1
– Project kick-off
– Investigation Report (how to package data in containers,
commonAPI methods from storage providers)
› Month 2
– Specification (define how the DataVault will work)
› Month 3
– Proof of concept (deliver an end to end proof of concept)
07/04/2015 Data Vault project pitch 7
8. Outputs, milestones and indicators of success
»Phase 2
› 4 months
– Minimum viable product
»Phase 3
› 6 months
– Full Release
07/04/2015 Data Vault project pitch 8
9. Outputs, milestones and indicators of success
»Indicators of Success:
› Researchers using the tool
› Other Institutions take-up
› Freeing up of active data storage
› A move towards the culture change required
› Ability to manage and realise assets
07/04/2015 Data Vault project pitch 9
10. Team
»Project members:
› University of Edinburgh
› University of Manchester
› Other active contributors welcome
»Project leads on both sites (institutional contribution)
»Developers x 1 per partner, half funded by JISC, half
locally funded
»Plus each sites RDMTeams
07/04/2015 Data Vault project pitch 10
11. Funding
Jisc Local 3 months - Jisc
3 months -
local
RDM Developer - Edinburgh £2,500 £2,500 £7,500 £7,500
RDM Developer - Manchester £2,500 £2,500 £7,500 £7,500
3 * monthly project meetings 666 2000
Next Research Data Spring workshop 500 500
£17,500
07/04/2015 Data Vault project pitch 11