What is a DMP

Presentation given at EU Research applicants training on the Horizon 2020 Open Research Data pilot in Bern, Switzerland on 10th November 2016

  1. 1. EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 What is a Data Management Plan? Sarah Jones Digital Curation Centre Twitter: @sjDCC This work is licensed under the Creative Commons CC-BY 4.0 licence
  2. 2. What is EUDAT? EUDAT offers a pan-European solution, providing a generic set of services to ensure minimum level of interoperability Building common data services in close collaboration with 25+ communities
  3. 3. What is a DMP and why write one? Requirements under Horizon 2020 Example plans Lessons and guidance Overview
  4. 4. WHAT IS A DMP & WHY WRITE ONE? Image CC-BY-NC-SA by Leo Reynolds
  5. 5. Data Management Plans A DMP is a brief plan to define: How the data will be created how it will be documented who will be able to access it where it will be stored who will back it up whether (and how) it will be shared & preserved DMPs are often submitted as part of grant applications, but are useful whenever researchers are creating data.
  6. 6. How do DMPs help? NON PECUNIAE INVESTIGATIONIS CURATORE SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS (Not for the research funder, but for life we make data management plans) Make your research easier Stop yourself drowning in irrelevant stuff Save data for later Avoid accusations of fraud or bad science Write a data paper Share your data for re-use Get credit for it
  7. 7. Undervaluing research data
  8. 8. CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA Research data lifecycle CREATING DATA: designing research, DMPs, planning consent, locate existing data, data collection and management, capturing and creating metadata RE-USING DATA: follow- up research, new research, undertake research reviews, scrutinising findings, teaching & learning ACCESS TO DATA: distributing data, sharing data, controlling access, establishing copyright, promoting data PRESERVING DATA: data storage, back- up & archiving, migrating to best format & medium, creating metadata and documentation ANALYSING DATA: interpreting, & deriving data, producing outputs, authoring publications, preparing for sharing PROCESSING DATA: entering, transcribing, checking, validating and cleaning data, anonymising data, describing data, manage and store data Ref: UK Data Archive:
  9. 9. What data organisation would a re-user like? Planning trick 1: think backwards CREATING DATA PROCESSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA
  10. 10. Data organisation
  11. 11. Planning trick 2: include RDM stakeholders Institution RDM policy Facilities €$£ Research funders Publishers Data Availability policy Commercial partners
  12. 12. DMPS IN HORIZON 2020 Image “Open Data” CC BY 2.0 by
  13. 13. Horizon 2020: Open Data Pilot t/h2020-hi-oa-data-mgt_en.pdf Participants must: Develop a Data Management Plan Deposit research data in a repository Take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) Provide information via the chosen repository about the tools that are needed to validate the results
  14. 14. Open Data by Default from 2017
  15. 15. Approach: as open as possible, as closed as necessary Image: ‘Balancing rocks’ by Viewminder CC-BY-SA-ND
  16. 16. Horizon 2020 and DMPs In H2020 the Data Management Plan (DMP) is a regular project deliverable, due by month 6. A DMP is a living document: to be used, updated and shared. You can use the H2020 template in DMPonline. The DMP is not part of the proposal evaluation, but there is an optional section on data management evaluated under impact. If (part of your) data cannot be shared with everyone, you may (partially) opt out of the pilot.
  17. 17. Findable – Assign persistent IDs, provide rich metadata, register in a searchable resource,... Accessible – Retrievable by their ID using a standard protocol, metadata remain accessible even if data aren’t... Interoperable – Use formal, broadly applicable languages, use standard vocabularies, qualified references... Reusable – Rich, accurate metadata, clear licences, provenance, use of community standards... Making data FAIR
  18. 18. 1. Data Summary 2. FAIR data 2.1 Making data findable, including provisions for metadata 2.2 Making data openly accessible 2.3 Making data interoperable 2.4 Increase data re-use (through clarifying licences) 3. Allocation of resources 4. Data security 5. Ethical aspects 6. Other issues manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf H2020 template
  19. 19. Common themes in DMPs 1. Description of data to be collected / created (i.e. content, type, format, volume...) 2. Standards / methodologies for data collection & management 3. Ethics and Intellectual Property (highlight any restrictions on data sharing e.g. embargoes, confidentiality) 4. Plans for data sharing and access (i.e. how, when, to whom) 5. Strategy for long-term preservation
  21. 21. Example plans 108 DMPs from the National Endowment for the Humanities grant-applications-2011-2014-now-available 20+ scientific DMPs submitted to the NSF (USA) provided by UCSD • dmp- samples.html Example DMP collection from Leeds University • DMPs in RIO journal • ournal_id=17 Further examples: •
  22. 22. Example H2020 DMPs in Zenodo Helix Nebula – High Energy Physics example Tweether – engineering (micro-electronics) example AutoPost – ICT example
  23. 23. Data description examples The final dataset will include self-reported demographic and behavioural data from interviews with the subjects and laboratory data from urine specimens provided. From NIH data sharing statements Every two days, we will subsample E. affinis populations growing under our treatment conditions. We will use a microscope to identify the life stage and sex of the subsampled individuals. We will document the information first in a laboratory notebook and then copy the data into an Excel spreadsheet. The Excel spreadsheet will be saved as a comma separated value (.csv) file. From DataOne – E. affinis DMP example
  24. 24. Metadata examples Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively. From ICPSR Framework for Creating a DMP We will first document our metadata by taking careful notes in the laboratory notebook that refer to specific data files and describe all columns, units, abbreviations, and missing value identifiers. These notes will be transcribed into a .txt document that will be stored with the data file. After all of the data are collected, we will then use EML (Ecological Metadata Language) to digitize our metadata. EML is one of the accepted formats used in ecology, and works well for the types of data we will be producing. We will create these metadata using Morpho software, available through KNB. The metadata will fully describe the data files and the context of the measurements. From DataOne – E. affinis DMP example
  25. 25. Data sharing examples We will make the data and associated documentation available to users under a data- sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed. From NIH data sharing statements The videos will be made available via the website (both as streaming media and downloads) HD and SD versions will be provided to accommodate those with lower bandwidth. Videos will also be made available via Vimeo, a platform that is already well used by research students at Bristol. Appropriate metadata will also be provided to the existing Vimeo standard. All video will also be available for download and re-editing by third parties. To facilitate this Creative Commons licenses will be assigned to each item. In order to ensure this usage is possible, the required permissions will be gathered from participants (using a suitable release form) before recording commences. From University of Bristol Kitchen Cosmology DMP
  26. 26. Examples restrictions Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement. From NIH data sharing statements 1. Share data privately within 1 year. Data will be held in Private Repository, but metadata will be public 2. Release data to public within 2 years. Encouraged after one year to release data for public access. 3. Request, in writing, data privacy up to 4 years. Extensions beyond 3 years will only be granted for compelling cases. 4. Consult with creators of private CZO datasets prior to use. Pis required to seek consent before using private data they can access From Boulder Creek Critical Zone Observatory DMP
  27. 27. Archiving examples The investigators will work with staff at the UKDA to determine what to archive and how long the deposited data should be retained. Future long- term use of the data will be ensured by placing a copy of the data into the repository. From ICPSR Framework for Creating a DMP Data will be provided in file formats considered appropriate for long-term access, as recommended by the UK Data Service. For example, SPSS Portal format and tab-delimited text for qualitative tabular data and RTF and PDF/A for interview transcripts. Appropriate documentation necessary to understand the data will also be provided. Anonymised data will be held for a minimum of 10 years following project completion, in compliance with LSHTM’s Records Retention and Disposal Schedule. Biological samples (output 3) will be deposited with the UK BioBank for future use. From Writing a Wellcome Trust Data Management and Sharing Plan
  28. 28. Share your example DMPs! Send us links to your DMPs We will add them to the DCC list Aim to cover wide range of disciplines and funders share-DMPs
  29. 29. LESSONS AND RESOURCES Image ‘Energy Resources | Energie Quelle’ CC-BY-NC by K. H. Reichert
  30. 30. Tips for writing DMPs Seek advice - consult and collaborate Consider good practice for your field Base plans on available skills & support Make sure implementation is feasible Think about things early…
  31. 31. DCC support on DMPs Webinars and training materials How-to guides and other advisory documents Checklist on what to cover in DMPs Example DMPs DMPonline
  32. 32. DMPonline A web-based tool to help researchers write DMPs Includes a template for Horizon 2020
  33. 33. How the tool works Click to write a generic DMP Or choose your funder to get their specific template Pick your uni to add local guidance and to get their template if no funder applies Choose any additional optional guidance
  34. 34. Writing plans: features Ability to leave notes for collaborators Custom guidance from funder, uni, discipline, group... Progress indicators
  35. 35. Where to find a data repository? The EC guidelines point to Re3data as one of the registries that can be searched to find a home for data ntent/re3data-demo
  36. 36. How to select a repository? Look for provision from your community, university, publisher, funder etc Check they match your particular data needs: e.g. formats accepted; mixture of Open and Restricted Access. See if they provide guidance on how to cite the deposited data. Do they assign a persistent & globally unique identifier for sustainable citations and to links back to particular researchers and grants? Look for certification as a ‘Trustworthy Digital Repository’ with an explicit ambition to keep the data available in long term.
  37. 37. How to license research data Horizon 2020 guidelines point to CC-BY or CC-0 DCC How-to guide helps you to license data EUDAT licensing wizard help you pick licence for data & software
  38. 38. Metadata standards Metadata Standards Directory Broad, disciplinary listing of standards and tools Maintained by RDA group metadata- directory Biosharing A portal of data standards, databases, and policies Focused on life, environmental and biomedical sciences
  39. 39. Key messages Data management is part of good practice whether you plan to make the data open or not – it benefits you! If you plan to share data, consider this from the outset as decisions made early on affect what you can do later. The process of planning is the most important aspect of DMPs. Think about the desired end result and plan for this. Approach DMPs in whatever way best fits your project. Don’t just let funder requirements drive things.
