1. www.geant.org
1 |
Click to edit Master title style
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
01/07/2021 1
Data Management Planning
for researchers
www.geant.org
Sarah Jones
EOSC Engagement Manager
sarah.jones@geant.org
Twitter: @sarahroams
Indonesian RDM webinar series
Friday 2nd July 2021
2. What is a DMP?
Image CC-BY-NC-SA by Leo Reynolds www.flickr.com/photos/lwr/13442910354
3. All manner of things that you produce in
the course of your research
What is research data?
4. “the active management and
appraisal of data over the lifecycle
of scholarly and scientific interest”
Data management is part of
good research practice
What is research data management?
Create
Document
Use
Store
Share
Preserve
5. A short plan that outlines:
• what data will be created and how
• how it will be managed (storage, back-up, access…)
• plans for data sharing and preservation
DMPs are often submitted as part of grant applications,
but are useful whenever researchers are creating data
What is a DMP?
6. 1. Description of data to be collected / created
(i.e. content, type, format, volume...)
2. Standards / methodologies for data collection & management
3. Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. embargoes, confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
5. Strategy for long-term preservation
Five common themes / questions in DMPs
7. Why create a DMP?
Image CC-BY by Ian Dooley https://unsplash.com/photos/DuBNA1QMpPA
10. www.geant.org
What do research funders want?
• A brief plan usually submitted in grant applications
• Some funders may want multiple stages of plans e.g. pre-
award, in-project, final report…
• 1-4 sides of A4 as attachment or a section in application
• Typically a prose statement covering suggested themes
• An outline of data management and sharing plans, justifying
decisions and any limitations
11. www.geant.org
Trend for DMPs to cover more than data
• Wellcome Trust issued new guidelines in 2017 that ask for an
Outputs Management Plan covering:
– datasets generated by your research
– original software created in the course of your research
– new materials you create – like antibodies, cell lines and reagents
– IP such as patents, copyright, design rights and confidential know-how
• The EPSRC has a requirement for Software Management Plans
12. www.geant.org
Why write a DMP / manage your data?
NON PECUNIAE INVESTIGATIONIS CURATORE
SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS
(Not for the research funder, but for life we make data management plans)
• Make your research easier
• Stop yourself drowning in irrelevant stuff
• Save data for later
• Avoid accusations of fraud or bad science
• Write a data paper
• Share your data for re-use
• Get credit for it
14. How can we make a good DMP?
14 |
Image CC-BY by Kelly Sikkema https://unsplash.com/photos/v9FQR4tbIq8
15. www.geant.org
Planning trick 1: think backwards
What data organisation would a re-user like?
CREATING
DATA
PROCESSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
17. www.geant.org
Planning trick 2: include RDM stakeholders
Institution
RDM policy
Facilities
€$£
Research funders
Publishers
Data Availability
policy
Commercial partners
www.openaire.eu/briefpaper-rdm-infonoads
18. www.geant.org
Use the DMP as a talking point
Consulting, supporting and networking with
researchers & all other interest groups
Slide content courtesy of Mari Elisa
Kuusniemi (MEK), University of
Helsinki Library
19. www.geant.org
Planning trick 3: ground your plan in reality
Base plans on available skills, support and good practice
for the field – show it’s feasible to implement
20. www.geant.org
Planning trick 4: plan to share from the outset
Decisions made early on affect what you can do later
• Negotiation on licenses and consent agreement may preclude
later sharing if not careful
• Costings can’t be included retrospectively
• Useful to consider data issues at the consortium negotiation
stage to make sure potential issues are identified and sorted asap
21. Key tools and support
21 |
Image CC-BY by Barn Images https://unsplash.com/photos/t5YUoHW6zRo
22. www.geant.org
DCC support on DMPs
• Webinars and training materials
• How-to guides and other advisory documents
• Checklist on what to cover in DMPs
• Example DMPs
• DMPonline
https://www.dcc.ac.uk/dmps
24. www.geant.org
How does DMPonline work?
Select options to get tailored guidance and support
Guidance and examples from
funders, unis, research
disciplines and others
DMP
Requirements from
funders, institutions
and others
Create Share Review Export Update …..
25. www.geant.org
Many DMP tools available…
Platform Organisation(s) Resource link(s)
DMPRoadmap CDL| DCC | Portage Network | INIST CNRS https://github.com/DMPRoadmap/roadmap
University of Queensland
Research Data Manager
University of Queensland https://research.uq.edu.au/project/research-data-
manager-uqrdm
ReDBox DLC QCIF https://www.redboxresearchdata.com.au/rbdlc.html
RDMOrganiser (RDMO) AIP | FHP | KIT http://rdmorganiser.github.io/en
Data Stewardship Wizard ELIXIR | DTL https://github.com/DataStewardshipPortal
ezDMP IEDA https://www.iedadata.org
Data planning tool UNINETT Sigma2 https://www.sigma2.no/content/data-planning-tool
And more….
Please update at: https://activedmps.org
26. www.geant.org
Managing and sharing data:
a best practice guide
• How to write a DMP
• Formatting your data
• Documentation
• Ethics and consent
• Copyright
• Data sharing
• …
http://data-archive.ac.uk/media/2894/managingsharing.pdf
27. Questions and worked examples
Image Israel Palacio https://unsplash.com/photos/P6FgiDNe6W4
28. www.geant.org
1. Describing data to be collected
• What type of data will you produce?
• What file format(s) will your data be in?
• How much data will be produced?
• How will you create your data?
29. www.geant.org
Data description examples
The final dataset will include self-reported demographic and behavioural data from
interviews with the subjects and laboratory data from urine specimens provided.
From NIH data sharing statements
Every two days, we will subsample E. affinis populations growing under our
treatment conditions. We will use a microscope to identify the life stage and sex of
the subsampled individuals. We will document the information first in a laboratory
notebook and then copy the data into an Excel spreadsheet. The Excel spreadsheet
will be saved as a comma separated value (.csv) file.
From DataOne – E. affinis DMP example
30. www.geant.org
Some formats are better for long-term
It’s preferable to opt for formats that are:
• Uncompressed
• Non-proprietary
• Open, documented
• Standard representation (ASCII, Unicode)
Data centres may have preferred formats for deposit e.g.
Type Recommended Non-preferred
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: https://www.ukdataservice.ac.uk/manage-data/format/recommended-formats.aspx
31. www.geant.org
2. Standards and methodologies
• What metadata and documentation will you record?
• What standards are used in your field?
• How will your data be organised?
• Where will it be stored and backed-up?
32. www.geant.org
Metadata examples
Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format.
The codebook will contain information on study design, sampling methodology,
fieldwork, variable-level detail, and all information necessary for a secondary analyst
to use the data accurately and effectively.
From ICPSR Framework for Creating a DMP
We will first document our metadata by taking careful notes in the laboratory notebook that
refer to specific data files and describe all columns, units, abbreviations, and missing value
identifiers. These notes will be transcribed into a .txt document that will be stored with the
data file. After all of the data are collected, we will then use EML (Ecological Metadata
Language) to digitize our metadata. EML is one of the accepted formats used in ecology, and
works well for the types of data we will be producing. We will create these metadata using
Morpho software, available through KNB. The metadata will fully describe the data files and the
context of the measurements.
From DataOne – E. affinis DMP example
33. www.geant.org
Where to find relevant standards?
Metadata Standards Directory
Broad, disciplinary listing of standards
and tools. Maintained by RDA group
https://rd-alliance.github.io/metadata-
directory
FAIRsharing
A portal of data standards,
databases, and policies
Focused on life, environmental and
biomedical sciences, but expanding
to other disciplines
https://fairsharing.org
34. www.geant.org
3. Ethical and IPR implications
• Are you seeking consent from participants?
• Are you re-using other people’s data?
• Who owns your data or has rights in it?
• Are restrictions on sharing needed?
35. www.geant.org
Examples restrictions
Because the STDs being studied are reportable diseases, we will be collecting identifying
information. Even though the final dataset will be stripped of identifiers prior to release
for sharing, we believe that there remains the possibility of deductive disclosure of
subjects with unusual characteristics. Thus, we will make the data and associated
documentation available to users only under a data-sharing agreement.
From NIH data sharing statements
1. Share data privately within 1 year.
Data will be held in Private Repository, but metadata will be public
2. Release data to public within 2 years.
Encouraged after one year to release data for public access.
3. Request, in writing, data privacy up to 4 years.
Extensions beyond 3 years will only be granted for compelling cases.
4. Consult with creators of private CZO datasets prior to use.
Pis required to seek consent before using private data they can access
From Boulder Creek Critical Zone Observatory DMP
36. www.geant.org
Seek consent for data sharing & preservation
•If you don’t ask, data centres won’t be able to accept
your data – regardless of any conditions on the original
grant or your desire for the data to be shared.
37. www.geant.org
4. Data sharing and reuse
• Are you allowed to share your data?
• Who will you share with and how?
• When and where will you make the data available?
• Do you need to impose conditions on reuse?
• How will you license the data for clarity?
38. www.geant.org
Data sharing examples
We will make the data and associated documentation available to users under a data-sharing
agreement that provides for: (1) a commitment to using the data only for research purposes and not
to identify any individual participant; (2) a commitment to securing the data using appropriate
computer technology; and (3) a commitment to destroying or returning the data after analyses are
completed.
From NIH data sharing statements
The videos will be made available via the bristol.ac.uk website (both as streaming media and downloads) HD and
SD versions will be provided to accommodate those with lower bandwidth. Videos will also be made available via
Vimeo, a platform that is already well used by research students at Bristol. Appropriate metadata will also be
provided to the existing Vimeo standard.
All video will also be available for download and re-editing by third parties. To facilitate this Creative Commons
licenses will be assigned to each item. In order to ensure this usage is possible, the required permissions will be
gathered from participants (using a suitable release form) before recording commences.
From University of Bristol Kitchen Cosmology DMP
40. www.geant.org
5. Preservation
• Which data do you need to keep?
• Will you deposit your data in a repository?
• Do you need to prepare it for deposit?
41. www.geant.org
Archiving examples
Data will be provided in file formats considered appropriate for long-term access, as
recommended by the UK Data Service. For example, SPSS Portal format and tab-
delimited text for qualitative tabular data and RTF and PDF/A for interview
transcripts. Appropriate documentation necessary to understand the data will
also be provided. Anonymised data will be held for a minimum of 10 years
following project completion, in compliance with LSHTM’s Records Retention and
Disposal Schedule. Biological samples (output 3) will be deposited with the UK
BioBank for future use.
From Writing a Wellcome Trust Data Management and Sharing Plan
The investigators will work with staff at the UKDA to determine what to archive and
how long the deposited data should be retained. Future long-term use of the data
will be ensured by placing a copy of the data into the repository.
From ICPSR Framework for Creating a DMP
42. www.geant.org
Lists of repositories to choose from
http://databib.org
http://service.re3data.org/search
Zenodo
• OpenAIRE-CERN joint effort
• Multidisciplinary repository
• Multiple data types
– Publications
– Long tail of research data
• Citable data (DOI)
• Links to funder, publications, data
& software
www.zenodo.org
44. www.geant.org
Example DMPs
• Public plans on DMPonline
https://dmponline.dcc.ac.uk/public_plans
• Plans from several funders and disciplines via DCC
www.dcc.ac.uk/resources/data-management-plans/guidance-examples
• 108 DMPs from the National Endowment for the Humanities
https://www.neh.gov/sites/default/files/inline-files/dmp_from_successful_grants.zip
• LIBER DMP catalogue in Zenodo
• https://libereurope.eu/working-group/research-data-management/plans
• DMPs published in RIO journal
• http://riojournal.com/browse_user_collection_documents.php?collection_id=3&journal_id=17
45. www.geant.org
Key messages
• Data management is part of good practice whether you
plan to make the data open or not
– it benefits you!
• Seek advice when developing your DMP - consider good
practice for your field
• Base plans on available skills & support so
implementation is feasible
• Justify decisions – particularly restrictions or costs