An Introduction to Research Data Management: slides from a presentation given online on May 12 2022, by Beth Knazook, Project Manager, Research Data. Covers topics such as: what are research data; why share research data; why DMPs are important; and where should you share your data?
1. introduction to research data management
May 12, 2022
Beth Knazook, Project Manager, Research Data
b.knazook@ria.ie & @bethknazook
2. agenda
• what are research data?
• what are the benefits to producing and sharing open data?
• what is a Data Management Plan (DMP) and what can it do for
you?
• where to share your data?
3. “Data may exist only in the eye of the beholder: The
recognition that an observation, artifact, or record constitutes
data is itself a scholarly act.”
(Borgman, Christine L. 2012. “The Conundrum of Sharing Research Data”. Journal of the American Society for
Information Science and Technology, 63(6): 1061. DOI: 10.1002/asi.22634)
what are research data?
4. what are research data?
● research data are the primary sources of information used to
support scientific enquiry, research and scholarship
● research data can include many types of data:
○ Observational
○ Computational
○ Experimental
○ Derived
● research data are a valuable asset
5. research data can take many forms:
textual data
audio files
numerical tables
survey or questionnaire
responses
interviews (recordings,
transcripts)
images (born-digital or
digitized, moving or still)
geospatial information
content or thematic analyses
artists’ notes
genomic information
… etc!
6. why share research data?
➤ because it is good practice.
➤ because it is practical.
➤ because it is and/or will be required.
“…the mere publication and display of content on the web is not enough to make
a project a part of Open Science. As a new way of doing science by allowing the
users to process the underlying data of a publication with tools, instead of just
perusing it, Open Science requires Open Data and Open Process on top of Open
Access.”
“Winter School - Open Data Citation for Social Science and Humanities” DARIAH-CAMPUS.
https://campus.dariah.eu/resource/events/open-data-citation-for-social-science-and-humanities
8. Say “no” to single-use data!
8
why share research data?
9. sharing data can make research more ethical
● promotes data integrity & reproducibility
● sharing human participant data encourages better practices in informed
consent & participant autonomy. Good sharing practices ensure
participants decide about data sharing and understand their
contribution to science.
● gives access to historic controls, multi-site studies, reproducibility studies,
and comparison of change studies
9
why share research data?
10. Findable, Accessible, Interoperable, Reuseable
F
A
I
R
Is the data easy for both humans and computers to find?
persistent identifier
rich metadata
indexed in a searchable resource.
Once the user finds the required data, can they access it?
mediated without specialised or proprietary tools or communication methods
provide the exact conditions under which the data are accessible
Can the data be integrated with other data?
formal, accessible, shared language for knowledge representation
references to other (meta)data
Is the data reproducible?
adequate context
accurate and relevant attributes
clear and accessible data usage licence
detailed provenance
Adapted from Bishop, B. W., & Hank, C. (2018). Measuring FAIR principles to inform fitness for use.
11. Plan Create Process Analyse Disseminate Preserve Reuse
how do we share data?
➤ a DMP is a formal document which clearly articulates the
strategies and tools you will implement to effectively manage
your data
➤ it is also a “living” document that can be modified throughout
your project to reflect any changes that have occurred
START
the research data management lifecycle
12. 15
why are DMPs important?
a DMP is important to the research process because it can help
you to:
➤ set out consistent strategies prior to starting research for
how data will be managed throughout its entire lifecycle
➤ identify strengths & weaknesses in current practices and
make decisions on how to adopt effective data
management practices
➤ prepare data for future reuse, preservation and sharing
➤ plan for and reduce overall cost of research by improving
project efficiencies and data management practices.
13. a DMP provides information across key research lifecycle categories:
DMP contents
DMP
Data
Collection
Documentation
& Metadata
Storage &
Backup
Preservation
Ethics & Legal
Compliance
Responsibilities
& Resources
Sharing &
Reuse
16
14. data collection
17
Include descriptions
of how you will
collect data,
including from where
and in what format(s)
Describe any
software and/or
platforms that will
be used for data
collection
Provide an
estimate of the
amount of data
you will collect
(e.g., MBs/GBs/TBs).
Explain how you will
organize your data,
including details
relating both to file
naming and versioning
Clearly explain how
you will both store
and transfer data
15. documentation & metadata
18
Choose a metadata
standard suited to your
discipline and/or
chosen data repository,
or provide rationale for
creating your own.
Describe how you
will consistently
capture
documentation
throughout the
project
Describe what
information will be
needed for others
to understand or
reuse your data
16. storage & backup
19
Describe how
collaborators or
research team will
access, modify,
contribute, and
work with your data.
State a data
backup
schedule,
automatic being
most ideal
Provide an estimate of
storage space needed
during the active phases
of your research:
Remember to take into
account file versioning,
backups, and data growth!
17. preservation
20
Consider optimal file formats
for supporting long-term
preservation: Optimally
preserved data are easily
accessible and used by anyone,
without requiring proprietary
software.
Not all data that you create
necessarily needs to be preserved:
Consider such things as the value
of your data, funding
requirements, etc., and decide
which, if any, should be preserved.
Consult with experts as needed!
18. sharing & reuse
21
Consider the appropriate
sharing of your data,
including any funding or
confidentiality
requirements
Consult with
colleagues or librarians
to choose an
appropriate data
repository, or search
re3data.org
Explain what uses
can be made of
your data through
licenses like
Creative
Commons
If applicable, describe
how you will ensure file
integrity,
anonymization and de-
identification.
Choose a repository that
assigns permanent
identifiers to datasets (e.g.,
DOI) to enhance
discoverability,
accessibility, and citability.
19. responsibilities & resources
22
Estimate and describe any
required resources and
costs for data management
and long-term access to
your data.
Identify data stewardship roles
and responsibilities of project
members and other
organizations during and after
the project
20. ethics & legal compliance
23
Describe how you will
ensure data is securely
managed after the
project is completed,
in accordance with any
ethical obligations,
including management
of sensitive data
Explain how you will
comply with any
applicable privacy
legislation, funding
and institutional
requirements.
Describe if there are
any legal, ethical,
and intellectual
property issues
when managing
and sharing your
data.
21. general guidelines for DMPs
24
Begin by providing
a description of
your research
project, its focus,
and purpose
Avoid discipline
specific jargon: Your
DMP should be easily
understood by
anyone!
Provide
clarification for
any acronyms
used
Provide rationale for
decisions: Help
others understand
why you have made a
decision!
Your DMP is a
living document:
Update it as
needed!
Avoid leaving
sections or questions
blank
22. what are some helpful DMP tools?
● there is a DMP creation tool available through the Data Curation Centre in
the UK: https://dmponline.dcc.ac.uk
● you can also browse publicly available discipline- and methodology-
specific DMPs to inform your own DMP:
https://dmponline.dcc.ac.uk/public_plans
23. where should you share your data?
repository selection considerations:
● domain-specific vs. generalist
● institutional vs. national or international
● how much data do you have?
● preservation capabilities
● persistent identifiers
● clear terms, conditions, and licensing