SlideShare a Scribd company logo
1 of 40
Manging and Sharing
Research Data
Sarah Jones
Digital Curation Centre, Glasgow
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
Webinar for Diponegoro University, Indonesia, 3rd September 2019
What is the Digital Curation Centre?
“a centre of expertise in digital information curation
with a focus on building capacity, capability and skills
for research data management across the UK's
higher education research community”
www.dcc.ac.uk
Training | Events | Tools | Advocacy | Consultancy | Guidance | Publications | Projects
What is RDM?
Image CC-BY-SA by Janneke Staaks www.flickr.com/photos/jannekestaaks/14411397343
What is Research Data Management?
Create
Document
Use
Store
Share
Preserve
“the active management and
appraisal of data over the
lifecycle of scholarly and
scientific interest”
Data management is part of
good research practice
What is involved in RDM?
• Data Management Planning
• Data creation
• Annotating / documenting data
• Analysis, use, versioning
• Storage and backup
• Publishing papers and data
• Preparing for deposit
• Archiving and sharing
• Licensing
• Citing…
Create
Document
Use
Store
Share
Preserve
Why manage and share data?
Direct benefits for you
• To make your research
easier!
• Stop yourself drowning
in irrelevant stuff
• Make sure you can
understand and reuse
your data again later
• Advance your career –
data is growing in
significance
Research integrity
• To avoid accusations of
fraud or bad science
• Evidence findings and
enable validation of
research methods
• Meet codes of practice
on research conduct
• Many research funders
worldwide now require
Data Management and
Sharing Plans
Potential to share data
• So others can reuse and
build on your data
• To gain credit – several
studies have shown
higher citation rates
when data are shared
• For greater visibility,
impact and new
research collaborations
• Promote innovation and
allow research in your
field to advance faster
Why make data available?
Sharing leads to breakthroughs
www.nytimes.com/2010/08/13/health/research
/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science
the way most of us have practiced in
our careers. But we all realised that
we would never get biomarkers
unless all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
...and increases the speed of discovery
Benefits for researchers:
sharing data increases citations!
Want evidence?
• Piwowar, Vision – 9% (microarray data)
• Drachen, Dorch, et al – 25-40%, astronomy
• Gleditch, et al – doubling to trebling (international relations)
Open Data Citation Advantage
http://sparceurope.org/open-data-citation-advantage
Benefits for institutions
“If an institution spent A$10 million on data, what
would be the return? The answer is: more publications;
an increased citation count; more grants; greater
profile; and more collaboration.”
Dr Ross Wilkinson, ANDS
www.ariadne.ac.uk/issue72/oar-2013-rpt
Creating data
Image CC-SA-ND by Bill Dickinson www.flickr.com/photos/skynoir/8270436894
Data creation tips
• Ensure consent forms, licences and agreements
don’t restrict opportunities to share data
• Choose appropriate formats
• Adopt a file naming convention
• Create metadata and documentation as you go
Ask for consent for data sharing
If not, data centres won’t be able to accept the data
– regardless of any conditions on the original grant.
www.data-archive.ac.uk/create-manage/consent-ethics/consent?index=3
Choose appropriate file formats
Different formats are good for different things
- open, lossless formats are more sustainable e.g. rtf, xml, tif, wav
- proprietary and/or compressed formats are less preservable but are often
in widespread use e.g. doc, jpg, mp3
One format for analysis then
convert to a standard format
Data centres may suggest preferred formats for deposit
www.data-archive.ac.uk/create-manage/format/formats-table
BioformatsConverter batch converts a
variety of proprietary microscopy image
formats to the Open Microscopy
Environment format - OME-TIFF
What is metadata?
Metadata
• Standardised
• Structured
• Machine and human
readable
Metadata helps to cite &
disambiguate data
Documentation aids reuse
Metadata
Documentation
Metadata standards
These can be general – such as Dublin Core
Or discipline specific
– Data Documentation Initiative (DDI) – social science
– Ecological Metadata Language (EML) - ecology
– Flexible Image Transport System (FITS) – astronomy
Search for standards in catalogues like:
http://rd-alliance.github.io/metadata-directory
“MTBLS1: A metabolomic study of urinary changes
in type 2 diabetes in……”
Why are ontologies important?
Example courtesy of Ken Haug, European Bioinformatics Institute (EMBL-EBI)
Documentation
Think about what is needed in order to evaluate,
understand, and reuse the data.
• Why was the data created?
• Have you documented what you did and how?
• Did you develop code to run analyses? If so, this
should be kept and shared too.
• Important to provide wider context for trust
ReadMe files
We recommend that a ReadMe be a plain text file containing the following:
• for each filename, a short description of what data it includes,
optionally describing the relationship to the tables, figures, or sections
within the accompanying publication
• for tabular data: definitions of column headings and row labels; data
codes (including missing data); and measurement units
• any data processing steps, especially if not described in the publication,
that may affect interpretation of results
• a description of what associated datasets are stored elsewhere, if
applicable
• whom to contact with questions
http://datadryad.org/pages/readme
Example template: https://www.lib.umn.edu/datamanagement/metadata
Managing data
Image ‘tools’ CC-BY by zzpza www.flickr.com/photos/zzpza/3269784239
Where will you store the data?
• Your own device (laptop, flash drive, server etc.)
– And if you lose it? Or it breaks?
• Departmental drives or university servers
• “Cloud” storage
– Do they care as much about your data as you do?
The decision will be based on how sensitive your data
are, how robust you need the storage to be, and
who needs access to the data and when
How to keep you data secure?
Develop a practical solution that fits your circumstances
– Store your data on managed servers
– Restrict access to collaborators or smaller subset
– Encrypt mobile devices carrying sensitive information
– Keep anti-virus software up-to-date
– Use secure data services for long-term sharing
www.wsj.com/articles/SB10001424052748703843804575534122591921594
Collaborative platforms e.g. OSF
https://osf.io
Data-specific platforms e.g. OMERO
Open Microscopy Environment
http://www.openmicroscopy.org/site/products/omero
Third-party tools for collaboration
ownCloud
• Open source product
with Dropbox-like
functionality
• Used by many unis
and service providers
to offer ‘approved’
solution
https://owncloud.org
Using Dropbox and other cloud
services – LSE guidelines
http://www.lse.ac.uk/intranet/LSEServices/
IMT/guides/softwareGuides/other/usingDr
opboxCloudStorageServices.aspx
University RDM services e.g. Edinburgh
• DataStore
• Compute & Data Facility (HPC)
• DataSync
• Wiki service
• Subversion
• Electronic Lab Notebook
• DataShare repository
• DataVault
• Pure (research info)
• Secure data service
www.ed.ac.uk/information-services/research-support/research-data-service
CCimagebymomboleumonFlickr
One copy = risk of data loss
Who will do the backup?
Use managed services where possible (e.g. University
filestores rather than local or external hard drives), so
backup is done automatically
3… 2… 1… backup!
at least 3 copies of a file
on at least 2 different media
with at least 1 offsite
Ask central IT team for advice
Data sharing
Image CC-BY-NC-ND by talkingplant www.flickr.com/photos/talkingplant/2256485110
How to make data open?
1. Choose your dataset(s)
– What can you may open? You may need to revisit this step if you encounter
problems later.
2. Apply an open license
– Determine what IP exists. Apply a suitable licence e.g. CC-BY
3. Make the data available
– Provide the data in a suitable format. Use repositories.
4. Make it discoverable
– Post on the web, register in catalogues…
https://okfn.org
DCC how-to guide: www.dcc.ac.uk/resources/how-guides/license-research-data
License research data openly
EUDAT licensing tool
Answer questions to determine which licence(s) are
appropriate to use
http://ufal.github.io/lindat-license-selector
Deposit in a data repository
http://databib.org
www.re3data.org
The Re3data catalogue can be searched to find a home for data
www.fosteropenscience.eu
/content/re3data-demo
National / domain repositories
BioSharing portal of
databases in life sciences
www.re3data.org
https://biosharing.org
Zenodo is a multi-disciplinary repository that can be
used for the long-tail of research data
• An OpenAIRE-CERN joint effort
• Multidisciplinary repository accepting
– Multiple data types
– Publications
– Software
• Assigns a Digital Object Identifier (DOI)
• Links funding, publications, data & software
www.zenodo.org
Zenodo
Archiving code in Zenodo
Get a DOI for each release
https://guides.github.com/
activities/citable-code
Citing research data: why?
http://ands.org.au/cite-data
How to cite data
www.dcc.ac.uk/resources/briefing-papers/introduction-
curation/data-citation-and-linking
Key citation elements
• Author
• Publication date
• Title
• Location (= identifier)
• Funder (if applicable)
How do you share data effectively?
• Use appropriate repositories, this
catalogue is a good place to start
http://www.re3data.org
• Document and describe it enough for
others to understand, use and cite
http://www.dcc.ac.uk/resources/how-
guides/cite-datasets
• Licence it so others can reuse
www.dcc.ac.uk/resources/how-guides/license-
research-data
Thanks for listening
For DCC resources see:
www.dcc.ac.uk/resources
Follow DCC us on twitter:
@digitalcuration and #ukdcc

More Related Content

What's hot

What's hot (20)

The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Introduction to Open Science and EOSC
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSC
 
Most Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital EconomyMost Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital Economy
 
Open, FAIR data and RDM
Open, FAIR data and RDMOpen, FAIR data and RDM
Open, FAIR data and RDM
 
The path to success with graph database and graph data science_ Neo4j GraphSu...
The path to success with graph database and graph data science_ Neo4j GraphSu...The path to success with graph database and graph data science_ Neo4j GraphSu...
The path to success with graph database and graph data science_ Neo4j GraphSu...
 
Where Data Architecture and Data Governance Collide
Where Data Architecture and Data Governance CollideWhere Data Architecture and Data Governance Collide
Where Data Architecture and Data Governance Collide
 
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4jAdobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
 
The Data Unicorns
The Data UnicornsThe Data Unicorns
The Data Unicorns
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
What it means to be FAIR
What it means to be FAIRWhat it means to be FAIR
What it means to be FAIR
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
CKAN as an open-source data management solution for open data
CKAN as an open-source data management solution for open data CKAN as an open-source data management solution for open data
CKAN as an open-source data management solution for open data
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 

Similar to Intro to RDM

Similar to Intro to RDM (20)

The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for Librarians
 
Introduction to RDM for trainee physicians
Introduction to RDM for trainee physiciansIntroduction to RDM for trainee physicians
Introduction to RDM for trainee physicians
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
DMP health sciences
DMP health sciencesDMP health sciences
DMP health sciences
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data Things
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Introduction to Data Management and Sharing
Introduction to Data Management and SharingIntroduction to Data Management and Sharing
Introduction to Data Management and Sharing
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 

More from Sarah Jones

More from Sarah Jones (20)

Data training tips and tricks
Data training tips and tricksData training tips and tricks
Data training tips and tricks
 
EOSC and libraries
EOSC and librariesEOSC and libraries
EOSC and libraries
 
EOSC Association priorities and activities
EOSC Association priorities and activitiesEOSC Association priorities and activities
EOSC Association priorities and activities
 
Managing and sharing data: lessons from the European context
Managing and sharing data: lessons from the European contextManaging and sharing data: lessons from the European context
Managing and sharing data: lessons from the European context
 
Reflections on Open Science
Reflections on Open ScienceReflections on Open Science
Reflections on Open Science
 
MAR comments analysis
MAR comments analysisMAR comments analysis
MAR comments analysis
 
EOSC-MAR-update.pptx
EOSC-MAR-update.pptxEOSC-MAR-update.pptx
EOSC-MAR-update.pptx
 
Intro-EOSC.pptx
Intro-EOSC.pptxIntro-EOSC.pptx
Intro-EOSC.pptx
 
Why is EOSC so hard?
Why is EOSC so hard?Why is EOSC so hard?
Why is EOSC so hard?
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
 
Is Europe ready for Open Science
Is Europe ready for Open ScienceIs Europe ready for Open Science
Is Europe ready for Open Science
 
DMPonline: 10 years, 10 lessons
DMPonline: 10 years, 10 lessonsDMPonline: 10 years, 10 lessons
DMPonline: 10 years, 10 lessons
 
Do & don't of supporting Open Science
Do & don't of supporting Open ScienceDo & don't of supporting Open Science
Do & don't of supporting Open Science
 
Why institutions need to raise their capabilities to support FAIR
Why institutions need to raise their capabilities to support FAIRWhy institutions need to raise their capabilities to support FAIR
Why institutions need to raise their capabilities to support FAIR
 
It takes more than a village: lessons on building global research commons
It takes more than a village: lessons on building global research commonsIt takes more than a village: lessons on building global research commons
It takes more than a village: lessons on building global research commons
 
DMPTuuli - what's new?
DMPTuuli - what's new?DMPTuuli - what's new?
DMPTuuli - what's new?
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiatives
 
Reflections on EOSC through the mirror of ARDC
Reflections on EOSC through the mirror of ARDCReflections on EOSC through the mirror of ARDC
Reflections on EOSC through the mirror of ARDC
 
Future EOSC roadmap
Future EOSC roadmapFuture EOSC roadmap
Future EOSC roadmap
 
Global Open Research Commons IG
Global Open Research Commons IGGlobal Open Research Commons IG
Global Open Research Commons IG
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Intro to RDM

  • 1. Manging and Sharing Research Data Sarah Jones Digital Curation Centre, Glasgow sarah.jones@glasgow.ac.uk Twitter: @sjDCC Webinar for Diponegoro University, Indonesia, 3rd September 2019
  • 2. What is the Digital Curation Centre? “a centre of expertise in digital information curation with a focus on building capacity, capability and skills for research data management across the UK's higher education research community” www.dcc.ac.uk Training | Events | Tools | Advocacy | Consultancy | Guidance | Publications | Projects
  • 3. What is RDM? Image CC-BY-SA by Janneke Staaks www.flickr.com/photos/jannekestaaks/14411397343
  • 4. What is Research Data Management? Create Document Use Store Share Preserve “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” Data management is part of good research practice
  • 5. What is involved in RDM? • Data Management Planning • Data creation • Annotating / documenting data • Analysis, use, versioning • Storage and backup • Publishing papers and data • Preparing for deposit • Archiving and sharing • Licensing • Citing… Create Document Use Store Share Preserve
  • 6. Why manage and share data? Direct benefits for you • To make your research easier! • Stop yourself drowning in irrelevant stuff • Make sure you can understand and reuse your data again later • Advance your career – data is growing in significance Research integrity • To avoid accusations of fraud or bad science • Evidence findings and enable validation of research methods • Meet codes of practice on research conduct • Many research funders worldwide now require Data Management and Sharing Plans Potential to share data • So others can reuse and build on your data • To gain credit – several studies have shown higher citation rates when data are shared • For greater visibility, impact and new research collaborations • Promote innovation and allow research in your field to advance faster
  • 7. Why make data available?
  • 8. Sharing leads to breakthroughs www.nytimes.com/2010/08/13/health/research /13alzheimer.html?pagewanted=all&_r=0 “It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.” Dr John Trojanowski, University of Pennsylvania ...and increases the speed of discovery
  • 9. Benefits for researchers: sharing data increases citations! Want evidence? • Piwowar, Vision – 9% (microarray data) • Drachen, Dorch, et al – 25-40%, astronomy • Gleditch, et al – doubling to trebling (international relations) Open Data Citation Advantage http://sparceurope.org/open-data-citation-advantage
  • 10. Benefits for institutions “If an institution spent A$10 million on data, what would be the return? The answer is: more publications; an increased citation count; more grants; greater profile; and more collaboration.” Dr Ross Wilkinson, ANDS www.ariadne.ac.uk/issue72/oar-2013-rpt
  • 11. Creating data Image CC-SA-ND by Bill Dickinson www.flickr.com/photos/skynoir/8270436894
  • 12. Data creation tips • Ensure consent forms, licences and agreements don’t restrict opportunities to share data • Choose appropriate formats • Adopt a file naming convention • Create metadata and documentation as you go
  • 13. Ask for consent for data sharing If not, data centres won’t be able to accept the data – regardless of any conditions on the original grant. www.data-archive.ac.uk/create-manage/consent-ethics/consent?index=3
  • 14. Choose appropriate file formats Different formats are good for different things - open, lossless formats are more sustainable e.g. rtf, xml, tif, wav - proprietary and/or compressed formats are less preservable but are often in widespread use e.g. doc, jpg, mp3 One format for analysis then convert to a standard format Data centres may suggest preferred formats for deposit www.data-archive.ac.uk/create-manage/format/formats-table BioformatsConverter batch converts a variety of proprietary microscopy image formats to the Open Microscopy Environment format - OME-TIFF
  • 15. What is metadata? Metadata • Standardised • Structured • Machine and human readable Metadata helps to cite & disambiguate data Documentation aids reuse Metadata Documentation
  • 16. Metadata standards These can be general – such as Dublin Core Or discipline specific – Data Documentation Initiative (DDI) – social science – Ecological Metadata Language (EML) - ecology – Flexible Image Transport System (FITS) – astronomy Search for standards in catalogues like: http://rd-alliance.github.io/metadata-directory
  • 17. “MTBLS1: A metabolomic study of urinary changes in type 2 diabetes in……” Why are ontologies important? Example courtesy of Ken Haug, European Bioinformatics Institute (EMBL-EBI)
  • 18. Documentation Think about what is needed in order to evaluate, understand, and reuse the data. • Why was the data created? • Have you documented what you did and how? • Did you develop code to run analyses? If so, this should be kept and shared too. • Important to provide wider context for trust
  • 19. ReadMe files We recommend that a ReadMe be a plain text file containing the following: • for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication • for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units • any data processing steps, especially if not described in the publication, that may affect interpretation of results • a description of what associated datasets are stored elsewhere, if applicable • whom to contact with questions http://datadryad.org/pages/readme Example template: https://www.lib.umn.edu/datamanagement/metadata
  • 20. Managing data Image ‘tools’ CC-BY by zzpza www.flickr.com/photos/zzpza/3269784239
  • 21. Where will you store the data? • Your own device (laptop, flash drive, server etc.) – And if you lose it? Or it breaks? • Departmental drives or university servers • “Cloud” storage – Do they care as much about your data as you do? The decision will be based on how sensitive your data are, how robust you need the storage to be, and who needs access to the data and when
  • 22. How to keep you data secure? Develop a practical solution that fits your circumstances – Store your data on managed servers – Restrict access to collaborators or smaller subset – Encrypt mobile devices carrying sensitive information – Keep anti-virus software up-to-date – Use secure data services for long-term sharing www.wsj.com/articles/SB10001424052748703843804575534122591921594
  • 23. Collaborative platforms e.g. OSF https://osf.io
  • 24. Data-specific platforms e.g. OMERO Open Microscopy Environment http://www.openmicroscopy.org/site/products/omero
  • 25. Third-party tools for collaboration ownCloud • Open source product with Dropbox-like functionality • Used by many unis and service providers to offer ‘approved’ solution https://owncloud.org Using Dropbox and other cloud services – LSE guidelines http://www.lse.ac.uk/intranet/LSEServices/ IMT/guides/softwareGuides/other/usingDr opboxCloudStorageServices.aspx
  • 26. University RDM services e.g. Edinburgh • DataStore • Compute & Data Facility (HPC) • DataSync • Wiki service • Subversion • Electronic Lab Notebook • DataShare repository • DataVault • Pure (research info) • Secure data service www.ed.ac.uk/information-services/research-support/research-data-service
  • 28. Who will do the backup? Use managed services where possible (e.g. University filestores rather than local or external hard drives), so backup is done automatically 3… 2… 1… backup! at least 3 copies of a file on at least 2 different media with at least 1 offsite Ask central IT team for advice
  • 29. Data sharing Image CC-BY-NC-ND by talkingplant www.flickr.com/photos/talkingplant/2256485110
  • 30. How to make data open? 1. Choose your dataset(s) – What can you may open? You may need to revisit this step if you encounter problems later. 2. Apply an open license – Determine what IP exists. Apply a suitable licence e.g. CC-BY 3. Make the data available – Provide the data in a suitable format. Use repositories. 4. Make it discoverable – Post on the web, register in catalogues… https://okfn.org
  • 31. DCC how-to guide: www.dcc.ac.uk/resources/how-guides/license-research-data License research data openly
  • 32. EUDAT licensing tool Answer questions to determine which licence(s) are appropriate to use http://ufal.github.io/lindat-license-selector
  • 33. Deposit in a data repository http://databib.org www.re3data.org The Re3data catalogue can be searched to find a home for data www.fosteropenscience.eu /content/re3data-demo
  • 34. National / domain repositories BioSharing portal of databases in life sciences www.re3data.org https://biosharing.org
  • 35. Zenodo is a multi-disciplinary repository that can be used for the long-tail of research data • An OpenAIRE-CERN joint effort • Multidisciplinary repository accepting – Multiple data types – Publications – Software • Assigns a Digital Object Identifier (DOI) • Links funding, publications, data & software www.zenodo.org Zenodo
  • 36. Archiving code in Zenodo Get a DOI for each release https://guides.github.com/ activities/citable-code
  • 37. Citing research data: why? http://ands.org.au/cite-data
  • 38. How to cite data www.dcc.ac.uk/resources/briefing-papers/introduction- curation/data-citation-and-linking Key citation elements • Author • Publication date • Title • Location (= identifier) • Funder (if applicable)
  • 39. How do you share data effectively? • Use appropriate repositories, this catalogue is a good place to start http://www.re3data.org • Document and describe it enough for others to understand, use and cite http://www.dcc.ac.uk/resources/how- guides/cite-datasets • Licence it so others can reuse www.dcc.ac.uk/resources/how-guides/license- research-data
  • 40. Thanks for listening For DCC resources see: www.dcc.ac.uk/resources Follow DCC us on twitter: @digitalcuration and #ukdcc

Editor's Notes

  1. Data as institutional crown jewels, special collections, the new oil...
  2. From Cambridge University workshop, 33 participants Asked what species the study was on and a range of answers denoting humans were provided Humans can see these all mean the same things but computers can’t