User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
Open Research Data in H2020
Marjan Grootveld
OpenAIRE webinar, 26 October 2016
2. Who we are
Open Access Infrastructure for Research in Europe
www.openaire.eu
3. DANS: Data Archiving and Networked
Services
Institute of Dutch
Academy and
Research Funding
Organisation
(KNAW & NWO)
since 2005
First predecessor
dates back to
1964 (Steinmetz
Foundation),
Historical Data
Archive 1989
Mission: promote
and provide
permanent access
to digital research
information
4. 4
DataverseNL for
short- and mid-
term storage
EASY: certified long-term
Electronic Archiving
System for self-deposit
NARCIS: Gateway
to scholarly
information in the
Netherlands
Research data in context
5. Contents
• Brief recap from recent OpenAIRE-EUDAT webinars
• The updated Guidelines for FAIR Data Management:
• F, A, I, R
• Costs, data security, ethical aspects, other RDM procedures
• Recommendations
• Links to EC and OpenAIRE information
5
6. Recent webinars
Introductory RDM webinar, Tony Ross-Hellauer & Sarah Jones, 26 May:
• Reasons to manage data
• How to manage and share data (+ how to respond to concerns about
sharing)
• EUDAT & OpenAIRE services
Q&A document: https://b2drop.eudat.eu/s/0H6qRgwdwkAVFvD#pdfviewer
“How to write a DMP”, Sarah Jones & Marjan Grootveld, 7/14 July:
• What is a Data Management Plan and why to write it?
• Example DMPs in different domains, with lots of links!
• Lessons and guidance (e.g. storing =/= archiving; how to find a
repository; file-naming conventions)
All recordings and slides are on https://eudat.eu/events/webinars
https://www.eudat.eu Research Data Services, Expertise & Technology
6
7. Recap: why manage data?
(Not for the research funder, but for life we make data management plans)
Make your research easier
Stop yourself drowning in irrelevant stuff
Save data for later
Avoid accusations of fraud or bad science
Write a data paper, connect your nano publications
Share your data for re-use & get them validated in real life
Get credit for it
7
NON PECUNIAE INVESTIGATIONIS CURATORE
SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS
9. Horizon 2020: Open Research Data Pilot
The use of a Data Management Plan (DMP) is
required for projects participating in the Open
Research Data Pilot, detailing what data the
project will generate, whether and how they will
be exploited or made accessible for verification
and re-use, and how they will be curated and
preserved.
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
9
10. Guidelines on FAIR DM v.3
Structure of the Guidelines:
1.Background: extension of the pilot
2.DMP general definition
3.Proposal, submission and evaluation
4.RDM plans during the project life cycle
5.Support
6.Annex 1: the DMP template
1. Data summary
2. FAIR data
3. Allocation of resources
4. Data security
5. Ethical aspects
6. Other issues
7. Summary table “Fair DM at a glance”
10
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
11. What’s new?
• You should develop a DMP for your project.
• There is a single DMP template from start to finish.
• The DMP template is inspired by the FAIR principles:
research data should be findable, accessible, interoperable
and re-usable (without suggesting any specific technology,
standard, or implementation solution).
Also explicit in the new guidelines:
• From 1-1-2017 the pilot will cover all thematic areas of
Horizon 2020.
• Costs related to open access to research data are eligible
for reimbursement during the duration of the project under
the conditions defined in the Grant Agreement.
11
12. Good things that remain
Whether a (proposed) project participates in the
ORD pilot or chooses to opt out does not affect
the evaluation of that project: proposals will not
be penalised for opting out.
Participating in the ORD pilot does
not necessarily mean opening up all
your research data: as open as
possible, as closed as necessary.
The DMP is a living document.
You are not required to
provide detailed answers to all
the questions in the first
version of the DMP (due M6).
Deposit in a research data repository:
a. the data needed to validate the results presented
in scientific publications, including the metadata;
b. any other data, including the metadata, as
specified in the DMP;
c. plus for a-b the documentation and the tools
that are needed to validate the results, e.g.
specialised software or software code, algorithms
and analysis protocols (when possible, these
instruments themselves).
12
13. DMPonline
A web-based tool to help researchers write DMPs
Guidance from EUDAT and OpenAIRE being added
https://dmponline.dcc.ac.uk
Choose your
funder to get
their specific
template
Choose any
additional
optional
guidance
13
14. §2 Making data FAIR
Findable
– Assign persistent IDs, provide rich metadata, register in a
searchable resource, ...
Accessible
– Retrievable by their ID using a standard protocol, metadata remain
accessible even if data aren’t...
Interoperable
– Use formal, broadly applicable languages, use standard
vocabularies, qualified references...
Reusable
– Rich, accurate metadata, clear licences, provenance, use of
community standards...
14
www.force11.org/group/fairgroup/fairprinciples and http://www.nature.com/articles/sdata201618
15. EC in the Guidelines: “This template is not intended as a strict
technical implementation of the FAIR principles, it is rather inspired
by FAIR as a general concept.”
EC Infographic:
http://ec.europa.eu/research/images/infographics/policy/open-data-2016-
w920.png
15
16. Some F questions
2.1 Making data findable, including provisions for metadata
• Use metadata and specify standards for metadata creation
(if any). If there are no standards in your discipline
describe what type of metadata will be created and how.
• Search keywords
• Persistent and unique identifiers such as DOI
• File and folder naming conventions: see OpenAIRE-EUDAT
July webinar
• Versioning of the datasets and clear version numbers
16
17. Metadata and documentation
• Metadata and documentation is needed to find and
understand research data.
• Think about what others would need in order to find,
evaluate, understand, and reuse your data.
• Get others to check the metadata to improve quality.
• Use standards to enable interoperability.
http://rd-alliance.github.io/metadata-directory
17
18. Some A questions
2.2 Making data openly accessible:
• Explain which data can’t be shared openly, if any
• Specify how access will be provided in case of restrictions,
e.g. through a data committee, a license, or arranged with
the repository.
• Will methods or software tools needed to access the data
(if any) be included or documented?
• Deposit the data and associated metadata, documentation
and code preferably in certified repositories which support
Open Access.
Data Seal of Approval
ICSU World Data System
nestor seal
ISO 16363
18
19. Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repository
Zenodo: http://www.zenodo.org Re3data.org: http://www.re3data.org
19
20. File format considerations
No clearcut definitions of “sustainable file format”.
Each archives has its own expertise, related to its designated
community. Examples:
http://dans.knaw.nl/en/deposit/information-about-depositing-data?set_language=en
http://researchdata.4tu.nl/en/publishing-research/data-description-and-formats/
4TU.ResearchData DANS
Level 1 Level 2 or 3 Preferred Accepted
audio .wav .ra, .mp3, .wma .wav, .flac .aiff, .mp3, .aac
chemistry NMR, ChemDoodle, ….pdb, .xyz
databases
delimited flat file
w/DDL .mdb, .dbf, .acdb .sql, .siard, .csv .mdb, .dbf, .hdf5 …
video
.mp1, .mp2, .mp4,
.mov …
.mpg2, .mpg4, .avi,
.mov .mkv
20
21. Interoperability
Before clocks were invented, people
kept time using different instruments to
observe the Sun’s zenith at noon.
Towns and cities set clocks based on
sunsets and sunrises. Time calculation
became a serious problem for people
travelling by train, sometimes hundreds
of miles in a day. UTC is the World's
Time Standard.
21
22. Some I questions
2.3 Making data interoperable
• Specify what data and metadata vocabularies, standards or
methodologies you will follow to facilitate interoperability.
• Standard vocabulary to allow inter-disciplinary
interoperability or a mapping from your vocabulary to more
commonly used ontologies?
22
23. Some R questions
2.4 Increase data re-use (through clarifying licences)
• License the data to permit the widest reuse possible
• Specify a data embargo, if this is needed
• How long will the data remain reusable?
• Describe data quality assurance processes
Re-use over time
23
24. Licensing research data and software
EUDAT licensing wizard help you pick licence for data & software
http://ufal.github.io/public-license-selector/
You should also license Open Access data, or waive rights.
Horizon 2020 Open Access
guidelines point to:
or
24
25. Keep everything? For always?
When regenerating data is cheaper than archiving, don’t archive.
Select what data you’ll need and want to retain.
10 years is often stated in data policies and academic codes, but
data can be valuable for ages, in climatology, sociology, health
sciences, astronomy, linguistics, … Look beyond minimal retention
periods where relevant.
“The lifetime of software is generally not as long as that of data”
(Daniel Katz e.a. http://bit.ly/2eScCKp)
RDNL Selection criteria: http://www.researchdata.nl/en/services/data-
management/selecting-research-data/
DCC How-to guide: http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
25
26. §3 Allocation of resources
• What are the costs for making data FAIR in your project?
• Resources for long term preservation
Check the UK Data Service Costing model.
Rule of thumb: 5% of the project budget is spent on RDM.
The High Level Expert Group on the European Open Science Cloud
recommends that “well budgeted data stewardship plans should be
made mandatory and we expect that on average about 5% of
research expenditure should be spent on properly managing and
stewarding data”.
UKDS model http://www.data-archive.ac.uk/create-manage/planning-for-sharing/costing
HLEG report
http://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.p
df#view=fit&pagemode=none p. 19
26
27. §4-6
Data security
• Provisions for data recovery, secure storage, transfer of
sensitive data?
• Safely stored in certified repositories for long term
preservation and curation?
Ethical aspects
• Any ethical or legal issues that can impact data sharing?
• Informed consent for data sharing and long term
preservation included in questionnaires dealing with
personal data?
Which other national/funder/sectorial/departmental
procedures for data management do you use (if any)?
27
29. Recommendations
• Think about the desired end result and plan for this.
• Involve all work packages and partners to get a coherent
plan.
• “Sharing” means “outside the consortium”.
• Approach the DMP in whatever way best fits your project:
• EC template is intended as a service, not an obligation. Read the
background information and the guidance, and use it as a checklist.
• More than one dataset? Describe generically what is
possible and dataset-specific what is necessary.
• Focus effort on datasets you’ll create rather than reuse.
29
30. The EC Open Research Data pilot
Key sources of information
• Guidelines on Open Access to Scientific Publications and Research Data in Horizon
2020
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilo
t/h2020-hi-oa-pilot-guide_en.pdf
• Guidelines on Data Management in Horizon 2020
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilo
t/h2020-hi-oa-data-mgt_en.pdf
• Annotated model grant agreement, clause 29.3
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/amga/h2
020-amga_en.pdf
• New infographic summarising key policy points
http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf
• Open Access and Data Management
• http://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-
issues/open-access-dissemination_en.htm
30
31. OpenAIRE support materials
• Briefing papers,
factsheets, webinars,
workshops, FAQs
• Information on:
• Open Research Data Pilot
• Creating a data
management plan
• Selecting a data repository
• Personal data
https://www.openaire.eu/opendatapilot
https://www.openaire.eu/support
31
32. dans.knaw.nl
DANS is een instituut van KNAW en NWO
Thank you!
Acknowledgements:
Thanks to Sarah Jones (DCC), OpenAIRE and EUDAT for slides.
marjan.grootveld@dans.knaw.nl
http://dans.knaw.nl/
Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organisations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)?
What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?
Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?
In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?
How will the data be licensed to permit the widest re-use possible?
When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.
Are the data produced and/or used in the project useable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why.
How long is it intended that the data remains re-usable? Are data quality assurance processes described?
Remember to give also your open data and software a proper licence.
The OA guidelines under Horizon 2020 point to CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
Ethical and legal issues can also be discussed in the context of the ethics review. If relevant, include references to ethics deliverables and ethics chapter in the Description of the Action.
Let’s move on to the considerations to make when managing and sharing data