1. Horizon 2020 Open Research Data Pilot
Sarah Jones
Digital Curation Centre
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
Webinar for the moderated course on the Horizon 2020 Open Research Data pilot, 10th June 2016
https://www.fosteropenscience.eu/content/horizon-2020-open-research-data-pilot-0
2. BACKGROUND
What are the drivers for open data?
Image CC-BY-SA-ND by Chris Smart www.flickr.com/photos/sigma/5297545749
3. Why open access and open data?
“The European Commission's vision is that
information already paid for by the public
purse should not be paid for again each
time it is accessed or used, and that it
should benefit European companies and
citizens to the full.”
https://ec.europa.eu/research/participants/
data/ref/h2020/grants_manual/hi/oa_pilot/
h2020-hi-oa-pilot-guide_en.pdf
5. Development of EC Open Access policy
Pilot in FP7
Provision of
support
OA mandate
in H2020
Trialled in 7 areas
Expected to:
• Deposit articles into a
repository
• Attempt to make
these OA within 6 or
12 months
• Choice between green
and gold routes
OA fees are eligible
for reimbursement
Pilot supported and
monitored through
OpenAIRE
Each beneficiary must:
• Deposit machine-readable
electronic copy in repository
by the date of publication
• Ensure OA via green/gold
routes within 6 or 12 month
embargo
• Ensure bibliographic metadata
is OA
• Aim to deposit research data
6. H2020 Open Research Data pilot
Guidelines on Data Management in Horizon 2020
http://ec.europa.eu/research/participants/data/ref/h2020/grants
_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
Pilot in
H2020
Provision of
support
?
Take part and feed back to
help shape future policy!
7. REQUIREMENTS OF THE ORD PILOT
What is expected of participants?
Image CC-BY-NC-SA by Ralf Appelt www.flickr.com/photos/adesigna/4090782772
8. Open Research Data (ORD) Pilot
Pilot focuses on research data specifically
‘Research data’ refers to information, in particular facts or numbers,
collected to be examined and considered as a basis for reasoning,
discussion or calculation.
In a research context, examples of data include statistics, results of
experiments, measurements, observations resulting from fieldwork,
survey results, interview recordings and images. The focus is on
research data that is available in digital form.
Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020
v.2.1, 15 February 2016, p3
9. H2020 areas participating in the pilot (2016-17)
• Future and Emerging Technologies
• Research infrastructures
• Leadership in enabling and industrial technologies – Information and Communication
Technologies
• Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing, and
Biotechnology: ‘nanosafety’ and ‘modelling’ topics
• Societal Challenge: Food security, sustainable agriculture and forestry, marine and maritime
and inland water research and the bioeconomy - selected topics as specified in the work
programme
• Societal Challenge: 'Climate Action, Environment, Resource Efficiency and Raw materials' –
except raw materials
• Societal Challenge: 'Europe in a changing world – inclusive, innovative and reflective Societies'
• Science with and for Society
• Cross-cutting activities - focus areas – part Smart and Sustainable Cities
Projects in other areas can participate on a voluntary basis
10. The scope of participation is growing...
• In 2014-15 work programme, 7 areas participated in the pilot.
• In the 2016 work programme, new topics joined in 3 areas (research
infrastructures, nanotechnologies and food security)
• All calls covered by the 2017 work programme will be part of the pilot.
Effectively it’s moving from a pilot to a mandate.
11. • If results are expected to be commercially or industrially exploited
• If participation is incompatible with the need for confidentiality in
connection with security issues
• Incompatible with existing rules on the protection of personal data
• Would jeopardise the achievement of the main aim of the action
• If the project will not generate / collect any research data
• If there are other legitimate reasons to not take part in the Pilot
Projects can opt out at any stage
Can opt out totally or partially (i.e. for some data only)
Should describe issues in the project DMP
Exemptions – reasons for opting out
12. Approach:
as open as
possible, as closed
as necessary
Image: ‘Balancing rocks’ by Viewminder CC-BY-SA-ND
www.flickr.com/photos/light_seeker/7780857224
13. • The data, including associated metadata, needed to
validate the results presented in scientific publications
• Other curated and/or raw data, including associated
metadata, as specified in the data management plan
Doesn’t apply to all data (researchers to define as appropriate)
Don’t have to share data if inappropriate – exemptions apply
Which data does the ORD pilot apply to?
14. Beneficiaries participating in the ORD pilot will:
• Deposit data in a research data repository
• Take measures to enable third parties to access, mine,
exploit, reproduce and disseminate (free of charge for
any user) this research data
• Provide information via the chosen repository about
tools and instruments necessary for validating the
results (where possible, provide the tools and
instruments themselves)
Key requirements of the ORD pilot
15. Data Management Plans
Projects participating in the pilot will be required to
develop a Data Management plan (DMP), in which they will
specify what data will be open.
Note that the Commission does NOT require applicants to submit
a DMP at the proposal stage.
A DMP is therefore NOT part of the evaluation.
DMPs are a deliverable for those participating in the pilot.
16. Where relevant*, H2020 proposals can
include a section on data management
which is evaluated under the criterion
‘Impact’
• What types of data will the project
generate/collect?
• What standards will be used?
• How will this data be shared/made
available? If not, why?
• How will this data be curated and
preserved?
* For “Research and Innovation actions” and “Innovation Actions”
• DMPs are a project deliverable for
those participating in the open
data pilot.
• Not a fixed document – should
evolve and gain precision
– Deliver first version within initial
6 months of project
– More elaborate versions
whenever important changes to
the project occur. At least at the
mid-term and final review.
Info on RDM: what and when
PROPOSAL STAGE IN PROJECT
17. Initial DMP (at 6 months)
The DMP should address the points below on a dataset by
dataset basis:
• Data set reference and name
• Data set description
• Standards and metadata
• Data sharing
• Archiving and preservation (including storage and backup)
18. More elaborate DMP
Scientific research data should be easily:
1. Discoverable
Are the data and software discoverable and identifiable by a standard mechanism e.g. DOIs?
2. Accessible
Are the data and software accessible and under what conditions e.g. licenses, embargoes etc?
3. Assessable and intelligible
Are the data and software assessable and intelligible to third parties for scrutiny and peer-review?
E.g. can judgements be made about their reliability and the competence of those who created them?
4. Useable beyond the original purpose for which it was collected
Are the data properly curated and stored together with the minimum software and documentation to
be useable by third parties in the long-term?
5. Interoperable to specific quality standards
Are the data and software interoperable, allowing data exchange between researchers, institutions,
countries etc? e.g. Adhering to standards and compliant with available applications
19. Feedback from the European Commission
LESSONS LEARNED
Image CC-BY-SA-ND by David D Wang https://www.flickr.com/photos/30326117@N08/3475108362
22. Lessons the EC has drawn (1)
• Explantation is paramount!
– Misperception that 'open' bias will be evaluated positively
– Confusion: DMP versus data management section at submission stage
– Need to state that not everything must be open. In theory, it is possible
to be in the ORD Pilot and not open any data.
– Emphasise flexibility (many opt-out / opt-in mechanisms)
• Emphasise the importance of feedback for policy in the
next Framework Programme: being in the Pilot means co-
shaping European policy on opening up research data
Content taken from slides by Daniel Spichtinger, Celina
Ramjoue, Jarkko Siren and Jean-Francois Dechamp
23. Lessons the EC has drawn (2)
• Helps to re-frame ORD Pilot as "Data Management Pilot"
– Stress the fact that researcher has freedom and responsibility via DMP.
Excellent research must include excellent data management.
– Underline overall aim: kick-start a virtous circle and change of culture
• Questions about eligibility of data management costs
• Tools and support needed for data management / DMPs
Content taken from slides by Daniel Spichtinger, Celina
Ramjoue, Jarkko Siren and Jean-Francois Dechamp
24. How to comply with the pilot requirements?
OPEN RESEARCH DATA
Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354
25. How can researchers make data open?
1. Choose the dataset(s) to share
• What can be made open? This step may need to be revisited if
problems are encountered later.
2. Apply an open license
• Determine what IP exists. Apply a suitable licence e.g. CC-BY
3. Make the data available
• Provide the data in a suitable format. Use repositories.
4. Make it discoverable
• Post on the web, get a unique ID, register in catalogues…
https://okfn.org
27. Which licenses are appropriate?
Creative Commons clauses that limit sharing
NC NonCommercial
What counts as commercial?
ND NoDerivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access guidelines point to:
or
28. EUDAT licensing tool
Researchers can answer a series of questions to determine
which licence(s) are appropriate to use
http://ufal.github.io/public-license-selector
29. Deposit in research data repositories
http://service.re3data.org/search
The EC guidelines point to Re3data as one of the registries
that can be searched to find a home for data
30. Zenodo is a multi-disciplinary repository that can be used
for the long-tail of research data
• An OpenAIRE-CERN joint effort
• Multidisciplinary repository accepting
– Multiple data types
– Publications
– Software
• Assigns a Digital Object Identifier (DOI)
• Links funding, publications, data & software
www.zenodo.org
Zenodo
31. Metadata and documentation
• Metadata is basic descriptive information to help others identify and
understand the structure of the data e.g. title, author...
• Documentation provides the wider context e.g. the methodology / workflow,
software, tools and any information needed to understand and reuse the data
• Relevant standards should be used for interoperability – check out the
Metadata Standards Directory from the Research Data Alliance
http://rd-alliance.github.io/metadata-directory
32. OpenAIRE
http://vimeo.com/108790101
Open Access Infrastructure for research in Europe
• aggregates data on OA outputs
• mines & enriches it content by linking thing together
• provides services & APIs e.g. to generate publication
lists and help with project reporting
www.openaire.eu
33. EUDAT services
EUDAT offers a pan-European solution, providing a generic set
of services to ensure minimum level of interoperability
Building common
data services in close
collaboration with
25+ communities
www.eudat.eu
34. FOSTER project
Facilitate Open Science Training for European Research
• Network of open access trainers
• Programme of open science courses
• Portal to training materials
• E-learning courses on open access and open data
www.fosteropenscience.eu
35. DMPonline
A web-based tool to help researchers write DMPs
Includes a template for Horizon 2020
https://dmponline.dcc.ac.uk
36. Useful references
• Guidelines on Open Access to Scientific Publications
and Research Data in Horizon 2020
• https://ec.europa.eu/research/participants/data/ref/
h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-
guide_en.pdf
• Guidelines on Data Management in Horizon 2020
• https://ec.europa.eu/research/participants/data/ref/
h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-
mgt_en.pdf
37. Thanks – any questions?
For more information check out the FOSTER e-learning
course on the Horizon 2020 Open Research Data pilot
www.fosteropenscience.eu/content/horizon-2020-open-
research-data-pilot-0
Editor's Notes
Commissioner for Research, Science and Innovation
Moving from a ‘best-effort’ in FP7 to an ‘obligation’ under Horizon 2020
6 months embargo (science, technology, medicine) and 12 months (humanities and social sciences)
Although DMPs are a project deliverable and not required at the application stage, proposals can include a section on data management if desired. The info suggested here is similar to the preliminary DMP, so essentially gets that started.
These steps align with what the EC asks for:
Choose which data to share – researchers asked to define this in DMP
Apply open licence - Take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data
Make the data available - deposit data in a research data repository
Make it discoverable – provide associated metadata, provide information on the tool and instruments necessary for validating the results
Make the data available by depositing in repositories
Remember that the data need to be discoverable and understandable, so the associated metadata needs to be deposited too