| www.eudat.eu | This webinar was co-organised by DANS, EUDAT and OpenAIRE and was held on 12th and 13th December 2016.
Everybody wants to play FAIR, but how do we put the principles into practice?
There is a growing demand for quality criteria for research datasets. In this webinar we will argue that the DSA (Data Seal of Approval for data repositories) and FAIR principles get as close as possible to giving quality criteria for research data. They do not do this by trying to make value judgements about the content of datasets, but rather by qualifying the fitness for data reuse in an impartial and measurable way. By bringing the ideas of the DSA and FAIR together, we will be able to offer an operationalization that can be implemented in any certified Trustworthy Digital Repository.
In 2014 the FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) were formulated. The well-chosen FAIR acronym is highly attractive: it is one of these ideas that almost automatically get stuck in your mind once you have heard it. In a relatively short term, the FAIR data principles have been adopted by many stakeholder groups, including research funders.
The FAIR principles are remarkably similar to the underlying principles of DSA (2005): the data can be found on the Internet, are accessible (clear rights and licenses), in a usable format, reliable and are identified in a unique and persistent way so that they can be referred to. Essentially, the DSA presents quality criteria for digital repositories, whereas the FAIR principles target individual datasets.
In this webinar the two sets of principles will be discussed and compared and a tangible operationalization will be presented.
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www.eudat.eu |
1. www.dans.knaw.nl
DANS is an institute of KNAW and NWO
FAIR Data in Trustworthy Data Repositories:
Everybody wants to play FAIR, but how do we put the
principles into practice?
Peter Doorn, Director DANS
Ingrid Dillo, Deputy Director DANS
EUDAT/OpenAIRE webinar, 12 and 13 December 2016
2. Who we are
Watch our videos on YouTube
https://www.youtube.com/user/DANSDataArchiving
3. What you can expect to learn
• General understanding of core requirements for trustworthy
data repositories
• General understanding of the FAIR principles
• Introduction to a possible way of operationalizing the FAIR
principles
4. DANS and DSA
• 2005: DANS to promote and provide permanent access to
digital research information
• Formulate quality guidelines for digital repositories including
DANS (TRAC, Nestor)
• 2006: 5 basic principles as basis for 16 DSA guidelines
• 2009: international DSA Board
• Over 60 seals acquired around the globe, but with a focus
on Europe
6. DSA and WDS: look-a-likes
Communalities:
• Lightweight, community review
Complementarity:
• Geographical spread
• Disciplinary spread
7. Partnership
Goals:
• Realizing efficiencies
• Simplifying assessment options
• Stimulating more certifications
• Increasing impact on the community
Outcomes:
• Common catalogue of requirements for core repository
assessment
• Common procedures for assessment
• Shared testbed for assessment
8. New common requirements
• Context (1)
• Organizational infrastructure (6)
• Digital object management (8)
• Technology (2)
• Additional information and
applicant feedback (2)
9. 1. Organisational infrastructure
R1. The repository has an explicit mission to provide access to and preserve data in
its domain.
R2. The repository maintains all applicable licenses covering data access and use and
monitors compliance.
R3. The repository has a continuity plan to ensure ongoing access to and preservation
of its holdings.
R4. The repository ensures, to the extent possible, that data are created, curated,
accessed, and used in compliance with disciplinary and ethical norms.
R5. The repository has adequate funding and sufficient numbers of qualified staff
managed through a clear system of governance to effectively carry out the mission.
R6. The repository adopts mechanism(s) to secure ongoing expert guidance and
feedback (either in-house, or external, including scientific guidance, if relevant).
10. 2. Digital object management (1)
R7. The repository guarantees the integrity and authenticity of the data.
R8. The repository accepts data and metadata based on defined criteria
to ensure relevance and understandability for data users.
R9. The repository applies documented processes and procedures in
managing archival storage of the data.
R10. The repository assumes responsibility for long-term preservation
and manages this function in a planned and documented way.
11. 2. Digital object management (2)
R11. The repository has appropriate expertise to address technical data
and metadata quality and ensures that sufficient information is available
for end users to make quality-related evaluations.
R12. Archiving takes place according to defined workflows from ingest
to dissemination.
R13. The repository enables users to discover the data and refer to
them in a persistent way through proper citation.
R14. The repository enables reuse of the data over time, ensuring that
appropriate metadata are available to support the understanding and
use of the data.
12. 3. Technical infrastructure
R15. The repository functions on well-supported operating systems and
other core infrastructural software and is using hardware and software
technologies appropriate to the services it provides to its Designated
Community.
R16. The technical infrastructure of the repository provides for
protection of the facility and its data, products, services, and users.
13.
14. New requirements are out now!
http://www.datasealofapproval.org/en/news-and-events/news/2016/11/25/wds-
and-dsa-announce-uni-ed-requirements-core-cert/
https://www.icsu-wds.org/news/news-archive/wds-dsa-unified-requirements-for-
core-certification-of-trustworthy-data-repositories
15. Back to 2005: DSA principles
The DSA is intended to ensure that:
• The data can be found on the internet
• The data are accessible (clear rights and licenses)
• The data are in a usable format
• The data are reliable
• The data are identified in a unique and persistent way so that they can be
referred to
16. FAIR Data Principles
Workshop Leiden 2014: minimal set of community agreed
guiding principles to make data more easily discoverable,
accessible, appropriately integrated and re-usable, and
adequately citable.
FAIR principles:
• Findable
• Accessible
• Interoperable
• Reusable
(both for machines
and for people)
17. FAIR principles
In the FAIR approach, data should be:
1. Findable – Easy to find by both humans and computer
systems and based on mandatory description of the metadata
that allow the discovery of interesting datasets;
2. Accessible – Stored for long term such that they can be
easily accessed and/or downloaded with well-defined license
and access conditions (Open Access when possible), whether
at the level of metadata, or at the level of the actual data
content;
3. Interoperable – Ready to be combined with other
datasets by humans as well as computer systems;
4. Reusable – Ready to be used for future research and to be
processed further using computational methods.
Source: http://www.dtls.nl/fair-data/
20. Implementing the FAIR Principles?
See: http://datafairport.org/fair-principles-living-document-menu and
https://www.force11.org/group/fairgroup/fairprinciples
21. Different implementations for different aims?
1. FAIR data management: posing requirements for new data creation
2. FAIR data assessment: establishing the profile of existing data
3. FAIR data technologies: transformation tools to make data FAIR
Creation Assessment Transformation
22. Creation: New Horizon 2020 guidelines on
FAIR Data Management
Section 2. FAIR data
1. Making data findable, including provisions for
metadata (5 questions)
2. Making data openly accessible (10 questions)
3. Making data interoperable (4 questions)
4. Increase data re-use (through clarifying
licenses - 4 questions)
Additional sections:
1. Data summary (6 questions, 5 of which also
cover aspects of FAIRness)
3. Allocation of resources (4 questions)
4. Data security (2 questions)
5. Ethical aspects (2 questions)
6. Other issues (2 questions)
Total of 23 + 16 = 39 questions!!
24. Assess: Resemblance DSA – FAIR principles
DSA Principles (for data repositories) FAIR Principles (for data sets)
data can be found on the internet Findable
data are accessible Accessible
data are in a usable format Interoperable
data are reliable Reusable
data can be referred to (citable)
The resemblance is not perfect:
• usable format (DSA) is an aspect of interoperability (FAIR)
• reliability (DSA) is a condition for reuse (FAIR)
• FAIR explicitly addresses machine readability
• citability is in FAIR an aspect of findability
25. Combine and operationalize: DSA & FAIR
• Growing demand for quality criteria for
research datasets and a way to assess
their fitness for use
• Combine the principles of core repository
certification and FAIR
• Use the principles as quality criteria:
• Core certification – digital repositories
• FAIR – research data (sets)
• Operationalize the principles to make them
easily implementable in any trustworthy
digital repository
26. How could it work?
• Consider F, A and I as separate dimensions of
data quality
• Interaction effects among dimensions
complicate scoring, and so do elements that
occur under different FAIR dimensions
• Score datasets on each dimension (from 1 to 5)
• Consider Reusability as the resultant of the
other three:
• Consider R, the average FAIRness as an
indicator of data quality
• (F + A + I) / 3 = R
• Make scoring as automatic as possible, although
not all principles can be established objectively
• scoring at ingest by data archivists of TDR
• after reuse by data users (community review)
27. Each dataset a FAIR profile
Examples:
• Dataset X has FAIR profile F4-A3-I2 è R=3
• PID with limited metadata (well findable; 4)
• Accessible with some restrictions (3)
• Fairly low interoperability (2)
• Dataset Y has FAIR profile F5-A2-I3 è R=3,3
• PID and extensive metadata (very easy to find; 5)
• Only metadata accessible (2)
• Average rating on interoperability (3)
Numbers of assessments, reviews and downloads indicated as well
28. By the way, alternative visualisation ideas…
FAIR dice suggested by
Herbert van de Sompel
FAIR letters suggested by
Ian Duncan
FAIR leaves suggested by
Wouter Haak
FAIR clover 1
FAIR clover 2
FAIR sunflower
29. Operationalising F - Findable
Findable - defined by metadata, documentation (and
identifier for citation):
1. No PID and no metadata/documentation
2. PID without or with insufficient* metadata
3. Sufficient* metadata without PID
4. PID with sufficient* metadata
– Information on data provenance
5. PID, rich metadata and additional documentation
– Additional explanation of how data can be used
* Sufficient = enough metadata to understand what the data is about
30. Operationalising F - Findable
Findable - defined by metadata, documentation (and
identifier for citation):
1. No PID and no metadata/documentation
2. PID without or with insufficient* metadata
3. Sufficient* metadata without PID
4. PID with sufficient* metadata
– Information on data provenance
5. PID, rich metadata and additional documentation
– Additional explanation of how data can be used
* Sufficient = enough metadata to understand what the data is about
31. Operationalising A - Accessible
Accessible - defined by presence of a user license
[metadata retrievable by identifier: already included
under F]:
1. No user license / unclear conditions of reuse / metadata nor
data are accessible
2. Metadata are accessible (even when the data are not or no
longer available)
3. User restrictions apply (of any kind, including privacy,
commercial interests, embargo period, etc.)
4. Public Access (after registration)
5. Open Access (unrestricted, CC0 – perhaps also CCby?)
Note 1: Some people want “Openness” to be separate from Accessible
Note 2: A3 could be seen as an acceptable threshold (e.g. by funders)
32. Operationalising I - Interoperable
Interoperable - defined by the data
format:
1. Proprietary, non-open format data
2. Proprietary format, accepted by DSA
Certified Trusted Data Repository
3. Non-proprietary, open format (= “preferred”
or “archival” format)
4. Data is additionally harmonized/
standardized, using standard vocabularies
5. Data is additionally linked to other data to
provide context
Note: this is an adaptation of Tim Berners-Lee’s 5-star open data plan!
33. First we attempted to operationalise R –
Reusable as well… but we changed our mind
Reusable – is it a separate dimension? Partly subjective: it
depends on what you want to use the data for!
Idea for operationalization Why we rejected it
Clear provenance of data (to facilitate both
replication and reuse)
Aspect of F4
Data is in a TDR – unsustained data will not
remain usable
Aspect of Repository à
Data Seal of Approval
Explication on how data was or can be used
is available
Aspect of F5
Data automatically usable by machines Aspect of I5
Data is reliable (replicable) Can only be known after
re-analysis
35. Or in Zenodo, Dataverse, Mendeley Data, figshare,
B2SAFE, … (if they comply with the DSA)
36. Or in Zenodo, Dataverse, Mendeley Data, figshare,
B2SAFE, … (if they comply with the DSA)
37. Towards a FAIR Data Assessment Tool
• Independent website like the DSA-website
• Repositories will link to the FAIR assessment
website
• The website will provide:
Ø Assessment tool (“questionnaire” with explanation and
examples for each criterion)
Ø Online database containing:
o Repository holding the dataset
o PID (+ basic metadata such as name) of dataset
o Reviewer info (ID can be withheld – anonymous
reviews should be possible)
o FAIR profile and scores
Ø Analytics of FAIR profiles
38. Can FAIR Data Assessment be automatic?
Criterion Automatic?
Y/N/Semi
Subjective?
Y/N/Semi
Comments
F1 No PID / No Metadata Y N
F2 PID / Insuff. Metadata S S Insufficient metadata is subjective
F3 No PID / Suff. Metadata S S Sufficient metadata is subjective
F4 PID / Sufficient Metadata S S Sufficient metadata is subjective
F5 PID / Rich Metadata S S Rich metadata is subjective
A1 No License / No Access Y N
A2 Metadata Accessible Y N
A3 User Restrictions Y N
A4 Public Access Y N
A5 Open Access Y N
I1 Proprietary Format S N Depends on list of proprietary formats
I2 Accepted Format S S Depends on list of accepted formats
I3 Archival Format S S Depends on list of archival formats
I4 + Harmonized N S Depends on domain vocabularies
I5 + Linked S N Depends on semantic methods used
Optional: qualitative assessment / data review
39. Main takeaways
• The core certification requirements and the FAIR
principles form a perfect couple for quality
assessment of research data and trustworthy data
repositories
• DANS is developing an operationalization of these
principles:
• Data archive staff will assess FAIRness of data upon
ingest
• Data users will assess FAIRness upon reuse
• Scoring mechanism as automatic as possible
• Ideally: all certified trustworthy repositories should
contain FAIR data
• But: the FAIR scoring mechanism is applicable in
any repository
40. Hands-on FAIR data management: IDCC workshops
12th International Digital Curation Conference, Edinburgh, 20-23 February 2017
How EUDAT Services could support FAIR data (workshop 4)
… In this workshop we relate the FAIR principles to EUDAT’s Service Suite. You will experiment
with community metadata and with an annotation tool for curating and retrieving data.
Community representatives will talk about their experiences...
OpenAIRE services and tools for Open Research Data in H2020 (workshop 6)
… The workshop will focus on practical issues of storage and data sharing aspects, on how to
select an appropriate data repository, on guidance for efficient data management plans, on the
monitoring of contextualized (linked) research…
Essentials 4 Data Support: the Train-the Trainer version (workshop 9)
… You learn how Research Data Netherlands made the popular ‘Essentials 4 Data Support’
training with rich online content, weekly assignments and expert speakers… to practice hands-
on writing DMPs and making a data management stakeholder analysis…
http://www.dcc.ac.uk/events/idcc17
For more information and registration see:
41. Thank you for listening
peter.doorn@dans.knaw.nl
ingrid.dillo@dans.knaw.nl
www.dans.knaw.nl