FAIR – Assessment or Improvement?

HMC FAIR Friday, 20th May 2022.
FAIR – Assessment or Improvement?
Anusuriya Devaraju1 & Robert Huber2
1 Senior Data Innovation Manager, TERN Australia (a.devaraju@uq.edu.au)
2 Project Manager, PANGAEA, University of Bremen (rhuber@uni-bremen.de)

We at TERN acknowledge the Traditional Owners and Custodians throughout Australia, New Zealand
and all nations. We honour their profound connections to land, water, biodiversity and culture
and pay our respects to their Elders past, present and emerging.
TERN is enabled by NCRIS.
Our work is a result of collaborative partnerships with many Universities and institutions.
To find out more please go to tern.org.au.

Anusuriya Devaraju (2022). FAIR – Assessment or Improvement?. HMF FAIR Friday, 20 May 2022.
D
a
t
a
b
a
s
e
s
&
M
o
d
e
l
l
i
n
g
K
n
o
w
l
e
d
g
e
R
e
p
r
e
s
e
n
t
a
t
i
o
n
&
R
e
a
s
o
n
i
n
g
D
a
t
a
E
n
g
i
n
e
e
r
i
n
g
a
n
d
A
n
a
l
y
t
i
c
s
P
r
o
j
e
c
t
&
D
a
t
a
M
a
n
a
g
e
m
e
n
t
D
a
t
a
G
o
v
e
r
n
a
n
c
e
Computer and Spatial Sciences Research Data Management
From Science to Operation

Source: Wilkinson, M., Dumontier, M., Aalbersberg,
I. et al. The FAIR Guiding Principles for scientific
data management and stewardship. Sci Data 3,
160018 (2016).
https://doi.org/10.1038/sdata.2016.18
FAIR Guiding Principles

• Not new! collectively endorsed by various stakeholders.
• Domain independent, high-level guideline for those (e.g., data provider and
publisher) wishing to improve the reusability of their data holdings.
• Focuses on data; other digital objects may benefit from application of the
principles.
• Place emphasis on machine-based data discovery and accessibility, as well as
human.
• May be adopted, in whole or in part, incrementally as the data provider’s publishing
environments evolve.
FAIR Guiding Principles

• Aims at supplying practical
solutions for the use of the FAIR
principles throughout the research
data life cycle.
• 22 partners from 8 Member States.
https://www.fairsfair.eu
Work Package 4 (Task 4.5)
Fostering FAIR Data Practices in Europe (FAIRsFAIR)

• Enable trustworthy data repositories committed to FAIR data provision to improve
the FAIRness of their datasets over time through a programmatic approach.
Our Approach to FAIR Data Assessment
Metrics + Automated Tool + Consultation => FAIR Data Improvement
1 2 3

• 17 core metrics (v0.5) - built on existing work on FAIR metrics (primarily RDA
FAIR Data Maturity Model).
FAIRsFAIR Data Assessment Metrics
FAIRdat and FAIR enough?
RDA WDS/RDA Assessment of
Data Fitness for Use Checklist
RDA FAIR Data Maturity Model
(v0.3)
FAIRsFAIR Data
Object Metrics v0.1
FAIRsFAIR Data
Object Metrics v0.2
FAIRsFAIR Data
Object Metrics v0.3
FAIRsFAIR Data
Object Metrics v0.4
Metrics consolidation based on existing FAIR
assessment frameworks
Metrics evaluation and refinement by the
FAIRsFAIR project partners
Metrics improvement through the focus group
and the final RDA FAIR Data Maturity Model
Metrics improvement through open
consultation and pilot repositories’ feedback
FAIR compliance level based on CMMI
added.
FAIRsFAIR Data
Object Metrics v0.5

Principles à Metrics à Practical Tests

Summary of Principles, Metrics and Tests
Source:
Devaraju, A. and Huber, R. (2021). An
automated solution for measuring the
progress toward FAIR research data, Patterns
(2021), Huber, An automated solution for
measuring the progress toward FAIR research
data, Patterns (2021),
https://doi.org/10.1016/j.patter.2021.100370

For detailed information about the metrics, see
Devaraju, Anusuriya, Huber, Robert, Mokrane, Mustapha, Herterich,
Patricia, Cepinskas, Linas, de Vries, Jerry, L'Hours, Herve, Davidson,
Joy, & Angus White. (2020). FAIRsFAIR Data Object Assessment
Metrics (0.5). Zenodo. https://doi.org/10.5281/zenodo.6461229

REST API & Front End (https://f-uji.net) https://github.com/pangaea-data-publisher/fuji
F-UJI FAIR Data Assessment Tool

Resources
• Metadata (embedded, and
from services)
• Data file(s)
• Repository Contexts
• Auxiliary information from
FAIR assessment enabling
services
• Link relation types
• HTML meta tags
• Schema.org
structured data

High Level Flow of Meta(data) Gathering
Extract metadata from
landing page, typed links
content negotiation , etc
Extract metadata
standards via the endpoint
Is a persistent
identifier?
-
Collate metadata of
the data object
Extract repository metadata (api,
metadata standards ) through
re3data
no
yes
Identifier (e.g., URL, PID)
Metadata-access endpoint (optional)
Metadata at the
object-level
Metadata at the
repository-level
Parse request
yes
yes
Is service
endpoint
(OAI/CSW/SPAR
QL) provided?
Parse metadata : DDI,
DCAT, DC, EML, METS,
MODS, ISO19xx, etc.
Retrieve metadata from
PID provider (datacite)

FAIR Assessment Enabling Services
Repository Contexts
‘Lookup’ Services
• PID provider service
(Datacite)
• r3data.org
• SPDX license list
• RDA Metadata Standards
Catalog
• LOV, LOD
• ISO/TR 22299 (Digital file
format recommendations
for long-term storage)
• Wolfram scientific formats
• more ….

F-UJI in Action
Dataset Tested : https://doi.org/10.1594/PANGAEA.206402

Our Approach to FAIR Data Assessment (Revisit)
Metrics + Automated Tool + Consultation => FAIR Data Improvement

Repository Certification Subject Areas Datasets Evaluated
(as of 25.09.2020)
PANGAEA CoreTrustSeal, WDS
Regular Member
Earth and
Environmental Science
500
Phaidra-Italy CoreTrustSeal Cultural Heritage 500
CSIRO Data Portal CoreTrustSeal Multiple disciplines 500
World Data Centre
for Climate (WDCC)
CoreTrustSeal, WDS
Regular Member
Earth System Science 500
DataverseNO CoreTrustSeal Multiple disciplines 500
Pilot Repositories

Before and After
Note: We applied the release (v1.0.0) of the tool to perform the evaluation. For, more details on the
assessment, see Devaraju & Huber (2021).

Uptake
• Open-source development
• 12 contributors, 18 forks, clients (R, web)
• Dataset assessments:
• ~10.000 individual tests via f-uji.net
• > n-thousands during repo tests (see below*)
• Repository assessments*:
• FAIRsFAIR pilots: 5 + 4 repos assessed
• DANS DGRTD project
• Institutional tests (e.g. Charité Berlin, UVP, Novartis)
• 2 articles published in reputable journals
and several invited talks.

Translating Principles to Metrics
• Some aspects in FAIR principles (e.g. rich metadata, accurate and relevant
attributes) requires human-mediation, whereas programmatic assessment requires
clear and machine-accessible metrics (and tests).
• The principles should be elaborated with care
• F1 – registering data and metadata objects with permanent identifiers
• I2 – FAIR vocabulary work in progress
• A2 – preserving metadata should be addressed at repository-level
• Our approach
• Metrics for research data focus on generally applicable data/metadata characteristics until
domain/community-driven criteria have been agreed.
• The metrics are built on established work and practical tests consider standard data practices.
• The hierarchical model of principle-metric-practical test.
• Domain-specific metrics will be developed as part of the FAIR-IMPACT.

Level of Data Objects
• The ‘type’ of data objects may influence the assessment result
Experiment
Dataset Group Dataset
Dataset
Data Repository A DataSeries
Collection
DataSeries …..
Data Repository B
Collection
Dataset Dataset
Files
Data Repository C

Level of Data Objects (Example)
Dataset
Data Series

Restricted Objects
• Restricted data can be FAIR too!

Performance Matters
The number data content
files to be assessed can be
pre-configured in F-UJI
Cache external resources
(selected) locally.

Keep repository in the loop
• F. A. I. R. are not new to data repositories/infrastructures.
• Assessment should take into account contexts (e.g., disciplinary practices, data
structures, types) and data infrastructure.

Object Meets Repository
• FAIR assessment must go beyond the
object itself.
• FAIR enabling (trustworthy) for
repositories/services evolves in parallel.
Image Source: Herve L’Hours (UKDA)
Hervé L'Hours, Ilona von Stein, Frans Huigen, Anusuriya Devaraju, Mustapha
Mokrane, Joy Davidson, Jerry de Vries, Patricia Herterich, Linas Cepinskas, &
Robert Huber. (2020). CoreTrustSeal plus FAIR Overview (03.00). Zenodo.
https://doi.org/10.5281/zenodo.4003630

Conclusions
FAIR – Assessment or Improvement?
We assess the datasets to improve their FAIRness.
Improvement is an ongoing effort.
Let’s focus on the outcomes (improvement & uptake), not just
outputs (metric, score, badge, recommendation) J

Related Resources
• Devaraju, A. and Huber, R. (2021). An automated solution for measuring the progress toward
FAIR research data, Patterns (2021), Huber, An automated solution for measuring the progress
toward FAIR research data, Patterns (2021), https://doi.org/10.1016/j.patter.2021.100370
• Devaraju, Anusuriya, Huber, Robert, Mokrane, Mustapha, Herterich, Patricia, Cepinskas, Linas,
de Vries, Jerry, L'Hours, Herve, Davidson, Joy, & Angus White. (2020). FAIRsFAIR Data Object
Assessment Metrics (0.5). Zenodo. https://doi.org/10.5281/zenodo.6461229
• Devaraju, A, Mokrane, M, Cepinskas, L, Huber, R, Herterich, P, de Vries, J, Akerman, V, L’Hours, H,
Davidson, J and Diepenbroek, M. (2021). From Conceptualization to Implementation: FAIR
Assessment of Research Data Objects. Data Science Journal, 20: 4, pp. 1–14.
https://doi.org/10.5334/dsj-2021-004.
• F-UJI Github Repository, https://github.com/pangaea-data-publisher/fuji
• F-UJI Front-end, https://f-uji.net/

FAIR – Assessment or Improvement?

Recommended

Recommended

More Related Content

Similar to FAIR – Assessment or Improvement?

Similar to FAIR – Assessment or Improvement? (20)

More from Anusuriya Devaraju

More from Anusuriya Devaraju (16)

Recently uploaded

Recently uploaded (20)

FAIR – Assessment or Improvement?