Towards a FAIR lifecycle

•Download as PPTX, PDF•

1 like•102 views

Presentation at Digital Humanities in the Nordics 2020 conference in panel: Towards deterioration, disappearance or destruction? Discussing the critical issue of long-term sustainability of digital humanities projects

Technology

Towards a FAIR data lifecycle
Jessica Parland-von Essen
22.10.2020
https://orcid.org/0000-0003-4460-3906

2
F
A
I
R
FINDABLE
• Described in relevant catalog with enough detail
• Landing page with globally unique persistent identifier
ACCESSIBLE
• Can be retrieved over the internet
• Versioning and lifecycle are documented
• Tombstone page if data is deleted
INTEROPERABLE
• Common, documented, and open formats
RE-USABLE
• Well documented and intelligible
• Rights clearly stated
https://doi.org/10.5281/zenodo.4045402

FAIR Ecosystem Components and FAIR Digital Objects
3 http://doi.org/10.5281/zenodo.3565428 https://doi.org/doi:10.2777/1524

Shallow FAIR and Deep FAIR
4
Necessary
research
information, PIDs,
machine readable
license
All data
elements are
machine
accessible
Research
Information
Research
Data

ACTIVE DATA
Raw, continuously
updated
DYNAMIC
RESEARCH DATA
Version
controlled,
possible to cite
RESEARCH
DATASET
PUBLICATION
Immutable
Documentation, validation
Research
Research Data Types
https://doi.org/10.23978/inf.77419

Discovering
Acquireing
Accessing
Ingesting
Documenting
Preparing
Processing
Documenting
Storing
Sharing
Publishing
Citing
Preser-
vation
Meta
data
Data
base
HPC
CODE
Work-
flows
articles LAM
Semantic
artefacts
PIDs

LEVEL 0
Output from automated data
collection
LEVEL 1
Near Real Time data
Metadata
Control
LEVEL 1
Internal Working data
LEVEL 2
Final quality-checked gap-
filled dataset
LEVEL 3
Elaborated Data Products
Metadata
Control
EXTER-
NAL
Data requirements on different levels for enabling FAIR?

Interoperability and persistance
• SSHOC reference ontology
• FAIRsFAIR Recommendations for semantic artefacts
• Choosing open formats and protocols
• Good data lifecycle management planning
• Using FAIR enabling services
• Managing reproducibility vs citations
8
A PID should be globally unique, i.e. nobody
else in the world should use the same string to
refer to anything else. In practice this means
that a PID has a controlled syntax and a
governed namespace (generally consisting of
a name space indicator (prefix) and a local
identifier (suffix)) and be issued and managed
by a clearly specified registration authority.
A PID should be resolvable, i.e. provide a way
for both machines and humans to access the
digital object itself, the state information
and/or landing page (in current practice this
means the identifier can be translated to a
fully defined URI, at any moment, without the
requirement that it resolves to the same URL
over time).
A PID it should be persistent, i.e. remain
unique and resolvable with a persistent
syntax. The object it represents should ideally
also be persistent, but even if that last
persistence is
10,11 broken the PID should guarantee not to
be reused for any other object in the future.
Persistent Identifiers
https://doi.org/10.5281/zenodo.4001631

Co-creation &
co-development
23/10/20209
Always design a thing by considering it in its next
larger context – a chair in a room, a room in a
house, a house in an environment, an
environment in a city plan.
Eliel Saarinen, Finnish architect (1873--1950)
LA2 / CC BY-S. Wikimedia
(https://creativecommons.org/licenses/by-sa/4.0)

Similar to Towards a FAIR lifecycle

FAIR Computational WorkflowsCarole Goble

EUDAT Research Data Services for all | www.eudat.eu | EUDAT

EUDAT B2Service Suite| - A new version is available at http://ow.ly/fsCi30grKHVEUDAT

OSFair2017 Workshop | EPOS: European Plate Observing SystemOpen Science Fair

A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan

Towards FAIR Open Science with PID Kernel Information: RPID TestbedBeth Plale

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard

FAIR Computational WorkflowsCarole Goble

A Finnish perspective on FAIRsFAIR outputsJessica Parland-von Essen

Persistent Identifiers in EUDAT services| www.eudat.eu | EUDAT

Web open standards for linked data and knowledge graphs as enablers of EU dig...Fabien Gandon

FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble

Rights Enforcement and Licensing Understanding for RDF Stores Aggregating Ope...Paolo Nesi

EUDAT B2Service Suite - November 2017 | www.eudat.eu |EUDAT

RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble

RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble

EUDAT Brochure - B2HANDLE.pdfEUDAT

EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service AreaEUDAT

Toward universal information access on the digital object cloudNational Institute of Informatics

Handling data and workflows in computational materials science: the AiiDA ini...Research Data Alliance

Similar to Towards a FAIR lifecycle (20)

FAIR Computational Workflows

EUDAT Research Data Services for all | www.eudat.eu |

EUDAT B2Service Suite| - A new version is available at http://ow.ly/fsCi30grKHV

OSFair2017 Workshop | EPOS: European Plate Observing System

A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...

Towards FAIR Open Science with PID Kernel Information: RPID Testbed

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...

FAIR Computational Workflows

A Finnish perspective on FAIRsFAIR outputs

Persistent Identifiers in EUDAT services| www.eudat.eu |

Web open standards for linked data and knowledge graphs as enablers of EU dig...

FAIRy stories: the FAIR Data principles in theory and in practice

Rights Enforcement and Licensing Understanding for RDF Stores Aggregating Ope...

EUDAT B2Service Suite - November 2017 | www.eudat.eu |

RO-Crate: A framework for packaging research products into FAIR Research Objects

RO-Crate: packaging metadata love notes into FAIR Digital Objects

EUDAT Brochure - B2HANDLE.pdf

EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area

Toward universal information access on the digital object cloud

Handling data and workflows in computational materials science: the AiiDA ini...

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"ML in Production",Oleksandr BaganFwdays

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Training state-of-the-art general text embeddingZilliz

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Ensuring Technical Readiness For Copilot in Microsoft 365

DevoxxFR 2024 Reproducible Builds with Apache Maven

DevEX - reference for building teams, processes, and platforms

"ML in Production",Oleksandr Bagan

DMCC Future of Trade Web3 - Special Edition

Human Factors of XR: Using Human Factors to Design XR Systems

Artificial intelligence in cctv survelliance.pptx

Unraveling Multimodality with Large Language Models.pdf

Developer Data Modeling Mistakes: From Postgres to NoSQL

My INSURER PTE LTD - Insurtech Innovation Award 2024

SIP trunking in Janus @ Kamailio World 2024

Anypoint Exchange: It’s Not Just a Repo!

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Powerpoint exploring the locations used in television show Time Clash

Training state-of-the-art general text embedding

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Scanning the Internet for External Cloud Exposures via SSL Certs

Towards a FAIR lifecycle

1. Towards a FAIR data lifecycle Jessica Parland-von Essen 22.10.2020 https://orcid.org/0000-0003-4460-3906

2. 2 F A I R FINDABLE • Described in relevant catalog with enough detail • Landing page with globally unique persistent identifier ACCESSIBLE • Can be retrieved over the internet • Versioning and lifecycle are documented • Tombstone page if data is deleted INTEROPERABLE • Common, documented, and open formats RE-USABLE • Well documented and intelligible • Rights clearly stated https://doi.org/10.5281/zenodo.4045402

3. FAIR Ecosystem Components and FAIR Digital Objects 3 http://doi.org/10.5281/zenodo.3565428 https://doi.org/doi:10.2777/1524

4. Shallow FAIR and Deep FAIR 4 Necessary research information, PIDs, machine readable license All data elements are machine accessible Research Information Research Data

5. ACTIVE DATA Raw, continuously updated DYNAMIC RESEARCH DATA Version controlled, possible to cite RESEARCH DATASET PUBLICATION Immutable Documentation, validation Research Research Data Types https://doi.org/10.23978/inf.77419

6. Discovering Acquireing Accessing Ingesting Documenting Preparing Processing Documenting Storing Sharing Publishing Citing Preser- vation Meta data Data base HPC CODE Work- flows articles LAM Semantic artefacts PIDs

7. LEVEL 0 Output from automated data collection LEVEL 1 Near Real Time data Metadata Control LEVEL 1 Internal Working data LEVEL 2 Final quality-checked gap- filled dataset LEVEL 3 Elaborated Data Products Metadata Control EXTER- NAL Data requirements on different levels for enabling FAIR?

8. Interoperability and persistance • SSHOC reference ontology • FAIRsFAIR Recommendations for semantic artefacts • Choosing open formats and protocols • Good data lifecycle management planning • Using FAIR enabling services • Managing reproducibility vs citations 8 A PID should be globally unique, i.e. nobody else in the world should use the same string to refer to anything else. In practice this means that a PID has a controlled syntax and a governed namespace (generally consisting of a name space indicator (prefix) and a local identifier (suffix)) and be issued and managed by a clearly specified registration authority. A PID should be resolvable, i.e. provide a way for both machines and humans to access the digital object itself, the state information and/or landing page (in current practice this means the identifier can be translated to a fully defined URI, at any moment, without the requirement that it resolves to the same URL over time). A PID it should be persistent, i.e. remain unique and resolvable with a persistent syntax. The object it represents should ideally also be persistent, but even if that last persistence is 10,11 broken the PID should guarantee not to be reused for any other object in the future. Persistent Identifiers https://doi.org/10.5281/zenodo.4001631

9. Co-creation & co-development 23/10/20209 Always design a thing by considering it in its next larger context – a chair in a room, a room in a house, a house in an environment, an environment in a city plan. Eliel Saarinen, Finnish architect (1873--1950) LA2 / CC BY-S. Wikimedia (https://creativecommons.org/licenses/by-sa/4.0)

Editor's Notes

F = Findable, kun aineistolla on pysyvä tunniste esim doi, linkki aineistoon toimii aina vaikka säilytyspaikka muuttuisi A = Accessible, tutkimusaineiston tunniste toimii hyperlinkkinä jonka avulla dataan ja sen kuvailutietoihin pääsee käsiksi verkkoselaimella I = Interoperable yhteentoimivuuden periaate edellyttää avoimia tiedostomuotoja ja yhteisiä standardeja, ei enää tiedostoja jotka eivät aukea R = Re-usable (datan kuvailu tukee tätä), dataa voidaan käyttää kun sillä on metatietoja ja käyttöehdoista kertova lisenssi
Figure 8 lähde: TFiR https://doi.org/doi:10.2777/1524 Diagram 2 lähde : http://doi.org/10.5281/zenodo.3565428
The first use case is the visibility of your work and outputs. When reporting on your work, to funders, and publishing outputs, a basic level of FAIRness and PID use is sufficient to enable findability, simple citation and output registration with core descriptive metadata. This is the context of what is usually called research information (sometimes referred to as current research information). The most common and useful PIDs for this are the research output DOI and the ORCID for the creator(s)/contributor(s). There are also other systems available to identify other kinds of entities to help further linking of information, such as organisations or protocols. Funders and employers might for instance require linking to some other contextual reference data like lists of grants, funders and affiliated organisations. This kind of information is becoming more important, but the actual data quality is depending on the functionalities each service provides. If the services used for dataset publication or reporting don’t require PIDs or don’t offer reference (meta)datasets or integration with PIDs for these kinds of things, it is difficult for the researcher to provide this information in an unambiguous way. The other use case for PIDs is the management of the research data itself. Here the PIDs can have different functions: (a) creating deep FAIR research datasets as research outputs, where all individual data elements are machine accessible, see panel F in Figure 1, or (b) when managing and documenting the actual workflow and data and related information during research to ensure reproducibility of research results.
The archive or generic repository usually operates with research dataset publications, that are are a sort of publication, albeit complex, but immutable, archived as output and evidence for research. This case is quite easy, pid wise. But in real life there are many steps and varieties of data before this- This should be taken into account when citing, for instance. How can we support sufficient reproduciblity without overflowing all systems with PID – that should be kept and maintained forever! Dynamic data citation
It is NOT recommended that the researcher or any individual person is the PID owner, but this, as well as management, should be governed in a sustainable way. ● Data Versioning: For retrieving earlier states of datasets, the data needs to be versioned. Markers shall indicate inserts, updates and deletes of data in the database. ● Data Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp. ● Data Identification: The data used shall be identified via a PID pointing to a time-stamped query, resolving to a landing page.

Towards a FAIR lifecycle

Recommended

Recommended

More Related Content

Similar to Towards a FAIR lifecycle

Similar to Towards a FAIR lifecycle (20)

More from Jessica Parland-von Essen

More from Jessica Parland-von Essen (20)

Recently uploaded

Recently uploaded (20)

Towards a FAIR lifecycle

Editor's Notes