A talk prepared for Workshop Working on data stewardship? Meet your peers!
Datum: 03 OKT 2017
https://www.surf.nl/agenda/2017/10/workshop-working-on-data-stewardship-meet-your-peers/index.html
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Developing and assessing FAIR digital resources
1. Developing and assessing
FAIR digital resources
1
Michel Dumontier, Ph.D.
Distinguished Professor of Data Science
@micheldumontier::datastewards:2017-10-03
2. 2 @micheldumontier::datastewards:2017-10-03
Most published research findings are false.
- John Ioannidis, Stanford University
Non-reproducibility of
64% in psychological studies and
65–89% in pharmacological studies
PLoS Med 2005;2(8): e124.
3. Grand Challenge:
How can we
automatically find
the evidence that
support or dispute a
hypothesis using the
totality of available
data, tools and
scientific
knowledge?
@micheldumontier::datastewards:2017-10-033
5. Can we empower scientists to make new discoveries
from the analysis of other people’s data?
5
A common rejection module (CRM) for acute rejection across multiple organs identifies
novel therapeutics for organ transplantation
Khatri et al. JEM. 210 (11): 2205
DOI: 10.1084/jem.20122709
@micheldumontier::datastewards:2017-10-03
6. How important is data reuse?
@micheldumontier::datastewards:2017-10-036
http://bit.ly/BiopharmaDataStewardship
(0 is not important, 5 is very important)
- Tom Plasterer
8. So what do we need to achieve this?
1. Data Science
Infrastructure to identify, represent, store, transport,
retrieve, aggregate, query, and analyze data and
execute services on demand in a reproducible manner.
Methods to continuously uncover plausible, supported,
prioritized, and experimentally verifiable associations.
2. Community
to build a massive, decentralized network of
interconnected and interoperable data and services
@micheldumontier::datastewards:2017-10-038
9. 15 Principles to enhance the value of all digital
resources and their metadata.
data, images, software, web services, repositories
@micheldumontier::datastewards:2017-10-039
http://www.nature.com/articles/sdata201618
10. Rapid Adoption of Principles
Developed and
endorsed by
researchers, publishers,
funding agencies,
industry partners.
As of May 2017,
200+ citations since
2016 publication
Included in G20
communique, EOSC,
H2020, NIH, and more…
@micheldumontier::datastewards:2017-10-0310
11. F1: (meta) data are assigned globally
unique and persistent identifiers
F2: Data are described with rich
metadata
F3: Metadata clearly and explicitly
include the identifier of the data it
describes
F4: (meta)data are registered or
indexed in a searchable resource
A1: (meta)data are retrievable by their
identifier using a standardized
communication protocol.
A1.1: The protocol is open, free and
universally implementable
A1.2: The protocol allows for an
authentication and authorization
A2: Metadata should be accessible
even when the data is no longer
available
I1: (meta)data use a formal,
accessible, shared, and broadly
applicable language for
knowledge representation.
I2: (meta)data use vocabularies that
follow the FAIR principles
I3: (meta)data include qualified
references to other (meta)data.
R1: meta(data) are richly described with
a plurality of accurate and relevant
attributes
R1.1: (meta)data are released with a
clear and accessible data usage
license.
R1.2: (meta)data are associated with
detailed provenance
R1.3: (meta)data meet domain-relevant
community standards
Findable Accessible
Interoperable Reusable
11
12. The Semantic Web
is a global web of FAIR data
12 @micheldumontier::datastewards:2017-10-03
standards for publishing, sharing and querying
facts, expert knowledge and services
scalable approach for the discovery
of independently constructed,
collaboratively described,
distributed knowledge
13. We are building a massive
decentralized knowledge graph
13 @micheldumontier::datastewards:2017-10-03Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
20. Build on API metadata specification standards
@micheldumontier::datastewards:2017-10-0320
SWAGGER
21. @micheldumontier::datastewards:2017-10-0321
Find new uses for existing drugs
Finding melanoma drugs through a probabilistic knowledge
graph. PeerJ Computer Science. 2017. 3:e106
https://doi.org/10.7717/peerj-cs.106
by exploring a probabilistic
knowledge graph
And validate them against
pipelines for drug discovery
22. Investigate the claims made by others
@micheldumontier::datastewards:2017-10-0322
AUC 0.91 across all therapeutic indications
23. How do we measure
how FAIR something is?
@micheldumontier::datastewards:2017-10-0323
24. We can ask investigators
what they intend to do…
Section 2. FAIR data
1. Making data findable, including provisions for
metadata (5 questions)
2. Making data openly accessible (10 questions)
3. Making data interoperable (4 questions)
4. Increase data re-use (through clarifying
licenses - 4 questions)
Additional sections:
1. Data summary (6 questions, 5 of which also
cover aspects of FAIRness)
2. Allocation of resources (4 questions)
3. Data security (2 questions)
4. Ethical aspects (2 questions)
5. Other issues (2 questions)
Total of 23 + 16 = 39 questions!!
@micheldumontier::datastewards:2017-10-0324
https://goo.gl/Strjua
25. FAIRness
FAIRness reflects the extent to which a digital
resource addresses the FAIR principles as per the
expectations defined by a community of
stakeholders.
@micheldumontier::datastewards:2017-10-0325
26. Stakeholders
People worried about
– Findability
– Accessibility
– Interoperability
– Reuse
– Provenance
– Licensing
– Recognition
– Value
@micheldumontier::datastewards:2017-10-0326
People who are
- Potential users
- Resource creators
- Academics
- Publishers
- Industry
- Funding agencies
- The public
27. Metrics
as explicit measures of expectation
• A metric is a standard of measurement.
• It must provide clear definition of what is being
measured, why one wants to measure it.
• It must describe the process by which you
obtain a valid measurement result, so that it
can be reproduced by others. It needs to
specify what a valid result is.
@micheldumontier::datastewards:2017-10-0327
28. Candidate Metrics
FM-F1A - Identifier uniqueness
FM-F1B - Identifier persistence
FM-F2 - Machine-readability of metadata
FM-F3 - Identifier in metadata
FM-F4 - Findable in search results
FM-A1.1 - Access protocol
FM-A1.2 - Access authorization
FM-A2 - Metadata Longevity
FM-I1 - Use of a knowledge representation language
FM-I2 - Use FAIR vocabularies
FM-I3 - Use qualified references
FM-R1.1 - Accessible licenses
FM-R1.2 - Provenance
FM-R1.3 - Standard conformance
@micheldumontier::datastewards:2017-10-0328
30. FAIRness Index
• A community, comprised of clearly defined
stakeholders (researchers, publishers, users,
etc), may define their own FAIRness Index
(Indicators) that expresses what makes a
digital resource ideally or maximally FAIR.
• A FAIRness Index is a collection of metrics that
are aligned to the FAIR principles and can be
consistently and transparently evaluated.
@micheldumontier::datastewards:2017-10-0330
31. Measures for Digital Repositories
• Data Seal of Approval
– 6 core requirements
– 16 criteria
• DIN31644: Information and documentation -
Criteria for trustworthy digital archives
– 10 core requirements
– 34 criteria
• ISO16363: : Audit and certification of trustworthy
digital repositories
– 100+ criteria
@micheldumontier::datastewards:2017-10-0331
32. DSA 16 requirements
1. mission to provide access to and preserve data
2. licenses covering data access and use and monitors compliance.
3. continuity plan
4. ensures that data created/used in compliance with norms.
5. adequate funding and qualified staff through clear governance
6. mechanism(s) for expert guidance and feedback
7. guarantees the integrity and authenticity of the data
8. accepts data and metadata to ensure relevance and understandability
9. applies documented processes in archival
10. responsibility for preservation that is documented.
11. expertise to address data and metadata quality
12. archiving according to defined workflows.
13. enables discovery and citation.
14. enables reuse with appropriate metadata.
15. infrastructure
16. infrastructure
@micheldumontier::datastewards:2017-10-0332
https://www.datasealofapproval.org
33. Data Seal of Approval
• self-assessment in the DSA online tool. The
online tool takes you through the
16 requirements and provides you with
support.
• Once you have completed your self-
assessment you can submit it for peer review
@micheldumontier::datastewards:2017-10-0333
34. Ways can we gather information to
assess FAIRness
A) Self assessment
B) FAIR Assessment Team
C) Automated assessment
D) Crowdsourcing
E) All of the above
@micheldumontier::datastewards:2017-10-0334
36. Summary
• Coupling discovery science with research data
management is the right incentive to produce
high quality data and metadata
• New infrastructure is needed to enable this
collaboration
• A framework to assess the FAIRness of digital
resources according to community
expectations is being developed
@micheldumontier::datastewards:2017-10-0336
A talk prepared for Workshop Working on data stewardship? Meet your peers!
Datum: 03 OKT 2017
https://www.surf.nl/agenda/2017/10/workshop-working-on-data-stewardship-meet-your-peers/index.html
SURFacademy organiseert in samenwerking met LCRDM en de UKB werkgroep Research Data op 3 oktober 2017 een netwerkbijeenkomst voor data stewards en anderen, die onderzoekers binnen de universiteiten en onderzoeksinstellingen ondersteunen in research data management. In deze bijeenkomst leer je collega's kennen en leer je van elkaars praktijk.
Abstract
Using meta-analysis of eight independent transplant datasets (236 graft biopsy samples) from four organs, we identified a common rejection module (CRM) consisting of 11 genes that were significantly overexpressed in acute rejection (AR) across all transplanted organs. The CRM genes could diagnose AR with high specificity and sensitivity in three additional independent cohorts (794 samples). In another two independent cohorts (151 renal transplant biopsies), the CRM genes correlated with the extent of graft injury and predicted future injury to a graft using protocol biopsies. Inferred drug mechanisms from the literature suggested that two FDA-approved drugs (atorvastatin and dasatinib), approved for nontransplant indications, could regulate specific CRM genes and reduce the number of graft-infiltrating cells during AR. We treated mice with HLA-mismatched mouse cardiac transplant with atorvastatin and dasatinib and showed reduction of the CRM genes, significant reduction of graft-infiltrating cells, and extended graft survival. We further validated the beneficial effect of atorvastatin on graft survival by retrospective analysis of electronic medical records of a single-center cohort of 2,515 renal transplant patients followed for up to 22 yr. In conclusion, we identified a CRM in transplantation that provides new opportunities for diagnosis, drug repositioning, and rational drug design.