SlideShare a Scribd company logo
1 of 134
Download to read offline
Mark D. Wilkinson
CBGP-UPM/INIA, Madrid
markw@illuminae.com
Data Interoperability and
FAIRness Through Existing
Web Technologies
DTL Partner Wkshp
November, 2016
The Problem
...one recent survey of 18 microarray studies found that only
two were fully reproducible using the archived data. Another
study of 19 papers in population genetics found that 30% of
analyses could not be reproduced from the archived data and
that 35% of datasets were incorrectly or insufficiently
described.
“
”
Dominique G. Roche , Loeske E. B. Kruuk,
Robert Lanfear, Sandra A. Binning (2015)
http://dx.doi.org/10.1371/journal.pbio.1002295
The Problem
We surveyed 100 datasets associated with nonmolecular
studies in journals that commonly publish ecological and
evolutionary research and have a strong PDA policy. Out of
these datasets, 56% were incomplete, and
64% were archived in a way that
partially or entirely prevented reuse.
“
”
Dominique G. Roche , Loeske E. B. Kruuk,
Robert Lanfear, Sandra A. Binning (2015)
http://dx.doi.org/10.1371/journal.pbio.1002295
FAIR
Findable
→ Globally unique, resolvable, and persistent identifiers
→ Machine-actionable contextual information supporting discovery
Accessible
→ Clearly-defined access protocol
→ Clearly-defined rules for authorization/authentication
Interoperable
→ Use shared vocabularies and/or ontologies
→ Syntactically and semantically machine-accessible format
Reusable
→ Be compliant with the F, A, and I Principles
→ Contextual information, allowing proper interpretation
→ Rich provenance information facilitating accurate citation
The Four Principles
“Skunkworks”
Task: Build a prototype
Skunkworks Participants
● Mark Wilkinson
● Michel Dumontier
● Barend Mons
● Tim Clark
● Jun Zhao
● Paolo Ciccarese
● Paul Groth
● Erik van Mulligen
● Luiz Olavo Bonino da
Silva Santos
● Matthew Gamble
● Carole Goble
● Joël Kuiper
● Morris Swertz
● Erik Schultes
● Erik Schultes
● Mercè Crosas
● Adrian Garcia
● Philip Durbin
● Jeffrey Grethe
● Katy Wolstencroft
● Sudeshna Das
● M. Emily Merrill
The Hourglass Concept
We want a large ecosystem of
apps that use FAIR Data
The Hourglass Concept
We want to support a wide
range of source providers
The Hourglass Concept
The FAIR solution between them must be THIN!
Skunkworks
participants had tons of
experience v.v.
metadata around
scholarly publication
Skunkworks
participants had tons of
experience v.v.
metadata around
scholarly publication
RDA,
Force11,
Dataverse,
Research
Objects,
NanoPubs,
Semantic
Science,
SADI,
AlzForum,
SWAN,
LSID,
…
…
…
...
There was very little
disagreement
about F,
about A,
or about R
The “I” is the big problem!
Interoperability is
Hard!!
The “I” is the big problem!
Keeping the history brief
A series of teleconferences led to the concept of putting
metadata into an iterative set of ~identical “containers”
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
ProgrammableWeb.com already catalogues
>16,000 different Web APIs!!!!!!
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
ProgrammableWeb.com already catalogues
>16,000 different Web APIs!!!!!!
APIs DO NOT MAKE YOU INTEROPERABLE!
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
REST is an architectural style for software
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
Hypermedia-driven, NOT parameter-driven
The phrase “my REST API” makes almost no sense,
and in bioinformatics, is usually accompanied by a page explaining the
key/value pairs you should add to the URL to access various functions.
This Is Not REST!
Skunkworks Hackathons
Are there existing standards that are
And have the properties of
?
Uses machine-accessible standards and
representations, following a REST
paradigm
LDP
Useful Features
I
I + R
F + A
I
Defines HTTP-resolvable URIs for each of
these containers
Defines the concept of a “Container” - a
machine-actionable way to represent
repositories, data deposits, data files,
data points, and their metadata
Uses a widely accepted standard (DCAT)
to relate metadata to data →
machine-actionable data mining
Caveats
We used LDP as “inspiration”
Some parts of LDP are not really “REST”
(in our opinion) so we stripped-out
those parts in favour of a 100%
API-free solution
For the moment we require
a read-only solution;
LDP defines a read-write platform
so we don’t use those portions either
(for now)
The FAIR Accessor
In incremental detail
26
What can we describe with
FAIR Accessors?
FAIR Accessors provide a machine-actionable, structured,
REST-oriented way to publish Metadata
about a wide range of scholarly “entities”
What can we describe with
FAIR Accessors?
Warehouses (e.g. EBI)
Databases (e.g. UniProt)
Repositories (e.g. Zenodo, Dataverse, Leiden Repository, etc.)
Datasets (e.g. output from a workflow)
Research Objects (data a/o workflow a/o results a/o publications)
Data “slices” (e.g. the result of a database query)
Data Records (e.g. image, excel file, patient clinical record)
Other…
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
In REST, a “Resource” is
“anything that can be identified”
whether that is a single thing
or an identifiable collection of things,
and whether it exists in the real-world
or only in the network.
A “Resource” is identified by a URI
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
Resources are manipulated using the HTTP
protocol on the Resource URI
For the FAIR Accessor, the only HTTP method
we currently require is HTTP GET
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
What is returned is a document full of
metadata richly describing that Container
(warehouse, database, dataset, slice, etc.)
And a list of Resources (URIs) that represent
the contained “things”
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
Looking more closely at one of those
contained things...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
The contained thing is a Resource
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
That Resource can be resolved by HTTP GET
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
To retrieve a Metadata document describing
that resource (e.g. a single record)
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Which record does this Metadata describe?
The foaf:primaryTopic attribute defines this
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Using the metadata structures defined by DCAT the
FAIR Accessor may also tell you how to get the
content of the record, and what formats are available
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
In this case, the record is available in XML format
By calling HTTP GET on URL_U2
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Or in RDF format by calling HTTP GET on URL_U1
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Or you may add additional layers...
Metadata
Metadata
Metadata
Metadata
DATA - format 1
DATA - format 2
Features of the FAIR Accessor
1: There is no API
GET
Interpret the Metadata
Select the desired Resource
GET
Features of the FAIR Accessor
1: There is no API
GET
Interpret the Metadata
Select the desired Resource
GET
ANY Web agent can explore/index a FAIR Accessor
(e.g. Google)
An agent that understands globally-accepted vocabularies
can explore it “intelligently”!
Features of the FAIR Accessor
46
1: There is no API
It’s difficult to get thinner than nothing ;-)
Features of the FAIR Accessor
2: Identifiers for unidentifi-ed/-able things
HTTP GET
<FAIR metadata/>
This is the ArrayExpress query
I did for paper doi:10/1234.56
Results:
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Features of the FAIR Accessor
2: Identifiers for unidentifi-ed/-able things
HTTP GET
<FAIR metadata/>
This is the ArrayExpress query
I did for paper doi:10/1234.56
Results:
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Should assist with reproducibility and transparency
Features of the FAIR Accessor
3: A predictable “place” for metadata
MetaRecordURL
PrimaryTopic: record 1A445
Metadata...
DATA - format 1
DATA - format 2
Different “kinds” of metadata have distinct ontological types, and
distinct document structures. There is no ambiguity regarding
what the metadata is describing - a repository or a record.
Repository metadata
MetaRecordURL
Features of the FAIR Accessor
3: Symmetry & predictable path to citation
XXX
Part of dataset XXX
Metadata...
DATA - format 1
DATA - format 2
The record metadata contains an “upward” link to the
Repository-level metadata, which should contain license and
citation information
Repository metadata:
Cite: doi:10/8847.384
License: cc-by
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Container
Resource HTTP GET
<FAIR metadata/>
Contains
<<184 Records>>
Contact Mark Wilkinson
For more information about
These records
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Container
Resource HTTP GET
<FAIR metadata/>
Contains
<<184 Records>>
Contact Mark Wilkinson
For more information about
These records
Features of the FAIR Accessor
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:distribution
<<NONE>>
HTTP GET
4: Granularity of Access/Privacy/Security
Features of the FAIR Accessor
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:distribution
<<NONE>>
HTTP GET
4: Granularity of Access/Privacy/Security
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
HTTP GET
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
HTTP GET
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Thin solution - if it’s private, do nothing! Literally!
The Real Thing
A working FAIR Accessor
Serving a “Slice” of UniProt
A real-world scenario...
Imagine you are publishing a paper
describing the evolution of proteins
involved in the RNA Processing machineries
of the fungus Aspergillus nidulans.
If you want to be a “good” scholarly publisher,
interested in transparency and reproducibility,
what is the first thing you need to explain
to your readers and reviewers?
A real-world scenario...
Imagine you are publishing a paper
describing the evolution of proteins
involved in the RNA Processing machineries
of the fungus Aspergillus nidulans.
If you want to be a “good” scholarly publisher,
interested in transparency and reproducibility,
what is the first thing you need to explain
to your readers and reviewers?
The list of proteins and their inclusion/exclusion criteria
How would this usually be done today?
The query that returns the relevant proteins
WHERE
{
?protein a up:Protein .
?protein up:organism ?organism .
?organism rdfs:subClassOf taxon:162425 .
?protein up:classifiedWith ?go .
?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> .
bind(replace(str(?protein),
"http://purl.uniprot.org/uniprot/", "", "i") as ?id)
}
The query that returns the relevant proteins
WHERE
{
?protein a up:Protein .
?protein up:organism ?organism .
?organism rdfs:subClassOf taxon:162425 .
?protein up:classifiedWith ?go .
?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> .
bind(replace(str(?protein),
"http://purl.uniprot.org/uniprot/", "", "i") as ?id)
}
NCBI Taxonomy:
Aspergillus nidulans
The query that returns the relevant proteins
WHERE
{
?protein a up:Protein .
?protein up:organism ?organism .
?organism rdfs:subClassOf taxon:162425 .
?protein up:classifiedWith ?go .
?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> .
bind(replace(str(?protein),
"http://purl.uniprot.org/uniprot/", "", "i") as ?id)
}
Gene Ontology:
RNA Processing
Create and publish a FAIR Accessor for that query
http://linkeddata.systems/Accessors/UniProtAccessor
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Create and publish a FAIR Accessor for that query
http://linkeddata.systems/Accessors/UniProtAccessor
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Resolve the URI
(in software or in your
browser)
Create and publish a FAIR Accessor for that query
Returns a page of metadata (in this example, in RDF)
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
68
69
Note that this Metadata is about ME! I am the creator of this dataset, and may be credited for it.
I have a script that
generates these…
not a significant barrier!
Container Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
Step down to individual Record metadata
Step down to individual Record metadata
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
Software calls HTTP GET on the URL
representing the MetaRecord Resource
for the desired record in the Container
(or just click on it, or type it into your browser)
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
The document that is returned
Note the change in metadata focus!
This metadata is about the UniProt Record
(not about Mark Wilkinson).
The record described in this metadata was
created by UniProt, so the citation and
authorship information is now THEIRS, not
MINE.
Container
Resource
Symmetrical Link
back upward to the Accessor
Container
(possibly relevant metadata
up there… about Me :-) )
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
Two ways to retrieve the record - RDF or HTML
(in REST-speak, two Representations
of that Resource)
Note that this metadata record is
somewhat more FAIR, than what you
can (easily) retrieve from UniProt itself!
e.g. the UniProt record does not include
the citation or license information - you
have to manually surf around the
UniProt Web page to find that.
So the Accessor makes UniProt’s
already notably FAIR data, even more
FAIR (with respect to “R”)
How FAIR are we now?
What does the Accessor give us?
What we have achieved
We have created a FAIR record for something - i.e. a slice of a database - that was,
historically, un-recordable and un-identifiable in any formal way.F
F + A
F + R
Accessors are a standard approach to providing human & machine accessible metadata
to facilitate appropriate discovery (contextual, biological), proper usage (license) and
proper citation for any kind of data.
The discovery, accessibility, and drill-down/up behaviors do not require any novel
API, rather simply rely on global Web standards; this allows them to be indexed by
existing Web search engines
What we have achieved
F + AI
The metadata itself uses machine-accessible syntaxes, and widely adopted ontologies
and vocabularies, thus easily integrates with other metadata
A
Accessors provide a lightweight means to protect privacy while still providing the
maximum degree of transparency possible
+
Accessors can be static, or dynamic. i.e. we can provide template Accessor file(s)
that are edited in Notepad, then published together with the data; or Accessors can
dynamically generate their output from code (e.g. layered on a database server)
Can we be FAIRer?
Not good enough
for the Skunk!
So far we have only addressed
Interoperability and FAIRness of Metadata
The Holy Grail of Interoperability is to
provide interoperable Data.
FAIR Projection:
Providing FAIR Data
from non-FAIR Data
This is going to be a bit complicated, but
please be patient
Interoperability is Hard!!
Imagine the data
we need to integrate
is in a CSV file
In FigShare or Zenodo
How do we discover and integrate that data?
CSV (usually) has no inherent semantics
and semantics is ~impossible to automatically guess
Things we need to do:
We need a way to query “opaque” data blobs (like CSV) about their content
We need a way to retrieve that content in a FAIR format
We need, therefore, to model semantics for that opaque data content
We need to model various semantics for that content (one “size” doesn’t fit all!)
We need to associate those semantic models with a record or record-sets
We need a way to query those semantics determine which “size” fits our req’s
We would like to reuse semantic definitions as much as possible
We need to do all of this without creating a new API :-)
Triple Pattern Fragments
+
RDF Modeling/Mapping Language
Ruben Verborgh
Ghent University
Anastasia Dimou
Ghent University
Triple Pattern Fragments (TPF)
A REST interface for requesting/retrieving RDF Triples
(from any source)
Ruben Verborgh
“Slices” of data, from any source, are considered Resources
and are therefore represented by a distinct URL:
http://some.database.org/dataset?s=___;p=___;o=___
Calling HTTP GET on a TPF URL returns the set of Triples matching {?s, ?p, ?o}
PLUS hypermedia instructions and Resource URLs for other relevant slices.
Triple Pattern Fragments (TPF)
A REST interface for retrieving RDF Triples
(from any source)
Ruben Verborgh
For example, the “BMI” column from a patient registry is a Resource with the URL:
http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””)
HTTP GET gives me all BMI triples in the registry, together with other Resource URLs
representing other “slices” that might be useful, for example:
http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
Triple Pattern Fragments (TPF)
A REST interface for retrieving RDF Triples
(from any source)
Ruben Verborgh
For example, the “BMI” column from a patient registry is a Resource with the URL:
http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””)
HTTP GET gives me all BMI triples in the registry, together with other Resource URLs
representing other “slices” that might be useful, for example:
http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
We have a standard, RESTful way to
request triples from any data source
i.e. every slice of every dataset will be considered a distinct Resource
→ simply call HTTP GET on that Resource to get the Triples
But it’s a disaster!
We have no way to know what Resources are
available for any given dataset
or what those Resources “are”
(proteins? genes? patients? articles?)
...and still no defined way to generate the associated Triples
RML
A way to describe the structure of an RDF document
Anastasia Dimou
RML allows us to create models of (meta)data structures
“What could this (meta)data look like?”
RML fulfills similar objectives to DCAT Profiles, the Dublin Core
Application Profile, and ISO 11179 - Metadata Registries;
but has added advantages!
http://rml.io/RMLmappingLanguage.html
Using RML to describe the structure
and semantics of a single Triple
Map1
Predicate
Object Map
Subject
Map
Object
Map
ex:Patient
Record
subjectMap template
“http://example.org/patient/{id}”
predicateObjectMap
predicate ex:has
Variant
objectM
ap
class
Map2
parentTriplesMap
Subject
Map2
SO:0000694
(“SNP”)
subjectMap template
“http://identifiers.org/dbsnp/{snp}”
class
T
H
E
M
O
D
E
L
Using RML to describe the structure
(and semantics) of a single triple
Map1
Predicate
Object Map
Subject
Map
Object
Map
ex:Patient
Record
subjectMap template
“http://example.org/patient/{id}”
predicateObjectMap
predicate ex:has
Variant
objectM
ap
class
Map2
parentTriplesMap
Subject
Map2
SO:0000694
(“SNP”)
subjectMap template
“http://identifiers.org/dbsnp/{snp}”
class
T
H
E
M
O
D
E
L
We call this a “Triple Descriptor”
These are used to describe the
structure of data “slices” in which all
Triples have the same structure
T
H
E
M
O
D
E
L
Using RML to describe the structure
(and semantics) of a single triple
Map1
Predicate
Object Map
Subject
Map
Object
Map
ex:Patient
Record
subjectMap template
“http://example.org/patient/{id}”
predicateObjectMap
predicate ex:has
Variant
objectM
ap
class
Map2
parentTriplesMap
Subject
Map2
SO:0000694
(“SNP”)
subjectMap template
“http://identifiers.org/dbsnp/{snp}”
class
Patient:123
rdf:type
ex:Patient
Record
snp:
rs0020394
ex:hasVariant
rdf:type
ex:Patient
Record
The Data
Where are we now?
TPF - A standard, RESTful way to request Triples
Triple Descriptors - A standard way to describe
the structure and meaning of a Triple
Where are we now?
TPF - A standard, RESTful way to request Triples
Triple Descriptors - A standard way to describe
the structure and meaning of a Triple
There is still no way to associate these with the dataset that they represent
There is still no way to make these triples “real”
Wait… that’s not right!
“There is still no way to associate these
with the dataset that they represent”
???!!!
Of course there is!
We already have that!
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
The FAIR Accessor carries this information
Using the metadata structures defined by DCAT the
FAIR Accessor also tells you how to get the content of
the record, and what formats are available
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get...
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get...
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
URL to the
Triple Pattern
Fragment
Server
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL
HTTP GET on that
URL returns:
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL
When a TPF Server
is associated with a
Record and a Triple
Descriptor, we call
the combination of
technologies a
FAIR
Projector
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL
Calling HTTP GET on TPFrag_URL_1
will give you rdf+xml triples from
Record R
That look like
VOILA!!
Interoperability
from
I hear you objecting… I skipped something important!!!
There is still no way to
make these triples “real”
I hear you objecting… I skipped something important!!!
There is still no way to
make these triples “real”
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL
What is this??
Sadly, there is no magic wand to create interoperability
Sadly, there is no magic wand to create interoperability
Someone has to write the TPF server, and host it somewhere
Interoperability will never come “for free”
(because semantics will never come “for free”)
However, there are reasons for optimism!
1. People must transform data anyway in order to integrate it - this is a daily
routine in most bioinformatics labs
2. For the most common file formats (e.g. CSV or Excel), there are RML-based
tools to automate these transformations; simply create an RML model of
what you want/need, and point the tool at the file.
3. Investing time into creating an RML model, vs. ad hoc “re-useless”
transformation, makes it ~trivial to create a TPF server, and therefore to create
a FAIR Projector for your own data transformation needs… and it is reusable!
AND
4. Because the RML of Triple Descriptors is itself so simple (one triple!), we can
also templatize their construction, making the entire process even more trivial
5. Citations Citations Citations! FAIR Accessors/Projectors are FAIR objects! You
can get credit if other people use your Projector for their own analyses!
Summary of FAIR Projectors
FAIR Projectors create interoperable data, associated with
interoperable metadata, that can be discovered by querying a FAIR
Port (or potentially Google, when/if Google properly indexes RDF)
F+I+R
A + I
I
FAIR Projectors can convert non-FAIR data into FAIR data, or can change
the structure, URL format, or semantics of existing FAIR data sources
FAIR Projectors can be deployed over, and provide a common interface to:
- Static Data Deposits (in any format!)
- Databases
- Triplestores
- Certain (common) types of Web Services
R+++
Triple Descriptors are FAIR entities, intended for reuse, &
None of this required a new API
How does the FAIR Projector fit into the Data FAIR Port?
How does the FAIR Projector fit into the Data FAIR Port?
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL
How does the FAIR Projector fit into the Data FAIR Port?
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL HTTP GET on that
URL returns:
We know that this is
about Record R in
some repository
We know one possible
representation of a slice
of the data in Record R
Or different Accessors published
by different individuals…
Anyone can publish an Accessor
for any publicly-visible data!
Every possible representation for
Record R
Can be used simply for discovery of
the “type” of content inside of
Record R, or can be used to
discover a FAIR Projector that
executes the transformation
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
Index all possible
representations for
Record R
(and for every other
record)
FAIR Profile
Map of all semantic
representations for
Record R
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
FAIR
Port
Registry/
Broker
Index of all FAIR
Profiles in one or more
of the FAIR Ports
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
FAIR
Port
Registry/
Broker
“I need the blood creatinine levels for patient
Record R using Clinical Measurement
Ontology format”
?
http://hosp.nl/recs/R?s.CMO.o
http://hosp.nl/recs/R?s.CMO.o
http://hosp.nl/recs/R?s.CMO.o
The data you want,
in the format you
want!
Go ahead and
integrate it with
your other data!
<FAIR metadata/>
foaf:primaryTopic u:Q2348Y
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic u:Q2348Y
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic u:Q2348Y
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic u:Q2348Y
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic u:Q2348Y
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic u:Q2348Y
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic u:Q2348Y
dcat:Distribution_3
Source TPFrag_URL
format rdf+xml
Model: Triple_Desc_URL
FAIR
Port
Registry/
Broker
“I need the functional annotations for UniProt
Q2348Y, using EDAM ontology”
?
Exactly the same process for any
data you are trying to retrieve...
http://linkdd.sys/UP/Q2348Y?s.fa.o
With all of this in-place, what will we be able to do?
With all of this in-place, what will we be able to do?
Siri?
Yes, master!
Please find me all known low molecular weight inhibitors of the
Human p65 Protein. Please separate the list based on those
that were found in curated databases, and those that were
found in self-deposited data archives. Also, keep track of the
license and citation information for each one. If you find data
that is relevant, but not public, please provide me with the
contact information for the person I need to request the data
from.
Thanks to:
Michel Dumontier - Stanford Center for Biomedical Informatics Research, Stanford, California.
Ruben Verborgh – Ghent University – imec, Ghent, Belgium
Luiz Olavo Bonino da Silva Santos - Dutch Techcentre for Life Sciences, Utrecht, The Netherlands -
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
Tim Clark - Department of Neurology, Massachusetts General Hospital Boston MA and Harvard Medical
School, Boston, MA, USA
Morris A. Swertz - Genomics Coordination Center and Department of Genetics, University Medical
Center Groningen, Groningen, The Netherlands
Fleur D.L. Kelpin - Genomics Coordination Center and Department of Genetics, University Medical
Center Groningen, Groningen, The Netherlands
Alasdair J. G. Gray - Department of Computer Science, School of Mathematical and Computer Sciences,
Heriot-Watt University, Edinburgh, UK
Erik A. Schultes - Department of Human Genetics, Leiden University Medical Center, The Netherlands
Erik M. van Mulligen - Department of Medical Informatics, Erasmus University Medical Center
Rotterdam, The Netherlands
Paolo Ciccarese - Perkin Elmer Innovation Lab, Cambridge MA and Harvard Medical School, Boston
MA, USA
Mark Thompson - Leiden University Medical Center, Leiden, The Netherlands
Jerven T. Bolleman - Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical
Universitaire, Geneva, Switzerland
Funding for Mark Wilkinson from:
Fundacion BBVA and the UPM Isaac Peral programme, and the
Spanish Ministerio de Economía y Competitividad grant number
TIN2014-55993-R.
Additional support for FAIR Skunkworks members comes from:
European Union funded projects ELIXIR-EXCELERATE (H2020 no. 676559),
ADOPT BBMRI-ERIC (H2020 no. 676550)
CORBEL (H2020 no. 654248)
Netherlands Organisation for Scientific Research (Odex4all project)
Stichting Topconsortium voor Kennis en Innovatie High Tech Systemen en Materialen
(FAIRdICT project)
BBMRI-NL
RD-Connect and ELIXIR (Rare disease implementation study FP7 no. 305444).
You are not only welcome to share
and reuse this presentation...
...You are encouraged to!
Important: All our templates are free to use under Creative Commons Attribution License. If you use the graphic assets (photos,
icons and typographies) included in this Google Slides Templates you must keep the Credits slide or add all attributions in the last
slide notes.
FGST
Free GoogleSlides
Templates
Some graphical elements were taken from slide templates provided by:

More Related Content

What's hot

Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)ALATechSource
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesMatthew Rowe
 
Open Source: Liberating your systems
Open Source: Liberating your systemsOpen Source: Liberating your systems
Open Source: Liberating your systemsRichard Wallis
 
Linked Data - Exposing what we have
Linked Data - Exposing what we haveLinked Data - Exposing what we have
Linked Data - Exposing what we haveRichard Wallis
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical researchMark Wilkinson
 
Linked Data - Radical Change?
Linked Data -  Radical Change?Linked Data -  Radical Change?
Linked Data - Radical Change?Richard Wallis
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions Pablo Pareja Tobes
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceMark Wilkinson
 
CrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossref
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Mathieu d'Aquin
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityMathieu d'Aquin
 

What's hot (20)

Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
Bio4j
Bio4jBio4j
Bio4j
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 
Open Source: Liberating your systems
Open Source: Liberating your systemsOpen Source: Liberating your systems
Open Source: Liberating your systems
 
Linked Data - Exposing what we have
Linked Data - Exposing what we haveLinked Data - Exposing what we have
Linked Data - Exposing what we have
 
Semantic Web and Linked Open Data
Semantic Web and Linked Open DataSemantic Web and Linked Open Data
Semantic Web and Linked Open Data
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical research
 
Linked Data - Radical Change?
Linked Data -  Radical Change?Linked Data -  Radical Change?
Linked Data - Radical Change?
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico science
 
CrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossRef Technical Information for Libraries
CrossRef Technical Information for Libraries
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open University
 
RDF data model
RDF data modelRDF data model
RDF data model
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 

Viewers also liked

Mappings Validation
Mappings ValidationMappings Validation
Mappings Validationandimou
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesResearch Data Alliance
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
LDIF Lightening Talk
LDIF Lightening TalkLDIF Lightening Talk
LDIF Lightening TalkWilliam Smith
 
Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...Routine Health Information NetwOrk (RHINO)
 
2014 review of data quality assessment methods
2014 review of data quality assessment methods2014 review of data quality assessment methods
2014 review of data quality assessment methodsRoger Zapata
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Beniamino Murgante
 
Data quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, NepalData quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, NepalSurvey Department
 
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality AssessmentLeveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality AssessmentUmair ul Hassan
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentOlaf Hartig
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Qualityandimou
 
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...HTAi Bilbao 2012
 
MEASURE Evaluation Data Quality Assessment Methodology and Tools
MEASURE Evaluation Data Quality Assessment Methodology and ToolsMEASURE Evaluation Data Quality Assessment Methodology and Tools
MEASURE Evaluation Data Quality Assessment Methodology and ToolsMEASURE Evaluation
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introductiondatatovalue
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyAmrapali Zaveri, PhD
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
¡LA BELLEZA DE LOS ARBOLES!
¡LA BELLEZA DE LOS ARBOLES!¡LA BELLEZA DE LOS ARBOLES!
¡LA BELLEZA DE LOS ARBOLES!pipis397
 
Ata 2010 Steve Brubaker Data Analytics
Ata 2010   Steve Brubaker Data AnalyticsAta 2010   Steve Brubaker Data Analytics
Ata 2010 Steve Brubaker Data Analyticsstevebrubaker
 
ANDRE BOCELLI- SUIZA
ANDRE BOCELLI- SUIZAANDRE BOCELLI- SUIZA
ANDRE BOCELLI- SUIZApipis397
 

Viewers also liked (20)

Mappings Validation
Mappings ValidationMappings Validation
Mappings Validation
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologies
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
LDIF Lightening Talk
LDIF Lightening TalkLDIF Lightening Talk
LDIF Lightening Talk
 
Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...Assessment & adjustment for data quality used in the South African DISTRICT ...
Assessment & adjustment for data quality used in the South African DISTRICT ...
 
2014 review of data quality assessment methods
2014 review of data quality assessment methods2014 review of data quality assessment methods
2014 review of data quality assessment methods
 
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
Data Usability Assessment for Remote Sensing Data: Accuracy of Interactive Da...
 
Data quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, NepalData quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
Data quality assessment of OSM datasets of Ringroad, Kathmandu, Nepal
 
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality AssessmentLeveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
 
LDQ 2014 DQ Methodology
LDQ 2014 DQ MethodologyLDQ 2014 DQ Methodology
LDQ 2014 DQ Methodology
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality Assessment
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...
METHODS, MATHEMATICAL MODELS, DATA QUALITY ASSESSMENT AND RESULT INTERPRETATI...
 
MEASURE Evaluation Data Quality Assessment Methodology and Tools
MEASURE Evaluation Data Quality Assessment Methodology and ToolsMEASURE Evaluation Data Quality Assessment Methodology and Tools
MEASURE Evaluation Data Quality Assessment Methodology and Tools
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introduction
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
¡LA BELLEZA DE LOS ARBOLES!
¡LA BELLEZA DE LOS ARBOLES!¡LA BELLEZA DE LOS ARBOLES!
¡LA BELLEZA DE LOS ARBOLES!
 
Ata 2010 Steve Brubaker Data Analytics
Ata 2010   Steve Brubaker Data AnalyticsAta 2010   Steve Brubaker Data Analytics
Ata 2010 Steve Brubaker Data Analytics
 
ANDRE BOCELLI- SUIZA
ANDRE BOCELLI- SUIZAANDRE BOCELLI- SUIZA
ANDRE BOCELLI- SUIZA
 

Similar to Data Interoperability and FAIRness Through Existing Web Technologies

Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Mark Wilkinson
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordMark Wilkinson
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
THGenius, rdf and open linked data for thesaurus management
THGenius, rdf and open linked data for thesaurus managementTHGenius, rdf and open linked data for thesaurus management
THGenius, rdf and open linked data for thesaurus management@CULT Srl
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked DataJane Stevenson
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
 
RDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachRDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachhorvadam
 

Similar to Data Interoperability and FAIRness Through Existing Web Technologies (20)

Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
THGenius, rdf and open linked data for thesaurus management
THGenius, rdf and open linked data for thesaurus managementTHGenius, rdf and open linked data for thesaurus management
THGenius, rdf and open linked data for thesaurus management
 
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
 
RDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approachRDF and Open Linked Data, a first approach
RDF and Open Linked Data, a first approach
 

More from Mark Wilkinson

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1Mark Wilkinson
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluatorMark Wilkinson
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector BuilderMark Wilkinson
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur lsMark Wilkinson
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceMark Wilkinson
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesMark Wilkinson
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Mark Wilkinson
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico scienceMark Wilkinson
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012Mark Wilkinson
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the SingularityMark Wilkinson
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Mark Wilkinson
 
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-inSWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-inMark Wilkinson
 
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)Mark Wilkinson
 
Technologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigationTechnologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigationMark Wilkinson
 
ISoLA 2010: SADI Taverna plug-in
ISoLA 2010:  SADI Taverna plug-inISoLA 2010:  SADI Taverna plug-in
ISoLA 2010: SADI Taverna plug-inMark Wilkinson
 
The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebMark Wilkinson
 
SADI in Taverna Tutorial
SADI in Taverna TutorialSADI in Taverna Tutorial
SADI in Taverna TutorialMark Wilkinson
 

More from Mark Wilkinson (20)

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluator
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web Service
 
Sadi service
Sadi serviceSadi service
Sadi service
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-services
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
SADI CSHALS 2013
SADI CSHALS 2013SADI CSHALS 2013
SADI CSHALS 2013
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico science
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
 
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-inSWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
 
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
 
Technologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigationTechnologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigation
 
ISoLA 2010: SADI Taverna plug-in
ISoLA 2010:  SADI Taverna plug-inISoLA 2010:  SADI Taverna plug-in
ISoLA 2010: SADI Taverna plug-in
 
The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic Web
 
SADI in Taverna Tutorial
SADI in Taverna TutorialSADI in Taverna Tutorial
SADI in Taverna Tutorial
 

Recently uploaded

final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDivyaK787011
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...HafsaHussainp
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxJosielynTars
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 

Recently uploaded (20)

final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 

Data Interoperability and FAIRness Through Existing Web Technologies

  • 1. Mark D. Wilkinson CBGP-UPM/INIA, Madrid markw@illuminae.com Data Interoperability and FAIRness Through Existing Web Technologies DTL Partner Wkshp November, 2016
  • 2. The Problem ...one recent survey of 18 microarray studies found that only two were fully reproducible using the archived data. Another study of 19 papers in population genetics found that 30% of analyses could not be reproduced from the archived data and that 35% of datasets were incorrectly or insufficiently described. “ ” Dominique G. Roche , Loeske E. B. Kruuk, Robert Lanfear, Sandra A. Binning (2015) http://dx.doi.org/10.1371/journal.pbio.1002295
  • 3. The Problem We surveyed 100 datasets associated with nonmolecular studies in journals that commonly publish ecological and evolutionary research and have a strong PDA policy. Out of these datasets, 56% were incomplete, and 64% were archived in a way that partially or entirely prevented reuse. “ ” Dominique G. Roche , Loeske E. B. Kruuk, Robert Lanfear, Sandra A. Binning (2015) http://dx.doi.org/10.1371/journal.pbio.1002295
  • 4.
  • 5. FAIR Findable → Globally unique, resolvable, and persistent identifiers → Machine-actionable contextual information supporting discovery Accessible → Clearly-defined access protocol → Clearly-defined rules for authorization/authentication Interoperable → Use shared vocabularies and/or ontologies → Syntactically and semantically machine-accessible format Reusable → Be compliant with the F, A, and I Principles → Contextual information, allowing proper interpretation → Rich provenance information facilitating accurate citation The Four Principles
  • 7. Skunkworks Participants ● Mark Wilkinson ● Michel Dumontier ● Barend Mons ● Tim Clark ● Jun Zhao ● Paolo Ciccarese ● Paul Groth ● Erik van Mulligen ● Luiz Olavo Bonino da Silva Santos ● Matthew Gamble ● Carole Goble ● Joël Kuiper ● Morris Swertz ● Erik Schultes ● Erik Schultes ● Mercè Crosas ● Adrian Garcia ● Philip Durbin ● Jeffrey Grethe ● Katy Wolstencroft ● Sudeshna Das ● M. Emily Merrill
  • 8. The Hourglass Concept We want a large ecosystem of apps that use FAIR Data
  • 9. The Hourglass Concept We want to support a wide range of source providers
  • 10. The Hourglass Concept The FAIR solution between them must be THIN!
  • 11. Skunkworks participants had tons of experience v.v. metadata around scholarly publication
  • 12. Skunkworks participants had tons of experience v.v. metadata around scholarly publication RDA, Force11, Dataverse, Research Objects, NanoPubs, Semantic Science, SADI, AlzForum, SWAN, LSID, … … … ...
  • 13. There was very little disagreement about F, about A, or about R
  • 14. The “I” is the big problem!
  • 16. Keeping the history brief A series of teleconferences led to the concept of putting metadata into an iterative set of ~identical “containers”
  • 17. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API ProgrammableWeb.com already catalogues >16,000 different Web APIs!!!!!!
  • 18. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API ProgrammableWeb.com already catalogues >16,000 different Web APIs!!!!!! APIs DO NOT MAKE YOU INTEROPERABLE!
  • 19. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API
  • 20. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API REST is an architectural style for software
  • 21. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API Hypermedia-driven, NOT parameter-driven The phrase “my REST API” makes almost no sense, and in bioinformatics, is usually accompanied by a page explaining the key/value pairs you should add to the URL to access various functions. This Is Not REST!
  • 22. Skunkworks Hackathons Are there existing standards that are And have the properties of ?
  • 23.
  • 24. Uses machine-accessible standards and representations, following a REST paradigm LDP Useful Features I I + R F + A I Defines HTTP-resolvable URIs for each of these containers Defines the concept of a “Container” - a machine-actionable way to represent repositories, data deposits, data files, data points, and their metadata Uses a widely accepted standard (DCAT) to relate metadata to data → machine-actionable data mining
  • 25. Caveats We used LDP as “inspiration” Some parts of LDP are not really “REST” (in our opinion) so we stripped-out those parts in favour of a 100% API-free solution For the moment we require a read-only solution; LDP defines a read-write platform so we don’t use those portions either (for now)
  • 26. The FAIR Accessor In incremental detail 26
  • 27. What can we describe with FAIR Accessors? FAIR Accessors provide a machine-actionable, structured, REST-oriented way to publish Metadata about a wide range of scholarly “entities”
  • 28. What can we describe with FAIR Accessors? Warehouses (e.g. EBI) Databases (e.g. UniProt) Repositories (e.g. Zenodo, Dataverse, Leiden Repository, etc.) Datasets (e.g. output from a workflow) Research Objects (data a/o workflow a/o results a/o publications) Data “slices” (e.g. the result of a database query) Data Records (e.g. image, excel file, patient clinical record) Other…
  • 29. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”?
  • 30. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”?
  • 31. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? In REST, a “Resource” is “anything that can be identified” whether that is a single thing or an identifiable collection of things, and whether it exists in the real-world or only in the network. A “Resource” is identified by a URI
  • 32. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? Resources are manipulated using the HTTP protocol on the Resource URI For the FAIR Accessor, the only HTTP method we currently require is HTTP GET
  • 33. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? What is returned is a document full of metadata richly describing that Container (warehouse, database, dataset, slice, etc.) And a list of Resources (URIs) that represent the contained “things”
  • 34. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? Looking more closely at one of those contained things...
  • 35. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? The contained thing is a Resource
  • 36. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? That Resource can be resolved by HTTP GET
  • 37. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? To retrieve a Metadata document describing that resource (e.g. a single record)
  • 38. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Which record does this Metadata describe? The foaf:primaryTopic attribute defines this
  • 39. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Using the metadata structures defined by DCAT the FAIR Accessor may also tell you how to get the content of the record, and what formats are available
  • 40. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? In this case, the record is available in XML format By calling HTTP GET on URL_U2
  • 41. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Or in RDF format by calling HTTP GET on URL_U1
  • 42. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”?
  • 43. Or you may add additional layers... Metadata Metadata Metadata Metadata DATA - format 1 DATA - format 2
  • 44. Features of the FAIR Accessor 1: There is no API GET Interpret the Metadata Select the desired Resource GET
  • 45. Features of the FAIR Accessor 1: There is no API GET Interpret the Metadata Select the desired Resource GET ANY Web agent can explore/index a FAIR Accessor (e.g. Google) An agent that understands globally-accepted vocabularies can explore it “intelligently”!
  • 46. Features of the FAIR Accessor 46 1: There is no API It’s difficult to get thinner than nothing ;-)
  • 47. Features of the FAIR Accessor 2: Identifiers for unidentifi-ed/-able things HTTP GET <FAIR metadata/> This is the ArrayExpress query I did for paper doi:10/1234.56 Results: MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 48. Features of the FAIR Accessor 2: Identifiers for unidentifi-ed/-able things HTTP GET <FAIR metadata/> This is the ArrayExpress query I did for paper doi:10/1234.56 Results: MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... Should assist with reproducibility and transparency
  • 49. Features of the FAIR Accessor 3: A predictable “place” for metadata MetaRecordURL PrimaryTopic: record 1A445 Metadata... DATA - format 1 DATA - format 2 Different “kinds” of metadata have distinct ontological types, and distinct document structures. There is no ambiguity regarding what the metadata is describing - a repository or a record. Repository metadata MetaRecordURL
  • 50. Features of the FAIR Accessor 3: Symmetry & predictable path to citation XXX Part of dataset XXX Metadata... DATA - format 1 DATA - format 2 The record metadata contains an “upward” link to the Repository-level metadata, which should contain license and citation information Repository metadata: Cite: doi:10/8847.384 License: cc-by
  • 51. Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security Container Resource HTTP GET <FAIR metadata/> Contains <<184 Records>> Contact Mark Wilkinson For more information about These records
  • 52. Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security Container Resource HTTP GET <FAIR metadata/> Contains <<184 Records>> Contact Mark Wilkinson For more information about These records
  • 53. Features of the FAIR Accessor Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:distribution <<NONE>> HTTP GET 4: Granularity of Access/Privacy/Security
  • 54. Features of the FAIR Accessor Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:distribution <<NONE>> HTTP GET 4: Granularity of Access/Privacy/Security
  • 55. Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml HTTP GET Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security
  • 56. Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml HTTP GET Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security
  • 57. Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security Thin solution - if it’s private, do nothing! Literally!
  • 58. The Real Thing A working FAIR Accessor Serving a “Slice” of UniProt
  • 59. A real-world scenario... Imagine you are publishing a paper describing the evolution of proteins involved in the RNA Processing machineries of the fungus Aspergillus nidulans. If you want to be a “good” scholarly publisher, interested in transparency and reproducibility, what is the first thing you need to explain to your readers and reviewers?
  • 60. A real-world scenario... Imagine you are publishing a paper describing the evolution of proteins involved in the RNA Processing machineries of the fungus Aspergillus nidulans. If you want to be a “good” scholarly publisher, interested in transparency and reproducibility, what is the first thing you need to explain to your readers and reviewers? The list of proteins and their inclusion/exclusion criteria How would this usually be done today?
  • 61. The query that returns the relevant proteins WHERE { ?protein a up:Protein . ?protein up:organism ?organism . ?organism rdfs:subClassOf taxon:162425 . ?protein up:classifiedWith ?go . ?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> . bind(replace(str(?protein), "http://purl.uniprot.org/uniprot/", "", "i") as ?id) }
  • 62. The query that returns the relevant proteins WHERE { ?protein a up:Protein . ?protein up:organism ?organism . ?organism rdfs:subClassOf taxon:162425 . ?protein up:classifiedWith ?go . ?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> . bind(replace(str(?protein), "http://purl.uniprot.org/uniprot/", "", "i") as ?id) } NCBI Taxonomy: Aspergillus nidulans
  • 63. The query that returns the relevant proteins WHERE { ?protein a up:Protein . ?protein up:organism ?organism . ?organism rdfs:subClassOf taxon:162425 . ?protein up:classifiedWith ?go . ?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> . bind(replace(str(?protein), "http://purl.uniprot.org/uniprot/", "", "i") as ?id) } Gene Ontology: RNA Processing
  • 64. Create and publish a FAIR Accessor for that query http://linkeddata.systems/Accessors/UniProtAccessor Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 65. Create and publish a FAIR Accessor for that query http://linkeddata.systems/Accessors/UniProtAccessor Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... Resolve the URI (in software or in your browser)
  • 66. Create and publish a FAIR Accessor for that query Returns a page of metadata (in this example, in RDF) Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 67.
  • 68. 68
  • 69. 69 Note that this Metadata is about ME! I am the creator of this dataset, and may be credited for it.
  • 70.
  • 71.
  • 72.
  • 73. I have a script that generates these… not a significant barrier!
  • 74. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 75. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET Step down to individual Record metadata
  • 76. Step down to individual Record metadata MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET Software calls HTTP GET on the URL representing the MetaRecord Resource for the desired record in the Container (or just click on it, or type it into your browser)
  • 77. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml The document that is returned
  • 78.
  • 79. Note the change in metadata focus! This metadata is about the UniProt Record (not about Mark Wilkinson). The record described in this metadata was created by UniProt, so the citation and authorship information is now THEIRS, not MINE.
  • 80. Container Resource Symmetrical Link back upward to the Accessor Container (possibly relevant metadata up there… about Me :-) )
  • 81. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml
  • 82. Two ways to retrieve the record - RDF or HTML (in REST-speak, two Representations of that Resource)
  • 83. Note that this metadata record is somewhat more FAIR, than what you can (easily) retrieve from UniProt itself! e.g. the UniProt record does not include the citation or license information - you have to manually surf around the UniProt Web page to find that. So the Accessor makes UniProt’s already notably FAIR data, even more FAIR (with respect to “R”)
  • 84. How FAIR are we now? What does the Accessor give us?
  • 85. What we have achieved We have created a FAIR record for something - i.e. a slice of a database - that was, historically, un-recordable and un-identifiable in any formal way.F F + A F + R Accessors are a standard approach to providing human & machine accessible metadata to facilitate appropriate discovery (contextual, biological), proper usage (license) and proper citation for any kind of data. The discovery, accessibility, and drill-down/up behaviors do not require any novel API, rather simply rely on global Web standards; this allows them to be indexed by existing Web search engines
  • 86. What we have achieved F + AI The metadata itself uses machine-accessible syntaxes, and widely adopted ontologies and vocabularies, thus easily integrates with other metadata A Accessors provide a lightweight means to protect privacy while still providing the maximum degree of transparency possible + Accessors can be static, or dynamic. i.e. we can provide template Accessor file(s) that are edited in Notepad, then published together with the data; or Accessors can dynamically generate their output from code (e.g. layered on a database server)
  • 87. Can we be FAIRer?
  • 88. Not good enough for the Skunk! So far we have only addressed Interoperability and FAIRness of Metadata The Holy Grail of Interoperability is to provide interoperable Data.
  • 89. FAIR Projection: Providing FAIR Data from non-FAIR Data
  • 90. This is going to be a bit complicated, but please be patient Interoperability is Hard!!
  • 91. Imagine the data we need to integrate is in a CSV file In FigShare or Zenodo How do we discover and integrate that data? CSV (usually) has no inherent semantics and semantics is ~impossible to automatically guess
  • 92. Things we need to do: We need a way to query “opaque” data blobs (like CSV) about their content We need a way to retrieve that content in a FAIR format We need, therefore, to model semantics for that opaque data content We need to model various semantics for that content (one “size” doesn’t fit all!) We need to associate those semantic models with a record or record-sets We need a way to query those semantics determine which “size” fits our req’s We would like to reuse semantic definitions as much as possible We need to do all of this without creating a new API :-)
  • 93. Triple Pattern Fragments + RDF Modeling/Mapping Language Ruben Verborgh Ghent University Anastasia Dimou Ghent University
  • 94. Triple Pattern Fragments (TPF) A REST interface for requesting/retrieving RDF Triples (from any source) Ruben Verborgh “Slices” of data, from any source, are considered Resources and are therefore represented by a distinct URL: http://some.database.org/dataset?s=___;p=___;o=___ Calling HTTP GET on a TPF URL returns the set of Triples matching {?s, ?p, ?o} PLUS hypermedia instructions and Resource URLs for other relevant slices.
  • 95. Triple Pattern Fragments (TPF) A REST interface for retrieving RDF Triples (from any source) Ruben Verborgh For example, the “BMI” column from a patient registry is a Resource with the URL: http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””) HTTP GET gives me all BMI triples in the registry, together with other Resource URLs representing other “slices” that might be useful, for example: http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
  • 96. Triple Pattern Fragments (TPF) A REST interface for retrieving RDF Triples (from any source) Ruben Verborgh For example, the “BMI” column from a patient registry is a Resource with the URL: http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””) HTTP GET gives me all BMI triples in the registry, together with other Resource URLs representing other “slices” that might be useful, for example: http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
  • 97. We have a standard, RESTful way to request triples from any data source i.e. every slice of every dataset will be considered a distinct Resource → simply call HTTP GET on that Resource to get the Triples
  • 98. But it’s a disaster! We have no way to know what Resources are available for any given dataset or what those Resources “are” (proteins? genes? patients? articles?) ...and still no defined way to generate the associated Triples
  • 99. RML A way to describe the structure of an RDF document Anastasia Dimou RML allows us to create models of (meta)data structures “What could this (meta)data look like?” RML fulfills similar objectives to DCAT Profiles, the Dublin Core Application Profile, and ISO 11179 - Metadata Registries; but has added advantages! http://rml.io/RMLmappingLanguage.html
  • 100. Using RML to describe the structure and semantics of a single Triple Map1 Predicate Object Map Subject Map Object Map ex:Patient Record subjectMap template “http://example.org/patient/{id}” predicateObjectMap predicate ex:has Variant objectM ap class Map2 parentTriplesMap Subject Map2 SO:0000694 (“SNP”) subjectMap template “http://identifiers.org/dbsnp/{snp}” class T H E M O D E L
  • 101. Using RML to describe the structure (and semantics) of a single triple Map1 Predicate Object Map Subject Map Object Map ex:Patient Record subjectMap template “http://example.org/patient/{id}” predicateObjectMap predicate ex:has Variant objectM ap class Map2 parentTriplesMap Subject Map2 SO:0000694 (“SNP”) subjectMap template “http://identifiers.org/dbsnp/{snp}” class T H E M O D E L We call this a “Triple Descriptor” These are used to describe the structure of data “slices” in which all Triples have the same structure
  • 102. T H E M O D E L Using RML to describe the structure (and semantics) of a single triple Map1 Predicate Object Map Subject Map Object Map ex:Patient Record subjectMap template “http://example.org/patient/{id}” predicateObjectMap predicate ex:has Variant objectM ap class Map2 parentTriplesMap Subject Map2 SO:0000694 (“SNP”) subjectMap template “http://identifiers.org/dbsnp/{snp}” class Patient:123 rdf:type ex:Patient Record snp: rs0020394 ex:hasVariant rdf:type ex:Patient Record The Data
  • 103. Where are we now? TPF - A standard, RESTful way to request Triples Triple Descriptors - A standard way to describe the structure and meaning of a Triple
  • 104. Where are we now? TPF - A standard, RESTful way to request Triples Triple Descriptors - A standard way to describe the structure and meaning of a Triple There is still no way to associate these with the dataset that they represent There is still no way to make these triples “real”
  • 105. Wait… that’s not right! “There is still no way to associate these with the dataset that they represent” ???!!! Of course there is! We already have that!
  • 106. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET The FAIR Accessor carries this information Using the metadata structures defined by DCAT the FAIR Accessor also tells you how to get the content of the record, and what formats are available
  • 107. If we consider the TPF Resource URL to be just another DCAT Distribution, we get... <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml
  • 108. If we consider the TPF Resource URL to be just another DCAT Distribution, we get... <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml URL to the Triple Pattern Fragment Server
  • 109. If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL
  • 110. If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL HTTP GET on that URL returns:
  • 111. If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL When a TPF Server is associated with a Record and a Triple Descriptor, we call the combination of technologies a FAIR Projector
  • 112. If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL Calling HTTP GET on TPFrag_URL_1 will give you rdf+xml triples from Record R That look like VOILA!! Interoperability from
  • 113. I hear you objecting… I skipped something important!!! There is still no way to make these triples “real”
  • 114. I hear you objecting… I skipped something important!!! There is still no way to make these triples “real” <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL What is this??
  • 115. Sadly, there is no magic wand to create interoperability
  • 116. Sadly, there is no magic wand to create interoperability Someone has to write the TPF server, and host it somewhere Interoperability will never come “for free” (because semantics will never come “for free”)
  • 117. However, there are reasons for optimism! 1. People must transform data anyway in order to integrate it - this is a daily routine in most bioinformatics labs 2. For the most common file formats (e.g. CSV or Excel), there are RML-based tools to automate these transformations; simply create an RML model of what you want/need, and point the tool at the file. 3. Investing time into creating an RML model, vs. ad hoc “re-useless” transformation, makes it ~trivial to create a TPF server, and therefore to create a FAIR Projector for your own data transformation needs… and it is reusable! AND 4. Because the RML of Triple Descriptors is itself so simple (one triple!), we can also templatize their construction, making the entire process even more trivial 5. Citations Citations Citations! FAIR Accessors/Projectors are FAIR objects! You can get credit if other people use your Projector for their own analyses!
  • 118. Summary of FAIR Projectors FAIR Projectors create interoperable data, associated with interoperable metadata, that can be discovered by querying a FAIR Port (or potentially Google, when/if Google properly indexes RDF) F+I+R A + I I FAIR Projectors can convert non-FAIR data into FAIR data, or can change the structure, URL format, or semantics of existing FAIR data sources FAIR Projectors can be deployed over, and provide a common interface to: - Static Data Deposits (in any format!) - Databases - Triplestores - Certain (common) types of Web Services R+++ Triple Descriptors are FAIR entities, intended for reuse, & None of this required a new API
  • 119. How does the FAIR Projector fit into the Data FAIR Port?
  • 120. How does the FAIR Projector fit into the Data FAIR Port? <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL
  • 121. How does the FAIR Projector fit into the Data FAIR Port? <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL HTTP GET on that URL returns: We know that this is about Record R in some repository We know one possible representation of a slice of the data in Record R
  • 122. Or different Accessors published by different individuals… Anyone can publish an Accessor for any publicly-visible data! Every possible representation for Record R Can be used simply for discovery of the “type” of content inside of Record R, or can be used to discover a FAIR Projector that executes the transformation
  • 123. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL Index all possible representations for Record R (and for every other record) FAIR Profile Map of all semantic representations for Record R
  • 124. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL FAIR Port Registry/ Broker Index of all FAIR Profiles in one or more of the FAIR Ports
  • 125. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL FAIR Port Registry/ Broker “I need the blood creatinine levels for patient Record R using Clinical Measurement Ontology format” ? http://hosp.nl/recs/R?s.CMO.o
  • 127. http://hosp.nl/recs/R?s.CMO.o The data you want, in the format you want! Go ahead and integrate it with your other data!
  • 128. <FAIR metadata/> foaf:primaryTopic u:Q2348Y dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic u:Q2348Y dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic u:Q2348Y dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic u:Q2348Y dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic u:Q2348Y dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic u:Q2348Y dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic u:Q2348Y dcat:Distribution_3 Source TPFrag_URL format rdf+xml Model: Triple_Desc_URL FAIR Port Registry/ Broker “I need the functional annotations for UniProt Q2348Y, using EDAM ontology” ? Exactly the same process for any data you are trying to retrieve... http://linkdd.sys/UP/Q2348Y?s.fa.o
  • 129. With all of this in-place, what will we be able to do?
  • 130. With all of this in-place, what will we be able to do? Siri? Yes, master! Please find me all known low molecular weight inhibitors of the Human p65 Protein. Please separate the list based on those that were found in curated databases, and those that were found in self-deposited data archives. Also, keep track of the license and citation information for each one. If you find data that is relevant, but not public, please provide me with the contact information for the person I need to request the data from.
  • 131. Thanks to: Michel Dumontier - Stanford Center for Biomedical Informatics Research, Stanford, California. Ruben Verborgh – Ghent University – imec, Ghent, Belgium Luiz Olavo Bonino da Silva Santos - Dutch Techcentre for Life Sciences, Utrecht, The Netherlands - Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. Tim Clark - Department of Neurology, Massachusetts General Hospital Boston MA and Harvard Medical School, Boston, MA, USA Morris A. Swertz - Genomics Coordination Center and Department of Genetics, University Medical Center Groningen, Groningen, The Netherlands Fleur D.L. Kelpin - Genomics Coordination Center and Department of Genetics, University Medical Center Groningen, Groningen, The Netherlands Alasdair J. G. Gray - Department of Computer Science, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK Erik A. Schultes - Department of Human Genetics, Leiden University Medical Center, The Netherlands Erik M. van Mulligen - Department of Medical Informatics, Erasmus University Medical Center Rotterdam, The Netherlands Paolo Ciccarese - Perkin Elmer Innovation Lab, Cambridge MA and Harvard Medical School, Boston MA, USA Mark Thompson - Leiden University Medical Center, Leiden, The Netherlands Jerven T. Bolleman - Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
  • 132. Funding for Mark Wilkinson from: Fundacion BBVA and the UPM Isaac Peral programme, and the Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R. Additional support for FAIR Skunkworks members comes from: European Union funded projects ELIXIR-EXCELERATE (H2020 no. 676559), ADOPT BBMRI-ERIC (H2020 no. 676550) CORBEL (H2020 no. 654248) Netherlands Organisation for Scientific Research (Odex4all project) Stichting Topconsortium voor Kennis en Innovatie High Tech Systemen en Materialen (FAIRdICT project) BBMRI-NL RD-Connect and ELIXIR (Rare disease implementation study FP7 no. 305444).
  • 133. You are not only welcome to share and reuse this presentation... ...You are encouraged to!
  • 134. Important: All our templates are free to use under Creative Commons Attribution License. If you use the graphic assets (photos, icons and typographies) included in this Google Slides Templates you must keep the Credits slide or add all attributions in the last slide notes. FGST Free GoogleSlides Templates Some graphical elements were taken from slide templates provided by:

Editor's Notes

  1. http://www.freedigitalphotos.net/images/Business_people_g201-Businessman_shaking_hand_p88735.html