SlideShare a Scribd company logo
1 of 171
Download to read offline
RDF what and why
Jerven Bolleman
Developer
Swiss-Prot Group
Introduction
• RDF	
  
• Its	
  a	
  technology	
  
• Cost	
  and	
  affordability	
  are	
  key	
  concerns
]<--.+++++++++++.++++++++.---------.>++++++++[<---------->-]
<++.>+++++[<+++++++++++++>-]<.+++++++++++++.----------.>++++
+++[<---------->-]<++.>++++++++[<++++++++++>-]<.>+++[<----->
-]<.>+++[<++++++>-]<..>+++++++++[<--------->-]<--.>+++++++[<
++++++++++>-]<+++.+++++++++++.>++++++++[<----------->-]<++++
.>+++++[<+++++++++++++>-]<.>+++[<++++++>-]<-.---.++++++.----
---.----------.>++++++++[<----------->-]<+.---.[-]<<<->[-]>[
-]<<[>+>+<<-]>>[<<+>>-]>>>[-]<<<+++++++++<[>>>+<<[>+>[-]<<-]
>[<+>-]>[<<++++++++++>>>+<-]<<-<-]+++++++++>[<->-]>>+>[<[-]<
<+>>>-]>[-]+<<[>+>-<<-]<<<[>>+>+<<<-]>>>[<<<+>>>-]<>>[<+>-]<
<-[>[-]<[-]]>>+<[>[-]<-]<++++++++[<++++++<++++++>>-]>>>[>+>+
<<-]>>[<<+>>-]<[<<<<<.>>>>>-]<<<<<<.>>[-]>[-]++++[<++++++++>
-]<.>++++[<++++++++>-]<++.>+++++[<+++++++++>-]<.><+++++..---
-----.-------.>>[>>+>+<<<-]>>>[<<<+>>>-]<[<<<<++++++++++++++
.>>>>-]<<<<[-]>++++[<++++++++>-]<.>+++++++++[<+++++++++>-]<-
-.---------.>+++++++[<---------->-]<.>++++++[<+++++++++++>-]
<.+++..+++++++++++++.>++++++++[<---------->-]<--.>+++++++++[
<+++++++++>-]<--.-.>++++++++[<---------->-]<++.>++++++++[<++
++++++++>-]<++++.------------.---.>+++++++[<---------->-]<+.
>++++++++[<+++++++++++>-]<-.>++[<----------->-]<.+++++++++++
..>+++++++++[<---------->-]<-----.---.+++.---.[-]<<<]
@
What is RDF?
What?
Why?
SPARQL?
Exam
ples
Exam
ples
RDF: Resource Description Framework
• Resource
– Generalization of “Web resource”
– A thing that can be identified (but not necessarily
retrieved) on the Web
• Description
– A resource is described with statements that
specify the properties and property values of the
resource
• Statement (aka Triple)
– subject: identifies the resource
– predicate: identifies a property of the resource
– object: identifies the value of that property
Everything can be described with (loads
of) triples...
Subject
Property
(resource)
A Triple
Object
(resource
or
literal value)
Subject
(resource)
Related triples form a graph...
An RDF graph can be serialized in several
ways
• RDF/XML: the W3C’s official format
– XML is well established: good for application developers
– very verbose, not very “readable”
– e.g. uniprot.org/uniprot/P00750.rdf
• N-Triple
– good for loading into triple stores
– e.g. uniprot.org/uniprot/P00750.nt
• Turtle ⟵ most examples will use this
– good for reading by humans
– e.g. uniprot.org/uniprot/P00750.ttl
• JSON-LD
– easy for javascript/websites
• ....
• Conversion 100% lossless
A simple example
RDF
What and why
presented by
A Triple
“Jerven Bolleman”
Literal value
RDF identifies resources with URIs
UniProt.rdf
What and why
presented by
A Triple
expasy.org/people/
Jerven_Tjalling
.Bolleman.htm
URI
Multiple URIs may identify the same thing
expasy.org/people/
Jerven_Tjalling
.Bolleman.htm
ch.linkedin.com/
in/jervenbolleman
owl:sameAs
A Triple
The life sciences have an identity
problem...
• www.genenames.org/data/hgnc_data.php?
hgnc_id=9993
– RGS11: regulator of G-protein signaling 11
• http://www.uniprot.org/taxonomy/9993
– European alpine marmot
• ...
Text
Te What is “9993”?
Hello, I a 9993.
I like flower?
The solution are URIs
• In RDF statements:
– subject and predicates must be URIs
– objects may be URIs or literal values

• Advantages:
– No risk of “name clashes” when integrating data from
different sources
– Different people can make statements about the same
resource:

Distributed annotation at a global scale!
Example: From tab-delimited to semantic
RDF in Turtle
format
Tab delimited Converted To
An example
Example: From tab-delimited to semantic
A Triple
Q9VGZ4
P25724
Q9V3H7
Q00403
P23312
P31928
Q9NAE1
Q9TYY1
Q10666
Q21921
Interactions.txt
P32234
P32234
P32234
P42643
P42643
P42643
P41932
P41932
P41932
P41932
Example step 1: Use URIs for subjects
and objects
A Triple
Interactions.txt
...prot/Q9VGZ4
...prot/P25724
...prot/Q9V3H7
...prot/Q00403
...prot/P23312
...prot/P31928
...prot/Q9NAE1
...prot/Q9TYY1
...prot/Q10666
...prot/Q21921
purl.uniprot.org/uniprot/P32234
purl.uniprot.org/uniprot/P32234
purl.uniprot.org/uniprot/P32234
...prot/P42643
...prot/P42643
...prot/P42643
...prot/P41932
...prot/P41932
...prot/P41932
...prot/P41932
Example step 2: Use shorthand syntax
A Triple
Interactions.txt
prot:Q9VGZ4 .
prot:P25724 .
prot:Q9V3H7 .
prot:Q00403 .
prot:P23312 .
prot:P31928 .
prot:Q9NAE1 .
prot:Q9TYY1 .
prot:Q10666 .
prot:Q21921 .
@prefix prot:<purl.uniprot.org/uniprot/>
prot:P32234
prot:P32234
prot:P32234
prot:P42643
prot:P42643
prot:P42643
prot:P41932
prot:P41932
prot:P41932
prot:P41932
Example step 3: Make statements
A Triple
Interactions.txt
@prefix prot:<purl.uniprot.org/uniprot/>
prot:P32234
prot:P32234
prot:P32234
prot:P42643
prot:P42643
prot:P42643
prot:P41932
prot:P41932
prot:P41932
prot:P41932
interacts_with
interacts_with
interacts_with
interacts_with
interacts_with
interacts_with
interacts_with
interacts_with
interacts_with
interacts_with
prot:Q9VGZ4 .
prot:P25724 .
prot:Q9V3H7 .
prot:Q00403 .
prot:P23312 .
prot:P31928 .
prot:Q9NAE1 .
prot:Q9TYY1 .
prot:Q10666 .
prot:Q21921 .
Example step 4: Use URIs for properties
@prefix prot:<purl.uniprot.org/uniprot/>
@prefix core:<purl.uniprot.org/core/>
prot:P32234
prot:P32234
prot:P32234
prot:P42643
prot:P42643
prot:P42643
prot:P41932
prot:P41932
prot:P41932
core:interacts_with
core:interacts_with
core:interacts_with
core:interacts_with
core:interacts_with
core:interacts_with
core:interacts_with
core:interacts_with
core:interacts_with
Interactions.ttl
prot:Q9VGZ4 .
prot:P25724 .
prot:Q9V3H7 .
prot:Q00403 .
prot:P23312 .
prot:P31928 .
prot:Q9NAE1 .
prot:Q9TYY1 .
prot:Q10666 .
RDF What? Quick recap
• RDF describes data with statements (aka triples)
– statement = subject + predicate + object
– related statements form a directed graph
• RDF uses URIs to identify things:
– subject and predicates must be URIs
– objects may be URIs or literal values
• Multiple serialisation formats that are 99.999999%
automatically convertible
Why RDF? Isn’t there a simpler solution?
What?
Why?
SPARQL?
Exam
ples
Exam
ples
A very simple example: FASTA
• Why does everyone in the sequence world use
FASTA?
A very simple example: FASTA
• Why does everyone in the sequence world use
FASTA?
– The smallest common denominator
– You can put in the header what you like and I can
choose to ignore it
• BUT: You only get a sequence...
>Who|cares_about:this?
THISISWHATWEWANT
A simple example: GFF
• Some people want to exchange more than
sequences, and invented GFF:
• BUT: ...
SEQ1 EMBL atg 103 105 . + 0
SEQ1 EMBL exon 103 172 . + 0
A simple example: GFF
• Some people want to exchange more than
sequences, and invented GFF:
• BUT: What do the columns mean?
– Originally, an exchange format for sequence
feature descriptions, later also used for other
annotations
– 3 versions known (to me ;)
– Not extendable without prior agreement of all
users
SEQ1 EMBL atg 103 105 . + 0
SEQ1 EMBL exon 103 172 . + 0
A proper solution: XML
• There is a world beyond sequences and
bioinformatics!

• XML is an IT-industry standard
– Datatypes
– Multi namespaces
– Schemas

• BUT:
– Hierarchical data model
– Schemas close extension
XML represents data as a tree
• XML datatypes
– Multi namespace
– XML Schema closes extensions
• Tree format
entry
Proton
acceptor 196
activ
e
2.7.11.
-
EC
No XML standard for other relationships
prizes:a case study
• XML datatypes
– Multi namespace
– XML Schema closes extensions
• Tree format
entry
Proton
acceptor 196
activ
e
2.7.11.
-
EC
Our data is a graph!
entry
Proton
acceptor
196activ
e
2.7.11.
-
EC
RDF advantages
• W3C standard
• Can be serialized as XML or JSON
• i.e. most benefits of XML or JSON
• Generic graph structure
• URIs as a standard way to identify resources and
their properties
– data integration without name clashes
– distributed annotation
– normalization
• Extensible!
RDF is extensible
• Anyone can say Anything about Anything
– You can say something about my data
• RDF extensions remain compatible
• RDF encourages data and schema reuse
@prefix prot:<purl.uniprot.org/uniprot/>
@prefix intact:<fake.ebi.ac.uk/intact/example>
prot:P32234
prot:P32234
intact:interacts_with
intact:interacts_with
Interactions.ttl
prot:Q9VGZ4
prot:P25724
RDF data model is simple
• Everything can be said with triples

• Generic triple stores
– low maintenance data integration

• SPARQL
– SQL
– XPath
– Regular expressions
for RDF
for RDF
for RDF
Comparison
Flat file XML RDF
Standard NO YES YES
Scalable NO YES YES +
Extendable NO NO YES
Generic

Data model
NO NO YES
Modeling data using RDF
Most common failure in RDF world:
Philosophy over pragmatism
1.	
  Be	
  honest	
  about	
  your	
  data	
  
• what	
  you	
  have:	
  not	
  what	
  you	
  want	
  
2.	
  Change	
  the	
  concept	
  change	
  the	
  IRI	
  
• 	
  One	
  concept	
  can	
  be	
  referred	
  to	
  by	
  multiple	
  
IRI	
  
3.	
  Better	
  to	
  “todo”	
  than	
  to	
  “debate”	
  
Model real data not the the “real world”
• Describe	
  records	
  that	
  relate	
  to	
  real	
  world	
  
things	
  
• Acknowledge	
  that	
  they	
  are	
  records	
  
• Model	
  measurements	
  before	
  “facts”
Example: mouse in a lab
1.5g
<weight>
Example: mouse in a lab
1.5g
<weight>
20g
<weight>
TIME it made you a liar
Example: mouse in a lab
1.5g
<measurement>
20g
<measurement>
<weight>
<weight>
1week
3week
_:1
_:2
<age>
<age>
Describing models using
OWL
OWL: Web Ontology Language
• Will	
  be	
  presented	
  in	
  detail	
  during	
  the	
  week	
  
• Logical	
  meaning	
  added	
  to	
  RDF	
  statements	
  
• That	
  tools	
  use	
  
• Classifies	
  existing	
  data	
  or	
  infers	
  new	
  data	
  
• Very	
  powerful	
  and	
  useful
‹#›
DANGER
It	
  is	
  pure	
  Logic	
  (first order)	
  
45
Classification by restricting set membership
<human> a owl:Class ;
rdfs:subClassOf [ owl:onProperty <legs> .
owl:cardinality 2 ] ;
rdfs:subClassOf [ owl:onProperty <brains> .
owl:cardinality 1 ] ;
rdfs:subClassOf [ owl:onProperty <referenceGenome> .
owl:allValuesFrom <HGCHR_genome> ] ;
…
Classification by restricting set membership
<human> a owl:Class ;
rdfs:subClassOf [ owl:onProperty <legs> .
owl:cardinality 2 ] ;
rdfs:subClassOf [ owl:onProperty <brains> .
owl:cardinality 1 ] ;
rdfs:subClassOf [ owl:onProperty <referenceGenome> .
owl:allValuesFrom <HGCHR_genome> ] ;
…
Lose a leg → no longer human
Validating RDF Data
W3C workgroup in progress
• Data-­‐Shapes	
  	
  
• You	
  don’t	
  want	
  to	
  know	
  how	
  the	
  sausage	
  is	
  
made…	
  	
  
• Vendors	
  looking	
  forward	
  to	
  implementing	
  it	
  
• Currently	
  not	
  that	
  bad,	
  could	
  be	
  better	
  
• First	
  Working	
  Draft
SPARQL
What?
Why?
SPARQL?
Exam
ples
Exam
ples
Why provide a public SPARQL endpoint
• A	
  10	
  man	
  wet	
  laboratory	
  can	
  not	
  afford:	
  
– to	
  host	
  their	
  own	
  database	
  in	
  house	
  holding	
  
all	
  or	
  even	
  a	
  bit	
  of	
  all	
  life	
  science	
  data.	
  	
  
– not	
  to	
  have	
  access,	
  and	
  use,	
  existing	
  life	
  
science	
  information.
← Not CPU Time...
But Brain Time
↓
The right kind of optimisation
Why provide a public SPARQL endpoint
• Classical	
  SQL	
  can	
  be	
  provided	
  on	
  the	
  web	
  
–Is	
  not	
  practical	
  
–No	
  federation	
  
–Poor	
  standards	
  conformance	
  
• Local SQL is expensive
• Local	
  JSON	
  is	
  no	
  better	
  
• Nor	
  is	
  local	
  XML
Data Integration Traditional
Pathway.txt
UniProt.txt
Pathway
Parser
UniProt
Parser
Pathway
Schema
UniProt
Schema
Own Lab Data
Data
warehouse
SQL
queries
$
$
$
$
$
$
Data Integration RDF/SPARQL
Pathway.rdf
UniProt.rdf
Own Lab Data
Triple Store
SPARQL
Queries
$
$?
Why provide a public SPARQL endpoint
• Document	
  centric	
  REST	
  is	
  not	
  enough	
  
–Swiss-­‐Prot	
  available	
  as	
  REST	
  	
  
–(over e-mail !!) since 1986
–expasy.ch since 1993
–www.uniprot.org	
  since	
  2002	
  
• Most user use a GUI not a CLI
• developers	
  build	
  GUI	
  on	
  a	
  CLI
57
© 2015 SIB
58
© 2015 SIB
60
© 2015 SIB
help@uniprot.org
100
10'000
1'000'000
2015-012015-022015-032015-042015-052015-062015-072015-08
queries ask select
construct describe
Queries per month in 2015
peak: 4 million per month
Real users
Mix between hard analytics and super specific
Estimate somewhere between:
300 - 1000 real humans per month
We know they are real because they take
holidays ;)
Using the Semantic Web for faster (Bio-) Research
Exercises with SPARQL
tutorial.sparql.uniprot.org
Why learn SPARQL
• Standardised formal query language
– implementation independent
• SPARQL ➔ SQL (via R2RML)
• SPARQL ➔ webservice (via SADI)
• SPARQL ➔ LDAP (e.g. SquirrelRDF)
• SPARQL ➔ RDF (triplestore e.g. OWLIM-se)
• SPARQL ➔ HADOOP/HIVE (e.g. SHARD)
• SPARQL ➔ Linked Data Fragments
– How you query independent of how you store!
Apparently it helps
kill vampires !!!
Its SPARQLy mammal time !!
Lets look at an single taxon record
www.uniprot.org/taxonomy/9993
Lets look at an single taxon record
www.uniprot.org/taxonomy/9993
@base <http://purl.uniprot.org/taxonomy/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix up: <http://purl.uniprot.org/core/> .
<9993> rdf:type up:Taxon ;
up:rank up:Species ;
up:reviewed true ;
up:mnemonic "MARMR" ;
up:scientificName "Marmota marmota" ;
up:commonName "Alpine marmot" ;
up:otherName "European marmot" ;
rdfs:seeAlso <http://animaldiversity.ummz.umich.edu/site/
accounts/information/Marmota_marmota.html> ,
<http://www.alphagalileo.org/Organisations/ViewItem.aspx?
OrganisationId=2043&ItemId=70106&CultureCode=en> ,
<http://www.arkive.org/alpine-marmot/marmota-marmota/
info.html> ,
Turtle is the RDF serialization aligned with
SPARQL
• Shorthand	
  to	
  avoid	
  typing	
  so	
  much	
  
– .	
  ‘dot’	
  is	
  end	
  statement	
  
– ;	
  ‘semi-­‐colon’	
  repeat	
  subject	
  
– ,	
  ‘comma’	
  is	
  repeat	
  subject	
  and	
  predicate	
  
• prefix	
  
– before	
  ‘:’	
  is	
  abbreviation	
  of	
  uri
Why don’t these queries work elsewhere?
• PREFIX	
  
– On	
  the	
  web	
  you	
  often	
  have	
  to	
  add	
  these	
  
– But	
  some	
  can	
  be	
  preconfigured
PREFIX :<http://purl.uniprot.org/core/>
SELECT ?x
FROM <http://purl.uniprot.org/taxonomy/>
WHERE {?x a :Taxon}
a = rdf:type = <http://www.w3.org/1999/02/22-
rdf-syntax-ns#type>
<9993> rdf:type up:Taxon ;
up:rank up:Species ;
up:reviewed true ;
up:mnemonic "MARMR" ;
up:scientificName "Marmota marmota" ;
up:commonName "Alpine marmot" ;
up:otherName "European marmot" ;
rdfs:subClassOf <9992> ;
skos:narrowerTransitive <9994> ;
rdfs:subClassOf
taxon:9994 is a more specific
classification than
<9993> rdf:type up:Taxon ;
up:rank up:Species ;
up:reviewed true ;
up:mnemonic "MARMR" ;
up:scientificName "Marmota marmota" ;
up:commonName "Alpine marmot" ;
up:otherName "European marmot" ;
rdfs:subClassOf <9992> ;
skos:narrowerTransitive <9994> ;
rank => “The level, for nomenclatural
purposes, of a taxon in a taxonomic
hierarchy”
Lets learn SPARQL
• Queries	
  over	
  RDF	
  data.	
  
– Four	
  basic	
  types	
  
• SELECT	
  
– Returns	
  “tab	
  delimited”	
  results	
  	
  
• CONSTRUCT	
  
– Makes	
  new	
  triples	
  
• DESCRIBE	
  
– Returns	
  all	
  triples	
  mentioning	
  a	
  
resource	
  
SPARQL:queries triple pattern
taxon:9606 rdf:type core:Taxon .
SPARQL:queries triple pattern
?anyTaxon rdf:type core:Taxon .
SPARQL:queries triple pattern
?anyTaxon rdf:type core:Taxon .
SELECT ?anyTaxon
WHERE {
}
SPARQL:queries triple pattern
taxon:9606 rdf:type core:Taxon .
taxon:9606 core:reviewed “true” .
SPARQL:queries triple pattern
?anyTaxon rdf:type core:Taxon .
?anyTaxon core:reviewed “true” .
SPARQL:queries triple pattern
?anyTaxon rdf:type core:Taxon .
?anyTaxon core:reviewed “true” .
SELECT ?anyTaxon
WHERE {
}
SPARQL:queries triple pattern
?anyTaxon rdf:type core:Taxon .
?anyTaxin core:reviewed “true” .
SELECT ?anyTaxon
WHERE {
}
SPARQL:queries triple pattern
?anyTaxon rdf:type core:Taxon .
$anyTaxon core:reviewed “true” .
SELECT ?anyTaxon
WHERE {
}
tutorial.sparql.uniprot.org
1: Select all taxon from NCBI/UniProt taxonomy
• Taxonomy	
  at	
  www|sparql.uniprot.org	
  
• Matches	
  NCBI	
  
• Time	
  sync	
  
• Adds	
  more	
  names	
  
• Ands	
  images
‹#›
88
Lets learn SPARQL
Shorthand a = rdf:type
2: AND join (default)
3: Shortcuts
Remember ‘;’ shortcut
4: Two variables one output column
5: Optional
• When	
  values	
  may	
  be	
  missing	
  
– yet	
  interesting	
  when	
  they	
  are	
  there	
  
• Use	
  as	
  sub	
  query	
  
• bound	
  values	
  from	
  outside	
  stay	
  bound	
  
inside	
  
– ?x	
  ?y?z	
  .	
  OPTIONAL	
  {?x	
  ?b	
  ?c}	
  	
  
• ?x	
  same	
  variable	
  =	
  same	
  thing
5: OPTIONAL commonName
6: UNION
• Allows	
  you	
  to	
  combine	
  query	
  patterns	
  as	
  an	
  
OR	
  operation.	
  
• Joins	
  are	
  still	
  from	
  outer	
  to	
  inner.	
  
UNION
Negation
• When	
  you	
  do	
  not	
  want	
  a	
  certain	
  category	
  of	
  
matches.
SELECT ?pet
WHERE {
?pet a pets:Friendly .
}
Oooops
7: Not exists (Negation 1)
8: Minus (Negation 2)
MINUS{} or FILTER (NOT EXISTS{})
• Whats	
  the	
  difference?	
  
– MINUS	
  subtracts	
  results	
  
– NOT	
  EXITS	
  tests	
  if	
  the	
  sub	
  pattern	
  is	
  
possible	
  at	
  all.	
  
• Normally	
  the	
  faster	
  option.
9: MINUS all data
10: FILTER (NOT EXISTS{}) no results
11: Negation option 3
SPARQL 1.0
SELECT ?subject ?rank
WHERE {
?subject up:rank ?rank .
OPTIONAL { ?subject up:rank up:Genus .
?subject up:rank ?genus .}
FILTER(! BOUND(?genus))
}
FILTERS
• You	
  just	
  saw	
  it	
  twice	
  
– Once	
  in	
  the	
  !BOUND	
  
– Once	
  in	
  the	
  NOT	
  EXISTS	
  
• FILTERS	
  a	
  result	
  set	
  by	
  possibly	
  removing	
  
values	
  
– FILTER	
  do	
  not	
  add	
  a	
  value	
  to	
  the	
  result	
  
• Inside	
  the	
  same	
  graph	
  pattern	
  order	
  
independent.
12: Filter
13: Filter on not in
Using implicit AND between lines
Using implicit AND between lines
15: FILTER IN
16: FILTER using OR
FILTER on numbers
• <	
  	
  
– FILTER	
  (1	
  <	
  2)	
  	
  	
  	
  	
  (17)	
  
• >	
  
– FILTER	
  (2	
  >	
  1)	
  	
  	
  	
  	
  (18)	
  
• =	
  
– FILTER	
  (1	
  =1)	
  	
  	
  	
  	
  (19)	
  
• !=	
  
– FILTER(1	
  !=	
  2)	
  	
  	
  	
  (20)	
  
Filters
• ?x	
  =	
  ?y	
  does	
  casting	
  (value	
  conversions)	
  (21)	
  
– 1.0^^xsd:float	
  =	
  1^^xsd:int	
  is	
  true	
  
• sameTerm(?x,	
  ?y)	
  does	
  not	
  (22)	
  
– sameTerm(1.0^^xsd:float,	
  1^^xsd:int)
FUNCTIONS for in filters and in binds
• Functions	
  
– STRLEN	
  
– SUBSTR	
  
– UCASE	
  
– LCASE	
  
– STRSTARTS	
  
– STRENDS	
  
– CONTAINS	
  
– STRBEFORE	
  
– STRAFTER	
  
– ENCODE_FOR_URI	
  
– CONCAT	
  
– langMatches	
  
– REGEX	
  
– REPLACE	
  
– IRI	
  
– STR
24: SUBSTR == substring
24: STRLEN == String Length
25: CONTAINS is case sensitive is it in
there
26: REGEX, just like java|python regex
BIND
• Builds	
  new	
  Values	
  
– Closes	
  the	
  basic	
  graph	
  pattern	
  (22)	
  
• Always	
  declare	
  before	
  use.
SELECT ?p WHERE {
{
?taxon a :Taxon .
}
BIND (?taxon AS ?p)
}
BIND existing variable to a new one
27: CONCAT
BIND can assign any output
Aggregate functions
• on	
  select	
  line	
  
• limited	
  in	
  number	
  
– count	
  
– sum	
  
– avg	
  
– min	
  
– max	
  
– groupConcat	
  
– sample
© 2013 SIB
30: count
© 2013 SIB
31: SAMPLE should give a random result back
© 2013 SIB
Follow the path
32: Path queries
33: Finding a grand parent using normal
joins
34: Finding a grandParent using a path
query
35: | is OR for predicate
36: Same result with UNION
37: Finding any ancestor
38: Can use the variable in a normal join
afterwards
© 2013 SIB
GROUP BY
GROUP BY
• Needed	
  for	
  aggregate	
  values	
  
• After	
  closing	
  the	
  where	
  clause	
  
– ...	
  WHERE	
  {?x	
  ?y	
  ?z}	
  GROUP	
  BY	
  ?x
39: GROUP BY
HAVING
• 
I have carrot !
HAVING
• FILTER	
  for	
  aggregates	
  	
  
• After	
  the	
  GROUP	
  BY	
  clause	
  
– ...	
  GROUP	
  BY	
  ?x	
  HAVING	
  (count(?y)	
  >	
  2)	
  
– ...	
  GROUP	
  BY	
  ?x	
  HAVING	
  (min(?y)	
  =	
  2)	
  
– etc...
40: HAVING
© 2013 SIB
LIMITS
&
OFFSET
41: LIMIT and OFFSET
• OFFSET	
  is	
  skip	
  first	
  results	
  
• LIMIT	
  return	
  no	
  more	
  than	
  x	
  results
ORDER
ORDER
© 2013 SIB
VALUES
• Super	
  BIND	
  
• Provide	
  inline	
  data
Marmota marmota marmota
Examples
• Parameter	
  lists	
  are	
  between	
  ()	
  
Text
VALUES (?annotation) {
(core:Disease_Annotation)
(core:Disulfide_Bond_Annotation)
}
Examples
• Undef	
  means	
  no	
  value	
  at	
  	
  
– all	
  not	
  bound
Text
VALUES (?annotation ?begin) {
(core:Disease_Annotation UNDEF)
(core:Disulfide_Bond_Annotation 2)
}
VALUES
• After	
  declaring	
  a	
  set	
  of	
  values	
  you	
  can	
  use	
  
them	
  in	
  your	
  query.
SELECT ?comment WHERE {
VALUES (?annotation ?begin) {
(core:Disease_Annotation UNDEF)
(core:Disulfide_Bond_Annotation 2)
}
?annotation rdfs:comment ?comment .
}
SERVICE: Using other sparql endpoints
• SERVICE<URL	
  of	
  other	
  endpoint>	
  
– Runs	
  a	
  sub	
  query	
  on	
  the	
  other	
  endpoint	
  
and	
  merges	
  it	
  back	
  into	
  your	
  query.
“Life is better with friends who understand you.”
SERVICE
SERVICE
• Useful	
  
– Quick	
  experimenting	
  with	
  combing	
  multiple	
  
datasources	
  
– Quick	
  for	
  queries	
  where	
  not	
  to	
  much	
  data	
  is	
  send	
  
to	
  the	
  remote	
  point	
  
• Slow	
  
– When	
  you	
  ask	
  for	
  to	
  much	
  data	
  
– Remote	
  endpoint	
  not	
  resourced	
  for	
  your	
  
questions
SERVICE
• Slowly	
  improving	
  
• Theoretically	
  unfixable	
  
• Practically	
  could	
  be	
  much	
  better	
  
• 1000	
  x	
  speed	
  up	
  small	
  step	
  away
Lets make
some triples
Construction
• CONSTRUCT	
  
– New	
  triples	
  	
  
• downloads	
  RDF	
  
– Does	
  not	
  update	
  store
Constructing an owl:sameAs between two
URI
INSERT
• Adds	
  data	
  
– like	
  construct
DELETE
• Removes	
  data	
  
– Triples	
  matching	
  are	
  removed	
  from	
  the	
  
data	
  
– Triples	
  can	
  be	
  bound	
  using	
  where	
  clause
DELETE
DELETE
INSERT
• Single	
  atomic	
  operation	
  
• Transactions	
  store	
  API	
  option
Atomic operation
© 2013 SIB
I’m exhausted now
Of Course Biology is complicated
#baseURI: http://purl.uniprot.org/unirule/UR000107224

#Rule UR000107224 Created by:bridge on:2009-02-12 Modified by:rantunes on:2015-06-09

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX uniprot:<http://purl.uniprot.org/uniprot/>

PREFIX sequence:<http://purl.uniprot.org/sequences/>

PREFIX unirule:<http://purl.uniprot.org/unirules/>

PREFIX taxon:<http://purl.uniprot.org/taxonomy/>

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

PREFIX hamap-sparql:<http://example.org/hamap_sparql/>

PREFIX up:<http://purl.uniprot.org/core/>

PREFIX faldo:<http://biohackathon.org/resource/faldo#>

PREFIX method:<http://example.org/method/>

PREFIX keyword:<http://purl.uniprot.org/keywords/>

PREFIX owl:<http://www.w3.org/2002/07/owl#>

PREFIX proteome:<http://purl.uniprot.org/proteomes/>

PREFIX hamap:<http://purl.uniprot.org/hamap/>

PREFIX annotation:<http://purl.uniprot.org/annotation/>

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>

CONSTRUCT {



?this up:annotation ?annotation0, 

?annotation1, 

?annotation2, 

?annotation3, 

?annotation5; 

up:classifiedWith <http://purl.obolibrary.org/obo/19805>, 

<http://purl.obolibrary.org/obo/334>, 

<http://purl.obolibrary.org/obo/34354>, 

<http://purl.obolibrary.org/obo/43420>, 

<http://purl.obolibrary.org/obo/6569>, 

<http://purl.obolibrary.org/obo/8198>, 

keyword:223, 

keyword:560, 

keyword:662 . 

?annotation0 a up:Function_Annotation; 

rdfs:comment "Catalyzes the oxidative ring opening of 3-hydroxyanthranilate to 2-amino-3-carboxymuconate semialdehyde, which
Questions

More Related Content

What's hot

Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedJoel Azzopardi
 
Overview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceOverview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceHaklae Kim
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410Arnaud Le Hors
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaEUCLID project
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real worldDiego Valerio Camarda
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityMathieu d'Aquin
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebOscar Corcho
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Mathieu d'Aquin
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 

What's hot (20)

Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
Overview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceOverview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web Science
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open University
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 

Similar to RDF: what and why plus a SPARQL tutorial

Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLJane Frazier
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceBarry Norton
 
Grails And The Semantic Web
Grails And The Semantic WebGrails And The Semantic Web
Grails And The Semantic Webwilliam_greenly
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022
Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022
Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022HostedbyConfluent
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies LIBIS
 
Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDAnushaMahmood
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesNandana Mihindukulasooriya
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsContinuent
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit
 

Similar to RDF: what and why plus a SPARQL tutorial (20)

Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
 
Implementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource ConditionsImplementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource Conditions
 
Nzitf Velociraptor Workshop
Nzitf Velociraptor WorkshopNzitf Velociraptor Workshop
Nzitf Velociraptor Workshop
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and Inference
 
Grails And The Semantic Web
Grails And The Semantic WebGrails And The Semantic Web
Grails And The Semantic Web
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022
Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022
Evolving Schemas Without Schema Evolution With Andreas Evers | Current 2022
 
Introduction to W3C Linked Data Platform
Introduction to W3C Linked Data PlatformIntroduction to W3C Linked Data Platform
Introduction to W3C Linked Data Platform
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies
 
Data interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTDData interchange integration, HTML XML Biological XML DTD
Data interchange integration, HTML XML Biological XML DTD
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examples
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 

More from Jerven Bolleman

Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
 
UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?Jerven Bolleman
 
sparql,uniprot.org in production
sparql,uniprot.org in productionsparql,uniprot.org in production
sparql,uniprot.org in productionJerven Bolleman
 
The UniProt SPARQL endpoint: 20 billion quads in production
The UniProt SPARQL endpoint: 20 billion quads in productionThe UniProt SPARQL endpoint: 20 billion quads in production
The UniProt SPARQL endpoint: 20 billion quads in productionJerven Bolleman
 
Biohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityBiohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityJerven Bolleman
 

More from Jerven Bolleman (8)

Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
Why sparql tohu
Why sparql tohuWhy sparql tohu
Why sparql tohu
 
UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?
 
sparql,uniprot.org in production
sparql,uniprot.org in productionsparql,uniprot.org in production
sparql,uniprot.org in production
 
The UniProt SPARQL endpoint: 20 billion quads in production
The UniProt SPARQL endpoint: 20 billion quads in productionThe UniProt SPARQL endpoint: 20 billion quads in production
The UniProt SPARQL endpoint: 20 billion quads in production
 
Biohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics ProductivityBiohackathon2013: Tripling Bioinformatics Productivity
Biohackathon2013: Tripling Bioinformatics Productivity
 
Learning sparql 2012 12
Learning sparql 2012 12Learning sparql 2012 12
Learning sparql 2012 12
 
Uni protsparqlcloud
Uni protsparqlcloudUni protsparqlcloud
Uni protsparqlcloud
 

Recently uploaded

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 

Recently uploaded (20)

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 

RDF: what and why plus a SPARQL tutorial

  • 1. RDF what and why Jerven Bolleman Developer Swiss-Prot Group
  • 2. Introduction • RDF   • Its  a  technology   • Cost  and  affordability  are  key  concerns
  • 3.
  • 4. ]<--.+++++++++++.++++++++.---------.>++++++++[<---------->-] <++.>+++++[<+++++++++++++>-]<.+++++++++++++.----------.>++++ +++[<---------->-]<++.>++++++++[<++++++++++>-]<.>+++[<-----> -]<.>+++[<++++++>-]<..>+++++++++[<--------->-]<--.>+++++++[< ++++++++++>-]<+++.+++++++++++.>++++++++[<----------->-]<++++ .>+++++[<+++++++++++++>-]<.>+++[<++++++>-]<-.---.++++++.---- ---.----------.>++++++++[<----------->-]<+.---.[-]<<<->[-]>[ -]<<[>+>+<<-]>>[<<+>>-]>>>[-]<<<+++++++++<[>>>+<<[>+>[-]<<-] >[<+>-]>[<<++++++++++>>>+<-]<<-<-]+++++++++>[<->-]>>+>[<[-]< <+>>>-]>[-]+<<[>+>-<<-]<<<[>>+>+<<<-]>>>[<<<+>>>-]<>>[<+>-]< <-[>[-]<[-]]>>+<[>[-]<-]<++++++++[<++++++<++++++>>-]>>>[>+>+ <<-]>>[<<+>>-]<[<<<<<.>>>>>-]<<<<<<.>>[-]>[-]++++[<++++++++> -]<.>++++[<++++++++>-]<++.>+++++[<+++++++++>-]<.><+++++..--- -----.-------.>>[>>+>+<<<-]>>>[<<<+>>>-]<[<<<<++++++++++++++ .>>>>-]<<<<[-]>++++[<++++++++>-]<.>+++++++++[<+++++++++>-]<- -.---------.>+++++++[<---------->-]<.>++++++[<+++++++++++>-] <.+++..+++++++++++++.>++++++++[<---------->-]<--.>+++++++++[ <+++++++++>-]<--.-.>++++++++[<---------->-]<++.>++++++++[<++ ++++++++>-]<++++.------------.---.>+++++++[<---------->-]<+. >++++++++[<+++++++++++>-]<-.>++[<----------->-]<.+++++++++++ ..>+++++++++[<---------->-]<-----.---.+++.---.[-]<<<] @
  • 6. RDF: Resource Description Framework • Resource – Generalization of “Web resource” – A thing that can be identified (but not necessarily retrieved) on the Web • Description – A resource is described with statements that specify the properties and property values of the resource • Statement (aka Triple) – subject: identifies the resource – predicate: identifies a property of the resource – object: identifies the value of that property
  • 7. Everything can be described with (loads of) triples... Subject Property (resource) A Triple Object (resource or literal value) Subject (resource)
  • 8. Related triples form a graph...
  • 9. An RDF graph can be serialized in several ways • RDF/XML: the W3C’s official format – XML is well established: good for application developers – very verbose, not very “readable” – e.g. uniprot.org/uniprot/P00750.rdf • N-Triple – good for loading into triple stores – e.g. uniprot.org/uniprot/P00750.nt • Turtle ⟵ most examples will use this – good for reading by humans – e.g. uniprot.org/uniprot/P00750.ttl • JSON-LD – easy for javascript/websites • .... • Conversion 100% lossless
  • 10. A simple example RDF What and why presented by A Triple “Jerven Bolleman” Literal value
  • 11. RDF identifies resources with URIs UniProt.rdf What and why presented by A Triple expasy.org/people/ Jerven_Tjalling .Bolleman.htm URI
  • 12. Multiple URIs may identify the same thing expasy.org/people/ Jerven_Tjalling .Bolleman.htm ch.linkedin.com/ in/jervenbolleman owl:sameAs A Triple
  • 13. The life sciences have an identity problem... • www.genenames.org/data/hgnc_data.php? hgnc_id=9993 – RGS11: regulator of G-protein signaling 11 • http://www.uniprot.org/taxonomy/9993 – European alpine marmot • ... Text Te What is “9993”?
  • 14. Hello, I a 9993. I like flower?
  • 15. The solution are URIs • In RDF statements: – subject and predicates must be URIs – objects may be URIs or literal values
 • Advantages: – No risk of “name clashes” when integrating data from different sources – Different people can make statements about the same resource:
 Distributed annotation at a global scale!
  • 16. Example: From tab-delimited to semantic RDF in Turtle format Tab delimited Converted To An example
  • 17. Example: From tab-delimited to semantic A Triple Q9VGZ4 P25724 Q9V3H7 Q00403 P23312 P31928 Q9NAE1 Q9TYY1 Q10666 Q21921 Interactions.txt P32234 P32234 P32234 P42643 P42643 P42643 P41932 P41932 P41932 P41932
  • 18. Example step 1: Use URIs for subjects and objects A Triple Interactions.txt ...prot/Q9VGZ4 ...prot/P25724 ...prot/Q9V3H7 ...prot/Q00403 ...prot/P23312 ...prot/P31928 ...prot/Q9NAE1 ...prot/Q9TYY1 ...prot/Q10666 ...prot/Q21921 purl.uniprot.org/uniprot/P32234 purl.uniprot.org/uniprot/P32234 purl.uniprot.org/uniprot/P32234 ...prot/P42643 ...prot/P42643 ...prot/P42643 ...prot/P41932 ...prot/P41932 ...prot/P41932 ...prot/P41932
  • 19. Example step 2: Use shorthand syntax A Triple Interactions.txt prot:Q9VGZ4 . prot:P25724 . prot:Q9V3H7 . prot:Q00403 . prot:P23312 . prot:P31928 . prot:Q9NAE1 . prot:Q9TYY1 . prot:Q10666 . prot:Q21921 . @prefix prot:<purl.uniprot.org/uniprot/> prot:P32234 prot:P32234 prot:P32234 prot:P42643 prot:P42643 prot:P42643 prot:P41932 prot:P41932 prot:P41932 prot:P41932
  • 20. Example step 3: Make statements A Triple Interactions.txt @prefix prot:<purl.uniprot.org/uniprot/> prot:P32234 prot:P32234 prot:P32234 prot:P42643 prot:P42643 prot:P42643 prot:P41932 prot:P41932 prot:P41932 prot:P41932 interacts_with interacts_with interacts_with interacts_with interacts_with interacts_with interacts_with interacts_with interacts_with interacts_with prot:Q9VGZ4 . prot:P25724 . prot:Q9V3H7 . prot:Q00403 . prot:P23312 . prot:P31928 . prot:Q9NAE1 . prot:Q9TYY1 . prot:Q10666 . prot:Q21921 .
  • 21. Example step 4: Use URIs for properties @prefix prot:<purl.uniprot.org/uniprot/> @prefix core:<purl.uniprot.org/core/> prot:P32234 prot:P32234 prot:P32234 prot:P42643 prot:P42643 prot:P42643 prot:P41932 prot:P41932 prot:P41932 core:interacts_with core:interacts_with core:interacts_with core:interacts_with core:interacts_with core:interacts_with core:interacts_with core:interacts_with core:interacts_with Interactions.ttl prot:Q9VGZ4 . prot:P25724 . prot:Q9V3H7 . prot:Q00403 . prot:P23312 . prot:P31928 . prot:Q9NAE1 . prot:Q9TYY1 . prot:Q10666 .
  • 22. RDF What? Quick recap • RDF describes data with statements (aka triples) – statement = subject + predicate + object – related statements form a directed graph • RDF uses URIs to identify things: – subject and predicates must be URIs – objects may be URIs or literal values • Multiple serialisation formats that are 99.999999% automatically convertible
  • 23. Why RDF? Isn’t there a simpler solution? What? Why? SPARQL? Exam ples Exam ples
  • 24. A very simple example: FASTA • Why does everyone in the sequence world use FASTA?
  • 25. A very simple example: FASTA • Why does everyone in the sequence world use FASTA? – The smallest common denominator – You can put in the header what you like and I can choose to ignore it • BUT: You only get a sequence... >Who|cares_about:this? THISISWHATWEWANT
  • 26. A simple example: GFF • Some people want to exchange more than sequences, and invented GFF: • BUT: ... SEQ1 EMBL atg 103 105 . + 0 SEQ1 EMBL exon 103 172 . + 0
  • 27. A simple example: GFF • Some people want to exchange more than sequences, and invented GFF: • BUT: What do the columns mean? – Originally, an exchange format for sequence feature descriptions, later also used for other annotations – 3 versions known (to me ;) – Not extendable without prior agreement of all users SEQ1 EMBL atg 103 105 . + 0 SEQ1 EMBL exon 103 172 . + 0
  • 28. A proper solution: XML • There is a world beyond sequences and bioinformatics!
 • XML is an IT-industry standard – Datatypes – Multi namespaces – Schemas
 • BUT: – Hierarchical data model – Schemas close extension
  • 29. XML represents data as a tree • XML datatypes – Multi namespace – XML Schema closes extensions • Tree format entry Proton acceptor 196 activ e 2.7.11. - EC
  • 30. No XML standard for other relationships prizes:a case study • XML datatypes – Multi namespace – XML Schema closes extensions • Tree format entry Proton acceptor 196 activ e 2.7.11. - EC
  • 31. Our data is a graph! entry Proton acceptor 196activ e 2.7.11. - EC
  • 32. RDF advantages • W3C standard • Can be serialized as XML or JSON • i.e. most benefits of XML or JSON • Generic graph structure • URIs as a standard way to identify resources and their properties – data integration without name clashes – distributed annotation – normalization • Extensible!
  • 33. RDF is extensible • Anyone can say Anything about Anything – You can say something about my data • RDF extensions remain compatible • RDF encourages data and schema reuse @prefix prot:<purl.uniprot.org/uniprot/> @prefix intact:<fake.ebi.ac.uk/intact/example> prot:P32234 prot:P32234 intact:interacts_with intact:interacts_with Interactions.ttl prot:Q9VGZ4 prot:P25724
  • 34. RDF data model is simple • Everything can be said with triples
 • Generic triple stores – low maintenance data integration
 • SPARQL – SQL – XPath – Regular expressions for RDF for RDF for RDF
  • 35. Comparison Flat file XML RDF Standard NO YES YES Scalable NO YES YES + Extendable NO NO YES Generic
 Data model NO NO YES
  • 37. Most common failure in RDF world: Philosophy over pragmatism 1.  Be  honest  about  your  data   • what  you  have:  not  what  you  want   2.  Change  the  concept  change  the  IRI   •  One  concept  can  be  referred  to  by  multiple   IRI   3.  Better  to  “todo”  than  to  “debate”  
  • 38. Model real data not the the “real world” • Describe  records  that  relate  to  real  world   things   • Acknowledge  that  they  are  records   • Model  measurements  before  “facts”
  • 39. Example: mouse in a lab 1.5g <weight>
  • 40. Example: mouse in a lab 1.5g <weight> 20g <weight>
  • 41. TIME it made you a liar
  • 42. Example: mouse in a lab 1.5g <measurement> 20g <measurement> <weight> <weight> 1week 3week _:1 _:2 <age> <age>
  • 44. OWL: Web Ontology Language • Will  be  presented  in  detail  during  the  week   • Logical  meaning  added  to  RDF  statements   • That  tools  use   • Classifies  existing  data  or  infers  new  data   • Very  powerful  and  useful
  • 45. ‹#› DANGER It  is  pure  Logic  (first order)   45
  • 46. Classification by restricting set membership <human> a owl:Class ; rdfs:subClassOf [ owl:onProperty <legs> . owl:cardinality 2 ] ; rdfs:subClassOf [ owl:onProperty <brains> . owl:cardinality 1 ] ; rdfs:subClassOf [ owl:onProperty <referenceGenome> . owl:allValuesFrom <HGCHR_genome> ] ; …
  • 47. Classification by restricting set membership <human> a owl:Class ; rdfs:subClassOf [ owl:onProperty <legs> . owl:cardinality 2 ] ; rdfs:subClassOf [ owl:onProperty <brains> . owl:cardinality 1 ] ; rdfs:subClassOf [ owl:onProperty <referenceGenome> . owl:allValuesFrom <HGCHR_genome> ] ; … Lose a leg → no longer human
  • 49. W3C workgroup in progress • Data-­‐Shapes     • You  don’t  want  to  know  how  the  sausage  is   made…     • Vendors  looking  forward  to  implementing  it   • Currently  not  that  bad,  could  be  better   • First  Working  Draft
  • 51. Why provide a public SPARQL endpoint • A  10  man  wet  laboratory  can  not  afford:   – to  host  their  own  database  in  house  holding   all  or  even  a  bit  of  all  life  science  data.     – not  to  have  access,  and  use,  existing  life   science  information.
  • 52. ← Not CPU Time... But Brain Time ↓ The right kind of optimisation
  • 53. Why provide a public SPARQL endpoint • Classical  SQL  can  be  provided  on  the  web   –Is  not  practical   –No  federation   –Poor  standards  conformance   • Local SQL is expensive • Local  JSON  is  no  better   • Nor  is  local  XML
  • 55. Data Integration RDF/SPARQL Pathway.rdf UniProt.rdf Own Lab Data Triple Store SPARQL Queries $ $?
  • 56. Why provide a public SPARQL endpoint • Document  centric  REST  is  not  enough   –Swiss-­‐Prot  available  as  REST     –(over e-mail !!) since 1986 –expasy.ch since 1993 –www.uniprot.org  since  2002   • Most user use a GUI not a CLI • developers  build  GUI  on  a  CLI
  • 59.
  • 62. Real users Mix between hard analytics and super specific Estimate somewhere between: 300 - 1000 real humans per month We know they are real because they take holidays ;)
  • 63. Using the Semantic Web for faster (Bio-) Research
  • 65. Why learn SPARQL • Standardised formal query language – implementation independent • SPARQL ➔ SQL (via R2RML) • SPARQL ➔ webservice (via SADI) • SPARQL ➔ LDAP (e.g. SquirrelRDF) • SPARQL ➔ RDF (triplestore e.g. OWLIM-se) • SPARQL ➔ HADOOP/HIVE (e.g. SHARD) • SPARQL ➔ Linked Data Fragments – How you query independent of how you store!
  • 66. Apparently it helps kill vampires !!!
  • 68. Lets look at an single taxon record www.uniprot.org/taxonomy/9993
  • 69. Lets look at an single taxon record www.uniprot.org/taxonomy/9993
  • 70.
  • 71. @base <http://purl.uniprot.org/taxonomy/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix up: <http://purl.uniprot.org/core/> . <9993> rdf:type up:Taxon ; up:rank up:Species ; up:reviewed true ; up:mnemonic "MARMR" ; up:scientificName "Marmota marmota" ; up:commonName "Alpine marmot" ; up:otherName "European marmot" ; rdfs:seeAlso <http://animaldiversity.ummz.umich.edu/site/ accounts/information/Marmota_marmota.html> , <http://www.alphagalileo.org/Organisations/ViewItem.aspx? OrganisationId=2043&ItemId=70106&CultureCode=en> , <http://www.arkive.org/alpine-marmot/marmota-marmota/ info.html> ,
  • 72. Turtle is the RDF serialization aligned with SPARQL • Shorthand  to  avoid  typing  so  much   – .  ‘dot’  is  end  statement   – ;  ‘semi-­‐colon’  repeat  subject   – ,  ‘comma’  is  repeat  subject  and  predicate   • prefix   – before  ‘:’  is  abbreviation  of  uri
  • 73. Why don’t these queries work elsewhere? • PREFIX   – On  the  web  you  often  have  to  add  these   – But  some  can  be  preconfigured PREFIX :<http://purl.uniprot.org/core/> SELECT ?x FROM <http://purl.uniprot.org/taxonomy/> WHERE {?x a :Taxon}
  • 74. a = rdf:type = <http://www.w3.org/1999/02/22- rdf-syntax-ns#type>
  • 75. <9993> rdf:type up:Taxon ; up:rank up:Species ; up:reviewed true ; up:mnemonic "MARMR" ; up:scientificName "Marmota marmota" ; up:commonName "Alpine marmot" ; up:otherName "European marmot" ; rdfs:subClassOf <9992> ; skos:narrowerTransitive <9994> ; rdfs:subClassOf taxon:9994 is a more specific classification than
  • 76. <9993> rdf:type up:Taxon ; up:rank up:Species ; up:reviewed true ; up:mnemonic "MARMR" ; up:scientificName "Marmota marmota" ; up:commonName "Alpine marmot" ; up:otherName "European marmot" ; rdfs:subClassOf <9992> ; skos:narrowerTransitive <9994> ; rank => “The level, for nomenclatural purposes, of a taxon in a taxonomic hierarchy”
  • 77. Lets learn SPARQL • Queries  over  RDF  data.   – Four  basic  types   • SELECT   – Returns  “tab  delimited”  results     • CONSTRUCT   – Makes  new  triples   • DESCRIBE   – Returns  all  triples  mentioning  a   resource  
  • 80. SPARQL:queries triple pattern ?anyTaxon rdf:type core:Taxon . SELECT ?anyTaxon WHERE { }
  • 81. SPARQL:queries triple pattern taxon:9606 rdf:type core:Taxon . taxon:9606 core:reviewed “true” .
  • 82. SPARQL:queries triple pattern ?anyTaxon rdf:type core:Taxon . ?anyTaxon core:reviewed “true” .
  • 83. SPARQL:queries triple pattern ?anyTaxon rdf:type core:Taxon . ?anyTaxon core:reviewed “true” . SELECT ?anyTaxon WHERE { }
  • 84. SPARQL:queries triple pattern ?anyTaxon rdf:type core:Taxon . ?anyTaxin core:reviewed “true” . SELECT ?anyTaxon WHERE { }
  • 85. SPARQL:queries triple pattern ?anyTaxon rdf:type core:Taxon . $anyTaxon core:reviewed “true” . SELECT ?anyTaxon WHERE { }
  • 87. 1: Select all taxon from NCBI/UniProt taxonomy • Taxonomy  at  www|sparql.uniprot.org   • Matches  NCBI   • Time  sync   • Adds  more  names   • Ands  images
  • 90. 2: AND join (default)
  • 93. 4: Two variables one output column
  • 94. 5: Optional • When  values  may  be  missing   – yet  interesting  when  they  are  there   • Use  as  sub  query   • bound  values  from  outside  stay  bound   inside   – ?x  ?y?z  .  OPTIONAL  {?x  ?b  ?c}     • ?x  same  variable  =  same  thing
  • 96. 6: UNION • Allows  you  to  combine  query  patterns  as  an   OR  operation.   • Joins  are  still  from  outer  to  inner.  
  • 97. UNION
  • 98. Negation • When  you  do  not  want  a  certain  category  of   matches. SELECT ?pet WHERE { ?pet a pets:Friendly . }
  • 100. 7: Not exists (Negation 1)
  • 102. MINUS{} or FILTER (NOT EXISTS{}) • Whats  the  difference?   – MINUS  subtracts  results   – NOT  EXITS  tests  if  the  sub  pattern  is   possible  at  all.   • Normally  the  faster  option.
  • 103. 9: MINUS all data
  • 104. 10: FILTER (NOT EXISTS{}) no results
  • 105. 11: Negation option 3 SPARQL 1.0 SELECT ?subject ?rank WHERE { ?subject up:rank ?rank . OPTIONAL { ?subject up:rank up:Genus . ?subject up:rank ?genus .} FILTER(! BOUND(?genus)) }
  • 106.
  • 107. FILTERS • You  just  saw  it  twice   – Once  in  the  !BOUND   – Once  in  the  NOT  EXISTS   • FILTERS  a  result  set  by  possibly  removing   values   – FILTER  do  not  add  a  value  to  the  result   • Inside  the  same  graph  pattern  order   independent.
  • 109. 13: Filter on not in
  • 110. Using implicit AND between lines
  • 111. Using implicit AND between lines
  • 114. FILTER on numbers • <     – FILTER  (1  <  2)          (17)   • >   – FILTER  (2  >  1)          (18)   • =   – FILTER  (1  =1)          (19)   • !=   – FILTER(1  !=  2)        (20)  
  • 115. Filters • ?x  =  ?y  does  casting  (value  conversions)  (21)   – 1.0^^xsd:float  =  1^^xsd:int  is  true   • sameTerm(?x,  ?y)  does  not  (22)   – sameTerm(1.0^^xsd:float,  1^^xsd:int)
  • 116. FUNCTIONS for in filters and in binds • Functions   – STRLEN   – SUBSTR   – UCASE   – LCASE   – STRSTARTS   – STRENDS   – CONTAINS   – STRBEFORE   – STRAFTER   – ENCODE_FOR_URI   – CONCAT   – langMatches   – REGEX   – REPLACE   – IRI   – STR
  • 117. 24: SUBSTR == substring
  • 118. 24: STRLEN == String Length
  • 119. 25: CONTAINS is case sensitive is it in there
  • 120. 26: REGEX, just like java|python regex
  • 121. BIND • Builds  new  Values   – Closes  the  basic  graph  pattern  (22)   • Always  declare  before  use. SELECT ?p WHERE { { ?taxon a :Taxon . } BIND (?taxon AS ?p) }
  • 122. BIND existing variable to a new one
  • 123.
  • 125. BIND can assign any output
  • 126. Aggregate functions • on  select  line   • limited  in  number   – count   – sum   – avg   – min   – max   – groupConcat   – sample
  • 127. © 2013 SIB 30: count
  • 128. © 2013 SIB 31: SAMPLE should give a random result back
  • 129. © 2013 SIB Follow the path
  • 131. 33: Finding a grand parent using normal joins
  • 132. 34: Finding a grandParent using a path query
  • 133. 35: | is OR for predicate
  • 134. 36: Same result with UNION
  • 135. 37: Finding any ancestor
  • 136. 38: Can use the variable in a normal join afterwards
  • 138. GROUP BY • Needed  for  aggregate  values   • After  closing  the  where  clause   – ...  WHERE  {?x  ?y  ?z}  GROUP  BY  ?x
  • 140. HAVING • I have carrot !
  • 141. HAVING • FILTER  for  aggregates     • After  the  GROUP  BY  clause   – ...  GROUP  BY  ?x  HAVING  (count(?y)  >  2)   – ...  GROUP  BY  ?x  HAVING  (min(?y)  =  2)   – etc...
  • 144. 41: LIMIT and OFFSET • OFFSET  is  skip  first  results   • LIMIT  return  no  more  than  x  results
  • 145. ORDER
  • 146. ORDER
  • 148.
  • 149. VALUES • Super  BIND   • Provide  inline  data
  • 151. Examples • Parameter  lists  are  between  ()   Text VALUES (?annotation) { (core:Disease_Annotation) (core:Disulfide_Bond_Annotation) }
  • 152. Examples • Undef  means  no  value  at     – all  not  bound Text VALUES (?annotation ?begin) { (core:Disease_Annotation UNDEF) (core:Disulfide_Bond_Annotation 2) }
  • 153. VALUES • After  declaring  a  set  of  values  you  can  use   them  in  your  query. SELECT ?comment WHERE { VALUES (?annotation ?begin) { (core:Disease_Annotation UNDEF) (core:Disulfide_Bond_Annotation 2) } ?annotation rdfs:comment ?comment . }
  • 154. SERVICE: Using other sparql endpoints • SERVICE<URL  of  other  endpoint>   – Runs  a  sub  query  on  the  other  endpoint   and  merges  it  back  into  your  query.
  • 155. “Life is better with friends who understand you.”
  • 157. SERVICE • Useful   – Quick  experimenting  with  combing  multiple   datasources   – Quick  for  queries  where  not  to  much  data  is  send   to  the  remote  point   • Slow   – When  you  ask  for  to  much  data   – Remote  endpoint  not  resourced  for  your   questions
  • 158. SERVICE • Slowly  improving   • Theoretically  unfixable   • Practically  could  be  much  better   • 1000  x  speed  up  small  step  away
  • 160. Construction • CONSTRUCT   – New  triples     • downloads  RDF   – Does  not  update  store
  • 161.
  • 162. Constructing an owl:sameAs between two URI
  • 163. INSERT • Adds  data   – like  construct
  • 164.
  • 165. DELETE • Removes  data   – Triples  matching  are  removed  from  the   data   – Triples  can  be  bound  using  where  clause
  • 166. DELETE
  • 167. DELETE INSERT • Single  atomic  operation   • Transactions  store  API  option
  • 169. © 2013 SIB I’m exhausted now
  • 170. Of Course Biology is complicated #baseURI: http://purl.uniprot.org/unirule/UR000107224 #Rule UR000107224 Created by:bridge on:2009-02-12 Modified by:rantunes on:2015-06-09 PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX uniprot:<http://purl.uniprot.org/uniprot/> PREFIX sequence:<http://purl.uniprot.org/sequences/> PREFIX unirule:<http://purl.uniprot.org/unirules/> PREFIX taxon:<http://purl.uniprot.org/taxonomy/> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX hamap-sparql:<http://example.org/hamap_sparql/> PREFIX up:<http://purl.uniprot.org/core/> PREFIX faldo:<http://biohackathon.org/resource/faldo#> PREFIX method:<http://example.org/method/> PREFIX keyword:<http://purl.uniprot.org/keywords/> PREFIX owl:<http://www.w3.org/2002/07/owl#> PREFIX proteome:<http://purl.uniprot.org/proteomes/> PREFIX hamap:<http://purl.uniprot.org/hamap/> PREFIX annotation:<http://purl.uniprot.org/annotation/> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#> CONSTRUCT { ?this up:annotation ?annotation0, ?annotation1, ?annotation2, ?annotation3, ?annotation5; up:classifiedWith <http://purl.obolibrary.org/obo/19805>, <http://purl.obolibrary.org/obo/334>, <http://purl.obolibrary.org/obo/34354>, <http://purl.obolibrary.org/obo/43420>, <http://purl.obolibrary.org/obo/6569>, <http://purl.obolibrary.org/obo/8198>, keyword:223, keyword:560, keyword:662 . ?annotation0 a up:Function_Annotation; rdfs:comment "Catalyzes the oxidative ring opening of 3-hydroxyanthranilate to 2-amino-3-carboxymuconate semialdehyde, which