Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and Graph Database approach

388 views

Published on

Presented at Integrative Bioinformatics Conference (IB2018, Harpenden, 2018).

We describe how to use Semantic Web Technologies and graph databases like Neo4j to serve life science data and address the FAIR data principles.

Published in: Data & Analytics
  • Be the first to comment

Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and Graph Database approach

  1. 1. Towards FAIRer Biological Knowledge Networks 
 Using a Hybrid Linked Data 
 and Graph Database approach Harpenden, 5/6/2018
 
 Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Find these slides on SlideShare KnetMiner-inspired Artwork
 by Hugo Dalton (hugodalton.com)
  2. 2. Can we do More with KnetMiner Data? (and better)
  3. 3. Behind the Scenes • Starting point: graph data model • With concepts, relations between concepts hierarchies of concept classes and relation types • => There are standardised ways for it • Make app development easier • independent components on top of a unified data model • clear separation between data access and apps • Serve third-party applications, making their data access no different than ours • Simplify the way we ingest data, • ease conversions from multiple formats into unified model • relax the high-memory requirements need (e.g., backing data store) • prepare for scalability (e.g., cloud stores, big data stores)
  4. 4. Putting it on a Bigger Picture
  5. 5. The Semantic Web Way • It’s for networked knowledge (semantic networks) • Focuses on sharing via web technologies and principles (eg, share resolvable URIs) • Rich ‘schema’ language, already much used in life sciences (i.e., ontologies, coming from frames and 1st- order logics) • protocol + a standard query language (SPARQL)
  6. 6. Modelling data with OWL:
 Promises & Wishes
  7. 7. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible
  8. 8. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?!
  9. 9. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?!
  10. 10. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy:
  11. 11. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?!
  12. 12. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens?
  13. 13. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared
  14. 14. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
  15. 15. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view
  16. 16. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse
  17. 17. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here
  18. 18. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it
  19. 19. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back
  20. 20. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back In fact, they did this
  21. 21. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back In fact, they did this
  22. 22. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back In fact, they did this
  23. 23. Simplifying Views in BioKNO obo:GO_0030015 a owl:Class ; rdfs:label "CCR4-NOT core complex"^^xsd:string ; rdfs:subClassOf obo:GO_0044424, obo:GO_0044424, [ a owl:Restriction ; owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ; # 'part of' owl:someValuesFrom obo:GO_0030014 # CCR4-NOT complex ] ; oboInOwl:id "GO:0030015"^^xsd:string ; obo:IAO_0000115 "The core of the CCR4-NOT complex. In Saccharomyces the CCR4-NOT..."; oboInOwl:hasOBONamespace "cellular_component"^^xsd:string .
  24. 24. Simplifying Views in BioKNO obo:GO_0030015 a owl:Class ; rdfs:label "CCR4-NOT core complex"^^xsd:string ; rdfs:subClassOf obo:GO_0044424, obo:GO_0044424, [ a owl:Restriction ; owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ; # 'part of' owl:someValuesFrom obo:GO_0030014 # CCR4-NOT complex ] ; oboInOwl:id "GO:0030015"^^xsd:string ; obo:IAO_0000115 "The core of the CCR4-NOT complex. In Saccharomyces the CCR4-NOT..."; oboInOwl:hasOBONamespace "cellular_component"^^xsd:string . obo:GO_0030014 a bk:GeneOntologyTerm ; dc:identifier obo:GO_0030014_acc ; bk:is_a obo:GO_0044424 , obo:GO_0043234 ; bk:prefName "CCR4-NOT complex" . obo:GO_0030015 a bk:GeneOntologyTerm; bk:prefName "CCR4-NOT core complex"; bk:is_a obo:GO_0044424, obo:GO_0043234 ; bk:part_of obo:GO_0030014; dc:identifier obo:GO_0030015_acc. obo:GO_0044424 a bk:GeneOntologyTerm; bk:prefName "intracellular part" ; • OWL is simplified mixing classes with SKOS-style concepts • More suitable for less formal, more simple taxonomies • OWL-2 punning makes it consistent
  25. 25. The BioKNO Ontology
 (and The rest of the World) BioKNO External Ontologies Mapping Type bk:Concept skos:Concept Subclass bk:Relation bk:relFrom bk:relTypeRef bk:relTo rdf:Statement
 rdf:subject rdf:predicate rdf:object Subclass Subproperties (ie, mapping to RDF reified statements) bk:Path, bk:Participant, bk:Interaction, bk:Transport, bk:Protein, bk:Gene Classes with same names in BioPAX and SIO Equivalent Class bk:participates_in bk:has_participant Relation Ontology (RO) properties with same names
 biopax:participant (as sub-property) Equivalent property bk:produces bk:produced_by bk:consumes bk:consumed_by biopax:product (as sub-property) RO properties with same names Equivalent property bk:regulates bk:positively_regulates bk:negatively_regulates RO properties with same names Equivalent property bk:is_a bk:part_of, bk:has_part bk:occurs_in, bk:co_occurs_with skos:broader Basic Formal Ontology (BFO)/RO properties with same names Equivalent property bk:Publication schema:CreativeWork Subclass bka:abstract bka:title (also known as AbstractHeader) bka:authors dcterms:description dcterms:title dc:creator Sub-property
  26. 26. The BioKNO Ontology
  27. 27. Putting it on a Bigger Picture
  28. 28. Putting it on a Bigger Picture
  29. 29. Accessing RDF through SPARQL
  30. 30. Accessing RDF through SPARQL
  31. 31. Accessing RDF through SPARQL
  32. 32. CONSTRUCT { ?protIri bk:expressed_by ?sampleIri. ?degRelIri a bk:Relation; bka:PVALUE ?pValue; bk:evidence bkev:EXP; # Inferred from experiment bk:relFrom ?protIri; # Details defined by UniProt info bk:relTo ?sampleIri; # Details defined by sample_degs_2.tsv bk:relTypeRef bk:expressed_by. } WHERE { # Some IDs and IRIs to be defined above BIND ( LCASE ( REPLACE ( ?Sample, ' ', '_' ) ) AS ?sampleId ) BIND ( IRI ( CONCAT ( STR ( bkr: ), ?Gene_Symbol ) ) AS ?protIri ) BIND ( IRI ( CONCAT ( STR ( bkr: ), 'degex_', ?sampleId ) ) AS ?sampleIri ) BIND ( IRI ( CONCAT ( STR ( bkr: ), 'degex_', ?sampleId, '_', LCASE ( ?Gene_Symbol ) ) )
 AS ?degRelIri ) BIND ( xsd:double ( ?p_value ) AS ?pValue ) } Extraction, Loading, Transformation
 SPARQL/TARQL Example
  33. 33. SPARQL/RDF for ELT • RDF-to-RDF translation via CONSTRUCT (or SPARUL) • TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  34. 34. SPARQL/RDF for ELT • RDF-to-RDF translation via CONSTRUCT (or SPARUL) • TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  35. 35. Issues https://lod-cloud.net/
  36. 36. Issues https://lod-cloud.net/ • Still not so popular (especially in more commercial contexts) • It’s (perceived as) difficult (in particular, SPARQL) • Bad reputation • Performance can still be an issue • eg, optimising SPARQL can be hard • Specific issues • eg, I need contextualised/attribute-attached properties • and I don’t fancy reified relations…
  37. 37. Another Graph Database World
 Property Graphs
  38. 38. Neo4j on top Of RDF
  39. 39. Application to Semantic Motif Search
  40. 40. The rdf2neo Tool https://github.com/Rothamsted/rdf2neo
  41. 41. Triple Stores vs Prop Graphs Neo4j, Cypher DBs, Graph DBs Semantic Web/Triple Stores Data xchg format - No official one, just Cypher, 
 Support for GraphML, RDF
 +/- Focus on backing applications + Focus on data sharing standards Data model + Relations with properties - Metadata/schemas/ontologies management - Relations cannot have properties (reification required) + Metadata/schemas/ontologies as first citizen and standardised OWL Performance + complex graph traversals + Comparable in most cases Query Language + Cypher is easier (eg, compact, implicit elems)?
 - Expressivity issues (unions) - No standard QL (but efforts in progress, eg, OpenCypher) - SPARQL is Harder? (URIs, namespaces, verbosity)
 + SPARQL More expressive Standardisation, openness +/- (TinkerPop is open, Neo4j isn’t) + Commercial support + More alive and up-to date (e.g., support for Hadoop, nice Neo4j browser, easy installation) + Natively open, many open implementations - Instability and many short-lived prototypes - Advancements seems to be slowing down + Some nice open and commercial browser (LODEStar, Scalability,
 big data +/- Commercial support to clustering/clouds for Neo4j
 + Open support in TinkerPop + Load Balancing/Cluster solutions, Commercial Cloud support (eg GraphDB)
 + SPARQL Over TinkerPop (via SAIL inteface)
  42. 42. Bridging to RDF: JSON-LD … "@id": "bkr:TOB1", "@type": "bk:Protein", "prefName": "TOB1 Human", "dcterms:identifier": "TOB1", "is_annotated_by": "obo:GO_0030014", "participates_in": { "@id": "http://www.wikipathways.org/id1", "@type": "bk:Pathway", "evidence": "bkev:IMPD", "prefName":
 “Bone Morphogenic Protein (BMP) Signalling and Regulation" } } { "@context": { "bk": "http://www.ondex.org/bioknet/terms/", "bka": "http://www.ondex.org/bioknet/terms/attributes/", "bkds": "http://www.ondex.org/bioknet/terms/dataSources/", "bkev": "http://www.ondex.org/bioknet/terms/evidences/", "bkr": "http://www.ondex.org/bioknet/resources/", "dcterms": "http://purl.org/dc/terms/", "obo": "http://purl.obolibrary.org/obo/", "xsd": "http://www.w3.org/2001/XMLSchema#", "@vocab": "http://www.ondex.org/bioknet/terms/", "dcterms:identifier": { "@type": "xsd:string" }, "evidence": { "@type": “@id" } }, …
  43. 43. KnetMiner UI Overview Search Select Explore Addressing FAIR • Findable (or, Semantic Web is still useful) • SPARQL endpoint • which powers URI Resolution • Dataset-level metadata (e.g., VoID) • Mapping to Standard Ontologies • Interested in contributing to existing standards
 (e.g., Bioschemas) • API/JSON-Schema formalisation • Accessible • Multiple access means (SPARQL, URIs, JSON APIs, Cypher) • Triple Stores and Property Graphs are complementary, not alternative • Data dumps • Interoperable (or, Sem Web is still useful) • Unified model encoded in at least one common syntax (RDF) • URIs are reused • Mappings to ontologies • Reusable • All of the above, plus multiple interfaces under unified model • Support to common graph languages (e.g., Cytoscape.js) • Converters (e.g., our RDF conversion scripts/tools, rdf2neo) • Open Data licences

×