SlideShare a Scribd company logo
1 of 27
Getting the best of Linked Data and
Property Graphs: rdf2neo and the
KnetMiner Use Case
Marco Brandizi
marco.brandizi@rothamsted.ac.uk
Find these slides at:
https://www.slideshare.net/mbrandizi
A short story about Gene Knowledge
www.knetminer.org
<concept>
<id>1</id>
<pid>Q75WV3</pid>
<description/>
<elementOf>
<idRef>UNIPROTKB-SwissProt</idRef>
</elementOf>
<ofType>
<idRef>Protein</idRef>
</ofType>
<evidences>
<evidence>
<idRef>IMPD</idRef>
</evidence>
</evidences>
<conames>
<concept_name>
<name>Probable trehalose-phosphate phosphatase 1</name>
<isPreferred>true</isPreferred>
</concept_name>
…
<cc>
<id>Protein</id>
<fullname>Protein</fullname>
<description>
A protein is comprised of one or more Polypeptides
and potentially other molecules.
</description>
<specialisationOf>
<idRef>MolCmplx</idRef>
</specialisationOf>
</cc>
<relation>
<fromConcept>1</fromConcept>
<toConcept>3</toConcept>
<ofType>
<idRef>participates_in</idRef>
</ofType>
<evidences>
<evidence>
<idRef>ECO:0000316</idRef>
</evidence>
</evidences>
<relgds/>
</relation>
<concept>
<id>3</id>
<pid>GO:0009651</pid>
<description>response to salt stress</description>
<ofType><idRef>BioProc</idRef></ofType>
<coaccessions>
<concept_accession>
<accession>GO:0009651</accession>
<elementOf><idRef>GO</idRef></elementOf>
<ambiguous>false</ambiguous>
</concept_accession>
</coaccessions>
</concept>
A short story about Gene Knowledge
www.ondex.org
A short story about Gene KnowledgeCan we improve? Graph DBs?
Query Languages? Open Data?
FAIR?
Sure! RDF! OWL!
Triple Store! SPARQL!
Uhm, we’ve tried that,
but…
I can feel what you mean,
but, it’s
not so difficult, let me…
Look! I’ve seen this Neo4j! It
has relations with properties!
Uhm… well… yeah, but no data
format, bad with ontologies, No
URIs/merging…
And look how cool a browser!
Oh, yes, that’s cool, but
maybe not the most
important thing…And Cypher is a
breeze!
Uhm… let me try. Oh, cool,
but UNION sucks, and…
And has graph algorithms!
And devs got the APIs in
minutes!
Uhm… Are Jena/RDF4J
that harder?
… …
Source: https://digiday.com/uk/weve-created-monster-publishers-vent-ad-tech-frustration
Why not Taking the Best of Both
Worlds?
Actually, Like This…
Details
The Bridge: rdf2neo
The Bridge: rdf2neo
The rdf2neo in Principle
The rdf2neo in Principle
The rdf2neo in Principle
It works!
Comparing Functionality
• Data ELT and Integration
• See our example: https://github.com/Rothamsted/bioknet-
onto/tree/master/examples/bmp_reg_human
• Semantic Web is focused on standardised data sharing
• Neo4j doesn’t have a data format, focused on backing applications
• URI-based merging in RDF
• CONSTRUCT-based data transformations in Sem Web (including tools like TARQL)
• MATCH/CREATE in Cypher, but not the same
• Query languages
• Cypher considered compact and simple to learn
• SPARQL better at complex graph patterns with branches
• Cypher very good at chain patterns
Query Performance
Details at: https://github.com/Rothamsted/graphdb-benchmark
Query Performance: Graph Traversal
Query Performance: Branch Union
Query Performance: Branch Union
Conclusions
• Hybrid architectures might be good at getting the best of both
• They’re feasible, performance are acceptable with both technologies
• rdf2neo can help you with keeping everything aligned to a conceptual
data model
• Helps with Linked Data and FAIR Principles
• Please checkout GitHub, get in touch (especially if you’re on
agriculture/plant biology)
• It comes with some overhead. You might need just one half
• Whatever you do, follow LOD/FAIR
Acknowledgements
Ajit Singh
Software Engineer
Monika Mistry
Master Student, Data Curator
Keywan Hassani-Pak
KnetMiner Team Leader
Chris Rawlings
Head of Computational & Analytical Sciences
William Brown
IT Admin
And You All!
Marco Brandizi
marco.brandizi@rothamsted.ac.ukFind these slides at:
EXTRA
rdf2neo Architecture
Cypher vs SPARQL
Proteins->Reactions->Pathways:
// chain of paths, node selection via property (exploits indices)
MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] ->
(pway:Path{ title: ‘apoptosis’ })
// further conditions, not always so performant
WHERE prot.name =~ ‘(?i)^DNA.+’
// Usual projection and post-selection operators
RETURN prot.name, pway
// Relations can have properties
ORDER BY csby.pvalue
LIMIT 1000
Proteins->Reactions->Pathways:
// Single-path (or same-direction branching) easy to write
MATCH (prot:Protein) - [:produced_by|consumed_by] -> (:Reaction)
- [:part_of*1..3] -> (pway:Path)
RETURN ID(prot), ID(pway) LIMIT 1000
// Very compact forms available, depending on the data
MATCH (prot:Protein) - (pway:Path) RETURN pway
select distinct ?prot ?pway {
where {
# Branch 1
?prot kb:pd_by|kb:cs_by ?react.
?prot a kb:Protein.
?react a kb:Reaction.
?react kb:part_of ?pway.
?pway a kb:Path.
}
union { # Branch 2
?prot ^kb:ac_by|kb:is_a ?enz.
?prot a kb:Protein.
?enz a kb:Enzyme.
{ # Branch 2.1
?enz kb:ac_by|kb:in_by ?comp.
?comp a kb:Compound.
?comp kb:cs_by|kb:pd_by ?trns
?trns a kb:Transport
} union {
# Branch 2.2
?enz ^kb:ca_by ?trns.
?comp a kb:Compound.
?trns a kb:Transport
}
?trns kb:part_of ?pway.
?pway a kb:Path.
}
} LIMIT 1000
Cypher vs SPARQL
Loading Performance
Details at: https://github.com/Rothamsted/graphdb-benchmarks
Conclusions
Neo4J, Cypher DBs, Graph DBs Semantic Web/Triple Stores
Data xchg format
- No official one, just Cypher,
Support for GraphML, RDF
+/- Focus on backing applications
+ Focus on data sharing standards
Data model
+ Relations with properties
- Metadata/schemas/ontologies management
- Relations cannot have properties (reification
required)
+ Metadata/schemas/ontologies as first citizen
and standardised OWL
Performance + complex graph traversals + Comparable in most cases
Query Language
+ Cypher is easier (eg, compact, implicit elems)? -
Expressivity issues (unions)
- No standard QL (but efforts in progress, eg,
OpenCypher)
- SPARQL is Harder? (URIs, namespaces,
verbosity) + SPARQL More expressive
Standardisation,
openness
+/- (TinkerPop is open, Neo4J isn’t)
+ Commercial support
+ More alive and up-to date (e.g., support for
Hadoop, nice Neo4j browser, easy installation)
+ Natively open, many open implementations
- Instability and many short-lived prototypes
- Advancements seems to be slowing down
+ Some nice open and commercial browser
(LODEStar,
Scalability, big data
+/- Commercial support to clustering/clouds for
Neo4J + Open support in TinkerPop
+ Load Balancing/Cluster solutions, Commercial
Cloud support (eg GraphDB) + SPARQL Over
TinkerPop (via SAIL inteface)

More Related Content

What's hot

Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesMatthew Rowe
 
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011sspeiser
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librariansC. Tobin Magle
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real worldDiego Valerio Camarda
 
Introduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmersIntroduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmersKevin Lee
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringMaría Poveda Villalón
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
 
Data and Donuts: Data organization
Data and Donuts: Data organizationData and Donuts: Data organization
Data and Donuts: Data organizationC. Tobin Magle
 
Towards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and WikidataTowards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and WikidataAndreas Thalhammer
 
Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Araport
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and toolsC. Tobin Magle
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of AraportAraport
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
ICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake MeyersICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake MeyersAraport
 

What's hot (20)

Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
 
Data Management for librarians
Data Management for librariansData Management for librarians
Data Management for librarians
 
Reproducible research
Reproducible researchReproducible research
Reproducible research
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world
 
Introduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmersIntroduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmers
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Data and Donuts: Data organization
Data and Donuts: Data organizationData and Donuts: Data organization
Data and Donuts: Data organization
 
Towards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and WikidataTowards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and Wikidata
 
Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and tools
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of Araport
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 
ICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake MeyersICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake Meyers
 

Similar to Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner Use Case

Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruningwajrcs
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...Lixi Conrads
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdfJaberRad1
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataMaori Ito
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scaleMark Schroering
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identificationabhinav vedanbhatla
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu
 
Metabolic network mapping for metabolomics
Metabolic network mapping for metabolomicsMetabolic network mapping for metabolomics
Metabolic network mapping for metabolomicsDinesh Barupal
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinarPistoia Alliance
 
Textmining activities at BioHackathon 2010
Textmining activities at BioHackathon 2010Textmining activities at BioHackathon 2010
Textmining activities at BioHackathon 2010Alberto Labarga
 

Similar to Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner Use Case (20)

Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
RDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation PruningRDF Join Query Processing with Dual Simulation Pruning
RDF Join Query Processing with Dual Simulation Pruning
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Metabolic network mapping for metabolomics
Metabolic network mapping for metabolomicsMetabolic network mapping for metabolomics
Metabolic network mapping for metabolomics
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Textmining activities at BioHackathon 2010
Textmining activities at BioHackathon 2010Textmining activities at BioHackathon 2010
Textmining activities at BioHackathon 2010
 

More from Rothamsted Research, UK

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasRothamsted Research, UK
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesRothamsted Research, UK
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerRothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference serviceRothamsted Research, UK
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialRothamsted Research, UK
 

More from Rothamsted Research, UK (20)

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Semic 2013
Semic 2013Semic 2013
Semic 2013
 
Uk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandiziUk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandizi
 

Recently uploaded

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

Recently uploaded (20)

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner Use Case

  • 1. Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner Use Case Marco Brandizi marco.brandizi@rothamsted.ac.uk Find these slides at: https://www.slideshare.net/mbrandizi
  • 2. A short story about Gene Knowledge www.knetminer.org
  • 3. <concept> <id>1</id> <pid>Q75WV3</pid> <description/> <elementOf> <idRef>UNIPROTKB-SwissProt</idRef> </elementOf> <ofType> <idRef>Protein</idRef> </ofType> <evidences> <evidence> <idRef>IMPD</idRef> </evidence> </evidences> <conames> <concept_name> <name>Probable trehalose-phosphate phosphatase 1</name> <isPreferred>true</isPreferred> </concept_name> … <cc> <id>Protein</id> <fullname>Protein</fullname> <description> A protein is comprised of one or more Polypeptides and potentially other molecules. </description> <specialisationOf> <idRef>MolCmplx</idRef> </specialisationOf> </cc> <relation> <fromConcept>1</fromConcept> <toConcept>3</toConcept> <ofType> <idRef>participates_in</idRef> </ofType> <evidences> <evidence> <idRef>ECO:0000316</idRef> </evidence> </evidences> <relgds/> </relation> <concept> <id>3</id> <pid>GO:0009651</pid> <description>response to salt stress</description> <ofType><idRef>BioProc</idRef></ofType> <coaccessions> <concept_accession> <accession>GO:0009651</accession> <elementOf><idRef>GO</idRef></elementOf> <ambiguous>false</ambiguous> </concept_accession> </coaccessions> </concept> A short story about Gene Knowledge www.ondex.org
  • 4. A short story about Gene KnowledgeCan we improve? Graph DBs? Query Languages? Open Data? FAIR? Sure! RDF! OWL! Triple Store! SPARQL! Uhm, we’ve tried that, but… I can feel what you mean, but, it’s not so difficult, let me… Look! I’ve seen this Neo4j! It has relations with properties! Uhm… well… yeah, but no data format, bad with ontologies, No URIs/merging… And look how cool a browser! Oh, yes, that’s cool, but maybe not the most important thing…And Cypher is a breeze! Uhm… let me try. Oh, cool, but UNION sucks, and… And has graph algorithms! And devs got the APIs in minutes! Uhm… Are Jena/RDF4J that harder? … … Source: https://digiday.com/uk/weve-created-monster-publishers-vent-ad-tech-frustration
  • 5. Why not Taking the Best of Both Worlds?
  • 10. The rdf2neo in Principle
  • 11. The rdf2neo in Principle
  • 12. The rdf2neo in Principle
  • 14. Comparing Functionality • Data ELT and Integration • See our example: https://github.com/Rothamsted/bioknet- onto/tree/master/examples/bmp_reg_human • Semantic Web is focused on standardised data sharing • Neo4j doesn’t have a data format, focused on backing applications • URI-based merging in RDF • CONSTRUCT-based data transformations in Sem Web (including tools like TARQL) • MATCH/CREATE in Cypher, but not the same • Query languages • Cypher considered compact and simple to learn • SPARQL better at complex graph patterns with branches • Cypher very good at chain patterns
  • 15. Query Performance Details at: https://github.com/Rothamsted/graphdb-benchmark
  • 19. Conclusions • Hybrid architectures might be good at getting the best of both • They’re feasible, performance are acceptable with both technologies • rdf2neo can help you with keeping everything aligned to a conceptual data model • Helps with Linked Data and FAIR Principles • Please checkout GitHub, get in touch (especially if you’re on agriculture/plant biology) • It comes with some overhead. You might need just one half • Whatever you do, follow LOD/FAIR
  • 20. Acknowledgements Ajit Singh Software Engineer Monika Mistry Master Student, Data Curator Keywan Hassani-Pak KnetMiner Team Leader Chris Rawlings Head of Computational & Analytical Sciences William Brown IT Admin
  • 21. And You All! Marco Brandizi marco.brandizi@rothamsted.ac.ukFind these slides at:
  • 22. EXTRA
  • 24. Cypher vs SPARQL Proteins->Reactions->Pathways: // chain of paths, node selection via property (exploits indices) MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ }) // further conditions, not always so performant WHERE prot.name =~ ‘(?i)^DNA.+’ // Usual projection and post-selection operators RETURN prot.name, pway // Relations can have properties ORDER BY csby.pvalue LIMIT 1000 Proteins->Reactions->Pathways: // Single-path (or same-direction branching) easy to write MATCH (prot:Protein) - [:produced_by|consumed_by] -> (:Reaction) - [:part_of*1..3] -> (pway:Path) RETURN ID(prot), ID(pway) LIMIT 1000 // Very compact forms available, depending on the data MATCH (prot:Protein) - (pway:Path) RETURN pway
  • 25. select distinct ?prot ?pway { where { # Branch 1 ?prot kb:pd_by|kb:cs_by ?react. ?prot a kb:Protein. ?react a kb:Reaction. ?react kb:part_of ?pway. ?pway a kb:Path. } union { # Branch 2 ?prot ^kb:ac_by|kb:is_a ?enz. ?prot a kb:Protein. ?enz a kb:Enzyme. { # Branch 2.1 ?enz kb:ac_by|kb:in_by ?comp. ?comp a kb:Compound. ?comp kb:cs_by|kb:pd_by ?trns ?trns a kb:Transport } union { # Branch 2.2 ?enz ^kb:ca_by ?trns. ?comp a kb:Compound. ?trns a kb:Transport } ?trns kb:part_of ?pway. ?pway a kb:Path. } } LIMIT 1000 Cypher vs SPARQL
  • 26. Loading Performance Details at: https://github.com/Rothamsted/graphdb-benchmarks
  • 27. Conclusions Neo4J, Cypher DBs, Graph DBs Semantic Web/Triple Stores Data xchg format - No official one, just Cypher, Support for GraphML, RDF +/- Focus on backing applications + Focus on data sharing standards Data model + Relations with properties - Metadata/schemas/ontologies management - Relations cannot have properties (reification required) + Metadata/schemas/ontologies as first citizen and standardised OWL Performance + complex graph traversals + Comparable in most cases Query Language + Cypher is easier (eg, compact, implicit elems)? - Expressivity issues (unions) - No standard QL (but efforts in progress, eg, OpenCypher) - SPARQL is Harder? (URIs, namespaces, verbosity) + SPARQL More expressive Standardisation, openness +/- (TinkerPop is open, Neo4J isn’t) + Commercial support + More alive and up-to date (e.g., support for Hadoop, nice Neo4j browser, easy installation) + Natively open, many open implementations - Instability and many short-lived prototypes - Advancements seems to be slowing down + Some nice open and commercial browser (LODEStar, Scalability, big data +/- Commercial support to clustering/clouds for Neo4J + Open support in TinkerPop + Load Balancing/Cluster solutions, Commercial Cloud support (eg GraphDB) + SPARQL Over TinkerPop (via SAIL inteface)

Editor's Notes

  1. Let me start from a little story…
  2. Eventually, it’s not really a tug of war, it’s that we realised they’re complementary.
  3. So, why not taking the best of the two worlds Hopefully less controversial than that…
  4. We have done some formal modelling of the mapping procedure Main results are: 1) it works as expected 2) Computational complexity is no worse than SPARQL, which is known to be constrainable into LOGSPACE.
  5. Loading is scalable and fairly OK, skipped here In querying, Both have comparable performance. Single use cases, they’re complementary again.
  6. Virtuoso is better with queries involving distant subgraphs (traversal not possible) and with complex branching, based on (nested) UNIONs. In the latter case, SPARQL looks easier to write (especially with nested patterns), though OpenCypher promised subqueries
  7. Virtuoso is better with queries involving distant subgraphs (traversal not possible) and with complex branching, based on (nested) UNIONs. In the latter case, SPARQL looks easier to write (especially with nested patterns), though OpenCypher promised subqueries