Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open data with Neo4j and Kotlin

How to prepare, import, query and analyze an open-data healtcare dataset with Neo4j and Kotlin.

  • Login to see the comments

Open data with Neo4j and Kotlin

  1. 1. Open data with Neo4j From ideation to production
  2. 2. Our (fictional) customer Investigation journalist Specializes in health-related scandales Nominated for the Pulitzer prize in 2017
  3. 3. A few scandals over the year
  4. 4. A few scandals over the year
  5. 5. A few scandals over the year
  6. 6. The Customer and US
  7. 7. Scoping - MVP EMERGENCE As a journalist, I need to quickly find people to interview, related to a particular health product For example : Who are the managers of pharmaceutical labs producing a faulty drug? Who are the health professionals the most influenced by these labs? Who are the patient’s relatives, friends, colleagues... ?
  8. 8. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  9. 9. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  10. 10. Data sources ? Public data of gifts by pharmaceutical labs to health professionals
  11. 11. ETALAB - Data source schema
  12. 12. Pharmaceutical Lab sub-graph
  13. 13. Technical Stakeholder interview “ Why would anyone use a graph database? we are using Oracle 12c !
  14. 14. DETOUR : Relational VS graph database
  15. 15. “ NEO4J INC. IS LIKE NOSQL, IT HAS NO FUTURE, RIGHT? Technical Stakeholder interview
  16. 16. Performance issues with document management systems First graph library prototypes 2000 2002 2007 2010 2013 Neo4j 2.0 Label addition to the graph model Neo4j browser reworked 2016 Neo4j 3.0 Bolt protocol Cypher extensions 2017 Neo4j 3.3 Neo Technology -> Neo4j Inc. Neo4j Desktop with Enterprise Edition Development of the first version of Neo4j Neo4j 1.0 is out Headquarters moved to the Silicon Valley Neo4j : Leading graph database for more than 10 years ! Neo Technology is created
  17. 17. “ But then, why Neo4j and NOT another graph database? Technical Stakeholder interview
  18. 18. DETOUR : NATIVE GRAPH DATABASE :Person:Speaker first_name Marouane age 30 shoe_size 42 :Conference name Devoxx Morocco ATTENDS first_name Hanae ATTENDS since 2015 :Person:Org EMAILED
  19. 19. name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Conference :Speaker EMAILED shoe_size 42 since 2015 DETOUR : NATIVE GRAPH DATABASE
  20. 20. name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Conference :Speaker since 2015 EMAILED shoe_size 42 DETOUR : NATIVE GRAPH DATABASE
  21. 21. START NODE (SN) END NODE (EN) name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Speaker since 2015 EMAILED shoe_size 42 SN PrevRel ∅ SN NextRel :Conference DETOUR : NATIVE GRAPH DATABASE
  22. 22. START NODE (SN) name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Speaker since 2015 EMAILED shoe_size 42 SN PrevRel ∅ SN NextRel :Conference END NODE (EN) EN PrevRel EN NextRel DETOUR : NATIVE GRAPH DATABASE
  23. 23. START NODE (SN) name Devoxx Morocco ATTENDS first_name Hanae ATTENDS first_name Marouane age 30 :Person :Org :Speaker since 2015 EMAILED shoe_size 42 SN PrevRel ∅ SN NextRel :Conference END NODE (EN) EN PrevRel EN NextRel Index-free adjacency Every co-located piece of data in the graph is co-located on the disk DETOUR : NATIVE GRAPH DATABASE
  24. 24. Technical Stakeholder interview
  25. 25. Pharmaceutical Lab sub-graph
  26. 26. Import - options Load CSV in Cypher (~= SQL for Neo4j) UNWIND in Cypher ETL APOC Cypher shell ...
  27. 27. Cypher Crash course Label Person ConfATTENDS TYPE Key Value k1 v1 k2 v2
  28. 28. Cypher Crash course - PATTERN MATCHING
  29. 29. Cypher Crash course - PATTERN MATCHING
  30. 30. Cypher Crash course - PATTERN MATCHING
  31. 31. Cypher Crash course - PATTERN MATCHING
  32. 32. Cypher Crash course - PATTERN MATCHING
  33. 33. Cypher Crash course - READ queries [MATCH WHERE] [OPTIONAL MATCH WHERE] [WITH [ORDER BY] [SKIP] [LIMIT]] RETURN [ORDER BY] [SKIP] [LIMIT]
  34. 34. MATCH (c:Conf) RETURN c Cypher Crash course - READ queries
  35. 35. MATCH (c:Conf {name: 'Devoxx Morocco'}) RETURN c Cypher Crash course - READ queries
  36. 36. MATCH (c:Conf) WHERE c.name ENDS WITH 'Morocco' RETURN c Cypher Crash course - READ queries
  37. 37. MATCH (s:Speaker) OPTIONAL MATCH (s)-[:TALKED_AT]->(c:Conf) WHERE c.name STARTS WITH 'Devoxx' RETURN s Cypher Crash course - READ queries
  38. 38. MATCH (p1:Player)-[:PLAYED]->(g:Game), (p1)-[:IN_TEAM]->(t:Team)<-[:IN_TEAM]-(p2:Player) WITH p1, COUNT(g) AS games, COLLECT(p2) AS teammates WHERE games > 100 AND ANY(t IN teammates WHERE f.name = 'Hadji') RETURN p1 Cypher Crash course - READ queries
  39. 39. (CREATE | MERGE) [SET|DELETE|REMOVE|FOREACH] [RETURN [ORDER BY] [SKIP] [LIMIT]] Cypher Crash course - write queries
  40. 40. CREATE (c:Conf {name: 'Devoxx Morocco'}) Cypher Crash course - write queries
  41. 41. MATCH (c:Conference {name: 'GraphConnect'}), (s:Speaker {name: 'Michael'}) MERGE (s)-[l:LOVES]->(c) ON CREATE SET l.how_much = 'very much' Cypher Crash course - write queries
  42. 42. MATCH (s:Speaker {name: 'Michael'}) REMOVE s.surname Cypher Crash course - write queries
  43. 43. MATCH (s:Speaker {name: 'Michael'}) DETACH DELETE s Cypher Crash course - write queries
  44. 44. MATCH (n) DETACH DELETE n Cypher Crash course - write queries
  45. 45. LAB IMPORT - TDD style <dependency> <groupId>org.neo4j.driver</groupId> <artifactId>neo4j-java-driver</artifactId> </dependency> <dependency> <groupId>org.neo4j.test</groupId> <artifactId>neo4j-harness</artifactId> <scope>test</scope> </dependency>
  46. 46. class MyClassTest { @get:Rule val graphDb = Neo4jRule() @Test fun `some interesting test`() { val subject = MyClass(graphDb.boltURI().toString()) subject.importDataset("/dataset.csv") graphDb.graphDatabaseService.execute("MATCH (s:Something) RETURN s").use { assertThat(it) // ... } } } LAB IMPORT - TDD style - Test skeleton
  47. 47. identifiant,pays_code,pays,secteur_activite_code,secteur,denomination_sociale,adresse_1,adresse_2,adresse_3,adresse_4,code_postal,ville QBSTAWWV,[FR],FRANCE,[PA],Prestataires associés,IP Santé domicile,16 Rue de Montbrillant,Buroparc Rive Gauche,"","",69003,LYON MQKQLNIC,[FR],FRANCE,[DM],Dispositifs médicaux,SIGVARIS,ZI SUD D'ANDREZIEUX,RUE B. THIMONNIER,"","",42173,SAINT-JUST SAINT-RAMBERT CEDEX OETEUQSP,[FR],FRANCE,[AUT],Autres,HEALTHCARE COMPLIANCE CONSULTING FRANCE SAS,47 BOULEVARD CHARLES V,"","","",14600,HONFLEUR FRQXZIGY,[FR],FRANCE,[MED],Médicaments,SANOFI PASTEUR MSD SNC,162 avenue Jean Jaurès,"","","",69007,Lyon GXIVOHBB,[FR],FRANCE,[PA],Prestataires associés,ISIS DIABETE,10-16 RUE DU COLONEL ROL TANGUY,ZAC DU BOIS MOUSSAY,"","",93240,STAINS ZQKPAZKB,[FR],FRANCE,[PA],Prestataires associés,CREAFIRST,8 Rue de l'Est,"","","",92100,BOULOGNE BILLANCOURT GEJLGPVD,[US],ÉTATS-UNIS,[DM],Dispositifs médicaux,Nobel Biocare USA LLC,800 Corporate Drive,"","","",07430,MAHWAH XSQKIAGK,[FR],FRANCE,[DM],Dispositifs médicaux,Cook France SARL,2 Rue due Nouveau Bercy,"","","",94227,Charenton Le Pont Cedex ARHHJTWT,[FR],FRANCE,[DM],Dispositifs médicaux,EYETECHCARE,2871 Avenue de l'Europe,"","","",69140,RILLIEUX-LA-PAPE LAB IMPORT - TDD style - companies.csv
  48. 48. @Test fun `imports countries of companies`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (country:Country) " + "RETURN country {.code, .name} " + "ORDER BY country.code ASC").use { assertThat(it).containsExactly( row("country", mapOf(Pair("code", "[FR]"), Pair("name", "FRANCE"))), row("country", mapOf(Pair("code", "[US]"), Pair("name", "ÉTATS-UNIS"))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - COUNTRIES
  49. 49. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - COUNTRIES
  50. 50. @Test fun `imports cities`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (city:City) " + "RETURN city {.name} " + "ORDER BY city.name ASC").use { assertThat(it).containsExactly( row("city", mapOf(Pair("name", "BOULOGNE BILLANCOURT"))), row("city", mapOf(Pair("name", "CHARENTON LE PONT CEDEX"))), row("city", mapOf(Pair("name", "HONFLEUR"))), row("city", mapOf(Pair("name", "LYON"))), row("city", mapOf(Pair("name", "MAHWAH"))), row("city", mapOf(Pair("name", "RILLIEUX-LA-PAPE"))), row("city", mapOf(Pair("name", "SAINT-JUST SAINT-RAMBERT CEDEX"))), row("city", mapOf(Pair("name", "STAINS"))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - CITIES
  51. 51. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - CITIES
  52. 52. @Test fun `imports city|country links`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (city:City)-[:LOCATED_IN_COUNTRY]->(country:Country) " + "RETURN country {.code}, city {.name} " + "ORDER BY city.name ASC").use { assertThat(it).containsExactly( mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "B[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "C[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "H[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "LYON")))), mapOf(Pair("country", mapOf(Pair("code", "[US]"))), Pair("city", mapOf(Pair("name", "MAHWAH")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "R[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "S[...]")))), mapOf(Pair("country", mapOf(Pair("code", "[FR]"))), Pair("city", mapOf(Pair("name", "STAINS")))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - COUNTRIES-[]-Cities
  53. 53. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - COUNTRIES-[]-Cities
  54. 54. @Test fun `imports addresses`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (address:Address) " + "RETURN address {.address} ").use { assertThat(it).containsOnlyOnce( row("address", mapOf(Pair("address", "16 RUE DE MONTBRILLANTnBUROPARC RIVE GAUCHE"))), row("address", mapOf(Pair("address", "ZI SUD D'ANDREZIEUXnRUE B. THIMONNIER"))), row("address", mapOf(Pair("address", "47 BOULEVARD CHARLES V"))), row("address", mapOf(Pair("address", "162 AVENUE JEAN JAURÈS"))), row("address", mapOf(Pair("address", "10-16 RUE DU COLONEL ROL TANGUYnZAC DU BOIS MOUSSAY"))), row("address", mapOf(Pair("address", "8 RUE DE L'EST"))), row("address", mapOf(Pair("address", "800 CORPORATE DRIVE"))), row("address", mapOf(Pair("address", "2 RUE DUE NOUVEAU BERCY"))), row("address", mapOf(Pair("address", "2871 AVENUE DE L'EUROPE"))) ) } assertThat(commitCounter.getCount()).isEqualTo(1) } LAB IMPORT - TDD style - ADDRESSES
  55. 55. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - ADDRESSES
  56. 56. @Test fun `imports address|city links`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (address:Address)-[location:LOCATED_IN_CITY]->(city:City) " + "RETURN location {.zipcode}, city {.name}, address {.address} " + "ORDER BY location.zipcode ASC").use { assertThat(it).containsOnlyOnce( mapOf( Pair("location", mapOf(Pair("zipcode", "07430"))), Pair("city", mapOf(Pair("name", "MAHWAH"))), Pair("address", mapOf(Pair("address", "800 CORPORATE DRIVE"))) ) //, [...] ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - ADDRESSES-[]-CITIES
  57. 57. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - ADDRESSES-[]-CITIES
  58. 58. @Test fun `imports business segments`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (segment:BusinessSegment) " + "RETURN segment {.code, .label} " + "ORDER BY segment.code ASC").use { assertThat(it).containsOnlyOnce( row("segment", mapOf(Pair("code", "[AUT]"), Pair("label", "AUTRES"))), row("segment", mapOf(Pair("code", "[DM]"), Pair("label", "DISPOSITIFS MÉDICAUX"))), row("segment", mapOf(Pair("code", "[MED]"), Pair("label", "MÉDICAMENTS"))), row("segment", mapOf(Pair("code", "[PA]"), Pair("label", "PRESTATAIRES ASSOCIÉS"))) ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - business segment
  59. 59. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) MERGE (segment:BusinessSegment { code: row.segment_code, label: row.segment_label}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - business segment
  60. 60. @Test fun `imports companies`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (company:Company) " + "RETURN company {.identifier, .name} " + "ORDER BY company.identifier ASC").use { assertThat(it).containsOnlyOnce( row("company", mapOf(Pair("identifier", "ARHHJTWT"), Pair("name", "EYETECHCARE"))), row("company", mapOf(Pair("identifier", "FRQXZIGY"), Pair("name", "SANOFI PASTEUR MSD SNC"))), row("company", mapOf(Pair("identifier", "GEJLGPVD"), Pair("name", "NOBEL BIOCARE USA LLC"))), row("company", mapOf(Pair("identifier", "GXIVOHBB"), Pair("name", "ISIS DIABETE"))), // [...] row("company", mapOf(Pair("identifier", "ZQKPAZKB"), Pair("name", "CREAFIRST"))) ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - companies
  61. 61. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) MERGE (segment:BusinessSegment { code: row.segment_code, label: row.segment_label}) MERGE (company:Company {identifier: row.company_id, name: row.company_name}) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - companies
  62. 62. @Test fun `imports address|company|business segment`() { newReader("/companies.csv").use { subject.import(it) } graphDb.graphDatabaseService.execute( "MATCH (segment:BusinessSegment)<-[:IN_BUSINESS_SEGMENT]-(company:Company)-[:LOCATED_AT_ADDRESS]->(address:Address) " + "RETURN company {.identifier}, segment {.code}, address {.address} " + "ORDER BY company.identifier ASC").use { assertThat(it).containsOnlyOnce( mapOf( Pair("company", mapOf(Pair("identifier", "ARHHJTWT"))), Pair("segment", mapOf(Pair("code", "[DM]"))), Pair("address", mapOf(Pair("address", "2871 AVENUE DE L'EUROPE"))) ) // [...] ) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 1 commit") .isEqualTo(1) } LAB IMPORT - TDD style - addresses-[]-companies-[]-business segment
  63. 63. session.run(""" UNWIND {rows} AS row MERGE (country:Country {code: row.country_code}) ON CREATE SET country.name = row.country_name MERGE (city:City {name: row.city_name}) MERGE (city)-[:LOCATED_IN_COUNTRY]->(country) MERGE (address:Address {address: row.address}) MERGE (address)-[:LOCATED_IN_CITY {zipcode: row.zipcode}]->(city) MERGE (segment:BusinessSegment { code: row.segment_code, label: row.segment_label}) MERGE (company:Company {identifier: row.company_id, name: row.company_name}) MERGE (company)-[:IN_BUSINESS_SEGMENT]->(segment) MERGE (company)-[:LOCATED_AT_ADDRESS]->(address) """.trimMargin(), mapOf(Pair("rows", rows))) LAB IMPORT - TDD style - addresses-[]-companies-[]-business segment
  64. 64. @Test fun `batches commits`() { newReader("/companies.csv").use { subject.import(it, commitPeriod = 2) } assertThat(commitCounter.getCount()) .overridingErrorMessage("Expected 5 batched commits.") .isEqualTo(5) } LAB IMPORT - TDD style - batch import
  65. 65. class CommitCounter : TransactionEventHandler<Any?> { private val count = AtomicInteger(0) override fun afterRollback(p0: TransactionData?, p1: Any?) {} override fun beforeCommit(p0: TransactionData?): Any? = return null override fun afterCommit(p0: TransactionData?, p1: Any?) = count.incrementAndGet() fun getCount(): Int = return count.get() fun reset() = count.set(0) } LAB IMPORT - TDD style - batch import
  66. 66. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  67. 67. Data sources
  68. 68. data sources - PROBLEM ? Lab name mismatch >_<
  69. 69. data sources - String matching option ™
  70. 70. data sources - Stack Overflow-DRIVEN DEVELOPMENT !
  71. 71. Sørensen–Dice coefficient
  72. 72. Sørensen–Dice coefficient “bois vert” “bo”, “oi”, “is”, “ve”, “er”, “rt” “bois ça” “bo”, “oi”, “is”, “ça” 2 * 3 / (6 + 4) = 60 % de similarité
  73. 73. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) Publishing an extension 101 ● Write the extension in any JVM language (Java, Scala, Kotlin…) ● Package a JAR ● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
  74. 74. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) Publishing an extension 101 ● Write the extension in any JVM language (Java, Scala, Kotlin…) ● Package a JAR ● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
  75. 75. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) class MyFunction { @UserFunction(name = "my.function") fun doSomethingAwesome(@Name("input1") input1: String, @Name("input2") input2: String): Double { // do something awesome... } }
  76. 76. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) In Java (Maven) <dependency> <groupId>org.neo4j</groupId> <artifactId>procedure-compiler</artifactId> <version>${neo4j.version}</version> </dependency> In Kotlin (Maven) <plugin> <groupId>org.jetbrains.kotlin</groupId> <artifactId>kotlin-maven-plugin</artifactId> <version>${kotlin.version}</version> <configuration> <annotationProcessorPaths> <annotationProcessorPath> <groupId>org.neo4j</groupId> <artifactId>procedure-compiler</artifactId> <version>${neo4j.version}</version> </annotationProcessorPath> </annotationProcessorPaths> </configuration> <executions> <execution><id>compile-annotations</id> <goals><goal>kapt</goal></goals> </execution> </executions> </plugin> https://bit.ly/safer-neo4j-extensions
  77. 77. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) @UserFunction(name = "strings.similarity") fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double { if (input1 == input2) return totalMatch val whitespace = Regex("s+") val words1 = normalizedWords(input1, whitespace) val words2 = normalizedWords(input2, whitespace) if (words1 == words2) return totalMatch val matchCount = AtomicInteger(0) val initialPairs1 = allPairs(words1) val initialPairs2 = allPairs(words2) val pairs2 = initialPairs2.toMutableList() initialPairs1.forEach { val pair1 = it val matchIndex = pairs2.indexOfFirst { it == pair1 } if (matchIndex > -1) { matchCount.incrementAndGet() pairs2.removeAt(matchIndex) return@forEach } } return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size) }
  78. 78. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+) @UserFunction(name = "strings.similarity") fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double { if (input1 == input2) return totalMatch val whitespace = Regex("s+") val words1 = normalizedWords(input1, whitespace) val words2 = normalizedWords(input2, whitespace) if (words1 == words2) return totalMatch val matchCount = AtomicInteger(0) val initialPairs1 = allPairs(words1) val initialPairs2 = allPairs(words2) val pairs2 = initialPairs2.toMutableList() initialPairs1.forEach { val pair1 = it val matchIndex = pairs2.indexOfFirst { it == pair1 } if (matchIndex > -1) { matchCount.incrementAndGet() pairs2.removeAt(matchIndex) return@forEach } } return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size) } 83% of matches!
  79. 79. detour - neo4j Rule and user-defined functions @get:Rule val graphDb = Neo4jRule() .withFunction( StringSimilarityFunction::class.java )
  80. 80. Drug import session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName """.trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
  81. 81. session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName MATCH (lab:Company) WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity """.trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity))) Drug import
  82. 82. session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName MATCH (lab:Company) WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab, labName ORDER BY similarity DESC WITH drug, labName, HEAD(COLLECT(lab)) AS lab """.trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity))) Drug import
  83. 83. session.run(""" UNWIND {rows} as row MERGE (drug:Drug {cisCode: row.cisCode}) ON CREATE SET drug.name = row.drugName WITH drug, row UNWIND row.labNames AS labName MATCH (lab:Company) WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab, labName ORDER BY similarity DESC WITH drug, labName, HEAD(COLLECT(lab)) AS lab FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END | MERGE (lab)<-[:DRUG_HELD_BY]-(drug)) FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END | MERGE (fallback:Company:Ansm {name: labName}) MERGE (fallback)<-[:DRUG_HELD_BY]-(drug) )""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity))) Drug import
  84. 84. CYPHER TRICKS - FOREACH as poor man’s IF FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END | MERGE (lab)<-[:DRUG_HELD_BY]-(drug)) FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END | MERGE (fallback:Company:Ansm {name: labName}) MERGE (fallback)<-[:DRUG_HELD_BY]-(drug) ) FOREACH (item in collection | ...do something...)
  85. 85. @RestController class LabsApi(private val repository: LabsRepository) { @GetMapping("/packages/{package}/labs") fun findLabsByMarketedDrug(@PathVariable("package") drugPackage: String): List<Lab> { return repository.findAllByMarketedDrugPackage(drugPackage) } } Drug import - API
  86. 86. @Repository class LabsRepository(private val driver: Driver) { fun findAllByMarketedDrugPackage(drugPackage: String): List<Lab> { driver.session(AccessMode.READ).use { val result = it.run(""" MATCH (lab:Company)<-[:DRUG_HELD_BY]-(:Drug)-[:DRUG_PACKAGED_AS]->(:Package {name: {name}}) OPTIONAL MATCH (lab)-[:IN_BUSINESS_SEGMENT]->(segment:BusinessSegment), (lab)-[:LOCATED_AT_ADDRESS]->(address:Address), (address)-[cityLoc:LOCATED_IN_CITY]->(city:City), (city)-[:LOCATED_IN_COUNTRY]->(country:Country) RETURN lab {.identifier, .name}, segment {.code, .label}, address {.toAddress}, cityLoc {.zipcode}, city {.name}, country {.code, .name} ORDER BY lab.identifier ASC""".trimIndent(), mapOf(Pair("name", drugPackage))) return result.list().map(this::toLab) } } Drug import - REPOSITORY
  87. 87. Backlog ● Find the address of a lab ● Find labs that own a specific drug ● Find health professionals related to/influenced by labs ● Find health professionals the most influenced by labs within a year ● Find patients related to health professionals ● Find patients’ relatives, friends, colleagues ● ...
  88. 88. BENEFIT IMPORT (from previous User Story) session.run(""" UNWIND {rows} AS row MERGE (hp:HealthProfessional {first_name: row.first_name, last_name: row.last_name}) MERGE (ms:MedicalSpecialty {code: row.specialty_code}) ON CREATE SET ms.name = row.specialty_name MERGE (ms)<-[:SPECIALIZES_IN]-(hp) MERGE (y:Year {year: row.year}) MERGE (y)<-[:MONTH_IN_YEAR]-(m:Month {month: row.month}) MERGE (m)<-[:DAY_IN_MONTH]-(d:Day {day: row.day}) MERGE (bt:BenefitType {type: row.benefit_type}) CREATE (b:Benefit {amount: row.benefit_amount}) CREATE (b)-[:GIVEN_AT_DATE]->(d) CREATE (b)-[:HAS_BENEFIT_TYPE]->(bt) MERGE (lab:Company {identifier:row.lab_identifier}) CREATE (lab)-[:HAS_GIVEN_BENEFIT]->(b) CREATE (hp)<-[:HAS_RECEIVED_BENEFIT]-(b) """.trimIndent(), mapOf(Pair("rows", rows)))
  89. 89. TOP 3 Health Professionals - API @RestController class HealthProfessionalApi(private val repository: HealthProfessionalsRepository) { @GetMapping("/benefits/{year}/health-professionals") fun findTop3ProfessionalsWithBenefits(@PathVariable("year") year: String) : List<Pair<HealthProfessional, AggregatedBenefits>> { return repository.findTop3ByMostBenefitsWithinYear(year) } }
  90. 90. TOP 3 Health Professionals - API @Repository class HealthProfessionalsRepository(private val driver: Driver) { fun findTop3ByMostBenefitsWithinYear(year: String): List<Pair<HealthProfessional, AggregatedBenefits>> { val result = driver.session(AccessMode.READ).use { val parameters = mapOf(Pair("year", year)) it.run(""" MATCH (:Year {year: {year}})<-[:MONTH_IN_YEAR]-(:Month)<-[:DAY_IN_MONTH]-(d:Day), (bt:BenefitType)<-[:HAS_BENEFIT_TYPE]-(b:Benefit)-[:GIVEN_AT_DATE]->(d), (lab:Company)-[:HAS_GIVEN_BENEFIT]->(b)-[:HAS_RECEIVED_BENEFIT]->(hp:HealthProfessional), (hp)-[:SPECIALIZES_IN]->(ms:MedicalSpecialty) WITH ms, hp, SUM(toFloat(b.amount)) AS total_amount, COLLECT(DISTINCT lab.name) AS labs, COLLECT(bt.type) AS benefit_types ORDER BY total_amount DESC RETURN ms {.code, .name}, hp {.first_name, .last_name}, total_amount, labs, benefit_types LIMIT 3 """.trimIndent(), parameters) } return result.list().map(this::toAggregatedHealthProfessionalBenefits) } }
  91. 91. DEPLOYMENT OPTIONS ● DIY - https://neo4j.com/docs/operations-manual/current/installation/ ● Azure - https://neo4j.com/blog/neo4j-microsoft-azure-marketplace-part-1/ ● Neo4j ON KUBERNETES - https://github.com/mneedham/neo4j-kubernetes ● Graphene DB ○ https://www.graphenedb.com/ ○ ON HEROKU - https://elements.heroku.com/addons/graphenedb ● NEO4J Cloud FOUNDRY - WIP !
  92. 92. DEPLOYMENT OPTIONS
  93. 93. DEPLOYMENT OPTIONS
  94. 94. “Nothing is ever finished” - TODO list Optimize the import Use Spring Data Neo4j Use “graphier” algorithms (shortest paths, page rank…) Expose GraphQL API - http://grandstack.io/
  95. 95. Thank you ! Florent Biville (@fbiville) Marouane Gazanayi (@mgazanayi) https://github.com/graph-labs/open-data-with-neo4j
  96. 96. Little ad for a friend (jérôme ;-))
  97. 97. Q&A ?
  98. 98. One more thing graph-labs.fr

×