Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transform Your Data: A Worked Example at GraphDay LA

This talk examines graph databases and Neo4j with a use-case driven approach. First, we look at some property graph model examples, taken from real-world datasets. Next we discuss converting a relational model to graph, using the canonical Northwind example. Finally, we dive into Fraud Detection and Personalized Recommendation examples, learning about Neo4j developer tooling as we explore these use cases.

  • Login to see the comments

  • Be the first to like this

Transform Your Data: A Worked Example at GraphDay LA

  1. 1. RDBMS TO GRAPH Graph Day LA
  2. 2. ACCOUNT HOLDER 2 ACCOUNT HOLDER 1 ACCOUNT HOLDER 3 CREDIT CARD BANK ACCOUNT BANK ACCOUNT BANK ACCOUNT ADDRESS PHONE NUMBER PHONE NUMBER SSN 2 UNSECURE LOAN SSN 2 UNSECURE LOAN CREDIT CARD
  3. 3. I'm Kevin Deployment Strategy at Neo4j /in/kevinvangundy @kevinvangundy
 kevin@neo4j.com
  4. 4. Does the underlying data-structure matter?
  5. 5. Name Country Dept University John UK Prime Brokerage Princeton Mary USA Sales and Trade Yale Li China Investment Banking Princeton Kate UK Sales and Trade Princeton Michal CA Investment Banking Brown Employees
  6. 6. ID Country 17 UK 12 USA 19 China 17 UK 112 CA Countries
  7. 7. ID Country Leader 17 UK Cameron 12 USA Obama 19 China Xi Jinping 17 UK Cameron 112 CA Trudeau Countries
  8. 8. Name Country Dept University John 17 Prime Brokerage Princeton Mary 12 Sales and Trade Yale Li 19 Investment Banking Princeton Kate 17 Sales and Trade Princeton Michal 112 Investment Banking Brown Employees
  9. 9. ID Name President State 92 Princeton Eisgrubt NJ 34 Yale Salovey CT 1 Brown Paxson RI University
  10. 10. Name Country Dept University John 17 Prime Brokerage 92 Mary 12 Sales and Trade 34 Li 19 Investment Banking 92 Kate 17 Sales and Trade 92 Michal 112 Investment Banking 1 Employees
  11. 11. Name Country Dept University John 17 Prime Brokerage 92 Mary 12 Sales and Trade 34 Li 19 Investment Banking 92 Kate 17 Sales and Trade 92 Michal 112 Investment Banking 1 ID Country 17 UK 12 USA 19 China 17 UK 112 CA ID Name Presiden t State 92 Princeton Eisgrubt NJ 34 Yale Salovey CT 1 Brown Paxson RI
  12. 12. SELECT p.name, c.country, c.leader, p.hair, u.name, u.pres, u.state FROM people p LEFT JOIN country c ON c.ID=p.country LEFT JOIN uni u ON p.uni=u.id WHERE u.state=‘CT’
  13. 13. Name Country Dept University John 17 Prime Brokerage 92 Mary 12 Sales and Trade 34 Li 19 Investment Banking 92 Kate 17 Sales and Trade 92 Michal 112 Investment Banking 1 ID Country 17 UK 12 USA 19 China 17 UK 112 CA ID Name President State 92 Princeton Eisgrubt NJ 34 Yale Salovey CT 1 Brown Paxson RI Name Country Dept University John 17 Prime Brokerage 92 Mary 12 Sales and Trade 34 Li 19 Investment Banking 92 Kate 17 Sales and Trade 92 Michal 112 Investment Banking 1 ID Country 17 UK 12 USA 19 China 17 UK 112 CA ID Name President State 92 Princeton Eisgrubt NJ 34 Yale Salovey CT 1 Brown Paxson RI Name Country Dept University John 17 Prime Brokerage 92 Mary 12 Sales and Trade 34 Li 19 Investment Banking 92 Kate 17 Sales and Trade 92 Michal 112 Investment Banking 1 ID Country 17 UK 12 USA 19 China 17 UK 112 CA 19 China 17 UK 112 CA Name Country Dept University John 17 Prime Brokerage 92 Mary 12 Sales and Trade 34 Li 19 Investment Banking 92 Kate 17 Sales and Trade 92 Michal 112 Investment Banking 1 ID Country 17 UK 12 USA 19 China 17 UK 112 CA Name Country Dept University John 17 Prime Brokerage 92 Mary 12 Sales and Trade 34 Li 19 Investment Banking 92 Kate 17 Sales and Trade 92 Michal 112 Investment Banking 1 ID Name President State 92 Princeton Eisgrubt NJ 34 Yale Salovey CT 1 Brown Paxson RI
  14. 14.  (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count    FROM person_reportee manager    WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION    SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count    FROM(    SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT L2Reportees.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") )
  15. 15. JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN
  16. 16. JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN
  17. 17. JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN JOIN
  18. 18. Have you seen Ted's UUID?
  19. 19. • Complex to model and store relationships • Performance degrades with increases in data • Queries get long and complex • Maintenance is painful SQL Trouble
  20. 20. • Easy to model and store relationships • Performance of relationship traversal remains constant with growth in data size • Queries are shortened and more readable • Adding additional properties and relationships can be done on the fly - no migrations Graph Motivations
  21. 21. Graph Gains
  22. 22.  (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count    FROM person_reportee manager    WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION    SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count    FROM(    SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT L2Reportees.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") )
  23. 23. MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total
  24. 24. MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total
  25. 25. How Fast is Fast? • Sample Social Graph with roughly 1,000 persons • On average each person has 50 friends • pathExists(a,b) limited to depth 4 • Caches warmed up to eliminate disk I/O
  26. 26. How Fast is Fast? DATABASE # OF PERSONS QUERY TIME • Sample Social Graph with roughly 1,000 persons • On average each person has 50 friends • pathExists(a,b) limited to depth 4 • Caches warmed up to eliminate disk I/O
  27. 27. How Fast is Fast? DATABASE # OF PERSONS QUERY TIME RDBMs 1,000 2,000 ms • Sample Social Graph with roughly 1,000 persons • On average each person has 50 friends • pathExists(a,b) limited to depth 4 • Caches warmed up to eliminate disk I/O
  28. 28. How Fast is Fast? DATABASE # OF PERSONS QUERY TIME RDBMs 1,000 2,000 ms Neo4j 1,000 2 ms • Sample Social Graph with roughly 1,000 persons • On average each person has 50 friends • pathExists(a,b) limited to depth 4 • Caches warmed up to eliminate disk I/O
  29. 29. How Fast is Fast? DATABASE # OF PERSONS QUERY TIME RDBMs 1,000 2,000 ms Neo4j 1,000 2 ms Neo4j 10,000,000 2 ms • Sample Social Graph with roughly 1,000 persons • On average each person has 50 friends • pathExists(a,b) limited to depth 4 • Caches warmed up to eliminate disk I/O
  30. 30. What is the 
 "Graph Impact?"
  31. 31. David Meza of NASA said: "Neo helped NASA save millions of dollars and up to two years by locating existing research they could use in his work on the Orion, the spacecraft NASA hopes eventually will take humans to Mars."
  32. 32. we're getting mars 2 years sooner because of Neo4j…
  33. 33. “We needed to understand consumer behavior across devices in order to capture a complete picture. Conceptually we could have done this in a relational database, but the multiple JOINS would have made it much too complicated.” 
 - Qualia CTO, Niels Meersschaert
  34. 34. We're smashing a billion queries a day that'd be impossible in relational…
  35. 35. "I found graph databases, which perform well with queries on connected data. With more than 10 years of experience of using relational database, I know that complicated joins are the performance killer. But graph databases kick ass of other databases." 
 - LinkedIn China Development Lead, Dong Bin
  36. 36. Neo4j is awesome…
  37. 37. How do you use Neo4j? KEY QUESTIONS
  38. 38. How do you use Neo4j? KEY QUESTIONS MODEL
  39. 39. How do you use Neo4j? KEY QUESTIONS MODEL+ LOAD DATA
  40. 40. How do you use Neo4j? KEY QUESTIONS MODEL QUERY DATA + LOAD DATA
  41. 41. How do you use Neo4j?
  42. 42. How do you use Neo4j?
  43. 43. Language Drivers
  44. 44. Native Procedures and Functions
  45. 45. From RDBMs To Graphs
  46. 46. Northwind
  47. 47. Northwind - a Canonical RDBMS Example
  48. 48. ( )-[:TO]->(Graph)
  49. 49. ( )-[:IS_BETTER_AS]->(Graph)
  50. 50. Starting with the ER Diagram
  51. 51. Locate the Foreign Keys
  52. 52. Drop the Foreign Keys
  53. 53. Find the JOIN Tables
  54. 54. (Simple) JOIN Tables Become Relationships
  55. 55. Attributed JOIN Tables -> Relationships with Properties
  56. 56. Querying a Subset Today
  57. 57. As a Graph
  58. 58. QUERYING THE GRAPH
  59. 59. Property Graph Model CREATE (:Employee{ firstName:“Steven”} ) -[:REPORTS_TO]-> (:Employee{ firstName:“Andrew”} ) REPORTS_TO Steven Andrew LABEL PROPERTY NODE NODE LABEL PROPERTY
  60. 60. Who do people report to? MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee) RETURN *
  61. 61. Who do people report to?
  62. 62. Who do people report to? MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee) RETURN e.employeeID AS managerID, e.firstName AS managerName, sub.employeeID AS employeeID, sub.firstName AS employeeName;
  63. 63. Who do people report to?
  64. 64. Who does Robert report to? MATCH p=(e:Employee)<-[:REPORTS_TO]-(sub:Employee) WHERE sub.firstName = ‘Robert’ RETURN p
  65. 65. Who does Robert report to?
  66. 66. What is Robert’s reporting chain? MATCH p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee) WHERE sub.firstName = ‘Robert’ RETURN p
  67. 67. What is Robert’s reporting chain?
  68. 68. Who’s the Big Boss? MATCH (e:Employee) WHERE NOT (e)-[:REPORTS_TO]->() RETURN e.firstName as bigBoss
  69. 69. Who’s the Big Boss?
  70. 70. Product Cross-Selling MATCH (choc:Product {productName: 'Chocolade'}) <-[:INCLUDES]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:INCLUDES]->(other:Product) RETURN employee.firstName, other.productName, COUNT(DISTINCT o2) as count ORDER BY count DESC LIMIT 5;
  71. 71. Product Cross-Selling
  72. 72. POWERING AN APP
  73. 73. NYC Meetup Recommendation App

×