Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RDBMS to Graphs

Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.

A graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.

RDBMS to Graphs

  1. 1. RDBMS  to  Graphs   Harnessing  the  Power  of  the  Graph   September  2015   Ryan  Boyd   @ryguyrg  
  2. 2. Agenda   •  Origins  of  Neo4j   •  Benefits  of  Graphs   •  Designing  your  Graph  Model   •  Query  <me!   •  Fi@ng  Neo4j  into  your  Enterprise  Architecture     •  Q&A  
  3. 3. Neo  Technology  Overview   Product   • Neo4j  -­‐  World’s  leading  graph   database   • 150+  enterprise  subscrip<on   customers  including  over     50  of  the  Global  2000   Company   • Neo  Technology,  Creator  of  Neo4j   • 100  employees  with  HQ  in  Silicon   Valley,  London,  Munich,  Paris  and   Malmö   • $45M  in  funding  
  4. 4. Neo4j  AdopDon  by  Selected  VerDcals   Financial
 Services Communications Health &
 Life Sciences HR &
 Recruiting Media &
 Publishing Social
 Web Industry 
 & Logistics Entertainment Consumer Retail Information Services Business Services
  5. 5. How  Customers  Use  Neo4j   Network & Data Center Master Data
 Management Social Recom– mendations Identity & Access Search &
 Discovery GEO
  6. 6. “Forrester  es<mates  that  over  25%  of  enterprises  will  be  using   graph  databases  by  2017”   Neo4j  Leads  the  Graph  Database  RevoluDon   “Neo4j  is  the  current  market  leader  in  graph  databases.”   “Graph  analysis  is  possibly  the  single  most  effecDve  compeDDve   differenDator  for  organiza<ons  pursuing  data-­‐driven  opera<ons   and  decisions  aaer  the  design  of  data  capture.”   IT  Market  Clock  for  Database  Management  Systems,  2014   hbps://www.gartner.com/doc/2852717/it-­‐market-­‐clock-­‐database-­‐management   TechRadar™:  Enterprise  DBMS,  Q1  2014   hbp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-­‐/E-­‐RES106801   Graph  Databases  –  and  Their  PotenDal  to  Transform  How  We  Capture  Interdependencies  (Enterprise  Management  Associates)   hbp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-­‐databasesand-­‐poten<al-­‐transform-­‐capture-­‐interdependencies/  
  7. 7. High  Business  Value  in  Data  RelaDonships   Data  is  increasing  in  volume…   •  New  digital  processes   •  More  online  transac<ons   •  New  social  networks   •  More  devices   Using  Data  RelaDonships  unlocks  value     •  Real-­‐<me  recommenda<ons   •  Fraud  detec<on   •  Master  data  management   •  Network  and  IT  opera<ons   •  Iden<ty  and  access  management   •  Graph-­‐based  search  …  and  is  ge[ng  more  connected   Customers,  products,  processes,   devices  interact  and  relate  to   each  other     Early  adopters  became  industry  leaders  
  8. 8. RelaDonal  DBs  Can’t  Handle  RelaDonships  Well   •  Cannot  model  or  store  data  and  rela>onships   without  complexity   •  Performance  degrades  with  number  and  levels   of  rela<onships,  and  database  size   •  Query  complexity  grows  with  need  for  JOINs   •  Adding  new  types  of    data  and  rela>onships   requires  schema  redesign,  increasing  <me  to   market   …  making  tradi<onal  databases  inappropriate   when  data  rela<onships  are  valuable  in  real-­‐Dme     Slow  development   Poor  performance   Low  scalability   Hard  to  maintain  
  9. 9. Modeling  as  a  Graph  
  10. 10. The  Whiteboard  Model  Is  the  Physical  Model  
  11. 11. CAR   name:  “Dan”   born:  May  29,  1970   twiber:  “@dan”   name:  “Ann”   born:    Dec  5,  1975   since:     Jan  10,  2011   brand:  “Volvo”   model:  “V70”   Property  Graph  Model  Components   Nodes   •  The  objects  in  the  graph   •  Can  have  name-­‐value  proper&es   •  Can  be  labeled   RelaDonships   •  Relate  nodes  by  type  and  direc<on   •  Can  have  name-­‐value  proper&es   LOVES   LOVES   LIVES  WITH   PERSON   PERSON  
  12. 12. RelaDonal  Versus  Graph  Models   RelaDonal  Model   Graph  Model   KNOWS   ANDREAS   TOBIAS   MICA   DELIA   Person   Friend  Person-­‐Friend   ANDREAS   DELIA   TOBIAS   MICA  
  13. 13. Let’s  Model!     Customer,  Supplier,  and  Product  (Master  Data)   Orders  (AcDvity)  
  14. 14. The  Domain  Model  
  15. 15. Except…  
  16. 16. Northwind  Example!    
  17. 17. The  QuintessenDal   Northwind  Example!     NOT  JUST  ANY  
  18. 18. (Northwind)-­‐[:TO]-­‐>(Graph)   Building  the  Graph  Model  
  19. 19. Building  RelaDonships  in  Graphs   SOLD   Employee   Order  Order  
  20. 20. Locate  Foreign  Keys  
  21. 21. (FKs)-­‐[:BECOME]-­‐>(RelaDonships)   Correct  DirecDons  
  22. 22. Simple  Join  Tables  Becomes  RelaDonships  
  23. 23. Afributed  Join  Tables  Become   RelaDonships  with  ProperDes  
  24. 24. Working  Subset  (Today’s  Exercise)  
  25. 25. Northwind  Graph  Model  
  26. 26. Querying  Your  Data  
  27. 27. Basic  Query:  Who  do  people  report  to?   MATCH  (:Employee{  firstName:“Steven”}  )  -­‐[:REPORTS_TO]-­‐>  (:Employee{  firstName:“Andrew”}  )     REPORTS_TO   Steven   Andrew   LABEL   PROPERTY   NODE   NODE   LABEL   PROPERTY  
  28. 28. Basic  Query:  Who  do  people  report  to?   MATCH ! (e:Employee)<-[:REPORTS_TO]-(sub:Employee)! RETURN ! *!
  29. 29. Basic  Query:  Who  do  people  report  to?  
  30. 30. Basic  Query:  Who  do  people  report  to?  
  31. 31. Real  Query  from  a  Customer   Find  all  direct  reports  and     how  many  people  they  manage,     each  up  to  3  levels  down  
  32. 32. (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager   JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN GROUP BY directReportees UNION SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count OUTER UNIONS FROM( SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lNam GROUP BY directReportees count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees
  33. 33. Real  Query  from  a  Customer   MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)   WHERE  boss.name  =  “John  Doe”   RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports   Find  all  direct  reports  and  how   many  people  they  manage,     up  to  3  levels  down   Cypher  Query  
  34. 34. Real  Query  from  a  Customer   Find  all  direct  reports  and  how   many  people  they  manage,     up  to  3  levels  down   Cypher  Query   SQL  Query   MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)   WHERE  boss.name  =  “John  Doe”   RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports  
  35. 35. MATCH  (sub)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(sub)   WHERE  boss.name  =  “John  Doe”   RETURN  sub.name  AS  Subordinate,        count(report)  AS  Total   Express  Complex  Queries  Easily  with  Cypher   Find  all  direct  reports  and  how   many  people  they  manage,     up  to  3  levels  down   Cypher  Query   SQL  Query  
  36. 36. “We  found  Neo4j  to  be  literally  thousands  of  Dmes  faster   than  our  prior  MySQL  solu<on,  with  queries  that  require   10  to  100  Dmes  less  code.  Today,  Neo4j  provides  eBay   with  func<onality  that  was  previously  impossible.”     Volker  Pacher   Senior  Developer  
  37. 37. Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?   MATCH ! p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)! WHERE! sub.firstName = ‘Robert’! RETURN ! p!
  38. 38. Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?  
  39. 39. Who’s  the  Big  Boss?   MATCH ! p=(e:Employee)! WHERE! NOT (e)-[:REPORTS_TO]->()! RETURN ! e.firstName as bigBoss!
  40. 40. Who’s  the  Big  Boss?  
  41. 41. Product  Cross-­‐Sell   MATCH ! (choc:Product {productName: 'Chocolade'})! <-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),! (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)! RETURN ! employee.firstName, other.productName, count(distinct o2) as count! ORDER BY ! count DESC! LIMIT 5;!
  42. 42. Product  Cross-­‐Sell  
  43. 43. High  Performance    
  44. 44. Cypher  vs  SQL  -­‐  Paths   MATCH (u:User)-[:KNOWS*5..5]->(f5) WHERE u.name = 'John' RETURN count(f5) as size; Cypher   Find  Size  of  John’s  5th  degree  Network   ●  100k  Users   ●  5M   Rela<onships   ●  Query  took  5   min,  30s   ●  Returns  count  of   312M     Neo4j  config:     page-­‐cache  =  512m   heap  =  4G  
  45. 45. Cypher  vs  SQL  -­‐  Paths   SELECT count(*) FROM user, user_friend as uf1, user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5 user as f5 WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1 AND uf5.user_2 = f5.id; SQL   Find  Size  of  John’s  5th  degree  Network   ●  100k  Users   ●  5M  Connec<ons   ●  Query  took  1hr  55  mins   ●  Returns  312M     MySQL  config:     key_buffer  =  2G   join_buffer_size  =  2G  
  46. 46. Cypher  vs  SQL  -­‐  Paths     SELECT count(*) FROM user, user_friend as uf1, user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5 WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1; SQL   Op>mize:  Only  count  on  JOIN  table   ●  100k  Users   ●  5M  Connec<ons   ●  Query  took  2  min,  30s   ●  Returns  count  of  312M     MySQL  config:     key_buffer  =  2G   join_buffer_size  =  2G  
  47. 47. Cypher  vs  SQL  -­‐  Paths   MATCH (u:User)-[:KNOWS*4..4]->(f4) WHERE u.name = 'John' RETURN sum(size((f4)-[:KNOWS]->())) Cypher   Op>mize:  Only  sum  degree  of  last  step   ●  100k  Users   ●  5M   Rela<onships   ●  Query  takes  12   sec   ●  Returns  count  of   312M     Neo4j  config:     page-­‐cache  =  512m   heap  =  4G  
  48. 48. Neo4j  Clustering     Architecture  OpDmized  for  Speed  &  Availability  at  Scale   50 Performance  Benefits   •  No  network  hops  within  queries   •  Real-­‐>me  opera>ons  with  fast  and   consistent  response  <mes     •  Cache  sharding  spreads  cache  across   cluster  for  very  large  graphs   Clustering  Features   •  Master-­‐slave  replica<on  with     master  re-­‐elec>on  and  failover     •  Each  instance  has  its  own  local  cache   •  Horizontal  scaling  &  disaster  recovery   Load  Balancer   Neo4j  Neo4j  Neo4j  
  49. 49. Ge[ng  Data  into  Neo4j   Cypher-­‐Based  “LOAD  CSV”  Capability   •  Transac<onal  (ACID)  writes   •  Ini<al  and  incremental  loads  of  up  to     10  million  nodes  and  rela<onships   Command-­‐Line  Bulk  Loader        neo4j-­‐import   •  For  ini<al  database  popula<on   •  For  loads  with  10B+  records   •  Up  to  1M  records  per  second    4.58  million  things   and  their  rela<onships…     Loads  in  100  seconds!  
  50. 50. MIGRATE     ALL  DATA   MIGRATE     GRAPH  DATA   DUPLICATE   GRAPH  DATA   Non-­‐graph  data   Graph  data   Graph  data  All  data   All  data   RelaDonal   Database   Graph   Database   Applica<on   Applica<on   Applica<on   Three  Ways  to  Load  Data  into  Neo4j  
  51. 51. Polyglot  Persistence    
  52. 52. Data  Storage  and   Business  Rules  Execu<on   Data  Mining     and  Aggrega<on   Neo4j  Fits  into  Your  Enterprise  Environment   ApplicaDon   Graph  Database  Cluster   Neo4j   Neo4j   Neo4j   Ad  Hoc   Analysis   Bulk  AnalyDc   Infrastructure   Graph  Compute  Engine   EDW      …   Data   ScienDst   End  User   Databases   Rela<onal   NoSQL   Hadoop  
  53. 53. Neo4j  +  Mongo!  
  54. 54. Users  Love  Neo4j  
  55. 55. Users  Love  Neo4j  
  56. 56. Learn  the  Way  of  the  Graph   Quickly  and  Easily  
  57. 57. Quick  Start  in  1  minute  
  58. 58. Quick  Start:  Plan  Your  Project   1   2   3   4   5   6   7   8   Learn  Neo4j   Decide  on  Architecture   Import  and  Model  Data   Build  ApplicaDon   Test  ApplicaDon   Deploy  your  app   in  as  lible  as  8  weeks   PROFESSIONAL  SERVICES  PLAN  
  59. 59. There  Are  Lots  of  Ways  to  Easily  Learn  Neo4j  
  60. 60. Huge  Ecosystem  of  Graph  Enthusiasts   •  1,000,000+  downloads   •  20,000+  educa<on  registrants   •  18,000+  Meetup  members   •  100+  technology  and  service  partners   •  150+  enterprise  subscrip<on  customers     including  50+  Global  2000  companies  
  61. 61. Get  Started  Now  
  62. 62. Summary  of  the  Power  of  the  Graph   •  Take  rela<onships  and  connected  data  seriously   •  Seriously  easy  to  model     •  Serious  performance     •  Fits  in  with  your  Enterprise  Architecture   •  Easy  to  get  started   •  Fast  to  reap  the  benefits  
  63. 63. RDBMS  to  Graphs   Harnessing  the  Power  of  the  Graph   Start  of  Q&A   Ryan  Boyd   @ryguyrg  

×