Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Graph-Native Advantage

Graph databases have gone from being the secret sauce of Web giants to a mainstream enterprise utility. As the excitement about the potential of the graph category grows, we're starting to see more and more non-graph technology with graph bolt-ons enter the fray.

In this webinar we'll survey some of the non-native graph approaches on the market today and show where they fall short of the native graph approach of Neo4j in terms of efficiency, performance, safety and agility.

  • Login to see the comments

The Graph-Native Advantage

  1. 1. The Graph-Native Advantage Dr. Jim Webber Chief Scientist, Neo4j @jimwebber
  2. 2. Overview • Graph-Native overview • A friendly little bit of computer science • Database architecture from 30,000ft • Why Neo4j is graph native, and why it matters • Quantitative performance advantages • Q&A
  3. 3. Assumptions • Graphs are the most important data structure on the planet • Most workloads can be conveniently modelled as graphs
  4. 4. An Unordered Singly Linked List 27 1657 5674
  5. 5. A Write-Centric Database? 27 1657 5674Client
  6. 6. Not a Practical Write-Centric Database? 27 1657 5674Client Client Client Every client contends for write lock in naïve implementation
  7. 7. CRDTs to the Rescue! 27 1657 5674Client 5674 7689 6Client 1657 5674 66Client 27 1657 5674 7689 6 66
  8. 8. Trees 27 1657 27 1657 5674 7689 6 66 5674 7689 6 66
  9. 9. Spread contention around the structure 27 1657 27 1657 5674 7689 6 66 5674 7689 6 66 Client Writes Client Writes Client Writes Client Writes
  10. 10. Databases <3 Trees, usually • Classic B-trees common pattern for on disk-databases • “Index” in memory, files on leaf nodes on disk • B+ Trees for linear scans are neat! • But…
  11. 11. So what?
  12. 12. Pick the right tool for the job
  13. 13. All Databases have a native model • It could be tables or columns or KV or documents… • Each database is likely very good for that model • Evolution driven by its primary workload in its primary market • Any add-on doesn’t benefit from this • Unloved • Opportunistic (e.g. “multi model”) • Models don’t compose easily Some vendors have spotted the enormous graph trend and are simply jumping on the bandwagon
  14. 14. Two Non-Native Approaches to Graph Graph Layer • Take existing data store • Bolt-on Graph-like API from third-party open source • Declare victory Graph Operator • Take existing data store • Add graph features into the query language • Declare victory
  15. 15. Non-Native Architectures Graph Layer Graph Operator Other DBMS (e.g. Column Store) Graph Layer Graph API Other DBMS (e.g. Document Store) Other QL Graph Operator
  16. 16. Non-Native Architectures Graph Layer Graph Operator Other DBMS (e.g. Column Store) Graph Layer Graph API Other DBMS (e.g. Document Store) Other QL Graph Operator No Cypher!
  17. 17. Non-Native Architectures Graph Layer Graph Operator Other DBMS (e.g. Column Store) Graph Layer Graph API Other DBMS (e.g. Document Store) Other QL Graph Operator Requires convention at user level Denormalization No Cypher!
  18. 18. Non-Native Architectures Graph Layer Graph Operator Other DBMS (e.g. Column Store) Graph Layer Graph API Other DBMS (e.g. Document Store) Other QL Graph Operator Requires convention at user level Denormalization No Cypher! Does not understand graphs Cannot prevent dangling relationships/logical corruption/etc
  19. 19. Two Non-Native Approaches to Graph Graph Layer • Take existing data store • Bolt-on Graph-like API from third-party open source • Declare victory
  20. 20. Popular Implementation: Column Store http://javahungry.blogspot.com/2013/08/hashing-how-hash-map-works-in-java-or.html
  21. 21. Two Non-Native Approaches to Graph Graph Operator • Take existing data store • Add graph features into the query language • Declare victory
  22. 22. Popular Implementation: B-Trees! http://zhangliyong.github.io/posts/2014/02/19/mongodb-index-internals.html
  23. 23. > Pick the Right Tool for the Job
  24. 24. Connectedness and Size of Data Set ResponseTime Relational and Other NoSQL Databases 0 to 2 hops 0 to 3 degrees Thousands of connections 1000x Advantage Tens to hundreds of hops Thousands of degrees Billions of connections Neo4j “Minutes to milliseconds” Real-Time Query Performance
  25. 25. Real-time Package Routing • Large postal service with over 500k employees • Neo4j routes 7M+ packages daily at peak, with peaks of 5,000+ routing operations per second. • Many hops per transaction. Real-time promotion recommendations • Record “Cyber Monday” sales • About 35M daily transactions non peak • Each transaction is 3-22 hops • Queries executed in 4ms or less • Replaced IBM Websphere commerce Real-time pricing engine • 300M pricing operations per day • 10x transaction throughput on half the hardware compared to Oracle • Presentation at http://graphconnect.com/gc2016-sf/ • Replaced Oracle database • 7-22 hops per transaction Use Cases
  26. 26. • > 90% of the U.S. population • Graph of (People)-(Devices)-(Cookies)-(Trackers), IPs, etc. • >1B transactions per day • 3 TB graph, 5B+ nodes, 256 GB RAM
  27. 27. Neo4j’s Graph Native Stack Cypher Engine Cypher HTTP Endpoint Bolt EndpointCustom Rest APOC Extensions Parser Compiled Runtime (EE) Interpreted Runtime Native Graph Engine In-Memory Page Cache Native Graph Storage Indexing ACID Cost-based Optimizer Fast Write Buffering CAPI Adapter Configuration Data Stores Logging Security High Availability Monitoring Command Line Interface Neo4j Browser Sync Custom Functions App or Community Driver Language Drivers 29
  28. 28. Neo4j’s Graph Native Stack Native Label index speed writes Composite indexes speed query performance Compiled Cypher Runtime for common queries now, all soon Query depth optimization for DISTINCT New JavaScript framework for better flexibility Cost-based optimizer default Cypher Engine Cypher HTTP Endpoint Bolt EndpointCustom Rest APOC Extensions Parser Compiled Runtime (EE) Interpreted Runtime Native Graph Engine In-Memory Page Cache Native Graph Storage Indexing ACID Cost-based Optimizer Fast Write Buffering CAPI Adapter Configuration Data Stores Logging Security High Availability Monitoring Command Line Interface Neo4j Browser Sync Custom Functions App or Community Driver Language Drivers 30
  29. 29. Graph Native Approach • Declarative query language • Human readability • Graph expressiveness • Optimizer and Query Planner for graphs • Graph metadata • Runtime metadata • Aim to work in main memory • And optimize for L2 where possible • Maximize IO performance • Graph traversals by pointer chasing
  30. 30. Design Trade-offAvailability Reliability
  31. 31. http://scienceprogress.org/wp-content/uploads/2008/04/two_way_591.jpg
  32. 32. LOVES [ { "NodeId": 0 }, { "Country": "USA" }, { "LOVES": 1, "Direction": "in" } ] [ { "NodeId": 1 }, { "Country": "UK" }, { "LOVES": 0, "Direction": "out" } ]
  33. 33. LOVES [ { "NodeId": 0 }, { "Country": "USA" }, { "LOVES": 1, "Direction": "in” "Value": "100%" } ] [ { "NodeId": 1 }, { "Country": "UK" } ]
  34. 34. For graphs: Reliability > Availability
  35. 35. Consistency models Can you read what you write? https://aphyr.com/posts/313-strong-consistency-models
  36. 36. Cluster members slightly“ahead”or“behind”of each other 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 If I query this server I won’t see the updates from transaction 11 . If I query this server, I’ll see all updates from all committed transactions 11 11 11
  37. 37. Register Login You need to login in to continue your purchase!
  38. 38. Register Login You need to login in to continue your purchase! Username: Password: Create Account
  39. 39. Register Login You need to login in to continue your purchase! Username: jim_w Password: ******** Create Account
  40. 40. Register Login You need to login in to continue your purchase! Username: Password: Login
  41. 41. Username: jim_w Password: ******** Login
  42. 42. Purchase Login Successf ul Try again No account found!Username: jim_w Password: ******** Login 𝙓
  43. 43. Username: jim_w Password: ******** A few moments later... ✓ Login
  44. 44. Purchase Login SuccessfulUsername: jim_w Password: ******** Login A few moments later... ✓
  45. 45. Q Why didn’t this work? A Eventual Consistency
  46. 46. Bookmark • Session token • String (for portability) • Opaque to application • Represents ultimate user’s most recent view of the graph • More capabilities to come
  47. 47. Let’s Build a System with Causal Consistency
  48. 48. 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 App Server A Driver Create Account
  49. 49. 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE (:User) App Server A Driver Create Account
  50. 50. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE (:User) App Server A Driver Create Account
  51. 51. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 CREATE (:User) App Server A Driver 11 Create Account 1110
  52. 52. CREATE (:User) App Server A Driver Create Account 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11 1110
  53. 53. CREATE (:User) App Server A Driver Create Account 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11 1110
  54. 54. CREATE (:User) App Server A Driver Create Account 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11 1110
  55. 55. CREATE (:User) App Server A Driver MATCH (:User) Login App Server B Driver Create Account 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 11 1110
  56. 56. 0 1 2 3 4 5 6 7 8 9 10 11 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE (:User) MATCH (:User) Login App Server A App Server B Driver Driver Create Account 1 2 3 4 5 6 7 8 9 1110
  57. 57. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE (:User) MATCH (:User) Login App Server A App Server B Driver Driver 11 Create Account 1 2 3 4 5 6 7 8 9 1110
  58. 58. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE (:User) MATCH (:User) Login App Server A App Server B Driver Driver 11 Create Account 1 2 3 4 5 6 7 8 9 1110
  59. 59. 0 1 2 3 4 5 6 7 8 9 10 11 0 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE (:User) MATCH (:User) Login App Server A App Server B Driver Driver 11 Obtain bookmark Create Account 1 2 3 4 5 6 7 8 9 1110
  60. 60. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 CREATE (:User) MATCH (:User) Login App Server A App Server B Driver Driver 11 Use bookmark Create Account 1 2 3 4 5 6 7 8 9 1110
  61. 61. Fault Tolerance must respect the graph • Neo4j’s approach maintains the necessary ACID semantics over the network • No corruption by design • Compared to corruption under non-fault operation for EC systems • Replication model maintains high performance pointer chasing • No over-the-network traversals • Compared to expensive hash lookups in others • All future versions of Neo4j will honor this too, even as the fault tolerance protocols evolve
  62. 62. Pushing Neo4j to the Limits • Asymptotic benchmarking effort for native graph tech • “What Neo4j can do when it’s pushed to its limits?” • And the results are pretty amazing
  63. 63. Traversals • Realistic retail dataset from Amazon • Commodity dual Xeon processor server • Social recommendation (Java procedure) equivalent to: MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco) WHERE id(you)={id} RETURN reco Threads Hops/second 1 3-4M 10 17-29M 20 34-50M 30 36-60M
  64. 64. Trillions! @profbriancox Read Scale • Can comfortably handle 1 trillion relationships on a single server • 24x2TB SSDs, 33TB size on disk. • Compiled Cypher query • Random reads • Sustains over 100k user transactions/sec • Even with 99.8% page faults because of modest 512GB RAM machine
  65. 65. Write Scale • Import highly connected Friendster dataset • 1.8 billion relationships takes around 20 minutes • That is 1M writes/second! Millions and billions! @profbriancox
  66. 66. >50M TRAVERSALS/SEC 1,000,000 WRITES/SEC 1,000,000,000 RECORDS
  67. 67. Comparison on a ~10M node, ~100M relationship graph Workload Non-native graph DB: 6 machines, each with 48 VCPUs, 256 GB disk and 256 GB of RAM Count nodes 201s Count outgoing rels 202s Count outgoing rels at depth 2 276s Count outgoing rels at depth 3 511s Group nodes by property val 212s Group rels by type 198s Count depth 2 knows-likes 324s Page Rank 2571s Neo4j: single thread < 1ms < 1ms 23s 423s* 8s 54s 149s* 27s*
  68. 68. Why you should care about Graph-Native Tech • Performance • Blazingly fast for graph workloads on commodity hardware • Safety • Don’t compromise graph data • Usability • Faster time to value • Ease of evolution • Better ecosystem tooling
  69. 69. Q&A @jimwebber

×