Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j

Neo4j Vertical Summit Telecommunications
Stefan Kolmar, Neo4j

  • Be the first to comment

  • Be the first to like this

Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j

  1. 1. Scalability and Graph Analytics with Neo4j Stefan Kolmar VP Field Engineering - Neo4j
  2. 2. I Remember...
  3. 3. The Evolution of Databases
  4. 4. The Evolution of Databases TRADITIONAL OLTP/RELATIONAL
  5. 5. The Evolution of Databases TRADITIONAL OLTP/RELATIONAL BIG DATA TECHNOLOGY
  6. 6. The Evolution of Databases TRADITIONAL OLTP/RELATIONAL BIG DATA TECHNOLOGY
  7. 7. The Evolution of Databases TRADITIONAL OLTP/RELATIONAL BIG DATA TECHNOLOGY
  8. 8. The classic challenges for Telco’s Large Data Volumes CDRs Network Metrics Customer Metrics
  9. 9. The classic challenges for Telco’s Large Data Volumes CDRs Network Metrics Customer Metrics Dynamic Access Dynamic Access
  10. 10. What Is Different in Neo4j? Index-Free Adjacency
  11. 11. Connectedness and Size of Data Set ResponseTime Relational and Other NoSQL Databases 0 to 2 hops 0 to 3 degrees Thousands of connections 1000x Advantage Tens to hundreds of hops Thousands of degrees Billions of connections Neo4j “Minutes to milliseconds”
  12. 12. The Largest Investment in Graph Databases
  13. 13. Multi-tenancy with Neo4j 4.0
  14. 14. • B2B SaaS: Greatly simplified management of DB infrastructure for your customers. • Multi-tenancy: A single instance of Neo4j Server/Cluster may serve multiple customers/users within an organization. • Rapid Testing/Development/Deployment: Manage separate databases for development, testing, staging, etc. in a single infrastructure. • Scalability: Disjoint data is organized in physically separate structures, strong isolation. • Cloud-Friendly: Databases can be associated to cloud storage and easily detached from a server and attached to another server. Multi-Database: Use Cases
  15. 15. Administration commands: ● CREATE|DROP|START|STOP DATABASE name Use commands: ● HTTP API: http://server:port/.../database ● Browser & Cypher Shell: :USE database ● Drivers: Session(database) ● Browser: Configure and Manage Neo4j Multi-Database Network Mgmt Customer Relations
  16. 16. Unbounded Scalability in Neo4j 4.0
  17. 17. Causal Clustering with Neo4j
  18. 18. • Scale-out model • Two ways of using: • Operate over single large, decomposed graph • Query across disjoint graphs, per business domain Data Scientists Run analysis on large, distributed databases. Developers Develop large scale applications on laptops/desktops and deploy in a network of Neo4j clusters. Enterprises Keep data in designated geographies Analyse graphs without replicating or moving them. Fabric: Distributed Graph Query
  19. 19. Cypher Queries SQL Cypher in Neo4j MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) RETURN boss.name AS Boss, sub.name AS Subordinate, count(report) AS Total
  20. 20. Multi-graph Cypher Queries SQL UNWIND corporate.graphIds() AS gid CALL { USE corporate.graph( gid ) MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) RETURN boss.name AS Boss, sub.name AS Subordinate, count(report) AS Total } RETURN Boss, Subordinate, Total ORDER BY Total Cypher in Neo4j 4.0 • Executes queries in parallel on multiple databases, combining or aggregating results. • Chains queries together from multiple databases for sophisticated real-time analyses.
  21. 21. The foundation: Causal Cluster How will this help a Telco to scale? The evolution: Fabric Large Data Volumes CDRs Network Metrics Customer Metrics Large Data Volumes CDRs Network Metrics Customer Metrics Large Data Volumes CDRs Network Metrics Customer Metrics Scaling R/W access
  22. 22. The foundation: Causal Cluster How will this help a Telco to scale? The evolution: Fabric Large Data Volumes CDRs Network Metrics Customer Metrics Large Data Volumes CDRs Network Metrics Customer Metrics Large Data Volumes CDRs Network Metrics Customer Metrics Scaling R/W access
  23. 23. NEO4J DBMSuser NEO4J DBMS CLUSTER A CORE 1 CORE 3CORE 2 REPLICA 1 REPLICA 2 CLUSTER B CORE 1 CORE 3CORE 2 NM1 Network Metrics Network Metrics NM2 NM1 NM2 NM1 NM2 NM3 NM3 NM3 NM3 NM3
  24. 24. http://ldbcouncil.org/developer/snb and https://neo4j.com/fosdem20 Neo4j 4.0 Scalability in Action Sharding the LDBC Social Network Benchmark Data Model
  25. 25. http://ldbcouncil.org/developer/snb and https://neo4j.com/fosdem20 Neo4j 4.0 Scalability in Action Sharding the LDBC Social Network Benchmark • 1-shard for the Persons graph • N-shards for the Forums graph
  26. 26. http://ldbcouncil.org/developer/snb and https://neo4j.com/fosdem20 Neo4j 4.0 Scalability in Action Sharding the LDBC Social Network Benchmark Up to 300x reduced latency Up to 10x Performance improvement
  27. 27. Scalability → Security?
  28. 28. BobJoe • Based on Role-based Access Control for graphs • Restrictions on what data can be seen by different users, applied to all database interactions • Implicit security view of the data for each user through schema-based security definitions • Grant/Deny permissions to traverse, read or write data based on node labels, relationship types or database and property names • Security rules are replicated across the cluster via roles that are associated with the users Security and Data Privacy Baseline_Personnel _Security_Standard Security_Check Counter_Terrorism _Check Developed_Vetting
  29. 29. Security and Data Privacy in Practice
  30. 30. • Call Centre Agent: -> needs Doctor’s name -> not allowed to read diagnosis • Doctor: -> ability to view patient records and -> ability to view patient diagnoses Constraints
  31. 31. // Doctors get wide-ranging access GRANT ACCESS ON DATABASE healthcare TO doctor; GRANT TRAVERSE {*} ON GRAPH healthcare TO doctor; GRANT READ {*} ON GRAPH healthcare TO doctor; GRANT WRITE ON GRAPH healthcare TO doctor; Security Config // Agents get narrower access GRANT ACCESS ON DATABASE healthcare TO agent; GRANT TRAVERSE {*} ON GRAPH healthcare TO agent; GRANT READ {Name} ON GRAPH healthcare NODES Doctor TO agent; GRANT READ {Name} ON GRAPH healthcare NODES Patient TO agent;
  32. 32. Call Centre Agent MATCH (:CallcenterAgent {name: 'Alice'}) <-[:CALLED]-(p:Patient)-[:HAS_DIAGNOSIS]-(dia) <-[:ESTABLISHED]-(d:Doctor) RETURN p.name, d.name, dia.name;
  33. 33. Reactive Architecture Neo4j 4.0
  34. 34. • Flow control throughout the stack, allowing for the client application to fully control the production and flow of records within a result • Synchronous/Asynchronous execution • Based on reactive streams with non-blocking backpressure library • Client applications can pull or discard the whole result or N elements • Can also be gracefully cancelled • Exposed through a reactive API in Drivers v4.0 • Use Cases: • Long queries with large result sets • Paged results • Thin/small clients Reactive Architecture
  35. 35. Graph Recipes & Analytics Graph Enhanced ML & AI Graph Data Science Science-driven approach to gain knowledge from the relationships and structures in data, typically to power predictions. Uses multi-disciplinary workflows that may include queries, statistics, algorithms and machine learning. ` Answers specific questions to gain insights from connections in existing/historical data Approaches typically include global queries and algorithms and direct use of results Training models (ML) with graph structured data to be used to emulate human, probabilistic decisions within a solution/ application (AI system)
  36. 36. Optimized for Analytics Leverage custom data structures optimized for global traversals and aggregation Flexibly decompose and reshape your graph for specific use cases Algorithms for Insights Robust algorithms that are highly parallelized and scale to billions of nodes Early access to dozens of experimental implementations Intuitive Interface Drastically simplified and standardized API that enables custom configurations Documentation, training, and examples so getting started is simple Product Supported & Under Active Development The Graph Data Science Library
  37. 37. Graph Data Science Analytics projections: - Specialized data structure for algorithms, capable of supporting billions of nodes - Cypher loaders for experimentation - Quickly reshape, combine, aggregate, and deduplicate your transactional data - Support for multiple node labels, relationship types, and properties - Manage multiple in-memory analytics graphs for different workloads - Memory footprint allowing large scale use Graph algorithms & more: - 40+ algorithms in 5 categories: community, centrality, similarity, pathfinding, and link prediction - Helper algorithms like graph generation, one hot encoding, and random walk - Early previews to new implementations in the alpha & beta name spaces - Supported, scalable algorithms include seeding, determinism, and incremental calculations - Estimate mode for memory requirements
  38. 38. Graph Data Science Algorithms Generally Unsupervised 38 A subset of data science algorithms that come from network science, Graph Algorithms enable reasoning about network structure. Pathfinding and Search Centrality (Importance) Community Detection Heuristic Link Prediction Similarity
  39. 39. • Neo4j provides • Scalability for Telco’s • Carrier grade high availability with Causal Cluster • Security features to fulfill privacy requirements • Graph Analytics to provide Data Science infrastructure for Telcos Conclusions
  40. 40. Scalability and Graph Analytics with Neo4j Stefan Kolmar VP Field Engineering - Neo4j

×