Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enterprise Ready: A Look at Neo4j in Production

This talk covers Neo4j architecture basics, helping you match Neo4j with the right technical problem. It also provides guidance for success in production and where Neo4j fits in your enterprise architecture stack.

  • Login to see the comments

Enterprise Ready: A Look at Neo4j in Production

  1. 1. 09:00-09:30 09:30-10:15 10:15-11:00 11:00-11:30 11:30-12:30 12:30-13:30 13:30-17:00 Breakfast and Registration The Connected Data Imperative: Why Graphs Transform Your Data: A Worked Example Break Enterprise Ready: A Look at Neo4j in Production Lunch Hands-On Training Session Agenda APRIL 26, 2017 SANTA CLARA
  2. 2. Key Takeaways 1. Neo4j architecture basics… to help you match up Neo4j with the right technical problem 2. Some guidelines for success in production 3. Where Neo4j fits into your enterprise architecture
  3. 3. The Right Data Technology For the Right Job Part I:
  4. 4. (Technology Selection) (Cruising)->(:TO)->
  5. 5. First Step: Align Technology with Need
  6. 6. First Step: Align Technology with Need
  7. 7. First Step: Align Technology with Need
  8. 8. Hordes of Data Hoardes of Data One Perspective on “Big Data”
  9. 9. Trending & Aggregation Finding Needles in Haystacks One Perspective on “Big Data”
  10. 10. Commodity Server Farms Cheap & Abundant Storage One Perspective on “Big Data”
  11. 11. One Perspective on “Big Data”
  12. 12. End Users = Data Specialists End Users = Systems of Interaction Latency & Freshness = Batch Latency & Freshness = Real-Time One Perspective on “Big Data”
  13. 13. Other Side of the Coin
  14. 14. End Users = Data Specialists End Users = Systems of Interaction Latency & Freshness = Batch Latency & Freshness = Real-Time
  15. 15. Discrete Data Minimally connected data Other NoSQL Relational DBMS Neo4j Graph DB Connected Data Focused on Data Relationships DBMSs Another Perspective on “Big Data”
  16. 16. Graph Graph Database Five Key Sub-Patterns (Including SQL) RDBMS TabularAggregate Oriented (3) Key-Value, Column-Family, Document Database Source: Martin Fowler NoSQL Distilled Database Management Systems
  17. 17. Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid) Connectedness Latency & Freshness Batch- Precompute Real-Time Important Dimensions in Technology Selection
  18. 18. Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid) RDBMS & Aggregate- Oriented NoSQL Hadoop / MapReduce |<———————- Graph Database & ———————>| Graph Compute Engine A View of the Data Management Portfolio
  19. 19. Latency & Freshness Batch- Precompute Real-Time Connectedness Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid) Neo4j Solves Connected, Real-Time Problems
  20. 20. End Users = Data Specialists End Users = Systems of Interaction Latency & Freshness = Batch Latency & Freshness = Real-Time A View of the Data Management Portfolio
  21. 21. Recommendations based on activity from yesterday Overnight/Intermittent Loading and Calculations Results in lag between activity & knowledge response System-wide local pre-calculations are computationally inefficient Real-Time Writes & Writes Up-to-the-moment freshness “Just-in-time” processing most efficient for “local” queries Recommendations that reflects your latest activity Batch Processing Real-Time Processing
  22. 22. Discrete Data Minimally connected data Hadoop Other NoSQL Relational DBMS Graph Database Connected Data Focused on Data Relationships Architectures for Leveraging Connectedness Designed for Discrete Lookups & Aggregation Designed for Causality & Pattern-Based Queries Architecture tradeoffs: - Data Model Richness for Volume - Performant Insight Into Connections - Data Trustability (ACID) Architecture tradeoffs: - Aggregation performance for arbitrary hop performance - “Infinite scale” for large scale index- free relationship performance
  23. 23. Distinguishing Features of a Native Graph Database Part II:
  24. 24. Intuitiveness Speed Agility Top Benefits
  25. 25. 25 A unified view for ultimate agility • Easily understood • Easily evolved • Easy collaboration between business and IT #1 Benefit: Project Agility The Whiteboard Model Is the Physical Model
  26. 26. Connectedness and Size of Data Set ResponseTime Relational and Other NoSQL Databases 0 to 2 hops 0 to 3 degrees Thousands of connections 1000x Advantage Tens to hundreds of hops Thousands of degrees Billions of connections Neo4j “Minutes to milliseconds” #2 Benefit: “Minutes to Milliseconds” Real-Time Query Performance
  27. 27. 27 Example HR Query in SQL The Same Query using Cypher MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE = “John Doe” RETURN AS Subordinate, count(report) AS Total Project Impact Less time writing queries • More time understanding the answers • Leaving time to ask the next question Less time debugging queries: • More time writing the next piece of code • Improved quality of overall code base Code that’s easier to read: • Faster ramp-up for new project members • Improved maintainability & troubleshooting Benefit #3 of 3: Query Productivity
  28. 28. Where’s the Magic?
  29. 29. At Write Time: data is connected as it is stored At Read Time: Lightning-fast retrieval of data and relationships via pointer chasing Index free adjacency Key Ingredient #1 of 3: Graph Optimized Memory & Storage
  30. 30. MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse) MARRIED_TO Dan Ann NODE RELATIONSHIP TYPE LABEL PROPERTY VARIABLE Key Ingredient #2 of 3: A Productive and Powerful Graph Query Language
  31. 31. Graph Transactions Over ACID Consistency 31 Maintains Integrity Over Time Graph Transactions Over Non-ACID DBMSs Becomes Corrupt Over Time Key Ingredient #3 of 3: ACID Graph Writes
  32. 32. “Why Neo4j”: What We Hear From Users ACID Transactions • ACID transactions with causal consistency • Neo4j Security Foundation delivers enterprise-class security and control Performance • Index-free adjacency delivers millions of hops per second • In-memory pointer chasing for fast query results Agility • Native property graph model • Modify schema as business changes without disrupting existing data Developer Productivity • Easy to learn, declarative openCypher graph query language • Procedural language extensions • Open library of procedures and functions APOC • Neo4j support and training • Worldwide developer community … all backed by Neo’s track record of leadership and product roadmap Hardware Efficiency • Native graph query processing and storage requires 10x less hardware • Index-free adjacency requires 10x less CPU
  33. 33. Recipes for Success with Neo4j Part III:
  34. 34. Confidential - Neo Technology, Inc. #1: Get to know the “Whole Product” Cloud IaaS, PaaSm, DBaaS Marketplace Companion Service Education Documents Online Training Classroom Custom Onsite 34 OSS Community Foundations LDBS, openCypher Events Forums Add-Ons Tech Ecosystem Tech Partners Graph Solutions Data Science Architecture Data Models Partners System Integrators Trainers OEMs Commercial Support Technical Support Packaged Services Custom Services
  35. 35. #2: Don’t be afraid to ask for help
  36. 36. #3: Use the technology for what it’s good for …not as a stashing ground for all data OLTP Relationships in Data Concrete Use Case
  37. 37. #4: Use the various APIs and components to your advantage Procedures - More complex imperative code - Extreme performance - APOC is your friend Cypher - Filtering & Pattern Matching - Most convenient Querying Bulk “neo4j-admin import” (1M rec/sec) Transactional - LOAD CSV - Community adapters & procedures - Roll-your-own Importing Data Community Edition - Learning - Simple projects Enterprise Edition - 24x7 - Large scale - Secure Product Edition
  38. 38. Deploying for 24x7 An Overview of Key Enterprise Features Part IV:
  39. 39. Real-time Package Routing • Large postal service with over 500k employees • Neo4j routes 7M+ packages daily at peak, with peaks of 5,000+ routing operations per second. Real-time promotion recommendations • Record “Cyber Monday” sales • About 35M daily transactions • Each transaction is 3-22 hops • Queries executed in 4ms or less • Replaced IBM Websphere commerce Real-time pricing engine • 300M pricing operations per day • 10x transaction throughput on half the hardware compared to Oracle • Presentation at • Replaced Oracle database What’s Possible
  40. 40. Neo4j 3.1 Security and Clustering Architecture Build and deploy graph applications across an entire enterprise • Compliance with internal and external enterprise Information Security needs • Robust and flexible new clustering architecture for diverse operational scenarios and application needs A foundation that enables mainstream enterprise solutions on-premises and in the cloud ENTERPRISE GRAPH FOUNDATION Operational, Analytic, and Transactional Uses Security Clustering Operability Enterprise Graph Applications 40 The Graph Foundation for the Enterprise
  41. 41. Raft-based architecture • Continuously available • Consensus commits • Third-generation cluster architecture Cluster-aware stack • Seamless integration among drivers, Bolt protocol and cluster • Eliminates need for external load balancer • Stateful, cluster-aware sessions with encrypted connections Streamlined development • Relieves developers from complex infrastructure concerns • Faster and easier to develop distributed graph applications Neo4j Causal Clustering Architecture Resilient, Modern, Fault-Tolerant. Guarantees Graph Safety. 41 ENTERPRISE EDITION
  42. 42. How Causal Clustering Works 42 Replica Servers Query, View Core Servers Synced Cluster Read Replica Read- Write Read Replica Read- WriteRead Replica Read Replica Reporting and Analysis Graph App Driver BOLT Write Read Read Replica Read Replica Read Replica Built-in load balancing • Spreads reads to core and replica servers • Spreads writes across core servers Causal consistency • Always-consistent view of data at any scale • Stronger than eventual consistency • Supports varying app SLAs • Best model for graph transactions Large heterogeneous clusters • 1000+ instance clusters • No dependence on master avoids bottleneck • Mix and match instance types App servers, reporting servers, IoT devices… ENTERPRISE EDITION
  43. 43. R E P L I C A Q U E R I E S C O R E Q U E R I E S Causal Clustering Architecture Optimizes for Cost-Consistency at Query Time Read Any 43 Read Your Own Writes Read Any Read Your Own Writes Linearizable (Future 3.x) QUORATE The Holy Grail of Distributed Systems Q U E R Y C O S T ENTERPRISE EDITION
  44. 44. How Causally Consistent Reads Work App Server Driver 3: Review Profile 4: Create an order Async Replication Raft Replication 1: Read Product Catalog Core ServerCore ServerReplica Server App Server Driver App Server Driver ENTERPRISE EDITION 2. Create Account 5: Review orders How it Works: • Application chooses a consistency level “Read Any” vs “Read your own writes” • Cluster chooses appropriate members Default optimizes for scalability (i.e. read replica server for reads) Causal Clustering Enables: • Application-driven SLAs • Optimizing for freshness vs. cost • Tunability within an application On an application & session basis 1: Read any replica | 2: Write à[Tx 101] | 3: RYOW*[Tx 101] | 4: Write à[Tx 102] | 5: RYOW [Tx 102]
  45. 45. Consistency with Causal Clustering 45 Expected Consistency Behavior Eventual Consistency Neo4j Causal Consistency Every single server is eventually updated ✔ ✔ View of related data is always consistent ✔ Users reading and re-reading data always see the same data Unless there have been intervening updates by others ✔ Users writing and updating data always see the latest data Unless there have been intervening updates by others ✔ Eventual consistency is not good enough for graphs ENTERPRISE EDITION
  46. 46. Satisfy enterprise admin and database security requirements • Flexible authentication options ActiveDirectory/LDAP or Native users • Role-based Authorization • List and kill running queries • Access controls for User-Defined Procedures Enables subgraph access control • Query logging and Security event logging Passes through originating end user • Extendable Auth plugin Architecture Kerberos support coming soon! 46 Enables Sarbanes-Oxley, HIPAA, PCI-DSS, et al Neo4j Security Foundation Enterprise-Class Security and Control P R E D E F I N E D R O L E S Privileges Reader Publisher Architect Admin Change own password • • • • Read data • • • • View own details • • • • Terminate own query • • • • Write/update/delete data • • • Manage index/constraints • • Terminate others’ queries • ENTERPRISE EDITION
  47. 47. Neo4j Deployment Success Program End-to-end Neo4j support throughout the project lifecycle • Tailored expert advice to guide you all the way through to deployment • Ensures you are successful with Neo4j Dedicated Neo4j Expert • Design & Product Manager advice to avoid common mistakes • Topology advice to get you to production • Provide expert best practice guidance Deployment Success Engagement • Proactive solution review throughout the project’s lifetime • Continuous delivery of knowledge as we progress Sustained Customer Success
  48. 48. Admin Query Monitoring 48 List all running queries with :qs (soon to be :queries) List query string with parameters and transaction metadata Users can only see and terminate their own queries Terminate selected query Admins can view and terminate all running queries across the cluster Track elapsed time for queries
  49. 49. Coming Soon! Neo4j 3.2 • Multi Data Center • Even Faster Reads & Writes • More Schema Constraints • Add-on for Kerberos • Query Monitoring Improvements • And more…!
  50. 50. Thanks! Stay Connected
  51. 51. 09:00-09:30 09:30-10:15 10:15-11:00 11:00-11:30 11:30-12:30 12:30-13:30 13:30-17:00 Breakfast and Registration The Connected Data Imperative: Why Graphs Transform Your Data: A Worked Example Break Enterprise Ready: A Look at Neo4j in Production Lunch Hands-On Training Session Agenda APRIL 26, 2017 SANTA CLARA
  52. 52. Some Perspective We are still here Journeying to here
  54. 54. A way of representing data DATA DATA
  55. 55. Relational Database Good for: • Well-understood data structures that don’t change too frequently A way of representing data • Known problems involving discrete parts of the data, or minimal connectivity DATA
  56. 56. Graph Database Relational Database A way of representing data Good for: • Dynamic systems: where the data topology is difficult to predict • Dynamic requirements: the evolve with the business • Problems where the relationships in data contribute meaning & value Good for: • Well-understood data structures that don’t change too frequently • Known problems involving discrete parts of the data, or minimal connectivity
  57. 57. Access to Knowledge BaseDirect Line to Support
  58. 58. Graph is easy to learn, hard to master • Common issues your team will hit • Underestimate graph complexity • Complaints of slow queries • Undersized hardware, especially memory, but also CPU • Ambitious number of future nodes • Bad scaling topology / architecture assumptions • Disappointing ‘Write’ speed • Deep analytics mismatch • You still need your 10,000 hours • 8760 hours in a year, so depending on how long you sleep, 5-7 years.
  59. 59. World Class 24/7 Support Neo4j Enterprise Support Access to Knowledge Base & User Forums Easy Access Support Portal and Lifecycle to track and manage issues Prioritized Fixes to product issues Agreements designed to fit any project demand NET PROMOTER SCORE 92% NPS Source: -is-leading-neo-technologys-new-20million-series-c/ ENTERPRISE-CLASS SUPPORT
  60. 60. Enterprise Scale for Global Internet Applications • Causal Clustering works across data centers • Cores can be spread out across DCs • 1 leader, all followers, consensus commits • Read-follows-writes still ensured • Subclusters for speedier local activity • Replica can hierarchically map to local replicas • Cluster API-level control for developers • Cloud delivery via Azure and AWS EC2 Reston Data Center UK Data Center
  61. 61. Native Performance Improvements • Label Indexes added to speed inserts, updates and deletes • Compound indexes to improve operational speeds • Cypher’s depth query in “DISTINCT” function has been dramatically improved by eliminating repetitious traversals through deep levels. • Common Cypher queries can be compiled to improve performance • Improved performance of Neo4j browser with new JavaScript framework
  62. 62. Production Governance Improvements Neo4j is “Enterprise-Obedient” • Node Keys are now available as schema constraint • now specify keys for any label • This helps assure the integrity of your graph by enforcing existence and guaranteeing uniqueness • especially useful for applications exchanging or importing data from across multiple data sources. • Kerberos encrypted authentication module add-on • Supports 3-tier integration with client, directory and database • CAPI-Flash hardware from IBM Power8 add-on • Role-based control of queries in Query Monitor
  63. 63. HDFS/MapReduce/Spark (Storage & Aggregation) Streaming (Filtering & Aggregation) Machine LearningGraph Computation Software for “Big Data”
  64. 64. Interoperability
  65. 65. Extensible
  66. 66. Write Scale 66 One million writes per second! Import 1.8 billion highly connected relationships
  67. 67. Neo4j: Optimized for Performance Cost-based optimizer Optimizes Cypher queries to traverse the graph in the most efficient way Computed statistics Exact statistics enable efficient costing, and instant query responses for counts and groupings Binary wire protocol High-performance Bolt protocol used by official Neo4j language drivers Native graph API Enables low-level access to the graph, for hand-tuned levels of performance Neo4j Advantage – Developer productivity
  68. 68. Neo4j Scalability Dynamic pointer compression Unlimited-sized graphs with no performance compromise Index partitioning Auto-partitioning of indexes into 2GB partitions Causal clustering architecture Enables unlimited read scaling with ACID writes and a choice of consistency levels Efficient processing Native graph processing and storage often requires 10x less hardware Efficient storage One-tenth the disk and memory requirements of certain alternatives Neo4j Advantage – Massive Scalability