Enterprise Ready: A Look at Neo4j in Production

09:00-09:30
09:30-10:15
10:15-11:00
11:00-11:30
11:30-12:30
12:30-13:30
13:30-17:00
Breakfast and Registration
The Connected Data Imperative: Why
Graphs
Transform Your Data: A Worked Example
Break
Enterprise Ready: A Look at
Neo4j in Production
Lunch
Hands-On Training Session
Agenda
APRIL 26, 2017
SANTA CLARA

Key Takeaways
1. Neo4j architecture basics… to help you match up
Neo4j with the right technical problem
2. Some guidelines for success in production
3. Where Neo4j fits into your enterprise architecture

The Right Data Technology
For the Right Job
Part I:

(Technology
Selection)
(Cruising)->(:TO)->

First Step:
Align Technology with Need

Hordes of Data Hoardes of Data
One Perspective on “Big Data”

Trending & Aggregation Finding Needles in Haystacks

Commodity Server Farms Cheap & Abundant
Storage

End Users =
Data
Specialists
End Users =
Systems of
Interaction
Latency &
Freshness =
Batch
Latency &
Freshness =
Real-Time

Discrete Data
Minimally
connected data
Other NoSQL Relational DBMS Neo4j Graph DB
Connected Data
Focused on
Data Relationships
DBMSs
Another Perspective on “Big Data”

Graph
Graph Database
Five Key Sub-Patterns (Including SQL)
RDBMS
TabularAggregate Oriented (3)
Key-Value, Column-Family,
Document Database
Source: Martin Fowler NoSQL Distilled
Database Management Systems

Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid)
Connectedness
Latency &
Freshness
Batch-
Precompute
Real-Time
Important Dimensions in
Technology Selection

RDBMS
&
Aggregate-
Oriented NoSQL
Hadoop /
MapReduce
|<———————- Graph Database & ———————>|
Graph Compute Engine
A View of the Data Management Portfolio

Latency &
Freshness Batch-
Precompute
Real-Time
Connectedness
Neo4j Solves Connected, Real-Time Problems

End Users =
Data
Specialists
End Users =
Systems of
Interaction
Latency &
Freshness =
Batch
Latency &
Freshness =
Real-Time
A View of the Data Management Portfolio

Recommendations
based on activity
from yesterday
Overnight/Intermittent
Loading and Calculations
Results in lag between activity
& knowledge response
System-wide local pre-calculations
are computationally inefficient
Real-Time Writes &
Writes
Up-to-the-moment freshness
“Just-in-time” processing
most efficient for “local” queries
Recommendations
that reflects your
latest activity
Batch Processing Real-Time Processing

Discrete Data
Minimally
connected data
Hadoop
Other NoSQL
Relational DBMS Graph Database
Connected Data
Focused on
Data Relationships
Architectures for Leveraging Connectedness
Designed for
Discrete Lookups &
Aggregation
Designed for Causality &
Pattern-Based Queries
Architecture tradeoffs:
- Data Model Richness for Volume
- Performant Insight Into Connections
- Data Trustability (ACID)
Architecture tradeoffs:
- Aggregation performance for
arbitrary hop performance
- “Infinite scale” for large scale index-
free relationship performance

Distinguishing Features of a
Native Graph Database
Part II:

Intuitiveness
Speed
Agility
Top Benefits

25
A unified view for
ultimate agility
• Easily understood
• Easily evolved
• Easy collaboration
between business
and IT
#1 Benefit: Project Agility
The Whiteboard Model Is the Physical Model

Connectedness and Size of Data Set
ResponseTime
Relational and
Other NoSQL
Databases
0 to 2 hops
0 to 3 degrees
Thousands of connections
1000x
Advantage
Tens to hundreds of hops
Thousands of degrees
Billions of connections
Neo4j
“Minutes to
milliseconds”
#2 Benefit:
“Minutes to Milliseconds” Real-Time Query Performance

27
Example HR Query in SQL The Same Query using Cypher
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = “John Doe”
RETURN sub.name AS Subordinate,
count(report) AS Total
Project Impact
Less time writing queries
• More time understanding the answers
• Leaving time to ask the next question
Less time debugging queries:
• More time writing the next piece of code
• Improved quality of overall code base
Code that’s easier to read:
• Faster ramp-up for new project members
• Improved maintainability & troubleshooting
Benefit #3 of 3: Query Productivity

At Write Time:
data is connected
as it is stored
At Read Time:
Lightning-fast retrieval of data and relationships
via pointer chasing
Index free adjacency
Key Ingredient #1 of 3:
Graph Optimized Memory & Storage

MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse)
MARRIED_TO
Dan Ann
NODE RELATIONSHIP TYPE
LABEL PROPERTY VARIABLE
A Productive and Powerful Graph Query Language

Graph Transactions Over
ACID Consistency
31
Maintains Integrity Over Time
Graph Transactions Over
Non-ACID DBMSs
Becomes Corrupt Over Time
ACID Graph Writes

“Why Neo4j”: What We Hear From Users
ACID Transactions
• ACID transactions with causal
consistency
• Neo4j Security Foundation delivers
enterprise-class security and control
Performance
• Index-free adjacency delivers millions
of hops per second
• In-memory pointer chasing for fast
query results
Agility
• Native property graph model
• Modify schema as business changes
without disrupting existing data
Developer Productivity
• Easy to learn, declarative openCypher
graph query language
• Procedural language extensions
• Open library of procedures and
functions APOC
• Neo4j support and training
• Worldwide developer community
… all backed by Neo’s track record of
leadership and product roadmap
Hardware Efficiency
• Native graph query processing and storage
requires 10x less hardware
• Index-free adjacency requires 10x less CPU

Recipes for Success
with Neo4j
Part III:

Confidential - Neo Technology, Inc.
#1: Get to know the
“Whole Product”
Cloud
IaaS, PaaSm, DBaaS
Marketplace
Companion Service
Education
Documents
Online Training
Classroom
Custom Onsite
34
OSS
Community
Foundations
LDBS, openCypher
Events
Forums
Add-Ons
Tech
Ecosystem
Tech Partners
Graph Solutions
Data Science
Architecture
Data Models
Partners
System Integrators
Trainers
OEMs
Commercial
Support
Technical Support
Packaged Services
Custom Services

#2: Don’t be afraid to ask for help

#3: Use the technology for what it’s good for
…not as a stashing ground for all data
OLTP
Relationships in
Data
Concrete
Use Case

#4: Use the various APIs and components
to your advantage
Procedures
- More complex
imperative code
- Extreme performance
- APOC is your friend
Cypher
- Filtering & Pattern
Matching
- Most convenient
Querying
Bulk
“neo4j-admin import”
(1M rec/sec)
Transactional
- LOAD CSV
- Community adapters &
procedures
- Roll-your-own
Importing Data
Community
Edition
- Learning
- Simple projects
Enterprise
Edition
- 24x7
- Large scale
- Secure
Product Edition

Deploying for 24x7
An Overview of Key Enterprise Features
Part IV:

Real-time Package Routing
• Large postal service with over
500k employees
• Neo4j routes 7M+ packages daily
at peak, with peaks of 5,000+
routing operations per second.
Real-time promotion recommendations
• Record “Cyber Monday” sales
• About 35M daily transactions
• Each transaction is 3-22 hops
• Queries executed in 4ms or less
• Replaced IBM Websphere commerce
Real-time pricing engine
• 300M pricing operations per day
• 10x transaction throughput on half
the hardware compared to Oracle
• Presentation at
http://graphconnect.com/gc2016-sf/
• Replaced Oracle database
What’s Possible

Neo4j 3.1
Security and Clustering Architecture
Build and deploy graph applications across
an entire enterprise
• Compliance with internal and external
enterprise Information Security needs
• Robust and flexible new clustering
architecture for diverse operational
scenarios and application needs
A foundation that enables mainstream
enterprise solutions on-premises and
in the cloud
ENTERPRISE GRAPH FOUNDATION
Operational, Analytic, and Transactional Uses
Security Clustering Operability
Enterprise
Graph Applications
40
The Graph Foundation for the Enterprise

Raft-based architecture
• Continuously available
• Consensus commits
• Third-generation cluster architecture
Cluster-aware stack
• Seamless integration among drivers,
Bolt protocol and cluster
• Eliminates need for external load balancer
• Stateful, cluster-aware sessions with
encrypted connections
Streamlined development
• Relieves developers from complex infrastructure concerns
• Faster and easier to develop distributed graph applications
Neo4j Causal Clustering Architecture
Resilient, Modern, Fault-Tolerant. Guarantees Graph Safety.
41 ENTERPRISE EDITION

How Causal Clustering Works
42
Replica Servers
Query, View
Core Servers
Synced Cluster
Read
Replica
Read-
Write
Read
Replica
Read-
WriteRead
Replica
Read Replica
Reporting
and Analysis
Graph
App
Driver
BOLT
Write
Read
Read
Replica
Read
Replica
Read
Replica
Built-in load balancing
• Spreads reads to core and replica servers
• Spreads writes across core servers
Causal consistency
• Always-consistent view of data at any scale
• Stronger than eventual consistency
• Supports varying app SLAs
• Best model for graph transactions
Large heterogeneous clusters
• 1000+ instance clusters
• No dependence on master
avoids bottleneck
• Mix and match instance types
App servers, reporting servers,
IoT devices…
ENTERPRISE EDITION

R E P L I C A Q U E R I E S C O R E Q U E R I E S
Causal Clustering Architecture Optimizes for
Cost-Consistency at Query Time
Read
Any
43
Read
Your Own
Writes
Read
Any
Read
Your Own
Writes
Linearizable
(Future 3.x)
QUORATE
The Holy Grail
of Distributed
Systems
Q U E R Y C O S T
ENTERPRISE EDITION

How Causally Consistent Reads Work
App Server
Driver
3: Review
Profile 4: Create
an order
Async
Replication
Raft
Replication
1: Read
Product
Catalog
Core ServerCore ServerReplica Server
App Server
Driver
App Server
Driver
ENTERPRISE EDITION
2. Create
Account
5: Review
orders How it Works:
• Application chooses a consistency level
“Read Any” vs “Read your own writes”
• Cluster chooses appropriate members
Default optimizes for scalability
(i.e. read replica server for reads)
Causal Clustering Enables:
• Application-driven SLAs
• Optimizing for freshness vs. cost
• Tunability within an application
On an application & session basis
1: Read any replica | 2: Write à[Tx 101] | 3: RYOW*[Tx 101] | 4: Write à[Tx 102] | 5: RYOW [Tx 102]

Consistency with Causal Clustering
45
Expected Consistency Behavior
Eventual
Consistency
Neo4j Causal
Consistency
Every single server is eventually updated ✔ ✔
View of related data is always consistent
✔
Users reading and re-reading data always see the same data
Unless there have been intervening updates by others ✔
Users writing and updating data always see the latest data
Unless there have been intervening updates by others ✔
Eventual consistency is not good enough for graphs
ENTERPRISE EDITION

Satisfy enterprise admin and database
security requirements
• Flexible authentication options
ActiveDirectory/LDAP or Native users
• Role-based Authorization
• List and kill running queries
• Access controls for User-Defined Procedures
Enables subgraph access control
• Query logging and Security event logging
Passes through originating end user
• Extendable Auth plugin Architecture
Kerberos support coming soon!
46
Enables
Sarbanes-Oxley,
HIPAA, PCI-DSS, et al
Neo4j Security Foundation
Enterprise-Class Security and Control
P R E D E F I N E D R O L E S
Privileges Reader Publisher Architect Admin
Change own password • • • •
Read data • • • •
View own details • • • •
Terminate own query • • • •
Write/update/delete data • • •
Manage index/constraints • •
Terminate others’ queries •
ENTERPRISE EDITION

Neo4j Deployment Success Program
End-to-end Neo4j support throughout the project lifecycle
• Tailored expert advice to guide you all the way through to
deployment
• Ensures you are successful with Neo4j
Dedicated Neo4j
Expert
• Design & Product Manager advice to avoid common mistakes
• Topology advice to get you to production
• Provide expert best practice guidance
Deployment Success
Engagement
• Proactive solution review throughout the project’s lifetime
• Continuous delivery of knowledge as we progress
Sustained Customer
Success

Admin Query Monitoring
48
List all running
queries with :qs
(soon to be :queries)
List query string with
parameters and
transaction metadata
Users can only see and
terminate their own queries
Terminate
selected query
Admins can view and terminate all
running queries across the cluster
Track elapsed time
for queries

Coming Soon!
Neo4j 3.2
• Multi Data Center
• Even Faster Reads & Writes
• More Schema Constraints
• Add-on for Kerberos
• Query Monitoring Improvements
• And more…!

Some Perspective
We are
still here
Journeying
to here

A way of representing data
DATA DATA

Relational
Database
Good for:
• Well-understood data structures
that don’t change too frequently
• Known problems involving
discrete parts of the data, or
minimal connectivity
DATA

Graph
Database
Relational
Database
Good for:
• Dynamic systems: where the data
topology is difficult to predict
• Dynamic requirements:
the evolve with the business
• Problems where the relationships
in data contribute meaning & value
Good for:
• Well-understood data structures
that don’t change too frequently
• Known problems involving
discrete parts of the data, or
minimal connectivity

Access to Knowledge BaseDirect Line to Support

Graph is easy to learn, hard to master
• Common issues your team will hit
• Underestimate graph complexity
• Complaints of slow queries
• Undersized hardware, especially memory, but also CPU
• Ambitious number of future nodes
• Bad scaling topology / architecture assumptions
• Disappointing ‘Write’ speed
• Deep analytics mismatch
• You still need your 10,000 hours
• 8760 hours in a year, so depending on how long you sleep, 5-7 years.

World Class 24/7 Support
Neo4j Enterprise Support
Access to Knowledge Base & User Forums
Easy Access Support Portal and Lifecycle to track
and manage issues
Prioritized Fixes to product issues
Agreements designed to fit any project demand
NET PROMOTER
SCORE
92%
NPS Source: http://www.creandum.com/some-thoughts-on-why-creandum
-is-leading-neo-technologys-new-20million-series-c/
ENTERPRISE-CLASS SUPPORT

Enterprise Scale for Global Internet Applications
• Causal Clustering works across data centers
• Cores can be spread out across DCs
• 1 leader, all followers, consensus commits
• Read-follows-writes still ensured
• Subclusters for speedier local activity
• Replica can hierarchically map to local replicas
• Cluster API-level control for developers
• Cloud delivery via Azure and AWS EC2
Reston Data Center
UK Data Center

Native Performance Improvements
• Label Indexes added to speed inserts, updates and deletes
• Compound indexes to improve operational speeds
• Cypher’s depth query in “DISTINCT” function has been dramatically
improved by eliminating repetitious traversals through deep levels.
• Common Cypher queries can be compiled to improve performance
• Improved performance of Neo4j browser with new JavaScript
framework

Production Governance Improvements
Neo4j is “Enterprise-Obedient”
• Node Keys are now available as schema constraint
• now specify keys for any label
• This helps assure the integrity of your graph by enforcing existence and
guaranteeing uniqueness
• especially useful for applications exchanging or importing data from across
multiple data sources.
• Kerberos encrypted authentication module add-on
• Supports 3-tier integration with client, directory and database
• CAPI-Flash hardware from IBM Power8 add-on
• Role-based control of queries in Query Monitor

HDFS/MapReduce/Spark
(Storage & Aggregation)
Streaming
(Filtering & Aggregation)
Machine LearningGraph Computation
Software for “Big Data”

Write Scale
66
One million writes per second!
Import 1.8 billion highly connected relationships

Neo4j: Optimized for Performance
Cost-based optimizer
Optimizes Cypher queries to traverse the
graph in the most efficient way
Computed statistics
Exact statistics enable efficient costing,
and instant query responses for counts
and groupings
Binary wire protocol
High-performance Bolt protocol used by
official Neo4j language drivers
Native graph API
Enables low-level access to the graph, for
hand-tuned levels of performance
Neo4j Advantage – Developer productivity

Neo4j Scalability
Dynamic pointer compression
Unlimited-sized graphs with no
performance compromise
Index partitioning
Auto-partitioning of indexes into
2GB partitions
Causal clustering architecture
Enables unlimited read scaling
with ACID writes and a choice
of consistency levels
Efficient processing
Native graph processing and storage
often requires 10x less hardware
Efficient storage
One-tenth the disk and memory
requirements of certain alternatives
Neo4j Advantage – Massive Scalability

Enterprise Ready: A Look at Neo4j in Production

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Enterprise Ready: A Look at Neo4j in Production

Similar to Enterprise Ready: A Look at Neo4j in Production (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Enterprise Ready: A Look at Neo4j in Production