Dr. Jim Webber discusses future of graph databases and Neo4j 3.1

Closing Keynote
Dr. Jim Webber
Chief Scientist, Neo4j

Brandine! Ahm
gonna do me
some say-ence

Bachmanity
in VENTURE-BACKED
graph database company?
@technige

Graph Insanity**
Dr. Jim Webber
Chief Scientist, Neo4j
** = “insanity” in this context refers to scientifically responsible jubilation

Overview
• A brief recap
• Future hardware trends
• Performance advantages of native graph
technology
• Looking to the future
• Drinks

Neo4j 3.0 Recap APRIL 2016 RELEASE
Delivering New Graph Capabilities
Developers
Develop applications
faster and easier
Architects
Design bigger and
faster applications
Administrators
Deploy Neo4j
anywhere easily
Neo4j 3.0 enables and accelerates large-scale graph initiatives
Giant graphs,
fast performance
Easy full-stack
development
Cloud, container
and on-premise
11

Introducing Neo4j 3.1
New Security and Clustering Architecture
Build and deploy graph applications across
an entire enterprise
• Compliance with internal and external
enterprise Information Security needs
• Robust and flexible new clustering
architecture for diverse operational
scenarios and application needs
A foundation that enables mainstream
enterprise solutions on-premises and
in the cloud
ENTERPRISE GRAPH FOUNDATION
Operational, Analytic, and Transactional Uses
Security Clustering Operability
Enterprise
Graph Applications
12
The Graph Foundation for the Enterprise

Neo4j 3.1 Highlights
Security
Foundation
Database
Kernel and
Operations
Advances
13
IBM Power8
CAPI Flash
Support
Schema
Viewer
Causal
Clustering
State-of-the-Art
Cluster
Architecture

Elephant – large single memory

By ‘eck. When I were a
lad, we ‘ad 20 megabyte
spinning disks and, aye,
we were glad o’ it.

native
non-native
Native Algorithmic and Mechanical Efficiency

Neo4j IBM POWER8 CAPI Flash
20
• Enables ultra-large in-memory graphs
• High performance, ultra-high
throughput graph processing on
56TB of near memory
• IBM CAPI Flash is a specialized IO
co-processor that provides IO gains
similar to GPUs for graphics
Significant improvements in
concurrency and scale

Pushing Neo4j to the Limits
• Asymptotic benchmarking effort
• “What Neo4j can do when it’s pushed to
its limits?”
• And the results are pretty amazing
This is our CTO Johan, please talk to
him. He’s totally a people person.

Traversals
• Realistic retail dataset from Amazon
• Commodity dual Xeon processor server
• Social recommendation (Java procedure) equivalent to:
MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco)
WHERE id(you)={id}
RETURN reco
Threads Hops/second
1 3-4M
10 17-29M
20 34-50M
30 36-60M

Trillions!
@profbriancox
Read Scale
• Can comfortably handle 1 trillion
relationships on a single server
• 24x2TB SSDs, 33TB size on disk.
• Compiled Cypher query
• Random reads
• Sustains over 100k user
transactions/sec
• Even with 99.8% page faults because
of small 512G RAM machine

Write Scale
• Import highly connected
Friendster dataset
• 1.8 billion relationships
takes around 20 minutes
• That is 1M writes/second!
Millions and
billions!
@profbriancox

https://crdurant26.files.wordpress.com/2015/02/boom.jpg

>50M traversals/sec
1,000,000 writes/sec
1,000,000,000 Records

Graph-native advantages
• Prioritize graph workloads
• Adapt at any point in the stack for graphs
• Disks, RAM, NVRAM, Coprocessors, RDMA, drivers,
query language, consensus protocol…
• Non-native approaches will adapt for their
primary use case
• Columns, documents

Comparison on a ~10M node, ~100M relationship graph
Workload Non-native graph DB: 6 machines, each with
48 VCPUs, 256 GB disk and 256 GB of RAM
Count nodes 201s
Count outgoing rels 202s
Count outgoing rels at depth 2 276s
Count outgoing rels at depth 3 511s
Group nodes by property val 212s
Group rels by type 198s
Count depth 2 knows-likes 324s
Page Rank 2571s
Neo4j: single thread
< 1ms
< 1ms
23s
423s*
8s
54s
149s*
27s*

• Consider the possum
• What’s that Emil? If I bring another animal into this venue, this keynote is
over?
• (emil’s head saying those words?)

Raft-based architecture
• Continuously available
• Consensus commits
• Third-generation cluster architecture
Cluster-aware stack
• Seamless integration among drivers,
Bolt protocol and cluster
• Eliminates need for external load balancer
• Cluster-aware sessions with encrypted
connections
Streamlined development
• Relieves developers from complex infrastructure concerns
• Faster and easier to develop distributed graph applications
Neo4j Causal Clustering Architecture
Fault-Tolerant and Scalable.
31 ENTERPRISE EDITION

How Causal Clustering Works
32
Replica Servers
Query, View
Core Servers
Synced Cluster
Read
Replica
Read-
Write
Read
Replica
Read-
WriteRead
Replica
Read Replica
Reporting
and Analysis
Graph
App
Driver
BOLT
Write
Read
Read
Replica
Read
Replica
Read
Replica
Built-in load balancing
• Spreads reads to core and replica servers
• Directs writes to core servers
Causal consistency
• Always-consistent view of data at any scale
• Stronger than eventual consistency
• Best model for graphs:
• Reliability >> Availability
Large heterogeneous clusters
• Non-blocking & asynchronous
protocols
• Mix and match instance types
App servers, reporting servers,
IoT devices…
ENTERPRISE EDITION

R E P L I C A Q U E R I E S C O R E Q U E R I E S
Causal Clustering Architecture Optimizes for
Cost-Consistency at Query Time
Read
Any
33
Read
Your Own
Writes
Read
Any
Read
Your Own
Writes
Linearizable
(Future 3.x)
QUORATE
The Holy Grail
of Distributed
Systems
Q U E R Y C O S T
ENTERPRISE EDITION

Causal Clustering Topology Awareness
• Today cluster round-robin load balances based on
consistency level
• Defaults to a core instance for writes, a read-replica for reads
• Tomorrow cluster will load balance by:
• Network topology
• Geography
• Bandwidth
• Server load
• Server capacity
• User preference
• Etc.

Efficient Fan-Out for Very Large Clusters
• Replica-to Replica catchup
• Chains, trees…
• Exploit DC locality
• Retain causal consistency
• Never see earlier versions
of the data
• Even over WAN latencies

http://technastic.com/how-to-limit-the-number-of-cpu-cores-used-by-a-process-on-windows
MATCH (d:Character {name:'The Doctor'})
-[:APPEARED_IN]->(ep:Episode),
(c:Character {name:'Dalek Fey'})
-[:APPEARED_IN]->(ep:Episode)
WITH ep
MERGE (d)-[:PREVAILED_IN]->(ep),
(c)-[:DEFEATED_IN]->(ep)

Neo4j 3.1 Creates a New Foundation
Enables Graphs Across the Enterprise
The graph database has gone mainstream
Has become a core enterprise technology spanning
a wide variety of business domains
Neo4j is the leading graph database
Extensive track record of graph leadership and
innovation
Neo4j 3.1 is the graph foundation for the enterprise
Provides the security, scalability, integration,
administration and operability required
to support enterprise graph applications
38
World-Class Research and Development

Let’s disConnect
@jimwebber
Build me a
box!

Dr. Jim Webber discusses future of graph databases and Neo4j 3.1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Dr. Jim Webber discusses future of graph databases and Neo4j 3.1

Similar to Dr. Jim Webber discusses future of graph databases and Neo4j 3.1 (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Dr. Jim Webber discusses future of graph databases and Neo4j 3.1

Editor's Notes