Graph Databases try to make it easy for developers to leverage huge amounts of connected information for everything from routing to recommendations. Doing that poses a number of challenges on the implementation side. In this talk we want to look at the different storage, query and consistency approaches that are used behind the scenes. We’ll check out current and future solutions used in Neo4j and other graph databases for addressing global consistency, query and storage optimization, indexing and more and see which papers and research database developers take inspirations from.
2. @mesirii
Graph Databases try to make it easy for developers to leverage huge amounts
of connected information for everything from routing to recommendations.
Doing that poses a number of challenges on the implementation side. In this
talk we want to look at the different storage, query and consistency
approaches that are used behind the scenes. We’ll check out current and
future solutions used in Neo4j and other graph databases for addressing
global consistency, query and storage optimization, indexing and more and
see which papers and research database developers take inspirations from.
11. @mesirii
A short history
There was a CMS/DMS in Sweden
Which had two big issues
Language independent Keywords
Complex Access Control for SaaS
RDBMS failed
In Memory Graph was cool
Dot Com Bubble burst
A new star was born
29. @mesirii
Implementation Designs
● Adjacency List
● Adjacency Matrix (compressed)
● Sparse Matrices
● Column Store
● HexaStore
● Hash Index
● Document Store
● Object Storage
30. @mesirii
● pre-materialize connections
● store "neighbours" with each node
● direct memory pointer
● cheap O(1) lookup, O(n) scan
● random memory access !
● Properties on Relationships
● Grouped by Type & Direction
● Neo4j
Adjacency List
Node Rel
Rel
Rel
Rel
Rel
Rel
Rel
Rel
Node
Node
Node
Node Rel
Rel
Rel
Rel
Node
Node
31. @mesirii
Adjacency Matrix
● matrix with nodes as
○ row and column
○ cell is relationship
○ can contain weight
○ 0 … no relationship
● matrix operations as
graph operations
● size is a problem (N^2)
● need to compress
● e.g. bitsets (SparkSee)
32. @mesirii
Sparse Matrix
● linear algebra
● GraphBLAS
○ research & development from Uni Texas
● efficient sparse matrices on CPU & GPU
● matrix operations (and filters) as graph
operations
● RedisGraph
33. @mesirii
Column Store
● sort by "natural ids"
● all properties and relationships as very wide columns
● need fixed schema
34. @mesirii
Hash / Hybrid Index
● Nodes and Relationships are documents
● Additional HashIndex(Source, Target) -> Linked List of Rels
● Used in ArangoDB
36. @mesirii
Hexa-Store
● Used by TripleStores
● Backed by Key-Value Store
● Store all combinations of triples
○ S-P-O
○ S-O-P
○ P-S-O
○ P-O-S
○ O-P-S
○ O-S-P
● And use prefix search for lookups/expand
● JS - GunDB, DGraph, Cayley
38. @mesirii
Native Database
• Each Database is native to it's core model
• eg. relational, column
• so optimized for that model in storage and
operations
• And non-native to other models that you put on top
• which causes lack of safety, performance,
expressiveness
47. @mesirii
File System
Record based files
Fixed Size Records
ID = Record ID
Offset = ID * Block Size
Pointer = Memory +
Offset
Nodes
Relationships
Properties
48. @mesirii
Page Cache
● OS Memory Mapping insufficient
● Which pages are important when (LRU-K)
● Transactional Guarantees / Isolation
● Concurrency
● Use for other types (indexes)
○ Generational Datastructures
● Seed cache
50. @mesirii
DB Engine
● Low Level Kernel SPI for common operations
● Only works with primitives / arrays
● Off Heap (tx, index, metadata, next: query state)
● Record Access
● Transaction Layer (Isolation)
● reusable Cursors (Prefetching)
● soon: Store Abstraction
53. @mesirii
Neo4j Type System (Cypher, Drivers, Browser)
Null
Missing or unknown
value
Boolean
True or false
Integer
64-bit signed integer
Float
Double precision
floating point
Spatial
different 2d and 3d
coordinate systems
Bytes
Raw octet stream
String
Unicode text
List
Ordered collection
Map
Keyed collection
Temporal
(local)date(time)
duration
Structure
Node Relationship Path
54. @mesirii
Why the hell - 4j?
Good
Founders were Java Developers
Easier to hire
Java has memory management (GC)
Java NIO
Portability
JVM got way faster/better of the years
Extensibility in all JVM Languages
Can utilize GraalVM
Bad
Little Access to low level system capabilities
(Cache, Memory, Network)
Need to use Unsafe
Garbage Collection (unpred. pauses)
No value types
Scala runtime behavior
C-Libraries are harder to integrate
59. @mesirii
SQL
SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o
ON (c.CustomerID = o.CustomerID)
JOIN order_details AS od
ON (o.OrderID = od.OrderID)
JOIN products AS p
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = 'Chocolat'
65. @mesirii
A (real) Question
Find all Actors and Movies they acted in
Whose name contains the letter "a"
Aggregate the frequency and movie titles
Filter by who acted in more than 5 movies
Return their name, birth year and movie titles
Ordered by number of movies
Limited to top 10
66. @mesirii
A (real) Cypher Query
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name CONTAINS "a"
WITH a,
count(m) AS cnt,
collect(m) AS movies
WHERE cnt > 5
RETURN a.name, a.born,
[m IN movies | m.title] as titles
ORDER BY size(movies) DESC
LIMIT 10
68. @mesirii
● cost based planner
○ e.g. index selectivity, db-statistics
● IDP (Iterative Dynamic Programming)
● Loads of papers on query plannig
Query Planning
71. @mesirii
openCypher
● open-source query language spec
● implementers group
● publishes artifacts
● reference implementation
● open collaboration
● toward a new standard
○ fun with standards orgs
74. @mesirii
Architecture & Data Flow
Application
Cypher Bolt Driver
Cypher Bolt Server
Neo4j
MATCH (a:Person)
WHERE a.name = 'Alice'
RETURN a.surname, a.age
{surname: 'Smith',
age: 33}
Parameterised
Cypher
Result
Stream
metadata
75. @mesirii
Driver Implementation
● Versioned Protocol (Handshake)
● Packstream Protocol based on MessagePack
● Asynchronous w/ sync APIs
● Uses Netty on Server
● Reactive w/ backpressure in v2 next year
77. @mesirii
Driver Concepts
Driver
Top-level object for all Neo4j interaction
Session
Logical context for sequence of transactions
Transaction
Unit of work
Statement Result
Stream of records plus metadata
79. @mesirii
Python (blocking)
uri = "bolt://localhost:7687"
driver = GraphDatabase.driver(uri, auth=("neo4j", "p4ssw0rd"))
def print_names(tx):
result = tx.run("MATCH (a:Person) RETURN a.name")
for record in result:
print(record["a.name"])
with driver.session() as session:
session.read_transaction(print_names)
80. @mesirii
Driver Implementation
● Versioned Protocol (Handshake)
● Packstream Protocol based on MessagePack
● Asynchronous w/ sync APIs
● Uses Netty on Server
● Reactive w/ backpressure in v2 next year
● Same architecture across languages
81. @mesirii
Transaction Routing
Connection
to reader
Session
Load Balancing Connection Pool
Connection
to writer
Connection
to reader
session.read_transaction(...) session.read_transaction(...)
session.write_transaction(...)driver.session() session.close()ACQUIRE
RELEASE
ACQUIRE
RELEASE
ACQUIRE
RELEASE
84. @mesirii
Server Selection Strategy
The Round Robin strategy (prior to 1.5)
continues to try all servers in turn,
leading to a severe backlog of work and
a dramatically lower overall throughput.
The Least Connected strategy
(introduced in 1.5) leads to only a
proportional drop in throughput under
the same circumstances, as the
misbehaving server is avoided.
one server starts to run slow
86. @mesirii
Clustering History
1. Zookeeper
2. Paxos (v1)
3. Paxos (v2)
4. Raft
"Raft is a consensus algorithm that is designed to be easy
to understand. It's equivalent to Paxos in fault-tolerance
and performance." raft.github.io
93. @mesirii
Clustering (next)
● Analytics on Reporting Instances
● Cluster member integration with Spark
● Distributed linear Transactions
● Sharding
● Workload based sharding
95. @mesirii
Past Graph Compute Options
● Data Processing
○ Spark with GraphX, Flink with Gelly
○ Gremlin Graph Computer
● Dedicated Graph Processing
○ Urika, GraphLab, Giraph, Mosaic,
GPS, Signal-Collect, Gradoop
● Data Scientist Toolkit
○ igraph, NetworkX, Boost in Python, R, C
96. @mesirii
Pregel - Bulk Synchronous Parallel (BSP)
The map-reduce for graph compute.
Node-Centric Processing
1. Each node sends message
about it's own state
2. Each node receives messages
from neighbours
3. Updates it's own state
4. Global Compute Superstep
100. @mesirii
How does it work?
Procedures
Neo4j
In Memory
Graph
Read projected
graph
Load projected
graph
Graph
Loader
Execute
algorithm
Store
results
1
2
4
3
Every operation is concurrent
101. @mesirii
How do you use it?
1. Call as Cypher procedure
2. Pass in specification (Label, Prop, Query) and configuration
3. ~.stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
4. non-stream variant writes results to graph; returns statistics
CALL algo.<name>('Label','TYPE',{conf})