Relational databases power most applications, but new use-cases have requirements that they are not well suited for.
That's why new approaches like graph databases are used to handle join-heavy, highly-connected and realtime aspects of your applications.
This talk compares relational and graph databases, show similarities and important differences.
We do a hands-on, deep-dive into ease of data modeling and structural evolution, massive data import and high performance querying with Neo4j, the most popular graph database.
I demonstrate a useful tool which makes data import from existing relational databases with a non-denormalized ER-model a "one click"-experience.
Which leaves biggest challenge for people coming from a relational background is to adapt some of their existing database experience to new ways of thinking.
4. History
of
Neo4j
-‐
Problem
• Digital
Asset
Management
System
in
2000
• SaaS
many
users
in
many
countries
• Two
hard
use-‐cases
• Mul1
language
keyword
search
• Including
synonyms
/
word
hierarchies
• Access
Management
to
Assets
for
SaaS
Scale
5. History
of
Neo4j
–
Rela%onal
ABempt
• Tried
with
many
rela1onal
DBs
• JOIN
Performance
Problems
• Hierarchies,
Networks,
Graphs
• Modeling
Problems
• Data
Model
evolu1on
• No
Success,
even
…
• With
expensive
database
consultants!
6. History
of
Neo4j
–
First
working
Implementa%on
• Graph
Model
&
API
sketched
on
a
napkin
• Nodes
connected
by
RelaAonships
• Just
like
your
conceptual
model
• Implemented
network-‐database
in
memory
• Java
API,
fast
Traversals
• Worked
well,
but
…
• No
persistence,
No
Transac1ons
• Long
import
/
export
1me
from
rela1onal
storage
7. History
of
Neo4j
-‐
Solu%on
• Evolved
to
full
fledged
database
in
Java
• With
persistence
using
files
+
memory
mapping
• Transac1ons
with
Transac1on
Log
(WAL)
• Lucene
for
fast
Node
search
• Founded
Company
in
2007
• Neo4j
(REST)-‐Server
• Neo4j
Clustering
&
HA
• Cypher
Query
Language
• Today
…
8. Neo
Technology
Overview
Product
• Neo4j
-‐
World’s
leading
graph
database
• 1M+
downloads,
adding
50k+
per
month
• 150+
enterprise
subscrip1on
customers
including
over
50
of
the
Global
2000
Company
• Neo
Technology,
Creator
of
Neo4j
• 80
employees
with
HQ
in
Silicon
Valley,
London,
Munich,
Paris
and
Malmö
• $45M
in
funding
from
Fidelity,
Sunstone,
Conor,
Creandum,
Dawn
Capital
9. Neo4j
Adop%on
by
Selected
Ver%cals
Financial
Services
Communications
Health &
Life Sciences
HR &
Recruiting
Media &
Publishing
Social
Web
Industry
& Logistics
Entertainment
Consumer Retail
Information Services
Business Services
10. How
Customers
Use
Neo4j
Network &
Data Center
Master Data
Management
Social
Recom–
mendations
Identity
& Access
Search &
Discovery
GEO
11. “Forrester
es1mates
that
over
25%
of
enterprises
will
be
using
graph
databases
by
2017”
Neo4j
Leads
the
Graph
Database
Revolu%on
“Neo4j
is
the
current
market
leader
in
graph
databases.”
“Graph
analysis
is
possibly
the
single
most
effec%ve
compe%%ve
differen%ator
for
organiza1ons
pursuing
data-‐driven
opera1ons
and
decisions
aler
the
design
of
data
capture.”
IT
Market
Clock
for
Database
Management
Systems,
2014
hmps://www.gartner.com/doc/2852717/it-‐market-‐clock-‐database-‐management
TechRadar™:
Enterprise
DBMS,
Q1
2014
hmp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-‐/E-‐RES106801
Graph
Databases
–
and
Their
Poten%al
to
Transform
How
We
Capture
Interdependencies
(Enterprise
Management
Associates)
hmp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-‐databasesand-‐poten1al-‐transform-‐capture-‐interdependencies/
12. Largest
Ecosystem
of
Graph
Enthusiasts
• 1,000,000+
downloads
• 20,000+
educated
developers
• 18,000+
Meetup
members
• 100+
technology
and
service
partners
• 150+
enterprise
subscrip1on
customers
including
50+
Global
2000
companies
13. High
Business
Value
in
Data
Rela%onships
Data
is
increasing
in
volume…
• New
digital
processes
• More
online
transac1ons
• New
social
networks
• More
devices
Using
Data
Rela%onships
unlocks
value
• Real-‐1me
recommenda1ons
• Fraud
detec1on
• Master
data
management
• Network
and
IT
opera1ons
• Iden1ty
and
access
management
• Graph-‐based
search
…
and
is
ge^ng
more
connected
Customers,
products,
processes,
devices
interact
and
relate
to
each
other
Early
adopters
became
industry
leaders
15. Rela%onal
DBs
Can’t
Handle
Rela%onships
Well
• Cannot
model
or
store
data
and
relaAonships
without
complexity
• Performance
degrades
with
number
and
levels
of
rela1onships,
and
database
size
• Query
complexity
grows
with
need
for
JOINs
• Adding
new
types
of
data
and
relaAonships
requires
schema
redesign,
increasing
1me
to
market
…
making
tradi1onal
databases
inappropriate
when
data
rela1onships
are
valuable
in
real-‐%me
Slow
development
Poor
performance
Low
scalability
Hard
to
maintain
16. Why
Rela%onal
DBs
Can’t
Handle
Rela%onships
Well?
• Data
Model
built
for
tabular
forms
not
JOINS
managing
connec1ons
was
bolted
on
both
in
schema
and
query
• Strict
schema
not
suitable
for
variable
structured
data
which
is
generated
and
used
by
todays
applica1ons
• Data
volume
and
JOIN
number
affect
cost
of
query
opera1on
exponen1ally
• Variable
hierarchies
and
networks
are
hard
to
store
and
query
so
many
“pamerns”
were
developed
…
olen
only
denormaliza1on
makes
complex
rela1onal
queries
fast
but
destroys
the
good
normalized
data-‐model
Built
for
Forms
Joins
are
expensive
Denormalize
#FTW
17. Unlocking
Value
from
Your
Data
Rela%onships
• Model
your
data
naturally
as
a
graph
of
data
and
rela1onships
• Drive
graph
model
from
domain
and
use-‐cases
• Use
rela1onship
informa1on
in
real-‐
1me
to
transform
your
business
• Add
new
rela1onships
on
the
fly
to
adapt
to
your
changing
requirements
18. High
Query
Performance
with
a
Na%ve
Graph
DB
• Rela1onships
are
first
class
ci1zen
• No
need
for
joins,
just
follow
pre-‐
materialized
rela1onships
of
nodes
• Query
&
Data-‐locality
–
navigate
out
from
your
star1ng
points
• Only
load
what’s
needed
• Aggregate
and
project
results
as
you
go
• Op1mized
disk
and
memory
model
for
graphs
19. High
Query
Performance:
Some
Numbers
• Traverse
4M+
rela1onships
per
second
and
core
• Cost
based
query
op1mizer
–
complex
queries
return
in
milliseconds
• Import
100K-‐1M
records
per
second
transac1onally
• Bulk
import
tens
of
billions
of
records
in
a
few
hours
20. High
Query
Performance:
Some
Numbers
• Traverse
4M+
rela1onships
per
second
and
core
• Cost
based
query
op1mizer
–
complex
queries
return
in
milliseconds
• Import
100K-‐1M
records
per
second
transac1onally
• Bulk
import
tens
of
billions
of
records
in
a
few
hours
23. CAR
name:
“Dan”
born:
May
29,
1970
twimer:
“@dan”
name:
“Ann”
born:
Dec
5,
1975
since:
Jan
10,
2011
brand:
“Volvo”
model:
“V70”
Property
Graph
Model
Components
Nodes
• The
objects
in
the
graph
• Can
have
name-‐value
proper&es
• Can
be
labeled
Rela%onships
• Relate
nodes
by
type
and
direc1on
• Can
have
name-‐value
proper&es
LOVES
LOVES
LIVES
WITH
PERSON
PERSON
24. Rela%onal
Versus
Graph
Models
Rela%onal
Model
Graph
Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person
Friend
Person-‐Friend
ANDREAS
DELIA
TOBIAS
MICA
38. Northwind
Graph
Model
Order
Product
Customer Employee
SOLD
ORDERS
Category
Employee
REPORTS_TO
PART_OF
PURCHASED
Supplier
SUPPLIES
39. s
Recap
-‐
Rules
Model
your
graph
first
and
import
into
that
model.
Alterna%vely
…
40. Normalized
ER-‐Models:
Transforma%on
Rules
• Tables
become
nodes
• Table
name
as
node-‐label
• Columns
turn
into
proper%es
• Convert
values
if
needed
• Foreign
Keys
(1:1,
1:n,
n:1)
into
rela%onships,
column
name
into
rela1onship-‐type
(or
bemer
verb)
• JOIN-‐Tables
represent
rela%onships
• Also
other
tables
without
domain
iden1ty
(w/o
PK)
and
two
FKs
• Columns
turn
into
rela%onship
proper%es
41. Normalized
ER-‐Models:
Cleanup
Rules
• Remove
technical
IDs
(auto-‐incremen1ng
PKs)
• Keep
domain
IDs
(e.g.
ISBN)
• Add
constraints
for
those
• Add
indexes
for
lookup
fields
• Adjust
names
for
Label,
REL_TYPE
and
propertyName
Note:
currently
no
composite
constraints
and
indexes
43. Ge^ng
Data
into
Neo4j
Cypher-‐Based
“LOAD
CSV”
Capability
• Transac1onal
(ACID)
writes
• Ini1al
and
incremental
loads
of
up
to
10
million
nodes
and
rela1onships
Command-‐Line
Bulk
Loader
neo4j-‐import
• For
ini1al
database
popula1on
• For
loads
up
to
10B+
records
• Up
to
1M
records
per
second
4.58
million
things
and
their
rela1onships…
Loads
in
100
seconds!
CSV
44. Ge^ng
Data
into
Neo4j
Custom
Cypher-‐Based
Loader
• Uses
transac1onal
Cypher
hmp
endpoint
• Parametrized,
batched,
concurrent
Cypher
statements
• Any
programming/script
language
with
driver
or
plain
hmp
JVM
Transac%onal
Loader
• Use
Neo4j’s
Java-‐API
• From
any
JVM
language
• Up
to
1M
records
per
second
Any
Data
Program
Program
Program
46. Import
Demo
Cypher-‐Based
“LOAD
CSV”
Capability
• Use
to
import
Northwind
CSV
dumps
Command-‐Line
Bulk
Loader
neo4j-‐import
• Chicago
Crimes
Dataset
Rela%onal
Import
Tool
neo4j-‐rdbms-‐import
• Proof
of
Concept
JDBC
+
API
CSV
47. RDBMS
Import
Tool
Demo
–
Proof
of
Concept
• JDBC
for
vendor-‐independent
database
connec1on
• SchemaCrawler
to
extract
DB-‐Meta-‐Data
• Use
Rules
to
drive
graph
model
import
• Op1onal
means
to
override
default
behavior
• Scales
writes
with
Parallel
Batch
Importer
API
• Reads
tables
concurrently
for
nodes
&
rela1onships
Demo:
MySQL
-‐
Employee
Demo
Database
Source:
github.com/jexp/neo4j-‐rdbms-‐import
Post
gres
MySQL
Oracle
49. Basic
Query:
Who
do
people
report
to?
MATCH
(:Employee
{firstName:”Steven”}
)
-‐[:REPORTS_TO]-‐>
(:Employee
{firstName:“Andrew”}
)
REPORTS_TO
Steven
Andrew
LABEL
PROPERTY
NODE
NODE
LABEL
PROPERTY
50. Basic
Query
Comparison:
Who
do
people
report
to?
SELECT *
FROM Employee as e
JOIN Employee_Report AS er ON (e.id = er.manager_id)
JOIN Employee AS sub ON (er.sub_id = sub.id)
MATCH
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)
RETURN
*
53. MATCH
(sub)-‐[:REPORTS_TO*0..3]-‐>(boss),
(report)-‐[:REPORTS_TO*1..3]-‐>(sub)
WHERE
boss.firstName
=
'Andrew'
RETURN
sub.firstName
AS
Subordinate,
count(report)
AS
Total;
Express
Complex
Queries
Easily
with
Cypher
Find
all
direct
reports
and
how
many
people
they
manage,
each
up
to
3
levels
down
Cypher
Query
SQL
Query
54. “We
found
Neo4j
to
be
literally
thousands
of
%mes
faster
than
our
prior
MySQL
solu1on,
with
queries
that
require
10
to
100
%mes
less
code.
Today,
Neo4j
provides
eBay
with
func1onality
that
was
previously
impossible.”
Volker
Pacher
Senior
Developer
55. Who
is
in
Robert’s
(direct,
upwards)
repor%ng
chain?
MATCH
path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)
WHERE
sub.firstName = 'Robert'
RETURN
path;
56. Who
is
in
Robert’s
(direct,
upwards)
repor%ng
chain?
57. Who’s
the
Big
Boss?
MATCH
(e:Employee)
WHERE
NOT (e)-[:REPORTS_TO]->()
RETURN
e.firstName as bigBoss;
61. Neo4j
Query
Planner
Cost
based
Query
Planner
since
Neo4j
2.2
• Uses
database
stats
to
select
best
plan
• Currently
for
Read
OperaAons
• Query
Plan
Visualizer,
finds
• Non
op1mal
queries
• Cartesian
Product
• Missing
Indexes,
Global
Scans
• Typos
• Massive
Fan-‐Out
62. Query
Planner
Slight
change,
add
an
:Employee
label
-‐>
more
stats
available
-‐>
new
plan
with
fewer
database-‐hits
64. Neo4j
Clustering
Architecture
Op%mized
for
Speed
&
Availability
at
Scale
64
Performance
Benefits
• No
network
hops
within
queries
• Real-‐Ame
operaAons
with
fast
and
consistent
response
1mes
• Cache
sharding
spreads
cache
across
cluster
for
very
large
graphs
Clustering
Features
• Master-‐slave
replica1on
with
master
re-‐elecAon
and
failover
• Each
instance
has
its
own
local
cache
• Horizontal
scaling
&
disaster
recovery
Load
Balancer
Neo4j
Neo4j
Neo4j
65. MIGRATE
ALL
DATA
MIGRATE
GRAPH
DATA
DUPLICATE
GRAPH
DATA
Non-‐graph
data
Graph
data
Graph
data
All
data
All
data
Rela%onal
Database
Graph
Database
Applica1on
Applica1on
Applica1on
Three
Ways
to
Migrate
Data
to
Neo4j
66. Data
Storage
and
Business
Rules
Execu1on
Data
Mining
and
Aggrega1on
Neo4j
Fits
into
Your
Enterprise
Environment
Applica%on
Graph
Database
Cluster
Neo4j
Neo4j
Neo4j
Ad
Hoc
Analysis
Bulk
Analy%c
Infrastructure
Graph
Compute
Engine
EDW
…
Data
Scien%st
End
User
Databases
Rela1onal
NoSQL
Hadoop
70. Quick
Start:
Plan
Your
Project
1
2
3
4
5
6
7
8
Learn
Neo4j
Decide
on
Architecture
Import
and
Model
Data
Build
Applica%on
Test
Applica%on
Deploy
your
app
in
as
limle
as
8
weeks
PROFESSIONAL
SERVICES
PLAN