1. Building a graph database requires modeling the data, choosing a query language, and providing storage.
2. Existing distributed databases like Cassandra can be used for storage due to their scalability and reliability, though a native graph database provides more functionality.
3. Solving complex graph problems requires capabilities beyond basic queries, including search, analytics, and integration with machine learning, which graph databases are designed to support at scale.
10. Storage - Cassandra
• Fast
• Distributed
• Scalable
• Reliable
• 11 years of development
• 54 committers (listed on apache)
• 274 contributors (listed on github)
17. Typical customer 360 queries
Offline
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
• Find me Jenny.
• Find me all people
with similar names
to 'Jenny'.
• Tell there are
duplicate Jennys.
• Find how Jenny
and John are
connected.
• Find how
influential Jenny is
in my application.
19. Find me all people with similar names to 'Jenny'
Offline
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Medium
How Fast?
• Human Fast
What?
• Search
• Graph
Why?
• Single index
lookup
• Single iteration
20. Tell there are duplicate Jennys
Offline
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Medium
How Fast?
• Offline
What?
• Analytics
• Graph
Why?
• Aggregation
• Multiple Iteration
21. Find how Jenny and John are connected
Offline
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Complex
How Fast?
• Machine
What?
• Graph
Why?
• Multiple partition
lookup
• Multiple iteration
22. Find how influential Jenny is in my application
Offline
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Complex
How Fast?
• Offline
What?
• Spark Analytics
• Graph via PageRank
Why?
• Full scan
• Unknown iterations
23. Typical customer 360 queries
Offline
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
• Find me Jenny.
• Find me all people
with similar names
to 'Jenny'.
• Tell there are
duplicate Jennys.
• Find how Jenny
and John are
connected.
• Find how
influential Jenny is
in my application.
24. Summary
1. What it takes to create a graph database
a. Model
b. Language
c. Storage
2. How you can leverage an existing storage engine, and why Cassandra is a
great choice.
3. Solving graph problems requires more than just the basics. Search and
Analytics are essential tools, especially graph database.
25. Don't try this at home
Do not try replicate 100 person years of
dev effort creating your own storage
engine.
Creating a graph database that scales is
tough enough.