SlideShare a Scribd company logo
1 of 47
A Practical Guide to Graph
Databases
About Me
Architect and Full Stack Developer
● 20 years of full stack experience
● Distributed high performance low
latency big data platforms
● Graph Databases are kinda my thing
www.bechbergerconsulting.com
www.bechberger.com
@bechbd
www.linkedin.com/in/davebechberger
Graph Databases
Graph Databases are hot
Graph Theory
What is Graph Datastore?
● Type of NoSQL datastore
● Uses graph structures (nodes, edges)
to store data
● Efficiently represents and traverses
relationships
The NoSQL Spectrum
Why use a graph database?
Network Analysis
Master Data Management
Recommendation Engines
Fraud Detection
Graph Ecosystem
The ecosystem is large and growing
The ecosystem is complex
Frameworks
RDF Triple Stores Labeled Property Model
Databases
Databases vs. Frameworks
Frameworks
● Data is processed not persisted
● Works on enormous datasets
● OLAP workloads
Databases
● Data is persisted and processed
● Real time querying
● OLTP and OLAP workloads
RDF/Triple Stores vs. Labeled Property Graphs
RDF Triple Stores
● Each entity is a triple
● Works with subject - object -
predicate
● Comes from semantic web
● Great for inferring relationships
Labeled Property Graphs
● Entities are a node or an edge
● Works with nodes - edges -
properties - labels
● Both nodes and edges contain
properties
● Great for efficiently traversing
relationships
RDF/Triple Stores vs. Labeled Property Graphs
RDF Triple Stores Labeled Property Graphs
Graph Query Languages
Gremlin
● Imperative +
Declarative
● Powerful
● Steep Learning
Curve
GraphQL
● Most useful for
REST endpoints
● Query Language
for APIs
SPARQL
● W3C Standard
for RDFs
● Based on
semantic Web
Cypher
● Declarative
● Easy to Use
● Most Popular
Language
Others
● Most are
extensions of SQL
● Usually specific to
one system
Queries - Find a Friend of a Friend
SPARQL
PREFIX foaf:
<http://xmlns.com/foaf/0.1/>
SELECT ?name WHERE {
?x foaf:name ?y .
?y foaf:name ?name .}
Cypher
MATCH n (me:Person)-[:FRIEND*2]->
(myFriend:Person) RETURN n.name
Gremlin
g.V().hasLabel(‘person’)
.repeat(out(‘friend’)).times(2)
.dedup().values(‘name’).next()
GraphQL
{
friend {
friend {
name
}
}
}
SQL Variants
SELECT name FROM expand(
bothE('is_friend_with').bothV()
.bothE('is_friend_with').bothV()
)
Both
Visualization
Desktop Tool Web
Visualizations
To use or not to use,
that is the question
Everything is a
Graph
But that doesn’t mean you should solve it with a graph
Explore the
Questions
Search and Selection
● Get me everyone who works at X?
● Find me everyone with a first name like “John”?
● Find me all stores within X miles?
Answer: Use a RDBMS or a Search Server
Related Data
● What is the easiest way for me to be introduced to an executive at X?
● How do “John” and “Paula” know each other?
● How is company X related to company Y?
Answer: Use a Graph
Aggregation
● How many companies are in my system?
● What are my average sales for each day over the past month?
● What is the number of transactions processed by my system each day?
Answer: Use a RDBMS
Pattern Matching
● Who in my system has a similar profile to me?
● Does this transaction look like other known fraudulent transactions?
● Is the user “J. Smith” the same as “Johan S.”?
Answer: It depends, you might use search server or a graph
Clustering, Centrality, and Influence
● Who is the most influential person I am connected with on LinkedIn?
● What equipment in my network will have the largest impact if it breaks?
● What parts tend to fail at the same time?
Answer: Use a graph
Still not sure?
Should I use Graph?
I sold this to Management as a Graph
project so we are using a graph
Based on work by Dr. Denise Gosnell: https://bit.ly/2s0qBC2
I’m still confused
● Do we care about the relationships between entities as
much or more than the entities themselves?
● If I were to model this in a RDBMS would I be writing
queries with multiple (5+) joins or recursive CTE’s to
retrieve my data?
● Is the structure of my data continuously evolving?
● Is my domain a natural fit for a graph?
Can’t I just do this in
SQL?
Northwind Data Models
Give me all products in a category (Search/Selection)
SQL
SELECT c.categoryName, p.productName,
FROM product AS p
INNER JOIN category AS c ON
c.categoryId=p.categoryId
WHERE c.categoryName=’Beverages’
Gremlin
g.V().has(‘category’, ‘categoryName’,
‘Beverages’).as(‘c’).in(‘part_of’)
.as(‘p’).select(‘c’, ‘p’)
.by(‘categoryName’).by(‘productName’)
Cypher
MATCH (o:Category)-[:PARTOF]->(p:Product)
RETURN c.categoryName, p.productName
Give me the top 5 products ordered (Aggregation)
SQL
SELECT TOP(5) c.categoryName,
p.productName, count(o)
FROM order AS o
INNER JOIN product AS p ON
p.productId=o.productId
INNER JOIN category AS c ON
c.categoryId=p.categoryId
ORDER BY count(o)
Gremlin
g.V().hasLabel("order").as(‘o’)
.out(‘orders’).as(‘p’).out(‘part_of’)
.as(‘c’).order().by(select(‘o’).count()).
select(‘c’, ‘p’, ‘o’).by(‘categoryName’)
.by(‘productName’).by(count())
Cypher
MATCH (o:Order)-[:ORDERS]->(p:Product) -
[:PART_OF]->(c:Category)
RETURN c.categoryName, p.productName,
count(o)
ORDER BY count(o)
DESC LIMIT 5
Find Products Purchased by others that I haven’t purchased
(Related Data/Pattern Matching)
SQL
SELECT TOP(5) product.product_name as Recommendation,
count(1) as Frequency
FROM product, customer_product_mapping,
(SELECT cpm3.product_id, cpm3.customer_id
FROM Customer_product_mapping cpm,
Customer_product_mapping cpm2, Customer_product_mapping cpm3
WHERE cpm.customer_id = ‘123’
and cpm.product_id = cpm2.product_id
and cpm2.customer_id != ‘customer-one’
and cpm3.customer_id = cpm2.customer_id
and cpm3.product_id not in (select distinct product_id
FROM Customer_product_mapping cpm
WHERE cpm.customer_id = ‘customer-one’)
) recommended_products
WHERE customer_product_mapping.product_id = product.product_id
and customer_product_mapping.product_id in
recommended_products.product_id
and customer_product_mapping.customer_id =
recommended_products.customer_id
GROUP BY product.product_name
ORDER BY Frequency desc
Gremlin
g.V().has("customer", "customerId", "123").as("c").
out("ordered").out("contains").out("is").aggregate("p").
in("is").in("contains").in("ordered").where(neq("c")).
out("ordered").out("contains").out("is").where(without("p")).
groupCount().order(local).by(values,
decr).select(keys).limit(local, 5).
unfold().values("name")
Cypher
MATCH (u:Customer {customer_id:’123’})-[:BOUGHT]->(p:Product)<-
[:BOUGHT]-(peer:Customer)-[:BOUGHT]->(r:Product)
WHERE not (u)-[:BOUGHT]->(r)
RETURN r as Recommendation, count(*) as Frequency
ORDER BY Frequency DESC LIMIT 5;
Give me all employees, their supervisor and level (Recursive CTE)
SQL
WITH EmployeeHierarchy (EmployeeID,
LastName,
FirstName,
ReportsTo,
HierarchyLevel) AS
( SELECT EmployeeID
, LastName
, FirstName
, ReportsTo
, 1 as HierarchyLevel
FROM Employees
WHERE ReportsTo IS NULL
UNION ALL
SELECT e.EmployeeID
, e.LastName
, e.FirstName
, e.ReportsTo
, eh.HierarchyLevel + 1 AS HierarchyLevel
FROM Employees e
INNER JOIN EmployeeHierarchy eh
ON e.ReportsTo = eh.EmployeeID)
SELECT *
FROM EmployeeHierarchy
ORDER BY HierarchyLevel, LastName, FirstName
Gremlin
g.V().hasLabel("employee").where(__.not(out("reportsTo"))).
repeat(__.in("reportsTo")).emit().tree().by(map
{def employee = it.get() employee.value("firstName") + " " +
employee.value("lastName")}).next()
Cypher
MATCH p = (u:Employee)->[:ReportsTo]->(s:Employee)<-
RETURN u.firstName as FirstName, u.LastName AS LastName,
(s.firstName + " " + s.lastName) AS ReportsTo, path(p) AS
HierarchyLevel ORDER BY HierarchyLevel, LastName, FirstName
Based on work by http://sql2gremlin.com/
Where do I start?
Choosing a Datastore
● Framework vs. RDF vs. Property Model
● HA/Transaction Volume/Data Size
● Hosted vs On Premise
Datastore Concerns
● Data Consistency - ACID or BASE
● Explore your choices
● Beware the Operational Overhead
Data Modelling
● Whiteboard friendly - close to but Pragmatic Conceptual model
● Take into account how you are traversing data
● Use your Relational model to start
● Iterate, Iterate, Iterate
Data Modelling Concerns
● Don’t use Symmetric Relationships
● Look out for Hidden/Anemic Relationships
● Look for Supernodes
● Schema - Use it and make it general
What next?
Summary
The Good
● Graphs are flexible
● Great at finding and traversing relationships
● Natural fit in many complex domains
● Query times are proportional to amount of graph you traverse
The Bad
● Different options scale very differently
● Team needs to learn a new mindset
● Still immature space
The Ugly
● Lack of documentation
● Large, splintered and rapidly evolving ecosystem
● Hard for new users to tell good versus bad use cases
Advice from the trenches...
● Graph datastores may solve your problem, but understand your problem first
● Expect some trial and error
● Your data model will evolve, plan for it
● Don’t underestimate the time it takes to bring your team up to speed
● Graphs databases are not a silver bullet
www.bechbergerconsulting.com
www.bechberger.com
@bechbd
www.linkedin.com/in/davebechberger
Questions?

More Related Content

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Recently uploaded (20)

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

NDC Oslo 2018 - A Practical Guide to Graph Databases

  • 1. A Practical Guide to Graph Databases
  • 2. About Me Architect and Full Stack Developer ● 20 years of full stack experience ● Distributed high performance low latency big data platforms ● Graph Databases are kinda my thing www.bechbergerconsulting.com www.bechberger.com @bechbd www.linkedin.com/in/davebechberger
  • 6. What is Graph Datastore? ● Type of NoSQL datastore ● Uses graph structures (nodes, edges) to store data ● Efficiently represents and traverses relationships
  • 8. Why use a graph database? Network Analysis Master Data Management Recommendation Engines Fraud Detection
  • 10. The ecosystem is large and growing
  • 11. The ecosystem is complex Frameworks RDF Triple Stores Labeled Property Model Databases
  • 12. Databases vs. Frameworks Frameworks ● Data is processed not persisted ● Works on enormous datasets ● OLAP workloads Databases ● Data is persisted and processed ● Real time querying ● OLTP and OLAP workloads
  • 13. RDF/Triple Stores vs. Labeled Property Graphs RDF Triple Stores ● Each entity is a triple ● Works with subject - object - predicate ● Comes from semantic web ● Great for inferring relationships Labeled Property Graphs ● Entities are a node or an edge ● Works with nodes - edges - properties - labels ● Both nodes and edges contain properties ● Great for efficiently traversing relationships
  • 14. RDF/Triple Stores vs. Labeled Property Graphs RDF Triple Stores Labeled Property Graphs
  • 15. Graph Query Languages Gremlin ● Imperative + Declarative ● Powerful ● Steep Learning Curve GraphQL ● Most useful for REST endpoints ● Query Language for APIs SPARQL ● W3C Standard for RDFs ● Based on semantic Web Cypher ● Declarative ● Easy to Use ● Most Popular Language Others ● Most are extensions of SQL ● Usually specific to one system
  • 16. Queries - Find a Friend of a Friend SPARQL PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?x foaf:name ?y . ?y foaf:name ?name .} Cypher MATCH n (me:Person)-[:FRIEND*2]-> (myFriend:Person) RETURN n.name Gremlin g.V().hasLabel(‘person’) .repeat(out(‘friend’)).times(2) .dedup().values(‘name’).next() GraphQL { friend { friend { name } } } SQL Variants SELECT name FROM expand( bothE('is_friend_with').bothV() .bothE('is_friend_with').bothV() )
  • 19. To use or not to use, that is the question
  • 20. Everything is a Graph But that doesn’t mean you should solve it with a graph
  • 22. Search and Selection ● Get me everyone who works at X? ● Find me everyone with a first name like “John”? ● Find me all stores within X miles? Answer: Use a RDBMS or a Search Server
  • 23. Related Data ● What is the easiest way for me to be introduced to an executive at X? ● How do “John” and “Paula” know each other? ● How is company X related to company Y? Answer: Use a Graph
  • 24. Aggregation ● How many companies are in my system? ● What are my average sales for each day over the past month? ● What is the number of transactions processed by my system each day? Answer: Use a RDBMS
  • 25. Pattern Matching ● Who in my system has a similar profile to me? ● Does this transaction look like other known fraudulent transactions? ● Is the user “J. Smith” the same as “Johan S.”? Answer: It depends, you might use search server or a graph
  • 26. Clustering, Centrality, and Influence ● Who is the most influential person I am connected with on LinkedIn? ● What equipment in my network will have the largest impact if it breaks? ● What parts tend to fail at the same time? Answer: Use a graph
  • 28. Should I use Graph? I sold this to Management as a Graph project so we are using a graph Based on work by Dr. Denise Gosnell: https://bit.ly/2s0qBC2
  • 29. I’m still confused ● Do we care about the relationships between entities as much or more than the entities themselves? ● If I were to model this in a RDBMS would I be writing queries with multiple (5+) joins or recursive CTE’s to retrieve my data? ● Is the structure of my data continuously evolving? ● Is my domain a natural fit for a graph?
  • 30. Can’t I just do this in SQL?
  • 32. Give me all products in a category (Search/Selection) SQL SELECT c.categoryName, p.productName, FROM product AS p INNER JOIN category AS c ON c.categoryId=p.categoryId WHERE c.categoryName=’Beverages’ Gremlin g.V().has(‘category’, ‘categoryName’, ‘Beverages’).as(‘c’).in(‘part_of’) .as(‘p’).select(‘c’, ‘p’) .by(‘categoryName’).by(‘productName’) Cypher MATCH (o:Category)-[:PARTOF]->(p:Product) RETURN c.categoryName, p.productName
  • 33. Give me the top 5 products ordered (Aggregation) SQL SELECT TOP(5) c.categoryName, p.productName, count(o) FROM order AS o INNER JOIN product AS p ON p.productId=o.productId INNER JOIN category AS c ON c.categoryId=p.categoryId ORDER BY count(o) Gremlin g.V().hasLabel("order").as(‘o’) .out(‘orders’).as(‘p’).out(‘part_of’) .as(‘c’).order().by(select(‘o’).count()). select(‘c’, ‘p’, ‘o’).by(‘categoryName’) .by(‘productName’).by(count()) Cypher MATCH (o:Order)-[:ORDERS]->(p:Product) - [:PART_OF]->(c:Category) RETURN c.categoryName, p.productName, count(o) ORDER BY count(o) DESC LIMIT 5
  • 34. Find Products Purchased by others that I haven’t purchased (Related Data/Pattern Matching) SQL SELECT TOP(5) product.product_name as Recommendation, count(1) as Frequency FROM product, customer_product_mapping, (SELECT cpm3.product_id, cpm3.customer_id FROM Customer_product_mapping cpm, Customer_product_mapping cpm2, Customer_product_mapping cpm3 WHERE cpm.customer_id = ‘123’ and cpm.product_id = cpm2.product_id and cpm2.customer_id != ‘customer-one’ and cpm3.customer_id = cpm2.customer_id and cpm3.product_id not in (select distinct product_id FROM Customer_product_mapping cpm WHERE cpm.customer_id = ‘customer-one’) ) recommended_products WHERE customer_product_mapping.product_id = product.product_id and customer_product_mapping.product_id in recommended_products.product_id and customer_product_mapping.customer_id = recommended_products.customer_id GROUP BY product.product_name ORDER BY Frequency desc Gremlin g.V().has("customer", "customerId", "123").as("c"). out("ordered").out("contains").out("is").aggregate("p"). in("is").in("contains").in("ordered").where(neq("c")). out("ordered").out("contains").out("is").where(without("p")). groupCount().order(local).by(values, decr).select(keys).limit(local, 5). unfold().values("name") Cypher MATCH (u:Customer {customer_id:’123’})-[:BOUGHT]->(p:Product)<- [:BOUGHT]-(peer:Customer)-[:BOUGHT]->(r:Product) WHERE not (u)-[:BOUGHT]->(r) RETURN r as Recommendation, count(*) as Frequency ORDER BY Frequency DESC LIMIT 5;
  • 35. Give me all employees, their supervisor and level (Recursive CTE) SQL WITH EmployeeHierarchy (EmployeeID, LastName, FirstName, ReportsTo, HierarchyLevel) AS ( SELECT EmployeeID , LastName , FirstName , ReportsTo , 1 as HierarchyLevel FROM Employees WHERE ReportsTo IS NULL UNION ALL SELECT e.EmployeeID , e.LastName , e.FirstName , e.ReportsTo , eh.HierarchyLevel + 1 AS HierarchyLevel FROM Employees e INNER JOIN EmployeeHierarchy eh ON e.ReportsTo = eh.EmployeeID) SELECT * FROM EmployeeHierarchy ORDER BY HierarchyLevel, LastName, FirstName Gremlin g.V().hasLabel("employee").where(__.not(out("reportsTo"))). repeat(__.in("reportsTo")).emit().tree().by(map {def employee = it.get() employee.value("firstName") + " " + employee.value("lastName")}).next() Cypher MATCH p = (u:Employee)->[:ReportsTo]->(s:Employee)<- RETURN u.firstName as FirstName, u.LastName AS LastName, (s.firstName + " " + s.lastName) AS ReportsTo, path(p) AS HierarchyLevel ORDER BY HierarchyLevel, LastName, FirstName Based on work by http://sql2gremlin.com/
  • 36. Where do I start?
  • 37. Choosing a Datastore ● Framework vs. RDF vs. Property Model ● HA/Transaction Volume/Data Size ● Hosted vs On Premise
  • 38. Datastore Concerns ● Data Consistency - ACID or BASE ● Explore your choices ● Beware the Operational Overhead
  • 39. Data Modelling ● Whiteboard friendly - close to but Pragmatic Conceptual model ● Take into account how you are traversing data ● Use your Relational model to start ● Iterate, Iterate, Iterate
  • 40. Data Modelling Concerns ● Don’t use Symmetric Relationships ● Look out for Hidden/Anemic Relationships ● Look for Supernodes ● Schema - Use it and make it general
  • 43. The Good ● Graphs are flexible ● Great at finding and traversing relationships ● Natural fit in many complex domains ● Query times are proportional to amount of graph you traverse
  • 44. The Bad ● Different options scale very differently ● Team needs to learn a new mindset ● Still immature space
  • 45. The Ugly ● Lack of documentation ● Large, splintered and rapidly evolving ecosystem ● Hard for new users to tell good versus bad use cases
  • 46. Advice from the trenches... ● Graph datastores may solve your problem, but understand your problem first ● Expect some trial and error ● Your data model will evolve, plan for it ● Don’t underestimate the time it takes to bring your team up to speed ● Graphs databases are not a silver bullet

Editor's Notes

  1. Test text for sizing
  2. Not an architect that just draws boxes and lines, I get my hands dirty by actually helping to build these things
  3. Graph database popularity is up almost 800% since January of 2013
  4. Leohard Euler - 1735 - 7 Bridges of Koingsberg 2 Islands in Pregel River w/ 7 bridges Can you walk all bridges and return to start w/o repeating A knowledge of Graph Theory may help but is not required
  5. Lots of examples out there as to why use a graph database but these are just a few
  6. The ecosystem is large and Growing This slide currently shows 43. I originally put this out on Twitter and immediately had ~ 10 more additions of datastores I had never heard of
  7. Lots of options out there SPARQL is a Standard for RDF graphs, there is not one for Property Model Graphs There is a movement out there called GQL to attempt to create a standard property model graph language
  8. There are lots of tools to help you visualize your data Don’t fall into the trap that the only way to view your data is as a node chart
  9. There are lots of tools to help you visualize your data Don’t fall into the trap that the only way to view your data is as a node chart
  10. Graphs are flexible. In general it is easy to extend your model with additional attributes and objects allowing data evolution at a rapid pace Graphs are great for searching relationships between items, but make sure that's what you want to search Graphs are a more natural data model in many domains Graph processing times are proportional to the amount of nodes and edges you choose to traverse, not the data size
  11. Depending on the graph datastore, they scale differently in terms of transactions and data size, many are single server only It is a different mindset your team has to learn, and learning is not a cheap process Graph databases are still not as mature as RDBMS systems
  12. Their is a lot of documentation for neophyte and expert users, not much in between The ecosystem is vast, splintered and constantly evolving. Graph databases are great for some use cases, horrible for others and it's not always easy to tell which you are in