2024 DevNexus Patterns for Resiliency: Shuffle shards
There and Back Again, A Developer's Tale
1. There And Back Again…
A Developer’s Tale
By: Jennifer Reif
Neo4j Developer Relations Engineer
jennifer.reif@neo4j.com
@JMHReif
2. Who Am I?
• Developer Relations Engineer for Neo4j
• Continuous learner, developer, blogger
• Conference speaker
• Survivor of financial industry development
Email: jennifer.reif@neo4j.com
Twitter: @JMHReif
4. We want to know…
• What actors played which characters in Lord of the Rings movies
• Other scenarios:
• What employees have which skills for job openings in the company
• What customers purchased which products and the suppliers
• What patient was prescribed which medications from which doctors
• What customers bought which vehicles from what dealerships/people
5. Existing solutions are painful
• Thousands of actors, employees, customers,
patients, doctors, dealerships, skills, etc
• Relational:
• Great for reports and simple JOINs, but too many
JOINs to go across 3 core tables and lookup tables
with endless rows each
• Document:
• Great for pulling information about individual
components, but linking properties across
substructures is complicated
• Key-value:
• Great for bits of information very quickly, but
aggregating and compiling lots of related data is
arduous
8. Database - specifically graph
• Database: a structured set of data held in a computer, especially one
that is accessible in various ways.
• Relational? NoSQL? Graph?
• Graph database: uses graph structures for semantic
queries with nodes, edges, and properties to represent and store data.
10. –Wikipedia, “Graph Database”, Performance section
“Execution of queries within a graph database is localized to a portion of
the graph. It does not search through irrelevant data, making it
advantageous for real-time big data analytical queries. Consequently, graph
database performance is proportional to the size of the data needed to be
traversed, staying relatively constant despite the growth of data stored.”
14. What is it used to accomplish?
Use Cases
• Social networks
• Impact analysis
• Logistics and routing
• Recommendations
• Access control
• Fraud analysis
• …and many, many more!
16. Neo4j is a database
Neo4j
Fast
Reliable
No size limit
Binary &
HTTP
protocol
ACID
transactions
2-4 M
ops/s
per core
Clustering
scale &
availability
Official
Drivers
17. Neo4j is a graph database
Neo4j
Property
Graph
Model
Native
GraphDB
Schema
Free
Graph
Storage
Cypher
Query
Language
Developer
Workbench
Extensible
Procedures
& Functions
Graph
Visualization
20. Property Graph Data Model
• 2 Main Components:
• Nodes
• Relationships
• Additional Components:
• Labels
• Properties
21. Property Graph Data Model
• Nodes:
• Represent the objects in the graph
• Can be categorized using Labels
Car
Person Person
22. Property Graph Data Model
• Nodes:
• Represent the objects in the graph
• Can be categorized using Labels
• Relationships:
• Relate nodes by type and direction
Car
DRIVES
LOVES
LOVES
LIVES WITH
OW
NS
Person Person
23. Property Graph Data Model
• Nodes:
• Represent the objects in the graph
• Can be categorized using Labels
• Relationships:
• Relate nodes by type and direction
• Properties:
• Name-value pairs that can be applied
to nodes or relationships
Car
DRIVES
LOVES
LOVES
LIVES WITH
OW
NS
Person Person
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
24. Tools for data modeling…
• Arrows tool:
• http://www.apcjones.com/arrows/
• Developer guides:
• https://neo4j.com/developer/data-modeling/
• GraphGists:
• https://neo4j.com/graphgists/
• Community Site:
• https://community.neo4j.com/
• Training - Data Modeling course:
• https://neo4j.com/graphacademy/
27. Whiteboard friendliness
title: The Lord of the Rings…
released: 2003
Movie
Cast
name: Orlando Bloom
name: Frodo Baggins
Character
PLAYED
APPEARS_IN
name: Elijah Wood
Cast
Character
name: Legolas
Character
name: Aragorn
name: Viggo Mortensen
Cast
PLAYED
PLAYED
APPEARS_IN
APPEARS_IN
30. Options for Importing Data
• Cypher statements / script: create individual statements to load data manually
• LOAD CSV: used for small and medium data sets can import local or online csv files to graph
• ETL Tool: can import from a relational database and maps relational data model to graph
• Kettle: can import massive amounts of data from a variety of sources
• APOC: standard library that includes several import procedures for different data formats
• Neo4j-admin import tool: command-line interface for large amounts of data
• Import programmatically from drivers: interact via preferred programming language
33. Cypher: Powerful and Expressive
CREATE (:Person { name:“Dan”}) -[:LOVES]-> (:Person { name:“Ann”})
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
34. Cypher: Powerful and Expressive
LOVES
Dan Ann
MATCH (:Person { name:"Dan"} ) -[:LOVES]-> ( whom )
RETURN whom
35. Cypher in 20 sec…
• Nodes look like this:
• (var:Label) OR (var:Label { propKey: propValue })
• Relationships look like this:
• -[var:REL_TYPE]-> or -[var:REL_TYPE { propKey: propValue }]-
• Using Cypher is just looking for particular patterns of those nodes/rels
• (var1:Label)-[var2:REL_TYPE]->(var3:Label)
37. Cypher statements/script
MERGE (m:Movie {id: 100})
ON CREATE SET m.title = “The Lord of the Rings: The Fellowship of the Ring”,
m.releaseDate = date(‘2001-12-19’)…
MERGE (c:Character {id: 300})
ON CREATE SET m.name = “Legolas”…
MERGE (c)-[:APPEARED_IN]->(m)
….
38. LOAD CSV
LOAD CSV WITH HEADERS FROM “file:///movies.csv” as row
MERGE (m:Movie {id: row.movieId})
ON CREATE SET m.title = row.title, m.releaseDate = date(row.released)…
….
LOAD CSV WITH HEADERS FROM “file:///movieCharacters.csv” as row
MATCH (m:Movie {id: row.movieId})
WITH m, row
MERGE (c:Character {id: row.id})
ON CREATE SET m.name = row.name …
MERGE (c)-[:APPEARED_IN]->(m)
….
41. APOC
WITH "https://bestmovies.com/" as url
CALL apoc.load.json(url) YIELD value
UNWIND value.results AS results
WITH results
MERGE (m:Movie {id: results.id})
ON CREATE SET m.title = results.title, m.releaseDate = date(results.released)…
….
42. APOC fave procs
• apoc.load.json(url) / apoc.load.csv(file) / apoc.load.xml(file) / apoc.load.jdbc(url)
• Procedures to load various kinds of data
• Can handle flat files or url paths (locally or remote)
• Excellent when you need transformations with data load
• apoc.periodic.iterate(‘cypher1’, ’cypher2’, {parms})
• For each result in cypher1 statement, run cypher2 statement on them
• Helpful for selecting a segment for update
• apoc.do.when(condition, query, else, {parms})
• Handles transformation for substituting values
• Used for a variety of functions, but here is good for cleaning data
• apoc.date.format(dateType, “precision”, ‘format’)
• Can output date in a variety of formats for display or querying
• Very helpful pulling or pushing date/time value into/out of Neo4j
47. //Load Movie objects that are wanted
WITH 'https://api.themoviedb.org/3/search/movie?api_key='+
$apiKey+'&query=Lord%20of%20the%20Rings' as url
CALL apoc.load.json(url)
YIELD value
UNWIND value.results AS results
WITH results
MERGE (m:Movie {movieId: results.id})
ON CREATE SET m.title = results.title, m.desc = results.overview, m.poster =
results.poster_path, m.reviewStars = results.vote_average, m.reviews = results.vote_count
WITH results, m
CALL apoc.do.when(results.release_date = "",
'SET m.releaseDate = null',
'SET m.releaseDate = date(results.release_date)',
{m:m, results:results}) YIELD value
RETURN m
48. //For Movie objects just loaded, pick out trilogy and retrieve cast of those movies
WITH 'https://api.themoviedb.org/3/movie/' as prefix, '/credits?api_key='+$apiKey as suffix,
["The Lord of the Rings: The Fellowship of the Ring", "The Lord of the Rings: The Two Towers",
"The Lord of the Rings: The Return of the King"] as movies
CALL apoc.periodic.iterate('MATCH (m:Movie) WHERE m.title IN $movies RETURN m',
'WITH m CALL apoc.load.json($prefix+m.movieId+$suffix) YIELD value
UNWIND value.cast AS cast
MERGE (c:Cast {id: cast.id})
ON CREATE SET c.name = cast.name
MERGE (ch:Character {name: cast.character})
MERGE (ch)-[r:APPEARS_IN]->(m)
MERGE (c)-[r1:PLAYED]->(ch)',
{batchSize: 1, iterateList:false, params:{movies:movies, prefix:prefix, suffix:suffix}});
51. Other ways to query and explore
• Make calls from an application
• Neo4j drivers for almost any programming language
• Java, Python, Javascript, Go, Ruby, PHP
• Visualization tools
• Open source and proprietary
• Neovis, Browser, Bloom, 3d-force-graph, Kineviz, yWorks
53. Will it play nice?
• Integrations, integrations, integrations!
• Out-of-the-box plugins (APOC, GraphQL, graph algorithms)
• Custom extensions possible
• Tons of options for feeding data to existing tools/systems
• Tableau, Kettle, Kafka, ElasticSearch, other DBs, Spark, and many more
55. What can I show to others?
• Neo4j Bloom (or partner/open source visualization tools)
• Exploration tool for business users to query with natural language
• Basic reports and query performance
• Build according to specs and compare solutions, just as you would with any technology
evaluation
• Use cases and success stories
• https://neo4j.com/resources
• Possible integrations and minimal interruption of existing systems
• What tools are you using today? Does our integration fit neatly?
• Community and support network!
• Support agreement or fabulous expert community answers to questions