SlideShare a Scribd company logo
1 of 22
Worst Practices and Gotchas
Graph Analytics team
Neo4j
2
About the presenter
Mats Rydberg
mats@neo4j.org
Software Engineer, Neo4j
Team Lead Graph Analytics
3
Overview
Objectives: Learn the most common issues and how to avoid them.
“Neo4j crashes when I put the JAR in the plugin directory”
Either:
● You’re running the wrong version of Neo4j
○ GDS 1.0, 1.1 supports Neo4j 3.5.9+, not 4.0
○ GDS 1.2 supports Neo4j 4.0.x, not 3.5
● You’ve installed graph algorithms and GDS together -- but they’re
not compatible
6
Common Mistakes
“I can't find feature X”
Make sure you're using the latest GDS version. Upgrade using Neo4j
Desktop or from the Download Center:
https://neo4j.com/download-center/#algorithms
Also make sure you're reading the corresponding version of the
documentation: https://neo4j.com/docs/graph-data-science/1.1/
7
Common Mistakes
“My algorithm ran but returned all 0s”
or
“My algorithm ran but all the results are the same”
or
“My algorithm ran but every node is in its own community”
or
“My algorithm ran but the shortest path is infinity”
These may result from having no connections between nodes.
Check the node and relationship projections -- are there edges
between your nodes?
8
Common Mistakes
“My algorithm ran, but all the nodes are in the same community”
Your graph may be too densely connected … but all hope is not lost!
1) Try a different algorithm -- if WCC finds a single community, that
doesn’t mean LPA will as well.
2) Try using weights (for Louvain, LPA) or thresholds (for WCC)
3) If you’re using Louvain, check the intermediate communities
9
Common Mistakes
“The algorithm ran, so it must be right… right?”
We’ve built in some guardrails - for example, you can’t run an algo
that’s incompatible with the direction of your graph - but the library
isn’t foolproof.
- You could project a bunch of different node labels as if they’re the
same thing
- You could treat any number as if it’s a weight or a seed property
- You could run a weighted algorithm on unweighted relationships with
default settings
- You could set weird values for tolerance, damping factor, iterations...
10
Common Mistakes
“I ran my algorithm twice and got different results”
Yes -
1) a number of the algorithms are stochastic meaning the algorithm uses a
heuristic that is non-deterministic.
2) Thread concurrency may cause non-deterministic results because we
can’t control the order in which threads are processed.
It doesn’t mean the results are wrong!
- Use seeding so you keep the results from the first run
- Know which algorithms are stochastic and choose appropriately
- Check if an algorithm converged on an answer
11
Common Mistakes
“I get different results when I run my algorithm with different X”
Yes - that’s intended behavior.
In particular, be mindful of:
- orientation: 'UNDIRECTED' will double all your edges
- aggregation: 'SINGLE' will deduplicate parallel edges
- maxIterations, tolerance control when an algorithm stops
- nodeLabels, relationshipTypes control what parts of your
projected graph are used in the algorithm
12
Common Mistakes
“This algorithm has literally been running for three months”
Some of the algorithms are really slow by their nature, specifically:
- Betweenness Centrality
- Node Similarity, or any of the alpha similarity algorithms
1) Check progress in the debug log
2) Break up the problem -- run WCC and execute on individual
components, or within individual communities
3) Set topK, topN, degreeCutOff for similarity
4) Use an approximation method:
gds.alpha.betweenness.sampled, gds.alpha.ml.ann13
Common Mistakes
“My algorithm didn't run because of the memory guard”
There's a feature to protect your database from crashing. Sometimes
it stops an algorithm from running. This can be disabled with the
sudo configuration parameter.
14
Common Mistakes
“My JVM went out of memory!”
Sometimes during workload the memory footprint is increased, for
example when making use of the .mutate execution mode.
To free up memory, you could
• Drop unused graphs
• Remove unused properties or relationships
Some algorithms can be configured to use less memory as well
15
Common Mistakes
16
… so how do I avoid this?
Do Don't
17
- Use memory estimation to find
out about memory requirements
- Configure Neo4j to use as much
heap as possible
(dbms.memory.heap.max_size)
- Run algorithms on a single
instance or read replica
- Configure Neo4j to use as much
page cache as possible
(dbms.memory.pagecache.size)
- Run algorithms on a core
member of a cluster
Configuration
Do Don't
18
- Only load nodes and
relationships that you plan to
use
- Only load necessary properties
- Avoid redundant relationship
projections (natural + reverse ==
undirected)
- Consider aggregating parallel
relationships
- Use '*' when creating in-memory
graphs
- Use Cypher projections in
production
Projections
Do Don't
19
- Use the catalog if multiple
algorithms are run on the
same graph
- Drop graphs that are not
needed any more
- One large in-memory graph
is better than multiple small
ones
- Use the catalog for one-time
algorithm executions
- Update your underlying data
in Neo4j without refreshing
your graph projection
Catalog
Do Don't
20
- Use seeding if possible
- Use threshold if possible
- Use tolerance if possible
- Try different concurrency
settings
- Run on anonymous graphs in
production
Algorithm Execution
Do Don't
21
- Try to run only one workload
at a time
- Avoid writing into properties
used in production
- Alter the graph in the same
transaction that loads the
graph
- Run algorithms on
operational systems
Single User Mode
We assume single user mode
- Catalog is partitioned by user
- Algos will grab as many resources as they can
- Concurrency can be controlled by concurrency parameter
- One user could e.g. remove a graph while the algo is running
22
Caveats - Single User Mode
Loading
- One transaction per loader thread
- Changes made in the transaction that calls the procedure are not
visible to the loader
- Loading the same graph twice can result in different graphs
Write back
- Happens in batches of 10k-100k elements
- Node values are written in parallel - one Tx per Thread and batch
- Relationship values are written single-threaded
- Rollback is not possible once a Batch-Tx has been committed
23
Caveats - Transactionality
Thank you!
Find us at
https://github.com/neo4j/graph-data-science

More Related Content

Similar to Graph Data Science WORST Practices

Analytics tools and Instruments
Analytics tools and InstrumentsAnalytics tools and Instruments
Analytics tools and Instruments
Krunal Soni
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
slandelle
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
Triskelion_Kaggle
 

Similar to Graph Data Science WORST Practices (20)

HPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster TutorialHPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster Tutorial
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
Analytics tools and Instruments
Analytics tools and InstrumentsAnalytics tools and Instruments
Analytics tools and Instruments
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
 
Surge2012
Surge2012Surge2012
Surge2012
 
Become a Better Developer with Debugging Techniques for Drupal (and more!)
Become a Better Developer with Debugging Techniques for Drupal (and more!)Become a Better Developer with Debugging Techniques for Drupal (and more!)
Become a Better Developer with Debugging Techniques for Drupal (and more!)
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
Gatling
Gatling Gatling
Gatling
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
 
12. Parallel Algorithms.pptx
12. Parallel Algorithms.pptx12. Parallel Algorithms.pptx
12. Parallel Algorithms.pptx
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Instant DBMS Homework Help
Instant DBMS Homework HelpInstant DBMS Homework Help
Instant DBMS Homework Help
 
Csc 440 assignment 2 convex hull out tuesday, feb 9th
Csc 440 assignment 2 convex hull out tuesday, feb 9thCsc 440 assignment 2 convex hull out tuesday, feb 9th
Csc 440 assignment 2 convex hull out tuesday, feb 9th
 
debugging (1).ppt
debugging (1).pptdebugging (1).ppt
debugging (1).ppt
 
An important characteristic of a test suite that is computed by a dynamic ana...
An important characteristic of a test suite that is computed by a dynamic ana...An important characteristic of a test suite that is computed by a dynamic ana...
An important characteristic of a test suite that is computed by a dynamic ana...
 
2019 PHP Serbia - Boosting your performance with Blackfire
2019 PHP Serbia - Boosting your performance with Blackfire2019 PHP Serbia - Boosting your performance with Blackfire
2019 PHP Serbia - Boosting your performance with Blackfire
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
 

More from Neo4j

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Graph Data Science WORST Practices

  • 1. Worst Practices and Gotchas Graph Analytics team Neo4j
  • 2. 2 About the presenter Mats Rydberg mats@neo4j.org Software Engineer, Neo4j Team Lead Graph Analytics
  • 3. 3 Overview Objectives: Learn the most common issues and how to avoid them.
  • 4. “Neo4j crashes when I put the JAR in the plugin directory” Either: ● You’re running the wrong version of Neo4j ○ GDS 1.0, 1.1 supports Neo4j 3.5.9+, not 4.0 ○ GDS 1.2 supports Neo4j 4.0.x, not 3.5 ● You’ve installed graph algorithms and GDS together -- but they’re not compatible 6 Common Mistakes
  • 5. “I can't find feature X” Make sure you're using the latest GDS version. Upgrade using Neo4j Desktop or from the Download Center: https://neo4j.com/download-center/#algorithms Also make sure you're reading the corresponding version of the documentation: https://neo4j.com/docs/graph-data-science/1.1/ 7 Common Mistakes
  • 6. “My algorithm ran but returned all 0s” or “My algorithm ran but all the results are the same” or “My algorithm ran but every node is in its own community” or “My algorithm ran but the shortest path is infinity” These may result from having no connections between nodes. Check the node and relationship projections -- are there edges between your nodes? 8 Common Mistakes
  • 7. “My algorithm ran, but all the nodes are in the same community” Your graph may be too densely connected … but all hope is not lost! 1) Try a different algorithm -- if WCC finds a single community, that doesn’t mean LPA will as well. 2) Try using weights (for Louvain, LPA) or thresholds (for WCC) 3) If you’re using Louvain, check the intermediate communities 9 Common Mistakes
  • 8. “The algorithm ran, so it must be right… right?” We’ve built in some guardrails - for example, you can’t run an algo that’s incompatible with the direction of your graph - but the library isn’t foolproof. - You could project a bunch of different node labels as if they’re the same thing - You could treat any number as if it’s a weight or a seed property - You could run a weighted algorithm on unweighted relationships with default settings - You could set weird values for tolerance, damping factor, iterations... 10 Common Mistakes
  • 9. “I ran my algorithm twice and got different results” Yes - 1) a number of the algorithms are stochastic meaning the algorithm uses a heuristic that is non-deterministic. 2) Thread concurrency may cause non-deterministic results because we can’t control the order in which threads are processed. It doesn’t mean the results are wrong! - Use seeding so you keep the results from the first run - Know which algorithms are stochastic and choose appropriately - Check if an algorithm converged on an answer 11 Common Mistakes
  • 10. “I get different results when I run my algorithm with different X” Yes - that’s intended behavior. In particular, be mindful of: - orientation: 'UNDIRECTED' will double all your edges - aggregation: 'SINGLE' will deduplicate parallel edges - maxIterations, tolerance control when an algorithm stops - nodeLabels, relationshipTypes control what parts of your projected graph are used in the algorithm 12 Common Mistakes
  • 11. “This algorithm has literally been running for three months” Some of the algorithms are really slow by their nature, specifically: - Betweenness Centrality - Node Similarity, or any of the alpha similarity algorithms 1) Check progress in the debug log 2) Break up the problem -- run WCC and execute on individual components, or within individual communities 3) Set topK, topN, degreeCutOff for similarity 4) Use an approximation method: gds.alpha.betweenness.sampled, gds.alpha.ml.ann13 Common Mistakes
  • 12. “My algorithm didn't run because of the memory guard” There's a feature to protect your database from crashing. Sometimes it stops an algorithm from running. This can be disabled with the sudo configuration parameter. 14 Common Mistakes
  • 13. “My JVM went out of memory!” Sometimes during workload the memory footprint is increased, for example when making use of the .mutate execution mode. To free up memory, you could • Drop unused graphs • Remove unused properties or relationships Some algorithms can be configured to use less memory as well 15 Common Mistakes
  • 14. 16 … so how do I avoid this?
  • 15. Do Don't 17 - Use memory estimation to find out about memory requirements - Configure Neo4j to use as much heap as possible (dbms.memory.heap.max_size) - Run algorithms on a single instance or read replica - Configure Neo4j to use as much page cache as possible (dbms.memory.pagecache.size) - Run algorithms on a core member of a cluster Configuration
  • 16. Do Don't 18 - Only load nodes and relationships that you plan to use - Only load necessary properties - Avoid redundant relationship projections (natural + reverse == undirected) - Consider aggregating parallel relationships - Use '*' when creating in-memory graphs - Use Cypher projections in production Projections
  • 17. Do Don't 19 - Use the catalog if multiple algorithms are run on the same graph - Drop graphs that are not needed any more - One large in-memory graph is better than multiple small ones - Use the catalog for one-time algorithm executions - Update your underlying data in Neo4j without refreshing your graph projection Catalog
  • 18. Do Don't 20 - Use seeding if possible - Use threshold if possible - Use tolerance if possible - Try different concurrency settings - Run on anonymous graphs in production Algorithm Execution
  • 19. Do Don't 21 - Try to run only one workload at a time - Avoid writing into properties used in production - Alter the graph in the same transaction that loads the graph - Run algorithms on operational systems Single User Mode
  • 20. We assume single user mode - Catalog is partitioned by user - Algos will grab as many resources as they can - Concurrency can be controlled by concurrency parameter - One user could e.g. remove a graph while the algo is running 22 Caveats - Single User Mode
  • 21. Loading - One transaction per loader thread - Changes made in the transaction that calls the procedure are not visible to the loader - Loading the same graph twice can result in different graphs Write back - Happens in batches of 10k-100k elements - Node values are written in parallel - one Tx per Thread and batch - Relationship values are written single-threaded - Rollback is not possible once a Batch-Tx has been committed 23 Caveats - Transactionality
  • 22. Thank you! Find us at https://github.com/neo4j/graph-data-science