SlideShare a Scribd company logo
1 of 21
Neo4j GDS Best Practices
Graph Data Science Workflow
Objective
Know how to put together the right workflow
for your project, and how to combine
algorithms effectively.
GDS in production
GDS in production
Emphasis on data modeling:
(1) Define the problem you’re trying to answer:
Making recommendations? Finding anomalies?
(2) Match the problem to the correct set of algorithms:
recommendation = similarity, anomalies = centrality
(3) Modify your data model for the algorithms you want to use
- mono- or bi-partite
- modify labels, relationships to use native graphs
- consider weights, seeding
GDS in production
Configuration & best practices:
(1) Do I have enough memory to run this?
(2) How long will this take to run?
(3) What do I need to change?
GDS in production
Loading the analytics graph:
● This is one of the slowest steps, unavoidable, and takes up the
most memory so we want to minimize the number of graphs
we load
● Is it possible for us to load data for all the algorithms into a
single graph?
GDS in production
Running your algorithms:
- .stats returns summary statistics about the results of the algorithm
without writing to the database. Run this first to check if the calculations
make sense!
- .stream returns all the results as a stream - use this if you’re extracting
them for use elsewhere (eg. Python)
- .mutate writes to the in-memory graph. Output from the first
algorithm are written before moving onto the next one in the sequence
- .write writes to the Neo4j database. This is the slowest, so only run it
once you know your algos make sense!
GDS in production
Don’t forget this step:
CALL gds.graph.drop('Similarity-Graph');d
CALL gds.graph.drop('Monopartite-Graph');
Double check that you’ve got all of them:
CALL gds.graph.list();
GDS in production
Time to data science:
GDS is general purpose!
Data Pre-processing:
1. Identifying and removing super nodes: degree centrality
2. Identifying subgraphs: weakly connected components
Common Algorithms & Combinations
Community Detection + ???? = Profit
Community detection algorithms break up your graph into smaller
subgraphs based on edges. Use community detection to downscope
problems on large graphs and focus on the important parts.
- Speed up calculations: Community detection + node similarity
- Focus on the important stuff: Community detection + centrality
- Aggregate your graph: Treat communities as nodes and run centrality,
similarity, etc
Common Algorithms & Combinations
What else can we use GDS for?
Fraud Detection
Weakly Connected
Components - First Party
Fraud
Louvain - Fraud Rings
Page Rank, Degree
Centrality - Anomalies
Disambiguation
Weakly Connected
Components - Common
identifiers
Label Propagation -
Overlapping relationships
Node Similarity
Recommendations
Louvain - Interacting
communities
Page Rank, Betweenness,
Closeness Centrality-
Important users
Node Similarity
And much more!
Transactional graph:
What kind of questions can we answer?
Examples from … Retail
Customer segmentation:
Identify customers who buy similar items (node
similarity), and use Louvain to identify clusters of
consumers with similar behaviour
Item recommendations:
Recommend items that are frequently bought by the
same customers or in the same transaction with node
similarity.
Find items that influence copurchases using Page
Rank, or fast moving items with closeness Centrality
Examples from … Retail
Examples from … Marketing
Web Traffic Graph:
How can we
tell whos
who?
Examples from … Marketing
Disambiguation:
Identify subgraphs of users with co-occurring
identifiers using Weakly Connected Components
User Behavior:
Identify communities of unique users that interact
with the same websites using Label Propagation
Identify which websites drive traffic using Page
Rank
Knowledge graph representing genes, chemicals, diseases:
Examples from … Life Sciences
What questions can we answer?
PageRank & Betweenness to identify essential regulatory genes or
drug targets
Louvain to identify protein regulatory networks
Shortest path to link drug targets to possible outcomes or side
effects
Node Similarity to find structurally similar chemicals
Link Prediction to estimate likelihood of interactions
Examples from … Life Sciences
What are your use
cases? Let’s brainstorm!
Questions?

More Related Content

What's hot

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Neo4j
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slidesSean Mulvehill
 
How do You Graph
How do You GraphHow do You Graph
How do You GraphBen Krug
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricCambridge Semantics
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run GraphVaticle
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewNeo4j
 
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento FrameworkNeo4j
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
GraphTour London 2020 - Customer Journey
GraphTour London 2020  - Customer Journey GraphTour London 2020  - Customer Journey
GraphTour London 2020 - Customer Journey Neo4j
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsNeo4j
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Markus Harrer
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Cambridge Semantics
 

What's hot (20)

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slides
 
How do You Graph
How do You GraphHow do You Graph
How do You Graph
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform Overview
 
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
GraphTour London 2020 - Customer Journey
GraphTour London 2020  - Customer Journey GraphTour London 2020  - Customer Journey
GraphTour London 2020 - Customer Journey
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with Graphs
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
 
Apouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-dataApouc 2014-business-analytics-and-big-data
Apouc 2014-business-analytics-and-big-data
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 

Similar to Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneDoug Needham
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
How to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdfHow to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdfCareervira
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystifiedMarc Moreau
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergencekvnnrao
 
Advertising using big data
Advertising using big dataAdvertising using big data
Advertising using big dataRajathMk1
 
Big Data
Big DataBig Data
Big DataNGDATA
 

Similar to Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices (20)

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Data science guide
Data science guideData science guide
Data science guide
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
How to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdfHow to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdf
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystified
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergence
 
Advertising using big data
Advertising using big dataAdvertising using big data
Advertising using big data
 
Big Data
Big DataBig Data
Big Data
 

More from Neo4j

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 

More from Neo4j (20)

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices

  • 1. Neo4j GDS Best Practices Graph Data Science Workflow
  • 2. Objective Know how to put together the right workflow for your project, and how to combine algorithms effectively.
  • 4. GDS in production Emphasis on data modeling: (1) Define the problem you’re trying to answer: Making recommendations? Finding anomalies? (2) Match the problem to the correct set of algorithms: recommendation = similarity, anomalies = centrality (3) Modify your data model for the algorithms you want to use - mono- or bi-partite - modify labels, relationships to use native graphs - consider weights, seeding
  • 5. GDS in production Configuration & best practices: (1) Do I have enough memory to run this? (2) How long will this take to run? (3) What do I need to change?
  • 6. GDS in production Loading the analytics graph: ● This is one of the slowest steps, unavoidable, and takes up the most memory so we want to minimize the number of graphs we load ● Is it possible for us to load data for all the algorithms into a single graph?
  • 7. GDS in production Running your algorithms: - .stats returns summary statistics about the results of the algorithm without writing to the database. Run this first to check if the calculations make sense! - .stream returns all the results as a stream - use this if you’re extracting them for use elsewhere (eg. Python) - .mutate writes to the in-memory graph. Output from the first algorithm are written before moving onto the next one in the sequence - .write writes to the Neo4j database. This is the slowest, so only run it once you know your algos make sense!
  • 8. GDS in production Don’t forget this step: CALL gds.graph.drop('Similarity-Graph');d CALL gds.graph.drop('Monopartite-Graph'); Double check that you’ve got all of them: CALL gds.graph.list();
  • 9. GDS in production Time to data science:
  • 10. GDS is general purpose!
  • 11. Data Pre-processing: 1. Identifying and removing super nodes: degree centrality 2. Identifying subgraphs: weakly connected components Common Algorithms & Combinations
  • 12. Community Detection + ???? = Profit Community detection algorithms break up your graph into smaller subgraphs based on edges. Use community detection to downscope problems on large graphs and focus on the important parts. - Speed up calculations: Community detection + node similarity - Focus on the important stuff: Community detection + centrality - Aggregate your graph: Treat communities as nodes and run centrality, similarity, etc Common Algorithms & Combinations
  • 13. What else can we use GDS for? Fraud Detection Weakly Connected Components - First Party Fraud Louvain - Fraud Rings Page Rank, Degree Centrality - Anomalies Disambiguation Weakly Connected Components - Common identifiers Label Propagation - Overlapping relationships Node Similarity Recommendations Louvain - Interacting communities Page Rank, Betweenness, Closeness Centrality- Important users Node Similarity And much more!
  • 14. Transactional graph: What kind of questions can we answer? Examples from … Retail
  • 15. Customer segmentation: Identify customers who buy similar items (node similarity), and use Louvain to identify clusters of consumers with similar behaviour Item recommendations: Recommend items that are frequently bought by the same customers or in the same transaction with node similarity. Find items that influence copurchases using Page Rank, or fast moving items with closeness Centrality Examples from … Retail
  • 16. Examples from … Marketing Web Traffic Graph: How can we tell whos who?
  • 17. Examples from … Marketing Disambiguation: Identify subgraphs of users with co-occurring identifiers using Weakly Connected Components User Behavior: Identify communities of unique users that interact with the same websites using Label Propagation Identify which websites drive traffic using Page Rank
  • 18. Knowledge graph representing genes, chemicals, diseases: Examples from … Life Sciences What questions can we answer?
  • 19. PageRank & Betweenness to identify essential regulatory genes or drug targets Louvain to identify protein regulatory networks Shortest path to link drug targets to possible outcomes or side effects Node Similarity to find structurally similar chemicals Link Prediction to estimate likelihood of interactions Examples from … Life Sciences
  • 20. What are your use cases? Let’s brainstorm!