Successfully reported this slideshow.
Upcoming SlideShare
×

Graph algorithms are powerful tools, and there’s a lot of excitement about their applications for data science. It can sometimes be difficult, however - especially for those of us who aren’t data scientists - to know how they might be applied to a particular data set or a specific business problem. There are graph algorithms for centrality and importance measurement, community detection, similarity comparison, pathfinding, and link prediction. Which ones should you use on your data, and which ones might be most useful in answering your business questions?

In this presentation, we’ll look at a few examples of Neo4j graph algorithms, and see how they can be applied to data and business problems from the banking industry. We’ll discuss what kinds of data are appropriate for different types of algorithms, show how to model and structure data to work with graph algorithms, and run through some real-world scenarios demonstrating the use of graph algorithms on a sample banking data set.

Webinar with Joe Depeau, Neo4j, April 15, 2020

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

1. 1. Graph Algorithms in Banking Joe Depeau Sr. Presales Consultant, UK 15th April, 2020 @joedepeau http://linkedin.com/in/joedepeau
2. 2. • Introduction to Graphs and Neo4j • Introduction to The Neo4j Graph Data Science Library • Demo Data Overview • Review of Graph Algorithms for Demo • Demo • Q&A 2 Agenda
3. 3. Introduction to Graphs and Neo4j 3
4. 4. Relational vs. Graph Databases 4
5. 5. Graphs in the Age of Connections 5
6. 6. 6
7. 7. 7 Car DRIVES name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Anatomy of a Property Graph Database Nodes • Represent the objects in the graph • Can be labeled Relationships • Relate nodes by type and direction Properties • Name-value pairs that can go on nodes and relationships. LOVES LOVES LIVES WITH OW NS Person Person
8. 8. Neo4j Graph Data Science Library 8
9. 9. Graph Algorithms are calculations that describe the topology and connectivity of your graph 9 What the heck are graph algorithms? - Global traversals & computations - Learning overall structure - Typically heuristics and approximations - Extracting new data from what you already have What’s important? What’s similar? What are efficient traversals?
10. 10. 10 ...and what do I do with them? Explore, plan, measure Find significant patterns and plan for optimal structures Score outcomes and set a threshold value for a prediction Machine learning Use the measures as features to train an ML model 1st node 2nd node Common neighbors Preferential attachment Label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0
11. 11. 11 Tell me more! Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality. Determines the importance of distinct nodes in the network. Detects group clustering or partition options. Evaluates how alike nodes are by neighbors and relationships. Estimates the likelihood of nodes forming a future relationship. Similarity
12. 12. Graph and ML algorithms in Neo4j • Minimum Weight Spanning Tree • Shortest Path • Single Source Shortest Path • All Pairs Shortest Path • A* • Yen’s K-shortest Paths • Random Walk • Breadth First Search • Depth First Search • Degree Centrality • Closeness Centrality • Betweenness Centrality • PageRank • ArticleRank • Eigenvector Centrality • Triangle Count / Clustering Coefficient • Weakly Connected Components • Strongly Connected Components • Label Propagation • Louvain Modularity • K-1 Colouring • Modularity Optimisation • Node Similarity • Approximate Nearest Neighbours • Cosine Similarity • Euclidean Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity https://neo4j.com/docs/graph-data-science/1.0/ Link Prediction • Adamic Adar • Common Neighbours • Preferential Attachment • Resource Allocations • Same Community • Total Neighbours 12
13. 13. Demo Data Overview 13
14. 14. 14 Some Examples of Typical Bank Data Event DataProduct and Services Data Customer DataOrganisational Data 3rd Party Data Documentation Employee Data Processes Systems and Databases KPIs and Reports Address Personal Data Documents Relationships Assets Documentation Processes Product / Service Details Product / Service Hierarchy Pricing Money Movements Web / App Activity Customer Contact Social Media Credit Rating Agencies Market Data Organisational Hierarchy Corporate Data
15. 15. 15 Some Examples of Typical Bank Data Event DataProduct and Services Data Customer DataOrganisational Data 3rd Party Data Documentation Employee Data Processes Systems and Databases KPIs and Reports Address Personal Data Documents Relationships Assets Documentation Processes Product / Service Details Product / Service Hierarchy Pricing Money Movements Web / App Activity Customer Contact Social Media Credit Rating Agencies Market Data Organisational Hierarchy Corporate Data
16. 16. 16 Our Graph Model
17. 17. 17 Three ways a Client node can be Flagged Performed a transaction flagged as fraud Share a SSN with another Client Have more than one SSN on file
18. 18. Graph Algorithms for Demonstration 18
19. 19. PageRank What: Finds important nodes based on their relationships. Why: Identify important or influential Client nodes by quantifying the flows of money towards them. Uses: - Fraud detection - Anti-money Laundering - Inform prioritization during analysis and investigation19
20. 20. 20 The PageRank Algorithm PageRank: what nodes can be considered ‘important’ in our graph based on money flows ?
21. 21. 21 The PageRank Algorithm PageRank: what nodes can be considered ‘important’ in our graph based on money flows ? Inputs .pagerank Property Output
22. 22. Weakly Connected Components What: Finds disconnected community subgraphs in our data. Why: Identify communities based on connections with shared pieces of identity. Uses: - Householding - Synthetic identities - Stolen identities 22
23. 23. 23 The Weakly Connected Components Algorithm Weakly Connected Components: what communities exist in the data based on connections to pieces of identity ?
24. 24. 24 The Weakly Connected Components Algorithm Weakly Connected Components: what communities exist in the data based on connections to pieces of identity ? .component_id Property Output Inputs
25. 25. Node Similarity What: Similarity between nodes based on neighbours. Writes a new relationship to the graph. Why: Identify similar nodes who share common pieces of identity. Uses: - Entity Resolution - Synthetic identities - Stolen identities 25
26. 26. 26 The Node Similarity Algorithm Node Similarity : how similar are two Client nodes based on pieces of shared identity ?
27. 27. 27 The Node Similarity Algorithm Node Similarity : how similar are two Client nodes based on pieces of shared identity ? SIMILAR Relationship Output with .score property Inputs
28. 28. Louvain Modularity What: Finds communities in our graph who are connected. Can return intermediate results. Why: Useful for identifying communities based on transaction behaviour rather than identity. Uses: - Fraud ring detection - Anti-money Laundering 28
29. 29. 29 The Louvain Algorithm Louvain: what communities of nodes transact amongst themselves ?
30. 30. 30 The Louvain Algorithm Louvain: what communities of nodes transact amongst themselves ? Inputs .louvain_community Property Output
31. 31. Demo 31
32. 32. Q & A 32
33. 33. 33 Thank you!