08448380779 Call Girls In Civil Lines Women Seeking Men
Leveraging Graph Analytics for Fraud Detection in PaySim Data
1. Graph Analytics for
Fraud Detection
Using PaySim and the Neo4j Graph Data Science Library
Dave Voutila <dave.voutila@neo4j.com>
Sales Engineer
1
2. ● Sales Engineer with Neo4j! 👋
● Based in Vermont, USA 🌲🍁⛰
○ Work primarily with our Canadian clients
● You can find me on…
○ ...the web: https://sisu.io
○ ...LinkedIn: https://www.linkedin.com/in/davevoutila/
○ ...GitHub: https://github.com/voutilad
○ In the hive of scum and villainy aka Twitter: @voutilad
Who am I?
3. ● Generating realistic, synthetic financial
transactions with PaySim
● Quick rundown of the Neo4j
Graph Data Science Library
● Live Demo of using graph algorithms to
analyze PaySim for fraudulent and risky
behavior
A Tale in 3 Acts
5. Meet PaySim 👋
● Simulates actors in a mobile
money network
○ Clients
○ Merchants
○ Banks
● Generate synthetic data that
is realistic in the aggregate
● Open source, customizable
● DETERMINISTIC!
7. ● Parameterized simulation of Client transactions
● Some fraud simulation, specifically money mules
PaySim v1 & v2-snapshot
8. ● 1st Party / Synthetic Fraud
○ Reuse of identifiers (ssn, email, phone)
○ Fabrication of identifiers
● 3rd Party Fraud
○ Attacks via Merchant vectors
○ Persistence and retargeting of victims
● You can more easily build your own fraudsters
● And a bunch more knobs and dials
PaySim v2.3 (my fork)
9. An Aside: 1st Party & Synthetic Fraud
See: https://sisu.io/posts/paysim-part3/
13. Graph Data Science is a
science-driven approach to gain
knowledge from the relationships
and structures in data, typically to
power predictions.
What is Graph data science?
Data scientists use
relationships to answer
questions.
14. Query (e.g. Cypher)
Real-time, local decisioning
and pattern matching
Graph Algorithms
Global analysis
and iterations
You know what you’re
looking for and making a
decision
You’re learning the overall structure
of a network, updating data, and
predicting
Local
Patterns
Global
Computation
15. • Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality & Approximate
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• Balanced Triad (identification)
Graph Algorithms & Functions in Neo4j
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Euclidean Distance
• Cosine Similarity
• Node Similarity (Jaccard)
• Overlap Similarity
• Pearson Similarity
• Approximate KNN
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
...and also Auxiliary Functions:
• Random graph generation
• One hot encoding
• Distributions & metrics
16. It’s easier than it sounds (promise)
The GDS doesn’t operate using the Neo4j kernel API
17. Graph Algorithms for Detecting Fraud
Graph algorithms enable reasoning
about network structure
Louvain to identify communities
that frequently interact
PageRank to measure influence
and transaction volumes
Connected components
identify disjointed group
sharing identifiers
Jaccard to measure account
similarity
19. ● Each step has a probability of
committing fraud
● If they’re feeling malicious…
○ They have a probability of
re-victimizing someone
○ Or they’ll find a new victim via a high
risk Merchant that they target
(peeking into their history)
● They perform test charges (payments)
● Subsequently may transfer balance
Meet our 3rd Party Fraudster
20. ● The Goal
○ Find unreported Fraud Victims
○ Find at-risk individuals
● The Approach
○ Build a training set of clients
○ Engineer some sort of risk score for merchants (our alleged
fraud vector)
○ Use Client transaction history with Merchants to categorize
them as likely fraud victims
Our mission
21. ● PaySim generated
○ 1.6M Transactions
○ ~10k Clients
○ 500 Merchants
● The graph
○ 1.6M nodes (98% transactions)
○ 5M relationships (we’ll be making more)
Our Playground
22. ● Known Fraud Victims
○ Folks that reported fraudulent charges
○ In the case of PaySim, we are all seeing
● Known Non-Victims
○ What’s a term for non-victims anyway?!
○ These are accounts that have no fraud
Our Training Set
23. ● We’ll primarily be relating Clients to Merchants
Bipartite Graphs
Clients
Merchants
Transactions
29. ● Those ~650 high-risk client accounts...
○ Can similarity routines reveal
anything?
○ What if we look at additional historical
transactions?
● That suspect merchant…
○ What can we glean from their activity?
● Operationalizing our findings...
○ How can we implement mutable
graph projections?
Possible next steps in our investigation
30. ● Make your own Fraudsters
○ https://github.com/voutilad/paysim
○ https://www.sisu.io/posts/paysim
○ Requires Java JDK 8 or newer (tested with 11)
● Integrate PaySim with Neo4j
○ https://github.com/voutilad/paysim-demo
○ Works with both Neo4j 3.5 and 4.0
Your Turn: Getting & Using PaySim