5. The Trinity:
Streams,
Tables,
Graphs
5
Streams
● Record history
● Sequence of immutable
data records
Tables
● Represent state
● Collection of key-
value pairs
Graphs
● Integrate datasets and query
across them in near real time
● Graph analytics provide
actionable insight
https://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
6. The world is a graph – everything is connected
• people, places, events
• companies, markets
• countries, history, politics
• sciences, art, teaching
• technology, networks, machines,
applications, users
• software, code, dependencies,
architecture, deployments
• criminals, fraudsters and their behavior
7. Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
12. Graph Database
● Database management system (DBMS)
● Property Graph data model
● Cypher query language
● Graph analytics
● Data visualization
● Developer tool for building applications
What is Neo4j?
neo4j.com/
14. Cypher Query Language
CREATE (:Company { name:“Neo4j”} ) -[:LOCATED_IN]-> (:City { name:“San Mateo”} )
LOCATED_I
NNeo4j
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
An
San Mateo
15.
16. Proof of concept goal:
● Combine customer, account, and
session data from different
systems into Neo4j
● Find suspicious parties and
accounts
● Identify potential fraud rings
(connected parties) and flag for
analyst follow up
18
Fraud Detection With Neo4j At Pig E. Bank
Customers Accounts Sessions
17. ○ Suspicious:
■ Shared SSNs, phones,
cookies
■ Connected to known
fraudsters
Evidence of Fraud
Cookie SSN Phone
Person Person
Person
25. 27
Fraud Flagger
1. Links innocents to
suspects
1. Suspects: a known
fraudster, or anyone
connected to one
1. Louvain Community
Detection to group all
associated parties into
candidate fraud rings
26. Graph Algorithm Categories in Neo4j
neo4j.com/
graph-algorithms-
book/
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Finds optimal paths
or evaluates route
availability and quality
Determines the
importance of distinct
nodes in the network
Detects group
clustering or partition
options
Evaluates how
alike nodes are
Estimates the likelihood
of nodes forming a
future relationship
Similarity
27. Graph Algorithms in Neo4j
• Parallel Breadth First Search &
DFS
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-
Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Updated June 2019
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
+35
36. ● Easiest way to deploy a
connector to get data into Neo4j
● Best flexibility to change which
data you pull from Kafka and
what goes into Neo4j without
touching the database
38
Kafka Connect Sink
https://www.confluent.io/hub/neo4j/kafka-connect-neo4j
37. 39
Infrastructure at Pig E. Bank
stream2
stream1
stream3
Party
Interaction
Stream
Online
Banking
Account
Registration
Customer
Service
Existing
Systems
38. NEO4J_kafka_group_id: myconsumer
NEO4J_streams_sink_topic_cypher_cookies: "
MERGE (c:Cookie { cookie_id: event.cookie_id })
ON CREATE SET c += event
MERGE (p:Party { id: toInt(event.party_id) })
MERGE (p)-[:COOKIE]->(c)"
NEO4J_streams_sink_enabled: "true"
NEO4J_streams_procedures_enabled: "true"
40
Configuring Neo4j Streams (docker)
NEO4J_streams_source_enabled: "true"
NEO4J_streams_source_topic_nodes_fraudflags: Party{*}
NEO4J_streams_source_topic_relationships_associations: ASSOCIATED{*}
NEO4J_streams_source_schema_polling_interval: 10000
Take messages from the “cookies” topic,
and write new cookie nodes to the graph,
matched to the right party!
Whenever a change is
made to a Party
or an ASSOCIATED link
created, report
that to a topic.
39. 41
Graphs Back to Tables, with a little help from KSQL
● Neo4j-streams publishes CDC back to Kafka
● Define a stream using KSQL that structures that
JSON
● Simple KSQL query over that stream yields all of
the cases WHERE fraud_followup OR
fraud_confirmed;
51. 54
Investigative GRANDstack App React UI
fetches data
from Neo4j
using
GraphQL
View data on parties or
“fraud flagged” cases
Select an active case to
begin adjudication
analysis Graph visualization
enables fraud analyst
to explore the
connected accounts to
verify fraudulent
behavior.
Analyst adjudicates case, updating data
in Neo4j which sends an event to Kafka
fraud stream via neo4j-streams
55. No PII Was Harmed
in the Making of this
Presentation
You may see phone numbers and Social
Security Numbers on screen.
Most of the schema and use case is real,
the data is fake.
58. ● Neo4j-streams: integrate Kafka & Neo4j, deploy as a Neo4j plugin or as a
connect worker:
○ Code: https://github.com/neo4j-contrib/neo4j-streams
○ Kafka Connect Neo4j Sink: https://www.confluent.io/hub/neo4j/kafka-
connect-neo4j
● GRANDStack: GraphQL, React, Apollo, and Neo4j for building rich web
applications on graphs https://grandstack.io/
● How to Leverage Neo4j-Streams to build a just-in-time data warehouse
https://www.freecodecamp.org/news/how-to-leverage-neo4j-streams-and-
build-a-just-in-time-data-warehouse-64adf290f093/
● Neo4j Graph Algorithms https://neo4j.com/docs/graph-algorithms/current/
64
Resources