Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4j

Presentation from GraphTour New York 2019 held October 16 - Extending the Stream-Table Duality to Trinity with Graphs, Jennifer Reif, Neo4j

  • Login to see the comments

Neo4j GraphTour New York_Leveraging Graphs for AI_Neo4j

  1. 1. Extending the Stream-Table Duality Into a Trinity With Graphs Jennifer Reif Neo4j jennifer.reif@neo4j.com @JMHReif
  2. 2. OR: “Streams and Tables and Graphs, oh my!”
  3. 3. Jennifer Reif jennifer.reif@neo4j.com @JMHReif 3 Neo4j Labs Engineer
  4. 4. Stream-Table Duality 4 https://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables A stream can be considered a changelog of a table A table is a snapshot of latest value of each key in a stream. Many applications need both streams and tables, leverage both depending on the use case. Streams ● Record history ● Sequence of immutable data records Tables ● Represent state ● Collection of key- value pairs
  5. 5. The Trinity: Streams, Tables, Graphs 5 Streams ● Record history ● Sequence of immutable data records Tables ● Represent state ● Collection of key- value pairs Graphs ● Integrate datasets and query across them in near real time ● Graph analytics provide actionable insight https://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
  6. 6. The world is a graph – everything is connected • people, places, events • companies, markets • countries, history, politics • sciences, art, teaching • technology, networks, machines, applications, users • software, code, dependencies, architecture, deployments • criminals, fraudsters and their behavior
  7. 7. Use Cases Internal Applications Master Data Management Network and IT Operations Fraud Detection Customer-Facing Applications Real-Time Recommendations Graph-Based Search Identity and Access Management
  8. 8. Pig E. Bank Paul Pennequin - Software Engineer
  9. 9. No problem! Give me 3 weeks!
  10. 10. Week 1: Getting Started With Graphs
  11. 11. Graph Database ● Database management system (DBMS) ● Property Graph data model ● Cypher query language ● Graph analytics ● Data visualization ● Developer tool for building applications What is Neo4j? neo4j.com/
  12. 12. Property Graph
  13. 13. Cypher Query Language CREATE (:Company { name:“Neo4j”} ) -[:LOCATED_IN]-> (:City { name:“San Mateo”} ) LOCATED_I NNeo4j LABEL PROPERTY NODE NODE LABEL PROPERTY An San Mateo
  14. 14. Proof of concept goal: ● Combine customer, account, and session data from different systems into Neo4j ● Find suspicious parties and accounts ● Identify potential fraud rings (connected parties) and flag for analyst follow up 18 Fraud Detection With Neo4j At Pig E. Bank Customers Accounts Sessions
  15. 15. ○ Suspicious: ■ Shared SSNs, phones, cookies ■ Connected to known fraudsters Evidence of Fraud Cookie SSN Phone Person Person Person
  16. 16. Fraud Detection Data Model
  17. 17. 21 LOAD CSV - parties.csv
  18. 18. Querying With Cypher
  19. 19. 23 Find Parties With Overlapping SSN, Phone, Or Cookies
  20. 20. 24
  21. 21. 25 Find Fraud Rings
  22. 22. 26
  23. 23. 27 Fraud Flagger 1. Links innocents to suspects 1. Suspects: a known fraudster, or anyone connected to one 1. Louvain Community Detection to group all associated parties into candidate fraud rings
  24. 24. Graph Algorithm Categories in Neo4j neo4j.com/ graph-algorithms- book/ Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality Determines the importance of distinct nodes in the network Detects group clustering or partition options Evaluates how alike nodes are Estimates the likelihood of nodes forming a future relationship Similarity
  25. 25. Graph Algorithms in Neo4j • Parallel Breadth First Search & DFS • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi- Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Updated June 2019 Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors +35
  26. 26. Identify communities around fraudulent actors Community Detection With Louvain Algorithms
  27. 27. 31
  28. 28. OK, but what about updates? We need this on a stream.
  29. 29. Week 2: Bringing in Kafka
  30. 30. 35 Pig E. Bank Architecture - POC Customers Accounts Sessions Account Management System {event} Web API{event}
  31. 31. 36 Pig E. Bank Architecture - TARGET Customers Accounts Sessions Account Management System {event} Web API{event} Fraud Detection
  32. 32. 37 neo4j-streams • Consume messages from a topic, write to the graph • Produce messages from the graph (Debezium format) neo4j.com/labs/kafka/
  33. 33. ● Easiest way to deploy a connector to get data into Neo4j ● Best flexibility to change which data you pull from Kafka and what goes into Neo4j without touching the database 38 Kafka Connect Sink https://www.confluent.io/hub/neo4j/kafka-connect-neo4j
  34. 34. 39 Infrastructure at Pig E. Bank stream2 stream1 stream3 Party Interaction Stream Online Banking Account Registration Customer Service Existing Systems
  35. 35. NEO4J_kafka_group_id: myconsumer NEO4J_streams_sink_topic_cypher_cookies: " MERGE (c:Cookie { cookie_id: event.cookie_id }) ON CREATE SET c += event MERGE (p:Party { id: toInt(event.party_id) }) MERGE (p)-[:COOKIE]->(c)" NEO4J_streams_sink_enabled: "true" NEO4J_streams_procedures_enabled: "true" 40 Configuring Neo4j Streams (docker) NEO4J_streams_source_enabled: "true" NEO4J_streams_source_topic_nodes_fraudflags: Party{*} NEO4J_streams_source_topic_relationships_associations: ASSOCIATED{*} NEO4J_streams_source_schema_polling_interval: 10000 Take messages from the “cookies” topic, and write new cookie nodes to the graph, matched to the right party! Whenever a change is made to a Party or an ASSOCIATED link created, report that to a topic.
  36. 36. 41 Graphs Back to Tables, with a little help from KSQL ● Neo4j-streams publishes CDC back to Kafka ● Define a stream using KSQL that structures that JSON ● Simple KSQL query over that stream yields all of the cases WHERE fraud_followup OR fraud_confirmed;
  37. 37. 42 Confluent Cloud - KSQL Access to Graph Analytics
  38. 38. Week 3: How To Expose Graph Data To Fraud Analyst Team? 43
  39. 39. 44 Business View Account Activity - New Accounts - Online Accesses - Profile Updates Known Fraudulent Accounts Auditor Investigation Account Action Improved Insight Close Account Legal Action confirmed and suspected fraud cases
  40. 40. Neo4j Client Drivers
  41. 41. ● Cluster routing (bolt+routing://) ● Bolt binary protocol ● Cypher type system ● Authentication / TLS Neo4j Client Drivers neo4j.com/developer/language-guides/
  42. 42. JavaScript Example
  43. 43. GraphQL
  44. 44. What is GraphQL? graphql.org An API query language and runtime for building APIs
  45. 45. ● Fullstack framework for building applications Fullstack GraphQL with GRANDstack grandstack.io
  46. 46. “Your Application Data Is A Graph” -- GraphQL
  47. 47. Expose A GraphQL API From Neo4j 53 GRANDstack.io
  48. 48. 54 Investigative GRANDstack App React UI fetches data from Neo4j using GraphQL View data on parties or “fraud flagged” cases Select an active case to begin adjudication analysis Graph visualization enables fraud analyst to explore the connected accounts to verify fraudulent behavior. Analyst adjudicates case, updating data in Neo4j which sends an event to Kafka fraud stream via neo4j-streams
  49. 49. 55 Fraud Flagger Party Interaction Stream Fraud Suspect Stream Fraud Confirmation Stream Identify Suspects Investigate suspects Adjudicate cases Investigative App Bank Internal Analyst neo4j-streams Technical View
  50. 50. Demo 56
  51. 51. 57 Show me something real!
  52. 52. No PII Was Harmed in the Making of this Presentation You may see phone numbers and Social Security Numbers on screen. Most of the schema and use case is real, the data is fake.
  53. 53. 10x Engineer
  54. 54. Resources 63
  55. 55. ● Neo4j-streams: integrate Kafka & Neo4j, deploy as a Neo4j plugin or as a connect worker: ○ Code: https://github.com/neo4j-contrib/neo4j-streams ○ Kafka Connect Neo4j Sink: https://www.confluent.io/hub/neo4j/kafka- connect-neo4j ● GRANDStack: GraphQL, React, Apollo, and Neo4j for building rich web applications on graphs https://grandstack.io/ ● How to Leverage Neo4j-Streams to build a just-in-time data warehouse https://www.freecodecamp.org/news/how-to-leverage-neo4j-streams-and- build-a-just-in-time-data-warehouse-64adf290f093/ ● Neo4j Graph Algorithms https://neo4j.com/docs/graph-algorithms/current/ 64 Resources
  56. 56. ● https://maxdemarzi.com/2019/08/19/finding-fraud/ ● https://maxdemarzi.com/2019/08/20/finding-fraud-part-two/ 65 Graph Based Fraud Detection Resources
  57. 57. Demo Code github.com/moxious/kafka-summit-fraud- demo jennifer.reif@neo4j.com @JMHReif
  58. 58. github.com/moxious/kafka-summit-fraud-demo jennifer.reif@neo4j.com @JMHReif 68
  59. 59. 69 neo4jsandbox.com
  60. 60. neo4j.com/developer/get-started/ Neo4j Developer Guides
  61. 61. r.neo4j.com/youtube Neo4j Youtube Channel
  62. 62. medium.com/neo4j Neo4j Developer Blog
  63. 63. community.neo4j.com Neo4j Community Site
  64. 64. neo4j.com/tag/twin4j TWIN4j
  65. 65. neo4jsandbox.com https://neo4jsandbox.com
  66. 66. neo4j.com/graphacademy/neo4j-certification/

×