Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GraphTour London 2020 - Graphs for AI, Amy Hodler

Graphs for AI. A Path for Data Science
Amy Hodler

  • Be the first to comment

GraphTour London 2020 - Graphs for AI, Amy Hodler

  1. 1. Graphs & AI A Path for Data Science Amy E. Hodler Director, Graph Analytics & AI Programs Neo4j @amyhodler
  2. 2. It’s Not What You Know
  3. 3. It’s Who You Know And Where They Are
  4. 4. Whose pay will increase the most?
  5. 5. Photo by Helena Lopes on Unsplash Network Structure is highly predictive of pay and promotions • People Near Structural Holes • Organizational Misfits “Organizational Misfits and the Origins of Brokerage in Intrafirm Networks” A. Kleinbaum “Structural Holes and Good Ideas” R. Burt
  6. 6. Relationships and Network Structure Strongest Predictors of Behavior & Complex Outcomes “Research into networks reveal that, surprisingly, the most connected people inside a tight group within a single industry are less valuable than the people who span the gaps ...” 6 “…jumping from ladder to ladder is a more effective strategy, and that lateral or even downward moves across an organization are more promising in the longer run . . .”
  7. 7. It’s a counter-intuitive notion 7
  8. 8. Which is why network science is so powerful 8
  9. 9. Overview Network Structure and Predictions Neo4j for Graph Data Science Steps of Graph Data Science
  10. 10. Relationships The Strongest Predictors of Behavior! “Increasingly we're learning that you can make better predictions about people by getting all the information from their friends and their friends’ friends than you can from the information you have about the person themselves” James Fowler 11
  11. 11. 823 1607 2439 3765 5824 0 1000 2000 3000 4000 5000 6000 7000 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Graph Is Accelerating AI Innovation 12 AI Research Papers Featuring Graph Data Source: Dimensions knowledge system Graph Technology graph neural network graph convolutional graph embedding graph learning graph attention graph kernel graph completion
  12. 12. Better Predictions with Graphs Using the Data You Already Have • Current data science models ignore network structure • Graphs add highly predictive features to ML models, increasing accuracy • Otherwise unattainable predictions based on relationships Machine Learning Pipeline 13
  13. 13. 14 • 27 Million warranty & service documents parsed for text to knowledge graph • Graph is context for AI to learn “prime examples” and anticipate maintenance • Improves satisfaction and equipment lifespan • Connecting 50 research databases, 100k’s of Excel workbooks, 30 bio-sample databases • Bytes 4 Diabetes Award for use of a knowledge graph, graph analytics, and AI • Customized views for flexible research angles • Almost 70% of credit card fraud was missed • ~1B Nodes and +1B Relationships to analyze • Graph analytics with queries & algorithms help find $ millions of fraud in 1st year Neo4j for Graph Analytics, AI and Data Science Caterpillar’s AI Supply Chain & Maintenance German Center for Diabetes Research (DZD) Financial Fraud Detection & Recovery Top 10 Bank
  14. 14. Predictive Maintenance Churn Prediction Fraud Detection Life Sciences Recommendations Cybersecurity Customer Segmentation Search/MDM Graph Data Science Applications Just a few examples…
  15. 15. A Path for Graph Data Science
  16. 16. The Steps of Graph Data Science Decision Support Graph Based Predictions Graph Native Learning 17 Graph Feature Engineering Graph Embeddings Graph Networks Knowledge Graphs Graph Analytics
  17. 17. The Steps of Graph Data Science Graph Feature Engineering Graph Embeddings Graph Networks 18 Graph AnalyticsKnowledge Graphs Graph search and queries Support domain experts
  18. 18. Knowledge Graph Connecting the Dots has become... 19 Multiple graph layers of financial information Includes corporate data with cross-relationships and external news
  19. 19. Knowledge Graph with Queries Connecting the Dots Dashboards and tools • Credit risk • Investment risk • Portfolio news recommendations • Typical analyst portfolio is 200 companies • Custom relative weights 1 Week Snapshot: 800,000 shortest path calculations for the ranked newsfeed. Each calculation optimized to take approximately 10 ms. has become... 20
  20. 20. The Steps of Graph Data Science Graph Feature Engineering Graph Embeddings Graph Networks 21 Knowledge Graphs Graph Analytics Graph queries & algorithms for offline analysis Understanding Structures
  21. 21. Query (e.g. Cypher) Fast, local decisioning and pattern matching Graph Algorithms (e.g. Neo4j Algorithms Library) Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation 22
  22. 22. Deceptively Simple Queries How many flagged accounts are in the applicant’s network 4+ hops out? How many login / account variables in common? Add these metrics to your approval process Difficult for RDMS systems over 3 hops Graph Analytics via Queries Detecting Financial Fraud Improving existing pipelines to identify fraud via heuristics 23
  23. 23. Graph Analytics via Algorithms Generally Unsupervised 24 A subset of data science algorithms that come from network science, Graph Algorithms enable reasoning about network structure. Pathfinding and Search Centrality (Importance) Community Detection Heuristic Link Prediction Similarity
  24. 24. • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity • Balanced Triad (identification) Graph Algorithms & Functions in Neo4j • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • A* Shortest Path • Yen’s K Shortest Path • Minimum Weight Spanning Tree • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality & Approximate • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity • K-1 Coloring • Euclidean Distance • Cosine Similarity • Node Similarity (Jaccard) • Overlap Similarity • Pearson Similarity • Approximate KNN Pathfinding & Search Centrality / Importance Community Detection Similarity Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors ...and also Auxiliary Functions: • Random graph generation • One hot encoding • Distributions & metrics 45
  25. 25. Graph Algorithms Detecting Financial Fraud Graph algorithms enable reasoning about network structure Louvain to identify communities that frequently interact PageRank to measure influence and transaction volumes Connected components identify disjointed group sharing identifiers Jaccard to measure account similarity 26
  26. 26. The Steps of Graph Data Science Graph Embeddings Graph Networks 27 Knowledge Graphs Graph Analytics Graph Feature Engineering Graph algorithms & queries for machine learning Improve Prediction Accuracy
  27. 27. Graph Feature Engineering Feature Engineering is combines and processes data to create new, more meaningful features, such as clustering or connectivity metrics. EXTRACTION 28 Client Betweenness Centrality Unique Shared Identifiers Weighted Score Known Fraudster? Jacob Olsen 0 1 1 No Kaylee Roach 32 2 4 Yes Mackenzie Burns 0 0 0 No Kayla Knowles 192 3 4 Yes Nicholas Jones 0 1 2 No John Smith 0.08 2 10 YesPaySim Dataset
  28. 28. Graph Feature Engineering Feature Engineering is combines and processes data to create new, more meaningful features, such as clustering or connectivity metrics. 29 Client Betweenness Centrality Shared Identifiers Weighted PageRank Known Fraudster? Jacob Olsen 0 1 1 No Kaylee Roach 32 2 4 Yes Mackenzie Burns 0 0 0 No Kayla Knowles 192 3 4 Yes Nicholas Jones 0 1 2 No John Smith 0.08 2 10 Yes Machine Learning on this To Build a Predictive Model
  29. 29. The Steps of Graph Data Science Decision Support Graph Based Predictions Graph Native Learning 30 Graph Feature Engineering Graph Embeddings Graph Networks Knowledge Graphs Graph Analytics FUTURE
  30. 30. Neo4j GDS Library Evolving the Graph Algorithms Library for Data Scientists • Run optimized, parallel algorithms over 10’s Billions of nodes • Production features like seeding for consistency • Scalable in-memory graph model that loads in parallel, can flexibly aggregate & reshape underlying data models • Simplified syntax & API with easy to understand guides, warnings, & errors messages • Extensive documentation with examples, tips, and browser guides Preview
  31. 31. for Enterprise Graph Data Science Neo4j Graph Data Science Library Practical, Scalable Graph Data Science Native Graph Creation & Persistence Neo4j Database Graph Exploration & Prototyping Neo4j Bloom Preview
  32. 32. Business neo4j.com/use-cases/ artificial-intelligence-analytics/ Data Scientists neo4j.com/sandbox Developers neo4j.com/download neo4j.com /graph-algorithms-book Free Until April 15
  33. 33. 34 “AI is not all about Machine Learning. Context, structure, and reasoning are necessary ingredients, and Knowledge Graphs and Linked Data are key technologies for this.” Wais Bashir Managing Editor, Onyx Advisory
  34. 34. 35 Amy E. Hodler @amyhodler amy.hodler@neo4j.com

×