Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Utilizing Neo4j with AI Applications

Patrick Smith and Brian Rodrigue of Excella share Utilizing Neo4j with AI Applications at GraphTour DC

  • Login to see the comments

Utilizing Neo4j with AI Applications

  1. 1. | @excellaco Utilizing Neo4j with AI Applications Patrick D. Smith & Brian Rodrigue
  2. 2. | @excellaco Introduction
  3. 3. | @excellaco About Us Excella has been implementing successful IT mission critical solutions in the commercial, federal, and non-profit sectors since 2002. Our experts specialize in software development, data and analytics, DevOps, program management, business analysis, and Agile best practices. We also deliver Agile and Scrum training to corporations, government agencies, and associations at our headquarters in Arlington, VA
  4. 4. | @excellaco About Us Our data team consists of data scientists, data engineers, and data visualization professionals working across a range of federal and commercial clients. We have experts with PhDs and Masters from universities such as Johns Hopkins, Oxford, Harvard and the University of Chicago with expertise in natural language processing and computer vision.
  5. 5. | @excellaco Introduction We’ll introduce three ways in integrate AI into your graph- based systems: • Intelligent Retrieval for Graph Ingest • Graph Embeddings and Intelligent Graph Reasoning • Graph Knowledge Reinforcement
  6. 6. | @excellaco The MAKO Project AI – Based Decision Making
  7. 7. | @excellaco Excella’s AI Research & Development Effort 1. Advancing the field with active research and development into the most innovative AI Methods 2. Developing AI based solutions The MAKO Project
  8. 8. | @excellaco “How can it be that mathematics, being after all a product of human thought independent of experience, is so admirably adapted to the objects of reality” - Einstein The MAKO Project
  9. 9. | @excellaco Creating Intelligent Agents DALE: The Deep Answer Learning Engine • Intelligent Customer Service System to respond to Tier 1 inquiries • Utilizes custom embedding and memory unit structures • On test sets: 87% accuracy on classifying a tweet as originating from one of 10 twitter accounts using 190 dim LieRr vectors. 83% with 50 dimensional glove vectors
  10. 10. | @excellaco Our Work: LieGr, GeoNN, and More • Mathematical Structure of Networks: Discovered that words, trained end-to-end with RNNs on NLP tasks, tend to naturally embed into a Lie group structure. This connects the "black box" of neural nets to mathematics that has been well understood for over a century. • LieGr: Leveraging words' natural embedding structure, we created a basic unsupervised word embedding scheme using special orthogonal Lie groups and the distributional hypothesis
  11. 11. | @excellaco Our Work: LieGr, GeoNN, and More • Geodesic Neural Networks (GeoNN): Generates text without treating words as discrete units and by modeling sequences of words as geodesic flow (analogue of straight- line motion) on a Lie group. This permits the use of a Generative Adversarial Network (GAN) for training. Sentences are deterministically generated, but the path along which they are generated can be randomly seeded.
  12. 12. | @excellaco Aiding Graph with AI Artificial Intelligence applications with deep neural networks can help advance a variety of graph computational problems: • node classification • node clustering • node retrieval/recommendation • link prediction
  13. 13. | @excellaco Graph Based AI for Customer Service Intelligent Graph Based Knowledge Retrieval
  14. 14. | @excellaco Business Case • Complicated process of forms lead to costly mistakes with serious repercussions on applicants​ • Confusion and anxiety leads to frequent calls to agency to check status or ask questions​ • Ultimate goal is to reduce the call volume
  15. 15. | @excellaco Technical Response • Create AI using Deep Learning to provide initial customer service responses​ • Graph to provide context into forms, supporting documents, processing times, costs, eligibility, etc
  16. 16. | @excellaco The Compliment • User research showed that there was additional value in opening graph access to the end user​ • Allow users to plan their journey and understand the options that fit their situation​ • Add Q&A layers to support interaction
  17. 17. | @excellaco The Stack • Initial PoC using Load CSV to start building the knowledge graph​ • React front end to allow user interaction and support agile development​ • Py2Neo for AI interaction​ • AWS Cloud, CI Pipeline​ • Replacing Load CSV with custom front end for maintenance​ • Automated uploads of updates to costs and processing times
  18. 18. | @excellaco Core Inference Engine • Utilizes a combined CNN/RNN structure to extract sentence meaning • Accesses additional structured information from Neo4j that is relevant to the question via a Neural Variational Answer Model • Combines output structure from Neo4j NVAM pipeline with RNN/CNN output in a fully connected layer • Utilizes a generative network component for answer generation
  19. 19. | @excellaco Core Inference Engine Question LieGr Embeddings RNN CNN Fully Connected Layer Answer Neo
  20. 20. | @excellaco Modeling Customer Service Data • Customer Service Knowledge based is modeled based on documents • (Brian)
  21. 21. | @excellaco Embeddings • Embeddings are low dimensional vector representations of unstructured data • Embeddings store latent information and structure on the data • Generated predictive and count based dimensionality reduction) models
  22. 22. | @excellaco Graph Embeddings Graph Embeddings help solve the computational efficiency problem of graph computing by embedding graph structures on a compact Manifold. Embedding Structures are broken down into: • node embedding • edge embedding • hybrid embedding • whole-graph embedding
  23. 23. | @excellaco Graph Embeddings • Converting graphs to vector spaces makes computation easier for artificial neural networks​ • It's hard to find meaningful information after traversing several edges away from a node • Easier to discover latent information that is embedded within the data
  24. 24. | @excellaco Graph Embeddings The learned representations of graph embeddings are useful for machine learning tasks such as the labeling of the nodes, regression, and edge prediction Features extracted with these sequence based graph embedding procedures can be used for predicting: • social network users’ missing age • the category of scientific papers in citation networks • the function of proteins in protein-protein interaction networks
  25. 25. | @excellaco Graph Embeddings Besides supervised learning tasks on nodes the extracted features can be used for: • graph visualization • edge prediction • community detection • structural role identification
  26. 26. | @excellaco Graph Embeddings • First big push in modern graph embedding research was DeepWalk by Perozzi et al, which uses truncated random walks for modeling sequences. • More involved sequence sampling methods include the use of second order random walks, the introduction of skips in random walks and branching processes • More sophisticated models encode the structural role of nodes, to get a representation that is in line with the multi level structure of the graph and consequently to improve the predictive performance on downstream machine learning tasks
  27. 27. | @excellaco Data Retrieval First Step in utilizing deep learning for graph is to extract features: • Nodes • Pairs: connections and number of common neighbors • Groups: existing cluster assignments Perozzi et al.
  28. 28. | @excellaco Embedding Structures: GEMSEC GEMSEC: Embeddings with Clustering • Graph embedding scheme that learns embeddings and latent clusters at the same time • Similar representations for nodes which have similar sampled neighborhoods • Probabilistic model on graphs: minimizing the negative log likelihood of observed neighborhood samples Rozemberczki, et al.
  29. 29. | @excellaco Embedding Structures: GEMSEC GEMSEC: Embeddings with Clustering • Clusters from GEMSEC provide the basis for information retrieval • We utilize the trained GEMSEC model to determine which cluster our sent information point lies in • Once we have the cluster, we utilize a selection model to determine which information is most relevant.
  30. 30. | @excellaco Variational Inference Answer Selection • Proposed as an answer selection model for question answering tasks • Employs a latent attention mechanism • Given a question q, it finds a set of answer sentences associated with q • Answer set determines the context vector, which are the words in the answer sentences that are prominent for predicting the answer matches to the current question. This enables the model to learn subtleties inherent in the questions.
  31. 31. | @excellaco Network Operation • Fully connected layer identifies need for information, sends ”query” to information retrieval model (NASM) • Query is vectorized via LieGr and sent to NASM • NASM utilizes an embedded graph representation as input Fully Connected Layer Answer Neo
  32. 32. | @excellaco Utilizing Graph for Intelligent Fraud Detection AI-Based Graph Reasoning
  33. 33. | @excellaco Our System
  34. 34. | @excellaco Intelligent Ingest Documents with Fraud Intelligent Retrieval Graph ReasoningNeo
  35. 35. | @excellaco Intelligent Ingest Documents with Fraud Intelligent Retrieval Graph ReasoningNeo
  36. 36. | @excellaco Unstructured Data to Graph PDF to Text • Python PDF Miner – Extracts unstructured text information Structured Data Extraction • OCR techniques for extracting tables and figures from PDF files • Utilizes tesseract OCR for regonition
  37. 37. | @excellaco Unstructured Data to Graph Speech to Text • Takes in call center audio data as an additive to the PDF to text pipeline • Utilizes Google Cloud Speech to Text API • 96% accuracy on call center audio files
  38. 38. | @excellaco Unstructured Data to Graph Extracting Entities: Stanford Name Entity Recognizer • Conditional Random Fields Model: Discriminative sequence modeling method • Entities: Person, Location, Organization • Trained on both British and American newswire, so robust across both domains • Optimized with LBFGS
  39. 39. | @excellaco Neural Variational Inference • NVDM: Generative model for probabilistic document modeling • Combines Unsupervised Variational Autoencoders with generative approaches • Unlike traditional models, the NVDM provides a dynamic, variational model of the text’s distribution
  40. 40. | @excellaco Intelligent Ingest Documents with Fraud Intelligent Retrieval Graph ReasoningNeo
  41. 41. | @excellaco Unstructured Data to Graph Intelligent Search • Crawl and extract further structured and unstructured data from websites using python-based Selenium API (, public information databases, etc.) • Searches are handled with respect to nodes; nodes are iterated through and searches pulled on metadata relating to that entity to enrich the dataset • Information pulled from and pushed back to nodes is handled in Py2Neo
  42. 42. | @excellaco Graph Reasoning Documents with Fraud Intelligent Retrieval Graph ReasoningNeo
  43. 43. | @excellaco Effective Graph Analytics What type of machine learning can we do on graph? • node classification • node clustering • node retrieval/recommendation • link prediction
  44. 44. | @excellaco Knowledge Graph Reinforcement
  45. 45. | @excellaco Graph Based Reasoning Reasoning over large scale knowledge graphs One option is to use traditional graph algorithms that are supported in Neo4j: • PageRank to determine entity importance • Path-Finding algorithms for relationship modeling • Label Propogation for group recognition
  46. 46. | @excellaco Graph Based Reasoning GEMSEC Embeddings • Pull nodes + relationships from Neo4j to feed into our embedding pipeline, output clusters and embeddings • Shows a 8.79% improvement over previous deep methods in predicting a related grouping • GEMSEC created with GPU enabled Tensorflow
  47. 47. | @excellaco Pulling Data for reasoning CREATE p =(audit:audit { number:'A-04-17-01003' })-[:AUDITS]- >(organization:organization { name:'AURUM INSTITUTE'})-[:manages]- >(grant:grant { name: 'PEPFAR' })-[:ISSUED_BY]->(agency:agency { name:'CDC'})RETURN p For our downstream predictive tasks, we pull nodes and their immediate relationships, or chains of relationships
  48. 48. | @excellaco Graph Based Reasoning Downstream Prediction Tasks: • Is this fraud or not? Graph and it’s embeddings are holding all of our latent information • We utilized a Recurrent Neural Network with a single softmax output layer, trained end to end, for our fraud detection predictions
  49. 49. | @excellaco Other Examples of Reasoning Graph Embeddings allow us utilize and harness the power of graph while allowing us to use a standard suite of machine learning and deep learning methods on downstream tasks: • Perozzi and Skiena showed in 2015 that we can use graph embeddings for downstream age prediction in social networks • Graph Convolutional Networks: Kipf & Welling introduced a structure for modeling arbitrarily structured graphs
  50. 50. | @excellaco Other Examples of Reasoning Convolutional Graph Networks: For these models, the goal is then to learn a function of signals/features on a graph which takes as input: • A feature description xixi for every node ii; summarized in a N×DN×D feature matrix XX (NN: number of nodes, DD: number of input features) • A representative description of the graph structure in matrix form; typically in the form of an adjacency matrix AA (or some function thereof)
  51. 51. | @excellaco Other Examples of Reasoning Convolutional Graph Networks: • Reduces the complexity of the training procedures • Powerful, but the learned structures cannot be transferred to other graphs
  52. 52. | @excellaco Closing Why Utilize AI Methods for Graph Analysis?
  53. 53. | @excellaco Closing • Graph Embeddings are a powerful means of utilizing your graph-based data for deep learning • Embedding structures can aid in creating ingest, in-graph, and downstream post-graph predictive tasks • Still a long way to go: How can we more closely integrate graph and deep learning?
  54. 54. | @excellaco
  55. 55. | @excellaco Patrick D. Smith Brian Rodrigue