SlideShare a Scribd company logo
1 of 56
AI, Knowledge Representation
and Graph Databases -
Key Trends in Data Science
Dan McCreary
Social Data Science Meetup
March 2nd, 2019
1
Talk Description
Knowledge Representation is a key focus for most modern AI texts. Many AI
experts feel that over half of their work is understanding how to find the
right knowledge structures to build intelligent agents that can continuously
learn and respond to changing events in their world. In 2012, a paper
published by Google started a consolidation of the many diverse forms of
knowledge representation into a single general-purpose structure called a
labeled property graph.
This talk will describe the key events behind this movement and show how a
new generation of data scientist will be needed to build and maintain
corporate knowledge graphs that contain a uniform, normalized and highly
connected data sets for used by researchers and intelligent agents. We will
also discuss the challenges of transferring siloed project-knowledge to
reusable structures.
2
Hello, my name is
dan.mccreary@gmail.com
• Distinguished Engineer in AI and Graph Technologies at
Optum’s Advanced Technology Collaborative
• Co-founder of "NoSQL Now!" conference (now part of
Dataversity)
• Author of "Making Sense of NoSQL" (w. Ann Kelly)
• 15+ years of working with non-tabular knowledge
representations
• Background in solution architecture, metadata management,
NLP, semantics, text analytics and knowledge
representation for AI
• Disclaimer: All opinions are my own and may not reflect the
views of my employer
3
Graph a “NoSQL” Data Architecture
Relational Analytical (OLAP) Key-Value
Column-Family DocumentGraph
key value
key value
key value
key value
See Chapter 1: https://www.manning.com/books/making-sense-of-nosql
4
Graph Databases are Hot!
5
Relational vs. Graph
6
1. Atomic unit of storage is a row of a table
2. Data is appended one row at a time
3. All columns within a table must have the same
structure and no variations within a table are
allowed
4. Table structures are fixed after design
5. Query language is SQL
6. Joins are log(N) search's against other tables
1. Atomic units of storage are nodes and edges
2. Each node and edge may have independent
properties that are determined at run time
(schema agnostic)
3. Joins between nodes and edges are computed at
load time and are stored as memory pointers
4. Each core hops through 2M edges/second (1,000x
faster than joins)
5. Query language varies although there are some
standards e.g. SPARQL/Cypher/Gremlin/etc.
Relational (row store) Graph
ID NAME DAT
E
AMOUN
T
Gartner on Graph Analytics
Key Analytics Trends for 2019
1. Augmented Analytics
2. Augmented Data Management
3. Continuous Intelligence
4. Explainable AI
5.Graph
6. Data Fabric
7. NLP/Conversational Analytics
8. Commercial AI/ML
9. Blockchain
10. Persistent Memory Servers
7
…Graph processing to continuously accelerate data
preparation and enable more complex and adaptive
data science…to efficiently model, explore and query
data with complex interrelationships across data
silos…the need to ask complex questions across data
silos which is not practical or even possible at scale
using SQL queries.
Graphs are also related to 4, 6 and 7
Which of the following organizations use graph databases?
• Every major airline uses a graph database to calculate fares in real-time
• Over half of retailers use graphs for product recommendations
Answer: They all do!
8
Amazon Product Graph Job Posting
As a leader in e-commerce, Amazon is building an authoritative knowledge base for every product in the world. With hundreds of
millions of customers and billions of products, Amazon will offer a challenging but fun journey to turn this big and rapidly changing data
into high-quality knowledge to impact customer experiences across Amazon from Alexa to Search to Shopping. As a member of the
Product Graph team, based in Seattle, you will play a key role in the establishment of a new platform, with opportunities to create
enormous benefit for our customers and Amazon.
9
How do we store
knowledge?
• What exactly is “knowledge”?
• How is knowledge different from raw data and
information?
• What is knowledge engineering?
• What is knowledge architecture?
• How does this relate to data science?
10
11
The Data Science Lifecycle
12
What do we mean by
“Understanding”?
2018: Graphs Join with Deep Learning
How did we get here?
https://arxiv.org/pdf/1806.01261.pdf
13
Why Metaphors Matter
“Metaphors drive design decisions”
GraphConnect Keynote 2018
We make decisions not by having a deep
understanding of how technology works,
but through being exposed to the right
metaphors.Hillary Mason
Data Scientist
https://neo4j.com/graphconnect-2018/ around 51 minutes in
14
Four Graph Metaphors
15
Neighborhood Walk
(explains index free agency, performance)
Knowledge Triangle
(explains data, information and knowledge)
The Open World Assumption
(explains graph Integration, agility)
The Jenga Tower
(explains resilience of graphs to change)
How do you get to your neighbor’s house?
• You walk out your door and over to the house
• Your houses are “Adjacent” so getting there is a direct “hop”
• You don’t need to consult anyone else about how to get to a
neighbor’s house because you have the right pointer
16
Your Logical Graph Model
• If two physical items are related, they have a relationships arch between
them and we model it like the above
• We build a “logical” data model that has this link
• In a native graph, the vertexes are loaded into memory and then the
physical memory addresses of each link is also reflected in each of the
nodes
17
Dan LIVES_NEXT_TO Ann
Relational Database Use Indexes
● You must walk to a central index system
● The index system does a “search” given the house’s address
● The index system will tell you how to get to your neighbors house
18
Central
Index
Search: 123 Main St.
The Knowledge Triangle Metaphor
19
https://en.wikipedia.org/wiki/DIKW_pyramid
• Diagram for representing the
relationships between data,
information, knowledge, and wisdom
• Too often we focus on “Big Data” and
not enough on connected knowledge
and transferrable knowledge
• Graphs are connected information
concepts
• Wisdom is reusable across multiple
context
• Can we capture knowledge in a form
that can be reused across multiple
domains?
Data - Binary, Codes, Data Lakes
Information - Concepts
Knowledge -
Patterns Relationships
Wisdom
& AI
From Raw Data to Wisdom with Continuous Enrichment
20
Knowledge that can be reused
across multiple context –
Transfer Learning
Data
Lake
Information
Knowledge
Graph
Wisdom
Raw data dumped from a
database, log files or
documents
Tagged Text
Connections
Consistent
De-duplicated
Semantics
Concepts
Reusable To New Problems
Defintions
Validity
Searchable
Continuous
Enrichment
Structure
Definition: The arrangement of and relations between the parts or elements
of something complex
• The real world has lots of structure
• How is structure captured in a machine learning feature?
• If we take simple “features” out of the real world do we lose structure?
• Do our brains “extract” features? (answer: no)
21
The Adversarial Turtle
99% of modern image recognition is just simple (but precise) texture matching
https://www.theverge.com/2017/11/2/16597276/google-ai-image-attacks-adversarial-turtle-rifle-3d-printed
• Use a 3D printer to print a turtle
• Place different “textures” on the shell
• Most image recognitions fail
• CNNs: “On a very fundamental level,
our work highlights how far current
CNNs are from learning the 'true'
structure of the world”
22
rifle
rifle
Our Brains are Graphs
100B Neurons
10K Connections per Neuron (degree)
23
Three Eras of Computing
Procedural Code
(Rules)
Programs
Data Answers
Explanations (Why)
Machine
Learning
Data
Answers
Rules
(10M weights)
1) Procedural Era
2) Machine Learning Era
3) Graph Era
Data
Answers
Explanations (Why)
Knowledge
Machine
Learning
24
Sir Tim’s
Vision
RDF
2001
Euler Solves
7 Bridges of
Königsberg
1736
W3C
SPARQL
2008
Labeled
Property
Graph
Neo4j 1.0
2010
LPG
in AI
2018
Google’s
Knowledge
Graph
“Things Not
Strings”
May 2012
Graph Timeline
25
AlexNet
Sept 2012
Graphs Rise
On
Intelligence
2005
The Birth of the Semantic Web
• May 2001
• Resource Description Format - RDF
• Keep it simple: Triples all the way down
• Universal Identifiers: URIs
• Ideal for data interchange
• The problem: reification
• When adding a simple attribute to a relationship
causes 10,000 SPARQL queries to become
obsolete
Subject Object
Property
26
The Semantic Web Stack
27
Graph Model
Graph Query
Blockchain
2010 Neo4j 1.0 Released
• Neo4j used a new graph data model called a Labeled Property Graph (LPG)
• Each vertex and Edge had their own set of properties (key-value pairs)
• Each edge must have a single type
• Verticies can have 0-N types (labels)
• Adding a new property to a relationships does not require you to rewrite
you queries! They solved the reification problem but kept the flexibility of
graphs!
• Developers LOVE Neo4j!
Vertex Vertex
Properties
Properties
Properties
Edge
28
2012 Google Introduces the Knowledge Graph
29
Google Knowledge Summary
30
Knowledge
Graph
Summary
Chest
Pain
Angina
Ischemic
Chest Pain
About
Symptoms
Treatments
Google Knowledge Graph
Results: Better Search
• 110 Billion Concepts and an API to get verticies but no edges.
• Relationships are too proprietary!
31
23andMe Ancestry Graph
Better relationship insights
32
Graphs and the “Open World”
33
Past: Closed Word Graph: Open World
• Everything is prohibited until it is
permitted
• You can only add data that you model
• Everything is permitted until it is
prohibited
• Anyone can easily add any data at any
time without disruption of services
Also known as “schemaless, or schema agnostic”
The Jenga Tower Metaphor
• What happens to your existing queries when you make a change to your data model?
• LPGs allow anyone to add properties to verticies or relationships without disrupting
other queries
34
Add a property to your model
1,000 queries need to be rewritten
TigerGraph
• First commercial distributed native graph
product to fully support the LPG data
model
• Scales to 100B verticies on commodity
hardware
• Support for subgraphs (lightweight
security)
• Supports distributed ACID transactions
using multi-version concurrency control
• Large library of graph algorithms
35
Sample Graph Algorithms
36
Dependencies
• Failure chains
• Order of operations
Clustering
• Finding related items
• Friends, fraud networks
Similarity
• Similar paths and patterns
Matching/Categorizing
• Look for and tag specific patterns
Flow/Cost
• Optimize costs based on
routing
• Path optimization
Centrality/Search
• Which nodes are the most
connected or relevant?
API Server Auth
Data
Pintrest Interest Graph
better ad targeting
37
Billions of interest concepts (an ontology)
Graphs Store Concept Distance
• How similar are concepts?
• How can the distance in a graph help us find the underlying intent of the questions?
• How can this help us build automated chatbots?
38
Baby Infant
Child
Chat Question: We are planning to have a new [baby, infant, child], what are my benefits?
Chatbot: Here is a link to your maternity benefits.
Maternity
Normalized Google Distance (NGD)
A semantic similarity measure derived from the number of hits
returned by the Google search engine for a given set of keywords.
39
1. "Shakespeare" returns 130M pages
2. "Macbeth" returns 26M pages
3. "Shakespeare Macbeth" returns 20.8M pages
where N is the total number of web pages searched by Google multiplied by
the average number of singleton search terms occurring on pages; f(x) and f(y)
are the number of hits for search terms x and y, respectively; and f(x, y) is the
number of web pages on which both x and y occur.
Data Lakes Today
• Very little reuse of code to “understand” and link data
• Few people using rules engines and ML to link data
40
Data Lake
(Data Swamp?)
100s of Data Scientists
100s of R and Python Libraries
80% of effort is “Data Engineering”
20% is Data Science
Data Access Code
From Data Scientist to Knowledge Scientist
41
Data
Information
Knowledge
Graph
Wisdom
Data Engineering
Insights
70% of
your time
Fast Path with Feedback
Slow Path
Causality: The Apocryphal Cancer Gene Story
42
The Smoking Gene Theory (1950) The Correct Causal Relationship (1969)
Smoking
Gene
Lung
Cancer
Urge to
Smoke
Lung
Cancer
Tar
Smoking
20 million preventable deaths From “The Book of Why” by Judea Pearl
?
Explainable AI Models are Graphs
43
https://www.darpa.mil/program/explainable-artificial-intelligence
Training
Data
Training
Data
Machine
Learning
Process
New
Machine
Learning
Process
Learned
Function
Today
Explainable AI
• Why did you do that?
• Why not something else?
• When do you succeed?
• When do you fail?
• When can I trust you?
• How to I correct and error?
• I understand why
• I understand why not
• I know when you succeed
• I know when you fail
• I know when to trust you
• I know why you erred
Decision or
Recommendation
Explanation
Interface
Explainable Model
Task
Task
Model-based Machine Learning
44
• Model-based machine learning requires the user to create a model of the world in the form of a graph. This
model encodes the assumptions you make such as how a change on one variable changes another variable
(causal relationships).
• Model-based machine learning allows fewer algorithms and universal inference
Many Algorithms
Flat Data
Hidden Assumptions
Many Algorithms
Specific Solution
Graph Data
Assumptions in Graph
Structure
One Algorithm
Universal Inference
General Solution
Few Algorithms
http://www.mbmlbook.com
2005: On Intelligence
The key to Artificial Intelligence has
always been the representation.
Jeff Hawkins
Sensory Data
Abstract Concepts
Hierarchical Temporal Memory (HTM)
45
Edge Computing Converts
Sensors into Concepts
General CPU Hardware vs. Graph Hardware
• Most graph traversal algorithms only use simple pointer hopping
• How efficient are CPU and GPUs at running graph algorithms?
• No need for floating point
• No need for matrix multiplication
46
CPU
1,000 Instructions
Available
(1503 defined x86
instructions )
100
Instructions
Used
GPUs are optimized for array processing
• GPUs were designed to support video games
• GPUs are optimized for highly parallel matrix multiplication
• Inefficient for graph “hop” calculations
47
NVIDIA "Pascal" GP100
What’s Coming: Graph Hardware!
Graphcore – graph in hardware - $350M in funding
48
AlexNet
• A convolutional neural
network (CNN), designed by
Alex Krizhevsky
• First image recognition
program to use GPUs
• Beat other teams by huge
margin 10% 2012
49
AlexNet in Graphcore
• Each layer in the algorithm maps to a
region of the graph
• Initial layers are convolutional layers
• The colors represent connection density
50
Different Algorithms Look Different
51
ResNet50:
“AI Brain Scan”
https://www.wired.co.uk/gallery/machine-learning-graphcore-pictures-inside-ai
52
Dan’s Predictions
• Graph technology will continue to gain relevance in analytics
• LPGs will be the dominate the graph data model
• RDF/SPARQL might only be used in niche areas
• Once a standard query language is adopted the ability to reuse
algorithms will grow quickly
• The number of graph databases and middle-tier algorithm products
will continue to grown
53
Recommendations
• Become a knowledge scientist!
• Learn a bit about graph modeling and graph algorithms
• Think about structure when doing feature design
• Understand the causal relationships between your data (Bayesian Graphs)
• Use graph databases when you have lots of relationships or want real-time
analytics (rules engines, recommendations)
• Build knowledge APIs, not just more libraries for data lakes
• Learn a few graph algorithms
• Similarity
• Clustering
• Recommendation
54
Resources
• Dan’s Medium Blog:
https://medium.com/@dmccreary
• Machine Learning with Python
• A Comprehensive Guide to Graph Algorithms
in Neo4j Mark Needham & Amy E. Hodler
• Wikipedia Articles
• Graph Databases
• Similarity (Network Science)
• Google Normalized Distance
• Explainable AI
55
Thank You!
Please send e-mail to Dan.McCreary@gmail.com if you want a copy of
the slides.
56

More Related Content

What's hot

A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge GraphsNeo4j
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublinm_ackermann
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge GraphLukas Masuch
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
Knowledge Graphs for Network Digital Twins
Knowledge Graphs for Network Digital TwinsKnowledge Graphs for Network Digital Twins
Knowledge Graphs for Network Digital TwinsNeo4j
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationCambridge Semantics
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectOntotext
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Graphs for Genealogists
Graphs for GenealogistsGraphs for Genealogists
Graphs for GenealogistsNeo4j
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...Neo4j
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Linked Data: principles and examples
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples Victor de Boer
 

What's hot (20)

A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge Graphs
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Big data storage
Big data storageBig data storage
Big data storage
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Knowledge Graphs for Network Digital Twins
Knowledge Graphs for Network Digital TwinsKnowledge Graphs for Network Digital Twins
Knowledge Graphs for Network Digital Twins
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Graphs for Genealogists
Graphs for GenealogistsGraphs for Genealogists
Graphs for Genealogists
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Graph databases
Graph databasesGraph databases
Graph databases
 
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Linked Data: principles and examples
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples
 

Similar to AI, Knowledge Representation and Graph Databases: Key Trends in Data Science

Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLNeo4j
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning Nikolaos Vasiloglou
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4Jijcsity
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetupJoshua Bae
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trendsAlan Morrison
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4jSina Khorami
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsNeo4j
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesCédric Fauvet
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfGeethaPratyusha
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 

Similar to AI, Knowledge Representation and Graph Databases: Key Trends in Data Science (20)

Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/MLGraph Data Science: The Secret to Accelerating Innovation with AI/ML
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4j
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Nosql public
Nosql publicNosql public
Nosql public
 
tecFinal 451 webinar deck
tecFinal 451 webinar decktecFinal 451 webinar deck
tecFinal 451 webinar deck
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 

More from Optum

Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and RetrievalOptum
 
Semantic Integration Patterns
Semantic Integration PatternsSemantic Integration Patterns
Semantic Integration PatternsOptum
 
Building Bi Dashboards With Sas
Building Bi Dashboards With SasBuilding Bi Dashboards With Sas
Building Bi Dashboards With SasOptum
 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMOptum
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic WebOptum
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic IntegrationOptum
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative SystemsOptum
 
XRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGXRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGOptum
 

More from Optum (8)

Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Semantic Integration Patterns
Semantic Integration PatternsSemantic Integration Patterns
Semantic Integration Patterns
 
Building Bi Dashboards With Sas
Building Bi Dashboards With SasBuilding Bi Dashboards With Sas
Building Bi Dashboards With Sas
 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEM
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic Web
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic Integration
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
XRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGXRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUG
 

Recently uploaded

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

AI, Knowledge Representation and Graph Databases: Key Trends in Data Science

  • 1. AI, Knowledge Representation and Graph Databases - Key Trends in Data Science Dan McCreary Social Data Science Meetup March 2nd, 2019 1
  • 2. Talk Description Knowledge Representation is a key focus for most modern AI texts. Many AI experts feel that over half of their work is understanding how to find the right knowledge structures to build intelligent agents that can continuously learn and respond to changing events in their world. In 2012, a paper published by Google started a consolidation of the many diverse forms of knowledge representation into a single general-purpose structure called a labeled property graph. This talk will describe the key events behind this movement and show how a new generation of data scientist will be needed to build and maintain corporate knowledge graphs that contain a uniform, normalized and highly connected data sets for used by researchers and intelligent agents. We will also discuss the challenges of transferring siloed project-knowledge to reusable structures. 2
  • 3. Hello, my name is dan.mccreary@gmail.com • Distinguished Engineer in AI and Graph Technologies at Optum’s Advanced Technology Collaborative • Co-founder of "NoSQL Now!" conference (now part of Dataversity) • Author of "Making Sense of NoSQL" (w. Ann Kelly) • 15+ years of working with non-tabular knowledge representations • Background in solution architecture, metadata management, NLP, semantics, text analytics and knowledge representation for AI • Disclaimer: All opinions are my own and may not reflect the views of my employer 3
  • 4. Graph a “NoSQL” Data Architecture Relational Analytical (OLAP) Key-Value Column-Family DocumentGraph key value key value key value key value See Chapter 1: https://www.manning.com/books/making-sense-of-nosql 4
  • 6. Relational vs. Graph 6 1. Atomic unit of storage is a row of a table 2. Data is appended one row at a time 3. All columns within a table must have the same structure and no variations within a table are allowed 4. Table structures are fixed after design 5. Query language is SQL 6. Joins are log(N) search's against other tables 1. Atomic units of storage are nodes and edges 2. Each node and edge may have independent properties that are determined at run time (schema agnostic) 3. Joins between nodes and edges are computed at load time and are stored as memory pointers 4. Each core hops through 2M edges/second (1,000x faster than joins) 5. Query language varies although there are some standards e.g. SPARQL/Cypher/Gremlin/etc. Relational (row store) Graph ID NAME DAT E AMOUN T
  • 7. Gartner on Graph Analytics Key Analytics Trends for 2019 1. Augmented Analytics 2. Augmented Data Management 3. Continuous Intelligence 4. Explainable AI 5.Graph 6. Data Fabric 7. NLP/Conversational Analytics 8. Commercial AI/ML 9. Blockchain 10. Persistent Memory Servers 7 …Graph processing to continuously accelerate data preparation and enable more complex and adaptive data science…to efficiently model, explore and query data with complex interrelationships across data silos…the need to ask complex questions across data silos which is not practical or even possible at scale using SQL queries. Graphs are also related to 4, 6 and 7
  • 8. Which of the following organizations use graph databases? • Every major airline uses a graph database to calculate fares in real-time • Over half of retailers use graphs for product recommendations Answer: They all do! 8
  • 9. Amazon Product Graph Job Posting As a leader in e-commerce, Amazon is building an authoritative knowledge base for every product in the world. With hundreds of millions of customers and billions of products, Amazon will offer a challenging but fun journey to turn this big and rapidly changing data into high-quality knowledge to impact customer experiences across Amazon from Alexa to Search to Shopping. As a member of the Product Graph team, based in Seattle, you will play a key role in the establishment of a new platform, with opportunities to create enormous benefit for our customers and Amazon. 9
  • 10. How do we store knowledge? • What exactly is “knowledge”? • How is knowledge different from raw data and information? • What is knowledge engineering? • What is knowledge architecture? • How does this relate to data science? 10
  • 11. 11
  • 12. The Data Science Lifecycle 12 What do we mean by “Understanding”?
  • 13. 2018: Graphs Join with Deep Learning How did we get here? https://arxiv.org/pdf/1806.01261.pdf 13
  • 14. Why Metaphors Matter “Metaphors drive design decisions” GraphConnect Keynote 2018 We make decisions not by having a deep understanding of how technology works, but through being exposed to the right metaphors.Hillary Mason Data Scientist https://neo4j.com/graphconnect-2018/ around 51 minutes in 14
  • 15. Four Graph Metaphors 15 Neighborhood Walk (explains index free agency, performance) Knowledge Triangle (explains data, information and knowledge) The Open World Assumption (explains graph Integration, agility) The Jenga Tower (explains resilience of graphs to change)
  • 16. How do you get to your neighbor’s house? • You walk out your door and over to the house • Your houses are “Adjacent” so getting there is a direct “hop” • You don’t need to consult anyone else about how to get to a neighbor’s house because you have the right pointer 16
  • 17. Your Logical Graph Model • If two physical items are related, they have a relationships arch between them and we model it like the above • We build a “logical” data model that has this link • In a native graph, the vertexes are loaded into memory and then the physical memory addresses of each link is also reflected in each of the nodes 17 Dan LIVES_NEXT_TO Ann
  • 18. Relational Database Use Indexes ● You must walk to a central index system ● The index system does a “search” given the house’s address ● The index system will tell you how to get to your neighbors house 18 Central Index Search: 123 Main St.
  • 19. The Knowledge Triangle Metaphor 19 https://en.wikipedia.org/wiki/DIKW_pyramid • Diagram for representing the relationships between data, information, knowledge, and wisdom • Too often we focus on “Big Data” and not enough on connected knowledge and transferrable knowledge • Graphs are connected information concepts • Wisdom is reusable across multiple context • Can we capture knowledge in a form that can be reused across multiple domains? Data - Binary, Codes, Data Lakes Information - Concepts Knowledge - Patterns Relationships Wisdom & AI
  • 20. From Raw Data to Wisdom with Continuous Enrichment 20 Knowledge that can be reused across multiple context – Transfer Learning Data Lake Information Knowledge Graph Wisdom Raw data dumped from a database, log files or documents Tagged Text Connections Consistent De-duplicated Semantics Concepts Reusable To New Problems Defintions Validity Searchable Continuous Enrichment
  • 21. Structure Definition: The arrangement of and relations between the parts or elements of something complex • The real world has lots of structure • How is structure captured in a machine learning feature? • If we take simple “features” out of the real world do we lose structure? • Do our brains “extract” features? (answer: no) 21
  • 22. The Adversarial Turtle 99% of modern image recognition is just simple (but precise) texture matching https://www.theverge.com/2017/11/2/16597276/google-ai-image-attacks-adversarial-turtle-rifle-3d-printed • Use a 3D printer to print a turtle • Place different “textures” on the shell • Most image recognitions fail • CNNs: “On a very fundamental level, our work highlights how far current CNNs are from learning the 'true' structure of the world” 22 rifle rifle
  • 23. Our Brains are Graphs 100B Neurons 10K Connections per Neuron (degree) 23
  • 24. Three Eras of Computing Procedural Code (Rules) Programs Data Answers Explanations (Why) Machine Learning Data Answers Rules (10M weights) 1) Procedural Era 2) Machine Learning Era 3) Graph Era Data Answers Explanations (Why) Knowledge Machine Learning 24
  • 25. Sir Tim’s Vision RDF 2001 Euler Solves 7 Bridges of Königsberg 1736 W3C SPARQL 2008 Labeled Property Graph Neo4j 1.0 2010 LPG in AI 2018 Google’s Knowledge Graph “Things Not Strings” May 2012 Graph Timeline 25 AlexNet Sept 2012 Graphs Rise On Intelligence 2005
  • 26. The Birth of the Semantic Web • May 2001 • Resource Description Format - RDF • Keep it simple: Triples all the way down • Universal Identifiers: URIs • Ideal for data interchange • The problem: reification • When adding a simple attribute to a relationship causes 10,000 SPARQL queries to become obsolete Subject Object Property 26
  • 27. The Semantic Web Stack 27 Graph Model Graph Query Blockchain
  • 28. 2010 Neo4j 1.0 Released • Neo4j used a new graph data model called a Labeled Property Graph (LPG) • Each vertex and Edge had their own set of properties (key-value pairs) • Each edge must have a single type • Verticies can have 0-N types (labels) • Adding a new property to a relationships does not require you to rewrite you queries! They solved the reification problem but kept the flexibility of graphs! • Developers LOVE Neo4j! Vertex Vertex Properties Properties Properties Edge 28
  • 29. 2012 Google Introduces the Knowledge Graph 29
  • 31. Google Knowledge Graph Results: Better Search • 110 Billion Concepts and an API to get verticies but no edges. • Relationships are too proprietary! 31
  • 32. 23andMe Ancestry Graph Better relationship insights 32
  • 33. Graphs and the “Open World” 33 Past: Closed Word Graph: Open World • Everything is prohibited until it is permitted • You can only add data that you model • Everything is permitted until it is prohibited • Anyone can easily add any data at any time without disruption of services Also known as “schemaless, or schema agnostic”
  • 34. The Jenga Tower Metaphor • What happens to your existing queries when you make a change to your data model? • LPGs allow anyone to add properties to verticies or relationships without disrupting other queries 34 Add a property to your model 1,000 queries need to be rewritten
  • 35. TigerGraph • First commercial distributed native graph product to fully support the LPG data model • Scales to 100B verticies on commodity hardware • Support for subgraphs (lightweight security) • Supports distributed ACID transactions using multi-version concurrency control • Large library of graph algorithms 35
  • 36. Sample Graph Algorithms 36 Dependencies • Failure chains • Order of operations Clustering • Finding related items • Friends, fraud networks Similarity • Similar paths and patterns Matching/Categorizing • Look for and tag specific patterns Flow/Cost • Optimize costs based on routing • Path optimization Centrality/Search • Which nodes are the most connected or relevant? API Server Auth Data
  • 37. Pintrest Interest Graph better ad targeting 37 Billions of interest concepts (an ontology)
  • 38. Graphs Store Concept Distance • How similar are concepts? • How can the distance in a graph help us find the underlying intent of the questions? • How can this help us build automated chatbots? 38 Baby Infant Child Chat Question: We are planning to have a new [baby, infant, child], what are my benefits? Chatbot: Here is a link to your maternity benefits. Maternity
  • 39. Normalized Google Distance (NGD) A semantic similarity measure derived from the number of hits returned by the Google search engine for a given set of keywords. 39 1. "Shakespeare" returns 130M pages 2. "Macbeth" returns 26M pages 3. "Shakespeare Macbeth" returns 20.8M pages where N is the total number of web pages searched by Google multiplied by the average number of singleton search terms occurring on pages; f(x) and f(y) are the number of hits for search terms x and y, respectively; and f(x, y) is the number of web pages on which both x and y occur.
  • 40. Data Lakes Today • Very little reuse of code to “understand” and link data • Few people using rules engines and ML to link data 40 Data Lake (Data Swamp?) 100s of Data Scientists 100s of R and Python Libraries 80% of effort is “Data Engineering” 20% is Data Science Data Access Code
  • 41. From Data Scientist to Knowledge Scientist 41 Data Information Knowledge Graph Wisdom Data Engineering Insights 70% of your time Fast Path with Feedback Slow Path
  • 42. Causality: The Apocryphal Cancer Gene Story 42 The Smoking Gene Theory (1950) The Correct Causal Relationship (1969) Smoking Gene Lung Cancer Urge to Smoke Lung Cancer Tar Smoking 20 million preventable deaths From “The Book of Why” by Judea Pearl ?
  • 43. Explainable AI Models are Graphs 43 https://www.darpa.mil/program/explainable-artificial-intelligence Training Data Training Data Machine Learning Process New Machine Learning Process Learned Function Today Explainable AI • Why did you do that? • Why not something else? • When do you succeed? • When do you fail? • When can I trust you? • How to I correct and error? • I understand why • I understand why not • I know when you succeed • I know when you fail • I know when to trust you • I know why you erred Decision or Recommendation Explanation Interface Explainable Model Task Task
  • 44. Model-based Machine Learning 44 • Model-based machine learning requires the user to create a model of the world in the form of a graph. This model encodes the assumptions you make such as how a change on one variable changes another variable (causal relationships). • Model-based machine learning allows fewer algorithms and universal inference Many Algorithms Flat Data Hidden Assumptions Many Algorithms Specific Solution Graph Data Assumptions in Graph Structure One Algorithm Universal Inference General Solution Few Algorithms http://www.mbmlbook.com
  • 45. 2005: On Intelligence The key to Artificial Intelligence has always been the representation. Jeff Hawkins Sensory Data Abstract Concepts Hierarchical Temporal Memory (HTM) 45 Edge Computing Converts Sensors into Concepts
  • 46. General CPU Hardware vs. Graph Hardware • Most graph traversal algorithms only use simple pointer hopping • How efficient are CPU and GPUs at running graph algorithms? • No need for floating point • No need for matrix multiplication 46 CPU 1,000 Instructions Available (1503 defined x86 instructions ) 100 Instructions Used
  • 47. GPUs are optimized for array processing • GPUs were designed to support video games • GPUs are optimized for highly parallel matrix multiplication • Inefficient for graph “hop” calculations 47 NVIDIA "Pascal" GP100
  • 48. What’s Coming: Graph Hardware! Graphcore – graph in hardware - $350M in funding 48
  • 49. AlexNet • A convolutional neural network (CNN), designed by Alex Krizhevsky • First image recognition program to use GPUs • Beat other teams by huge margin 10% 2012 49
  • 50. AlexNet in Graphcore • Each layer in the algorithm maps to a region of the graph • Initial layers are convolutional layers • The colors represent connection density 50
  • 51. Different Algorithms Look Different 51 ResNet50: “AI Brain Scan” https://www.wired.co.uk/gallery/machine-learning-graphcore-pictures-inside-ai
  • 52. 52
  • 53. Dan’s Predictions • Graph technology will continue to gain relevance in analytics • LPGs will be the dominate the graph data model • RDF/SPARQL might only be used in niche areas • Once a standard query language is adopted the ability to reuse algorithms will grow quickly • The number of graph databases and middle-tier algorithm products will continue to grown 53
  • 54. Recommendations • Become a knowledge scientist! • Learn a bit about graph modeling and graph algorithms • Think about structure when doing feature design • Understand the causal relationships between your data (Bayesian Graphs) • Use graph databases when you have lots of relationships or want real-time analytics (rules engines, recommendations) • Build knowledge APIs, not just more libraries for data lakes • Learn a few graph algorithms • Similarity • Clustering • Recommendation 54
  • 55. Resources • Dan’s Medium Blog: https://medium.com/@dmccreary • Machine Learning with Python • A Comprehensive Guide to Graph Algorithms in Neo4j Mark Needham & Amy E. Hodler • Wikipedia Articles • Graph Databases • Similarity (Network Science) • Google Normalized Distance • Explainable AI 55
  • 56. Thank You! Please send e-mail to Dan.McCreary@gmail.com if you want a copy of the slides. 56

Editor's Notes

  1. My background is a solution architect. I have spent most of the last 20 years understanding how to objectively match business problems to the appropriate technologies. My focus has been on the fast-evolving area of NoSQL databases. I have also had a strong interest in AI, semantics, natural language process and search.
  2. OK, now lets take a step back and take a more structure look at where these tools fit into our business processes at Optum. There are six main database architecture patterns we use when we think of a business problem. Relational or row-stores Analytical or OLAP Key-Value stores – one of the simplest but most extensible data architectures Column-family stores Graph stores and Document stores Graph is just one of these six. Graph stores are often most closely related to document stores. Both Graph and document stores have the ability for new data to be added to structures without needing to remodel the data. We call these systems schema-free or schema agnostic. They are a key driver for highly agile systems. Your systems may often draw on two or more of these systems. Databases that support multiple data models are called multi-model databases. They prevent us from having to store the same data over in multiple systems for transactions, search and analytics. Multi-modal systems that integrate graph technologies are also an emerging trend.
  3. https://db-engines.com/en/ranking_categories
  4. Now lets do a side-by-side comparison of both the traditional Relational row-store and compare some key facts. With a row store, the atomic unit of work is adding a single row at a time to a table. The key is that all the datatypes in each column must be the same. If there are dates in the third column and decimals in the fourth column all your data must conform to this standard. The table column structures and datatypes are fixed when you design your database. Once you have a million rows loaded into each table and 10,000 reports created it becomes challenging to modify the database. Relational database also use the SQL language and they use join operations to dynamically calculate relationships each time the query is run. These calculations are based on binary search algorithms that run in log(N) time where N is the number of rows in each table. As the table grows the searches sale as the log of the number of rows. Graph databases on the other hand allow you to add any number of nodes and and relationships into your graph database. Each node and relationship has it own properties but there are usually few overarching rules about what datatypes these structures can contain. Graph databases also used fixed memory pointers to store relationships between each node. As a result the queries over relationships are fast and there is no slow-down as the number of verticies gets bigger.
  5. http://aima.cs.berkeley.edu/
  6. https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
  7. https://arxiv.org/pdf/1806.01261.pdf
  8. https://neo4j.com/graphconnect-2018/ around 51 minutes in
  9. Sometimes the best way to understand how graph databases are different is using a metaphor. We call this metaphor the “Neighbor Walk” metaphor. It has proven very helpful for people that are trying to understand the performance differences between relational joins and a graph traversal. Let’s say you want to walk out your front door and over to your neighbor’s house. You open the door, point you body to the neighbor’s house and walk over there. Pretty simple. Since your houses are adjacent this is the logical way to do it.
  10. Here is the graph “logical model” for this. You might have a vertex for your house, a relationships called LIVES_NEXT_TO as a pre-calculated relationship to your neighbor’s house.
  11. Here are the steps that are reflective of how a relational databases does joins Walk out the door Walk downtown to the DMV where they have a service that tells you how to get to your neighbor’s house When you get to the DMV you take a number When your number comes up you get called by a search agent You give your neighbors name to the DMV search agent The search agent at the DMV has a list of all the people in your town sorted by their address (called the index). They search the list (using a binary search algorithm) and they finally give you the GPS coordinates of your neighbor’s house. You take these GPS coordinates, enter them them into your GPS tracker and follow the directions to Ann’s house. Now granted the metaphor is not perfect. The speedup really depends on how many rows each table has. However, this metaphore helps you remember that sometimes directly addressing a memory location that is pre-calculated for you at load time can sometimes be three orders of magnitude faster than searching for something every time you need to access it. RDBMS systems try to minimize this search time by doing clever things like caching. However the more data you have the longer the searches take. Direct memory access will always be faster than doing a search! Now lets also look at some different architectures than a single node graph.
  12. Now lets take a look into some of the reasons that organizations are moving toward knowledge graphs as ways to connect and reuse data. The structure we used to describe this system is called the DIKY pyramid. It is a triangle with Data at the bottom, Information at one level up, Knowledge (in the form of a graph) at the third level and Wisdom at the top. The Wisdom is the layer most strongly associated with AI. When we think of going to the top of the mountain and we ask the Gurus for advice, we are asking them to apply their knowledge to our specific problem. We are asking them to transfer knowledge to our context.
  13. This picture follows this same DIKW pattern. However, we want to invoke the idea of raw binary data at the bottom with binary symbols that have little meaning without context. The second layer is where we identify the nouns in our documents for data. We look for people, places and things in the byte stream. This is where we can understand isolated data – know the types, the definitions of the types and be able to validate if the 1s and 0s make sense within a narrow context. The third layer is where we start to tie our Nouns together in a graph. It is where we build relationship links between things. It is where we might look for duplicated data (Master Data Management), where we check for consistency and we verify the patterns of connections are consistent. The highest level is where we restructure our graph so that it comes in sub-graphs that are reusable across multiple applications. We can provide consistent APIs that pull data in consistent ways and these interfaces are reusable across many domains. Central to this pyramid is the concept of continuous enrichment cycles. As we discover new things at a higher level, we sometimes provide feedback to lower levels.
  14. https://www.blog.google/products/search/introducing-knowledge-graph-things-not/
  15. If you go to Google and type “chest pain” you will note that a “Knowledge Summary” box appears in the right side. Note that the keyword is mapped into a Preferred Term called “Angina” – which is the formal medical condition name. You will also not that the term “Ischemic chest pain” is also shown as an alternate label. The Knowledge summary has tabs for ABOUT, SYMPTOMS and TREATMENTS. The summary box also indicates that this is a common condition that impacts over 3 million people per year in the US. This is an example of using a Graph to group common concepts about a topic. The keyword gets you to the right part of the graph, but the knowledge summary is a machine generated summary that has been carefully reviewed for quality by the Mayo Clinic. Googles Knowledge Graph contains over 100 billion “facts” about things that people search for. They build this graph by harvesting information from many web pages and using both Natural Language Processing (NLP) and machine learning based on what users actually click on to find the most relevant summary information for any topic in the graph.
  16. There is another way to describe the differences between fixed and schema agnostic. This is related to the way that logic systems make assumptions about unknown data. Many relational database use logic that imply that missing or unknown data is always false. Graph databases use an assumption that unknown data is usually unknown. This is known as the Open World Assumption (OWA). https://en.wikipedia.org/wiki/Open-world_assumption http://www.mkbergman.com/852/the-open-world-assumption-elephant-in-the-room/
  17. Dan
  18. Explainable AI is a key https://www.darpa.mil/program/explainable-artificial-intelligence