SlideShare a Scribd company logo
1 of 34
KNOWLEDGE GRAPH CONSTRUCTION
FOR RESEARCH & MEDICINE
Paul Groth (@pgroth)
pgroth.com
Disruptive Technology Director
Elsevier Labs (@elsevierlabs)
Connected Data London 2017
Contributions: Brad Allen, Pascal Coupet, Sujit Pal, Craig Stanley, Ron Daniel, Alex de Jong
Our customers are facing challenges in
science and health
1. Industrial Research Institute 2. The Lancet 3. Tufts 4. World Health Organization
Elsevier is in a unique position to make a contribution
towards solving these challenges
Life-saving drugs are expensive to develop.3
Global research spend is growing every year.1
3.4%
from 2015
Predicted spend
$1.9TN
research in 2016
Studies:
70-80% of
research asks the
wrong questions
or cannot be
reproduced
Researchers lack the tools they need to be
effective.2
Preventable medicalerrors:
Third largest cause of death in theUS
Health providers cannot save lives without the best
information.4
$2.5BN
median pharmaceutical
spend per drug
1/20
successrate
of drugs
Heart
Disease
611k
Cancer
585k
Medical
Error
225k 149k
Respiratory
Illness
ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR
RESEARCHERS, DOCTORS AND NURSES
My work is moving towards a new field; what should I know?
• Journal articles, reference works, profiles of researchers, funders &
institutions
• Recommendations of people to connect with, reading lists, topic pages
How should I treat my patient given her condition & history?
• Journal articles, reference works, medical guidelines, electronic health
records
• Treatment plan with alternatives personalized for the patient
How can I master the subject matter of the course I am taking?
• Course syllabus, reference works, course objectives, student history
• Quiz plan based on the student’s history and course objectives
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
ANSWERS ARE ABOUT THINGS, NOT JUST WORKS
Why shouldn’t a search on an author return
information about the author, including the
author’s works? Where was the author born,
when did she live, what is she known for? … All of
this is possible, but only if we can make some
fundamental changes in our approach to
bibliographic description. ... The challenge for us
lies in transforming what we can of our data into
interrelated “things” without overindulging that
metaphor.
Coyle, K. (2016). FRBR, before and after: a look at our
bibliographical models. Chicago: ALA Editions.
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
KNOWLEDGE GRAPHS DEFINED
• Knowledge graphs are "graph structured knowledge bases (KBs) which store factual
information in form of relationships between entities” (Nickel, M., Murphy, K., Tresp, V. and
Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs.
arXiv:1503.00759v3)
• Knowledge graphs are metadata evolved beyond the focus on the work, linking people, concepts,
things and events
• Knowledge Graphs are focused on things to provide answers
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
ELSEVIER’S KNOWLEDGE PLATFORM
Products
Data & Content
Sources
Knowledge
Graphs
Platforms &
Shared Services
Entity Hubs
Usage logs Pathways EHRsArticles Authors Institutions
SyllabiCitations ChemicalsBooks DrugsFunders
Funder Hub Article HubProfile Hub Journal Hub Institution Hub
Research HealthcareLife Sciences
Content Life Sciences Search IdentityResearch
Reaxys CK SherpathScopus SD ROS
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
THE GROWTH OF SCIENCE COMPLICATES OUR EFFORTS
MORE DOMAINS & MORE SPECIFICITY
Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A.,
& Wyatt, S. (2017). Searching Data: A Review of
Observational Data Retrieval Practices. arXiv
preprint arXiv:1707.06937.
Some observations from @gregory_km
survey:
1. The needs and behaviours of specific user groups
(e.g. early career researchers, policy makers,
students) are not well documented.
2. Background uses of observational data are better
documented than foreground uses.
3. Reconstructing data tables from journal articles,
using general search engines, and making direct data
requests are common.
MACHINE READING
TOPIC PAGES
Definition
Related
terms
Relevant
ranked
snippets
HOW - OVERVIEW
Content
Books, Articles, Ontologies ...
• Identification of concepts
• Disambiguation
• Domain/sub-domain
identification
• Abbreviations,
variants
• Gazeteering
• Identification and
classification of text snippets
around concepts
• Features building for
concept/snippet pairs
• Lexical, syntactic,
semantic, doc
structure …
• Ranking concept snippet pairs
• Machine learning
• Hand made rules
• Similarities
• Deduplication
Technologies
NLP, ML
• Curation
• White list driven
• Black list
• Corrections/improve
ments
• Evaluation
• Gold set by domain
• Random set by
domain
• By SMEs (Subject
Matter Experts)
• Automation
• Content Enrichment
Framework
• Taxonomy coverage
extension
Knowledge Graph
Concepts, snippets, meta data, …
| 16
OmniScience
Neuros
cience
Extension vocabularies by domains to provide coverage
Number of Concepts Number of Labels
OmniScience 01.16.11 45969 47421
OmniScience Neuroscience branch 21/11/2016 2356 2455
OmniScience Extension Neuroscience branch 21/11/2016 23932 101276
| 17
Concept Bad Good
Inferior Colliculus
By comparing activation obtained in an equivalent
standard ( non-cardiac-gated ) fMRI experiment ,
Guimaraes and colleagues found that cardiac-
gated activation maps yielded much greater
activation in subcortical nuclei , such as the
inferior colliculus .
The inferior colliculus (IC) is part of the tectum of the midbrain (mesencephalon) comprising the quadrigeminal
plate (Lamina quadrigemina). It is located caudal to the superior colliculus on the dorsal surface of the
mesencephalon ( Figure 36.7 FIGURE 36.7Overview of the human brainstem; view from dorsal. The superior and
inferior colliculi form the quadrigeminal plate. Parts of the cerebellum are removed.). The ventral border is
formed by the lateral lemniscus. The inferior colliculus is the largest nucleus of the human auditory system. …
Purkinje cells
It is felt that the aminopyridines are likely to
increase the excitability of the potassium channel-
rich cerebellar Purkinje cells in the flocculus (
Etzion and Grossman , 2001 ) .
Purkinje cells are the most salient cellular elements of the cerebellar cortex. They are arranged in a single row
throughout the entire cerebellar cortex between the molecular (outer) layer and the granular (inner) layer. They
are among the largest neurons and have a round perikaryon, classically described as shaped “like a chianti
bottle,” with a highly branched dendritic tree shaped like a candelabrum and extending into the molecular layer
where they are contacted by incoming systems of afferent fibers from granule neurons and the brainstem…
Olfactory Bulb
The most common sites used for induction of
kindling include the amygdala, perforant path ,
dorsal hippocampus , olfactory bulb , and
perirhinal cortex.
The olfactory bulb is the first relay station of the central olfactory system in the vertebrate brain and contains in
its superficial layer a few thousand glomeruli, spherical neuropils with sharp borders ( Figure 1 Figure 1Axonal
projection pattern of olfactory sensory neurons to the glomeruli of the rodent olfactory bulb. The olfactory
epithelium in rats and mice is divided into four zones (zones 1–4). A given odorant receptor is expressed by
sensory neurons located within one zone of the epithelium. Individual olfactory sensory neurons express a single
odorant receptor…
Examples of good and bad snippets
19
One Weird Trick from Natural Language Processing (NLP)
• Knowledge bases are populated by scanning text and doing Information Extraction
• Most information extraction systems are looking for very specific things, like drug-drug interactions
• Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text
• For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar
• The weird trick for open information extraction … a simple algorithm, known as ReVerb*:
1. Find “relation phrases” starting with a verb and ending with a verb or preposition
2. Find noun phrases before and after the relation phrase
3. Discard relation phrases not used with multiple combinations of arguments.
In addition, brain scans were performed to exclude
other causes of dementia.
* Fader et al. Identifying Relations for Open Information Extraction
20
ReVerb output
# SD Documents Scanned 14,000,000
Extracted ReVerb Triples 473,350,566
21
Universal schemas – Predict ‘missing‘ KG facts
• Make a matrix:
• columns for the relation phrases
from ReVerb or the semantic
relations from EMMeT
• rows are the pairs of concepts
linked by a relation
• A ‘1.0’ in a cell if those concepts
were linked by that relation
• Outlined cells in diagram
are the ones initialized to
1.
• Factorize matrix to ExK and KxR, then
recombine.
• “Learns” the correlations between text
relations and EMMeT relations, in the
context of pairs of objects.
• Cells going from 0 to > 0
indicates potential.
• Find new triples to go into EMMeT e.g.,
(glaucoma, has_alternativeProcedure,
biofeedback)
BUILDING
KNOWLEDGE GRAPHS
FROM DATA
23
Medical Graph – Statistical correlations at scale
I65
Occlusion and stenosis
of precerebral arteries
G40
Epilepsy
has_successor
I61
C71
Malignant neoplasm
of brain
odds ratio: 1.12
intracerebral
hemorrhage has_successor criteria1:
• Correlation selected by
preditive modeling
algorithmus
• No. of relations is higher
than in mirrored relation
• p-value < 0,05
• Odds ratios balanced over
all covariates.
1 Criteria based on: Jensen et.al.: Temporal disease trajectories condensed from population-wide registry data
covering 6.2 million patients. Nature Communications, 2014 Jun 24 ;5:4022. doi: 10.1038/ncomms5022.
Other
covariates
Primary care
Secondary care
Drug prescriptions
5m patients
each 6 years longitudinality
24
Medical Graph in practice, patient 35: risk of depression
• 49 year old man
• Dx: overweight,
diabetes,
hypertension,
anxiety disorder
 has an absolute
risk of 36% to
develop a
depression within
the next 4 years
25
… and rationale of why model thinks this
26
• Targets for prediction: ICD-coded diagnoses
• Only incident patients per diagnose considered, i.e. diagnosis-free 2009 – 2010
• if these patients remain diagnosis-free 2011 - 2014 (observation period), then 0 else 1
• Covariates: all ICD-/ATC-codes, age and sex measured in 2010
Example: Model to predict „I50 – Heart Failure“
26
Analysis Design
Predict 4 year long-term effects, balanced for all co-variables
I50 -
I50 free
patients
2009 2010
time
I50 -
(coded
as 0)
I50 +
(coded
as1)
2011 2014
Covariates
Remaining I50 free patients/ newly I50 diagnosed patients
27
27
(A) integrate & clean
Research on anonymized claims data
Primary care
Secondary care
Drug prescriptions
Other data
Visits & diagnoses
Visits, diagnoses &
procedures
Drug presciptions
Further cooperations just started
Will enable analysis of vital and laboratory parameters
Data integration
& cleaning
• Data cleaning
• Longitudinally linked &
integrated for analytics
• Anonymized
6 Mio patients
6 years
> 1.5b events
Billing data flow
60+ sickness funds
28
Technology
stack
feature
extraction
For 3.8m patients:
• age, gender
• all diagnoses: ICD10-coded, 3 digits, i.e. 2054 codes
• all medications: ATC-coded, 5 digits, i.e. 906 codes
• death, hospitalization
Results in: 6277 features
• 1623 targets, 2011-2014
• 2320 covariates, 2010
• 2334 filter-columns, 2009-2010
data mining Calculate prevalence, incidence, mean age for all covariates (i.e. diseases
and medications)
machine
learning
Predictive modelling for ~1600 targets
• Linear classification model, resulting in odds ratios
• Calculation of p-values
(B) mine & learn
Calculate statistics & build prediction models for ~1600 targets
KNOWLEDGE GRAPHS
AS A BACKBONE
29
• Total concepts = 540,632
• 100+ person years of clinical
expert knowledge
EMMeT Ontology
EXPANDING TO INCLUDE IMAGE SNIPPETS
33
Enrichment though integration using linked data
CONCLUSION
• Knowledge graphs are critical components for delivering customer value
• AI techniques such as machine learning and predictive modelling from data are key parts of
knowledge graph construction
• This is particularly the case as the amount, speed and specificity of data and requirements
accelerates
• Leveraging existing assets such as ontologies, data, and controlled (i.e. connected data) have
been key assets for Elsevier in the build out of knowledge graphs
• Another talk is how all this enables intelligent based solutions
• Oh and we are hiring 
• Paul Groth p.groth@elsevier.com

More Related Content

What's hot

Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxNeo4j
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryNeo4j
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!Adrian Hornsby
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
 
Power BI Desktop | Power BI Tutorial | Power BI Training | Edureka
Power BI Desktop | Power BI Tutorial | Power BI Training | EdurekaPower BI Desktop | Power BI Tutorial | Power BI Training | Edureka
Power BI Desktop | Power BI Tutorial | Power BI Training | EdurekaEdureka!
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesDianaGray10
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarPeter Ward
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Neo4j
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
 
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowDatabricks
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 

What's hot (20)

Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!
 
introduction Azure OpenAI by Usama wahab khan
introduction  Azure OpenAI by Usama wahab khanintroduction  Azure OpenAI by Usama wahab khan
introduction Azure OpenAI by Usama wahab khan
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
 
Power BI Desktop | Power BI Tutorial | Power BI Training | Edureka
Power BI Desktop | Power BI Tutorial | Power BI Training | EdurekaPower BI Desktop | Power BI Tutorial | Power BI Training | Edureka
Power BI Desktop | Power BI Tutorial | Power BI Training | Edureka
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 

Similar to Knowledge graph construction for research & medicine

Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...Neuroscience Information Framework
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework Neuroscience Information Framework
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsramakanz
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...BaoTramDuong2
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...Maryann Martone
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...Artificial Intelligence Institute at UofSC
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databasesTarek Tawfik Amin
 

Similar to Knowledge graph construction for research & medicine (20)

Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systems
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
 
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databases
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 

More from Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-cziPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 

More from Paul Groth (20)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 

Recently uploaded

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Recently uploaded (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Knowledge graph construction for research & medicine

  • 1. KNOWLEDGE GRAPH CONSTRUCTION FOR RESEARCH & MEDICINE Paul Groth (@pgroth) pgroth.com Disruptive Technology Director Elsevier Labs (@elsevierlabs) Connected Data London 2017 Contributions: Brad Allen, Pascal Coupet, Sujit Pal, Craig Stanley, Ron Daniel, Alex de Jong
  • 2. Our customers are facing challenges in science and health 1. Industrial Research Institute 2. The Lancet 3. Tufts 4. World Health Organization Elsevier is in a unique position to make a contribution towards solving these challenges Life-saving drugs are expensive to develop.3 Global research spend is growing every year.1 3.4% from 2015 Predicted spend $1.9TN research in 2016 Studies: 70-80% of research asks the wrong questions or cannot be reproduced Researchers lack the tools they need to be effective.2 Preventable medicalerrors: Third largest cause of death in theUS Health providers cannot save lives without the best information.4 $2.5BN median pharmaceutical spend per drug 1/20 successrate of drugs Heart Disease 611k Cancer 585k Medical Error 225k 149k Respiratory Illness
  • 3. ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR RESEARCHERS, DOCTORS AND NURSES My work is moving towards a new field; what should I know? • Journal articles, reference works, profiles of researchers, funders & institutions • Recommendations of people to connect with, reading lists, topic pages How should I treat my patient given her condition & history? • Journal articles, reference works, medical guidelines, electronic health records • Treatment plan with alternatives personalized for the patient How can I master the subject matter of the course I am taking? • Course syllabus, reference works, course objectives, student history • Quiz plan based on the student’s history and course objectives
  • 4. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ANSWERS ARE ABOUT THINGS, NOT JUST WORKS Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  • 5. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER KNOWLEDGE GRAPHS DEFINED • Knowledge graphs are "graph structured knowledge bases (KBs) which store factual information in form of relationships between entities” (Nickel, M., Murphy, K., Tresp, V. and Gabrilovich, E. (2015). A review of relational machine learning for knowledge graphs. arXiv:1503.00759v3) • Knowledge graphs are metadata evolved beyond the focus on the work, linking people, concepts, things and events • Knowledge Graphs are focused on things to provide answers
  • 6.
  • 7.
  • 8.
  • 9. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ELSEVIER’S KNOWLEDGE PLATFORM Products Data & Content Sources Knowledge Graphs Platforms & Shared Services Entity Hubs Usage logs Pathways EHRsArticles Authors Institutions SyllabiCitations ChemicalsBooks DrugsFunders Funder Hub Article HubProfile Hub Journal Hub Institution Hub Research HealthcareLife Sciences Content Life Sciences Search IdentityResearch Reaxys CK SherpathScopus SD ROS
  • 10. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER THE GROWTH OF SCIENCE COMPLICATES OUR EFFORTS
  • 11. MORE DOMAINS & MORE SPECIFICITY Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey: 1. The needs and behaviours of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. 2. Background uses of observational data are better documented than foreground uses. 3. Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common.
  • 14.
  • 15. HOW - OVERVIEW Content Books, Articles, Ontologies ... • Identification of concepts • Disambiguation • Domain/sub-domain identification • Abbreviations, variants • Gazeteering • Identification and classification of text snippets around concepts • Features building for concept/snippet pairs • Lexical, syntactic, semantic, doc structure … • Ranking concept snippet pairs • Machine learning • Hand made rules • Similarities • Deduplication Technologies NLP, ML • Curation • White list driven • Black list • Corrections/improve ments • Evaluation • Gold set by domain • Random set by domain • By SMEs (Subject Matter Experts) • Automation • Content Enrichment Framework • Taxonomy coverage extension Knowledge Graph Concepts, snippets, meta data, …
  • 16. | 16 OmniScience Neuros cience Extension vocabularies by domains to provide coverage Number of Concepts Number of Labels OmniScience 01.16.11 45969 47421 OmniScience Neuroscience branch 21/11/2016 2356 2455 OmniScience Extension Neuroscience branch 21/11/2016 23932 101276
  • 17. | 17 Concept Bad Good Inferior Colliculus By comparing activation obtained in an equivalent standard ( non-cardiac-gated ) fMRI experiment , Guimaraes and colleagues found that cardiac- gated activation maps yielded much greater activation in subcortical nuclei , such as the inferior colliculus . The inferior colliculus (IC) is part of the tectum of the midbrain (mesencephalon) comprising the quadrigeminal plate (Lamina quadrigemina). It is located caudal to the superior colliculus on the dorsal surface of the mesencephalon ( Figure 36.7 FIGURE 36.7Overview of the human brainstem; view from dorsal. The superior and inferior colliculi form the quadrigeminal plate. Parts of the cerebellum are removed.). The ventral border is formed by the lateral lemniscus. The inferior colliculus is the largest nucleus of the human auditory system. … Purkinje cells It is felt that the aminopyridines are likely to increase the excitability of the potassium channel- rich cerebellar Purkinje cells in the flocculus ( Etzion and Grossman , 2001 ) . Purkinje cells are the most salient cellular elements of the cerebellar cortex. They are arranged in a single row throughout the entire cerebellar cortex between the molecular (outer) layer and the granular (inner) layer. They are among the largest neurons and have a round perikaryon, classically described as shaped “like a chianti bottle,” with a highly branched dendritic tree shaped like a candelabrum and extending into the molecular layer where they are contacted by incoming systems of afferent fibers from granule neurons and the brainstem… Olfactory Bulb The most common sites used for induction of kindling include the amygdala, perforant path , dorsal hippocampus , olfactory bulb , and perirhinal cortex. The olfactory bulb is the first relay station of the central olfactory system in the vertebrate brain and contains in its superficial layer a few thousand glomeruli, spherical neuropils with sharp borders ( Figure 1 Figure 1Axonal projection pattern of olfactory sensory neurons to the glomeruli of the rodent olfactory bulb. The olfactory epithelium in rats and mice is divided into four zones (zones 1–4). A given odorant receptor is expressed by sensory neurons located within one zone of the epithelium. Individual olfactory sensory neurons express a single odorant receptor… Examples of good and bad snippets
  • 18.
  • 19. 19 One Weird Trick from Natural Language Processing (NLP) • Knowledge bases are populated by scanning text and doing Information Extraction • Most information extraction systems are looking for very specific things, like drug-drug interactions • Best accuracy for that one kind of data, but misses out on all the other concepts and relations in the text • For broad knowledge base, use Open Information Extraction that only uses some knowledge of grammar • The weird trick for open information extraction … a simple algorithm, known as ReVerb*: 1. Find “relation phrases” starting with a verb and ending with a verb or preposition 2. Find noun phrases before and after the relation phrase 3. Discard relation phrases not used with multiple combinations of arguments. In addition, brain scans were performed to exclude other causes of dementia. * Fader et al. Identifying Relations for Open Information Extraction
  • 20. 20 ReVerb output # SD Documents Scanned 14,000,000 Extracted ReVerb Triples 473,350,566
  • 21. 21 Universal schemas – Predict ‘missing‘ KG facts • Make a matrix: • columns for the relation phrases from ReVerb or the semantic relations from EMMeT • rows are the pairs of concepts linked by a relation • A ‘1.0’ in a cell if those concepts were linked by that relation • Outlined cells in diagram are the ones initialized to 1. • Factorize matrix to ExK and KxR, then recombine. • “Learns” the correlations between text relations and EMMeT relations, in the context of pairs of objects. • Cells going from 0 to > 0 indicates potential. • Find new triples to go into EMMeT e.g., (glaucoma, has_alternativeProcedure, biofeedback)
  • 23. 23 Medical Graph – Statistical correlations at scale I65 Occlusion and stenosis of precerebral arteries G40 Epilepsy has_successor I61 C71 Malignant neoplasm of brain odds ratio: 1.12 intracerebral hemorrhage has_successor criteria1: • Correlation selected by preditive modeling algorithmus • No. of relations is higher than in mirrored relation • p-value < 0,05 • Odds ratios balanced over all covariates. 1 Criteria based on: Jensen et.al.: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nature Communications, 2014 Jun 24 ;5:4022. doi: 10.1038/ncomms5022. Other covariates Primary care Secondary care Drug prescriptions 5m patients each 6 years longitudinality
  • 24. 24 Medical Graph in practice, patient 35: risk of depression • 49 year old man • Dx: overweight, diabetes, hypertension, anxiety disorder  has an absolute risk of 36% to develop a depression within the next 4 years
  • 25. 25 … and rationale of why model thinks this
  • 26. 26 • Targets for prediction: ICD-coded diagnoses • Only incident patients per diagnose considered, i.e. diagnosis-free 2009 – 2010 • if these patients remain diagnosis-free 2011 - 2014 (observation period), then 0 else 1 • Covariates: all ICD-/ATC-codes, age and sex measured in 2010 Example: Model to predict „I50 – Heart Failure“ 26 Analysis Design Predict 4 year long-term effects, balanced for all co-variables I50 - I50 free patients 2009 2010 time I50 - (coded as 0) I50 + (coded as1) 2011 2014 Covariates Remaining I50 free patients/ newly I50 diagnosed patients
  • 27. 27 27 (A) integrate & clean Research on anonymized claims data Primary care Secondary care Drug prescriptions Other data Visits & diagnoses Visits, diagnoses & procedures Drug presciptions Further cooperations just started Will enable analysis of vital and laboratory parameters Data integration & cleaning • Data cleaning • Longitudinally linked & integrated for analytics • Anonymized 6 Mio patients 6 years > 1.5b events Billing data flow 60+ sickness funds
  • 28. 28 Technology stack feature extraction For 3.8m patients: • age, gender • all diagnoses: ICD10-coded, 3 digits, i.e. 2054 codes • all medications: ATC-coded, 5 digits, i.e. 906 codes • death, hospitalization Results in: 6277 features • 1623 targets, 2011-2014 • 2320 covariates, 2010 • 2334 filter-columns, 2009-2010 data mining Calculate prevalence, incidence, mean age for all covariates (i.e. diseases and medications) machine learning Predictive modelling for ~1600 targets • Linear classification model, resulting in odds ratios • Calculation of p-values (B) mine & learn Calculate statistics & build prediction models for ~1600 targets
  • 29. KNOWLEDGE GRAPHS AS A BACKBONE 29
  • 30.
  • 31. • Total concepts = 540,632 • 100+ person years of clinical expert knowledge EMMeT Ontology
  • 32. EXPANDING TO INCLUDE IMAGE SNIPPETS
  • 33. 33 Enrichment though integration using linked data
  • 34. CONCLUSION • Knowledge graphs are critical components for delivering customer value • AI techniques such as machine learning and predictive modelling from data are key parts of knowledge graph construction • This is particularly the case as the amount, speed and specificity of data and requirements accelerates • Leveraging existing assets such as ontologies, data, and controlled (i.e. connected data) have been key assets for Elsevier in the build out of knowledge graphs • Another talk is how all this enables intelligent based solutions • Oh and we are hiring  • Paul Groth p.groth@elsevier.com

Editor's Notes

  1. Work with dans Reviewed 400 papers deep dive 114