SlideShare a Scribd company logo
1 of 62
A CONTEXT-DRIVEN SUBGRAPH MODEL FOR
LITERATURE-BASED DISCOVERY
PH.D. DISSERTATION DEFENSE
DELROY CAMERON
AUGUST 18, 2014
PH.D. COMMITTEE
AMIT P. SHETH (ADVISOR)
KRISHNAPRASAD THIRUNARAYAN
MICHAEL RAYMER
RAMAKANTH KAVULURU (UKY)
THOMAS C. RINDFLESCH (NIH)
VARUN BHAGWAN (YAHOO! LABS)All truths are easy to understand once they are discovered;
the point is to discover them. (Galileo Galilei, 1564–1642)
2
Historical Perspectives
Walter Sutton
(1877 – 1916)
Theodor Boveri
(1862 – 1915)
Gregor Johann Mendel
(1822 – 1884)
Mendelian Laws of Inheritance
(1866)
Boveri-Sutton Chromosome Theory
(1903)
3
Science of Making Discoveries
Discovery
Information Processing
System
What is promising?
4
Thesis Statement
An information processing system that leverages rich representations
of textual content from scientific literature based on implicit and explicit
context can provide effective means for literature-based discovery.
5
Motivation
Rofecoxib Osteoarthritis1999 TREAT
Merck & Co.
Increased risk of
Heart Attack
2002
2004
$254.3 million
Settlement
2005
Vioxx
Withdrawn
$4.85 billion
Settlement
Confirmed by
Clinical Trial
2007 2011
$950 million
Settlement
2013
$23 million
Settlement
6
Motivation
Literature-Based Discovery (LBD)
7
Literature-Based Discovery (LBD)
ABC Model
AnC Model
Context-Driven Subgraph Model
A CB
A CB1 B2 BiSource: Wikipedia - http://en.wikipedia.org/wiki/Don_R._Swanson
Keyword-based
Concept-based
Relations-based
2006 20111986 1996
ARROWSMITH v1
Term Frequency
1999
IRIDESCENT
Term Co-occurrence
2001
DAD
MetaMAP
UMLS
2003
Litlinker
MeSH, UMLS, Rules
Level of Support
Contribution #1
Context-Driven
Subgraph Model for LBD
SemBT
Semantic Predications
Level of Support
Discovery Browsing
Degree Centrality
Cooperative Reciprocity
Manual
2013
Manjal
UMLS, MeSH
Topic Profiles, TF-IDF
2004
Rajolink
MeSH, Rarity
BioSbKDS
UMLS Relations
MeSH
2005
BITOLA
UMLS, MeSH
Assoc. Rules,
Confidence
Graph-based
ACS (2004)
MeSH,
Hebbian Learning
A CB
CAUSESINHIBITS
A C
PRODUCES
INHIBITS
Discovery Patterns
Hybrid
ARROWSMITH v2
8 Features (2007)
Semantic MEDLINE
Summarization
Discovery Browsing
Epiphanet
Predications-based
Semantic Indexing
CoPub
Keywords, Mutual
Information
2010
Literature-based discovery refers to the use of papers and other academic publications
(the “literature”) to find new relationships between existing knowledge (the “discovery”).
Definition courtesy of Wikipedia: http://en.wikipedia.org/wiki/Literature-based_discovery
8
Application: Raynaud Syndrome – Fish Oil
ISA
Prostaglandin I3
CONVERTS_TO
Dietary
Fish Oils
Platelet
Aggregation
DISRUPTS
ISA
DISRUPTS
DISRUPTS
Epoprostenol
DISRUPTS
ISA
STIMULATES
Prostaglandin
CONVERTS_TO
Raynaud
Syndrome
TREATS
CAUSES
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition
of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.
Dietary
Fish Oils
Platelet
Aggregation
Raynaud
Syndrome
DISRUPTS CAUSES
Dietary
Fish Oils
Platelet
Aggregation
Raynaud
Syndrome
Keyword/
Concept
based
Relations
based
Subgraph
based
Inferred predicates
9
Comparison
Scenario Intermediate Cameron [19]
Srinivasan
[88, 89]
Weeber
[101, 102]
Gordon
[36,37,38]
Hristovski
[40]
Raynaud
Syndrome –
Dietary Fish
Oils
Blood
Viscosity
× × × × ×
Platelet
Aggregation
× × × × ×
Vascular
Reactivity
× × × ×
Ramakrishnan
[72]*
?
?
?
Table 1: Comparison of intermediates rediscovered for Raynaud Syndrome – Dietary Fish Oil
DISRUPTS
ISA
ISA
Dietary
Fish Oils
Platelet
Aggregation
DISRUPTS
Raynaud
Syndrome
CAUSES
Prostaglandins
CONVERTS_TO
Prostacyclin
(PGI2)
DISRUPTS
Prostaglandin I3
(PGI3) TREATSSTIMULATES
Raynaud
Syndrome
Dietary
Fish Oils
Fatty Acid
Essential
Fatty Acid
Triglyceride
Lipid
ISA
DISRUPTS CAUSES
ISA
INHIBIT
AFFECTS
ISA
INHIBITS
Blood
Viscosity
Cellular
Activity
Blood
Physiology
Problem
How to automate this?
Tissue
Function
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis
using
DISRUPTS
ISA
Dietary
Fish Oils
Prostaglandin I3
(PGI3)
Prostacyclin
(PGI2)
Raynaud
Syndrome
CAUSESVasoconstrictionINHIBIT
CONVERTS_TO
AFFECTS DISRUPTS
TREATS
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Dissertation
Contributions
Knowledge
Exploration
Limitations
& Future
Work
PREDICATIONS GRAPH
12
13
. . .
Subgraph Model
Predications
Graph (G)
Candidate
Graph (RG)
Subgraphs (SG)
No two contexts are the same
R(s,t)(c1) R(s,t)(c2) R(s,t)(ck)
R(s,t)
. . .
. . .
What is context?
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Dissertation
Contributions
Knowledge
Exploration
Limitations
&
Future
Work
15
• Path Relatedness
• Semantic Predication Context
Context Distribution Assumption: The context of a semantic predication
can be expressed as the distribution of all MeSH descriptors associated
with all articles that contain it.
Semantic Underpinnings
Relational
Semantic
Summary
Textual
Semantic
Summary
Concept-Level
Semantic
Summary
Interchangeability Assumption: The concept-level and relational semantic
summary of a MEDLINE article are interchangeable.
16
Linguistic Underpinnings
Linguistic items with similar distributions have similar meanings
“You shall know a word
by the company it keeps”
– J. R. Firth 1957
Semantic Predications with shared contexts in their distributions are related
Distributional Semantics
Context-sensitive nature of meaning
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Dissertation
Contributions
Knowledge
Exploration
Limitations
&
Future
Work
18
MeSH Hierarchy
MeSH Hierarchy
Automatic Subgraph Creation
m1 m2
m7 m8
m1 m7 m2 m8
m
1
m5 m9 m
8
Semantic Relatedness
of MeSH Context Vectorsm9m1
m5 m8
Contribution #2
Context of a path
as a vector of
MeSH Descriptors
pi
pj
19
Path Relatedness
3 32
5 42
2
53 6
Objective #1: Maximize weights of In-Context Descriptors
Objective #2: Minimize weights of Out-Of-Context Descriptors
C(pi)
C(pj)
1 3 1 2
2
3 00 00 02 0 0 03 22
5 42 53 61 3 1 20 00
p – path
t – semantic predication
m1 m2 m3 m4 m5
m1 m2 m6 m7 m8 m9 m10 m11 m12 m13
m1 m2 m6 m7 m8 m9 m10 m11 m12 m13m3 m4 m5
C(pi)
C(pj)
20
Path Relatedness: Shared Context
1 00 00 01 0 0 01 11
1 11 11 11 1 1 10 00
Platelet
aggregation
Platelet
activation
Epoprostenol
Platelet
adhesiveness
Prostaglandinsm3 m4 m5 m9 m10 m11 m12 m13
G-Tree
platelet
aggregation
hemostasis
Blood
physiological
process
Blood
physiological
phenomena
Circulatory and respiratory
physiological phenomena
platelet
adhesiveness
platelet
activation Epoprostenol
D-Tree
Prostaglandins
I
Arachidonic
Acids
Fatty Acids,
Unsaturated
Fatty Acids
Lipids
Prostaglandins
Eicosanoids
Contribution #3
Structured Background Knowledge
for computing shared context of paths
C(pi)
C(pj)
21
Path Relatedness Score
*Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006
22
Hierarchical Agglomerative Clustering
A C A CA CA C A CA CA C A C
Iteration 1
Iteration n
. . .
Bucket PopulationBucket Merging
...
A C
A C
A C
A C
Path Relatedness Threshold
1. Bucket Population
2. Bucket Merging
3. Subgraph Ranking
23
Summary of Metrics
• Path Relatedness
– Model: MeSH Context Vectors
– Metrics: Semantics-enhanced shared context, Log Reduction
– Threshold: ??
• MeSH Semantic Similarity
– Model: MeSH Hierarchy
– Metrics: Dice Similarity
– Threshold: Manually
24
Automatic Threshold Selection
RS-DFO Experiment
Manual Threshold = 3.0
Gaussian Distribution
Path Relatedness Score
NumberofPathPairs
25
Automatic Threshold Selection
Gaussian Function
Path Relatedness Score
ExpectedValue
26
Automatic Threshold Selection
• Gaussian Distribution
Diagram courtesy of Wikipedia*
Points of Inflection
27
Threshold Comparisons
Scenario
Path Relatedness Score
Max
2 Std Dev. Manual 3 Std Dev.
RS-DFO 2.68 3.0 3.04 3.38
Testosterone-Sleep 3.35 3.5 3.8262 6.22
DEHP-Sepsis 3.94 4.0 4.53 4.84
Table 2: Path Relatedness Threshold Comparisons
28
Bucket Merging
Ba
Bb
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval. Cambridge University Press 2008,
ISBN 978-0-521-86571-5, pp. I-XXI, 1-482
Straggly Clusters Compact Clusters
Broad Clusters
29
Subgraph Ranking
Intra-Cluster Rank
30
Singleton Ranking
Association Rarity
31
Summary of Metrics
• Path Relatedness
– Model: MeSH Context Vectors
– Metrics: Semantics-enhanced shared context, Log Reduction
– Manual Threshold for Semantic Similarity, Dice Similarity
– Threshold: 2nd Standard Deviation from Mean of Gaussian
• Bucket Relatedness
– Model: Set of Paths
– Metric: Inter-Cluster Similarity
– Threshold: 2nd Standard Deviation from Mean of Gaussian
• Subgraph Ranking
– Metrics: Intra-Cluster Similarity, Singleton Rank (Association Rarity)
32
Algorithm
Time Complexity: Θ(N 2logN )
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Dissertation
Contributions
Knowledge
Exploration
Limitations
&
Future
Work
34
Raynaud Syndrome – Dietary Fish Oil
Inferred predicates
Path Relatedness Threshold = 3σ
Scenario 1: Raynaud Syndrome – Dietary Fish Oil
Details Intermediate Association Status
Cut-off date:
Nov. 1985
By. D. R.
Swanson
(Article)
Blood Viscosity
Dietary Fish Oils INHIBITS Blood
Viscosity
Blood Viscosity CAUSES Raynaud
Syndrome
ZR-15
Platelet Aggregation
Dietary Fish Oils INHIBITS Platelet
Aggregation
Platelet Aggregation CAUSES Raynaud
Syndrome
S1
Vasoconstriction
Dietary Fish Oils INHIBITS
Vasoconstriction
Vasoconstriction CAUSES Raynaud
Syndrome
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 2: Magnesium – Migraine
Details Intermediate Association Status
Cut-off date:
Apr. 1987
By. D. R.
Swanson
(Article)
Calcium Channel Blockers
Magnesium ISA Calcium Channel
Blocker
Calcium Channel Blockers TREATS
Migraine
S22
Epilepsy Magnesium AFFECTS Epilepsy Epilepsy CO_EXISTS_WITH Migraine S9
Hypoxia Magnesium INHIBITS Hypoxia Hypoxia ASSOCIATED_WITH Migraine
Inflammation Magnesium INHIBITS Inflammation Inflammation CAUSES Migraine ZR-3
Platelet Activity
Magnesium INHIBITS Platelet
Aggregation
Platelet Aggregation CAUSES Migraine S1
Prostaglandins
Magnesium STIMULATES
Prostaglandins
Prostaglandins DISRUPTS Migraine S4
Stress/Type A Personality STRESS INHIBITS Magnesium Stress ASSOICATED_WITH Migraine
Serotonin Magnesium INHIBITS Serotonin Serotonin CAUSES Migraine S1
Cortical Depression
Magnesium INHIBITS Spreading
Cortical Depression
Spreading Cortical Depression CAUSES
Migraine
Substance P Magnesium INHIBITS Substance P Substance P CAUSES Migraine
Vascular Mechanisms
Magnesium INHIBITS
Vasoconstriction
Vasoconstriction CAUSES Migraine S9
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 3: Somatomedin C – Arginine
Details Intermediate Association Status
Cut-off date:
Apr. 1989
By. D. R.
Swanson
(Article)
Growth Hormone
Arginine STIMULATES Growth
Hormone
Growth Hormone STIMULATES
Somatomedins (IGF1)
S5
Body Weight (body mass)
Somatomedins (IGF1) STIMULATES
Growth
Arginine STIMULATES Growth S7
Malnutrition Somatomedins TREATS Malnutrition Arginine TREATS Malnutrition S7
Wound Healing (NK
activity)
Somatomedins STIMULATES Wound
Healing
Arginine STIMULATES Wound Healing
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Scenario 4: Indomethacin – Alzheimer’s Disease
Details Intermediate Association Status
Cut-off date:
Jul. 1995
By.
Swanson/Smal
heiser
(Article)
Acetylcholine Indomethacin INHIBITS Acetylcholine Acetylcholine CAUSES Alzheimers S4
Lipid Peroxidation
Indomethacin INHIBITS Lipid
Peroxidation
Lipid Peroxidation CAUSES Alzheimers S2
M2-Muscarinic
Indomethacin INHIBITS M2-
Muscarinic
M2-Muscarinic CAUSES Alzheimers
Membrane Fluidity
Indomethacin INHIBITS Membrane
Fluidity
Membrane Fluidity CAUSES Alzheimers
Lymphocytes
Indomethacin STIMULATES Natural
Killer T-Cell Activity
T-Cell Activity INHIBITS Alzheimers S14
Thyrotropin
Indomethacin STIMULATES
Thyrotropin
Thyrotropin AFFECTS Alzheimers ZR-20
T-lymphocytes (T-Cells)
Indomethacin STIMULATES T-
lymphocytes
T-lymphocyte Activity INHIBITS
Alzheimers
S3
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 5: Estrogen – Alzheimer’s Disease
Details Intermediate Association Status
Cut-off date:
Jul. 1995
By.
Swanson/Smal
heiser
(Article)
Antioxidant Activity Estrogen INHIBITS Antioxidant Activity Antioxidant Activity CAUSES Alzheimers S4
Aliproprotein E (ApoE) Estrogen INHIBITS ApoE ApoE CAUSES Alzheimers S3
Calbindin D28k
Estrogen REGULATES Caldindin
D28k
Calbindin D28k AFFECTS Alzheimers S4
Cathepsin D Estrogen STIMULATES Cathepsin D Cathepsin D PREVENTS Alzheimers
Cytochrome C Oxidase
Subunit III
Estrogen STIMULATES Cytochrome
C Oxidase Subunit III
Cytochrome C Oxidase Subunit III
AFFECTS Alzheimers
Glutamate Estrogen STIMULATES Glutamate Glutamate AFFECTS Alzheimers
Receptor Polymorphism
Estrogen EXHIBITS Receptor
Polymorphism
Receptor Polymorphism AFFECTS
Alzheimers
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 6: Calcium Independent PLA2 – Schizophrenia
Details Intermediate Association Status
Cut-off date:
1997
By.
Swanson/Smal
heiser
(Article)
Oxidative Stress
Oxidative Stress INHIBITS Calcium-
Independent PLA2
Oxidative Stress CAUSES Schizophrenia ZR-2
Selenium
Selenium INHIBITS Calcium-
Independent PLA2
Selenium PREVENTS Schizophrenia ZR-2
Vitamin E
Vitamin E INHIBITS Calcium-
Independent PLA2
Vitamin E PREVENTS Schizophrenia ZR-2
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 7: Chlorpromazine – Cardiac Hypertrophy
Details Intermediate Association Status
Cut-off date:
01/01/2002
By. J. D. Wren
(Article)
Calcineurin Chlorpromazine INHIBITS Calcineurin
Calcineurin CAUSES Cardiac
Hypertrophy
S5
Isoproterenol
Chlorpromazine INHIBITS
Isoproterenol
Isoproterenol CAUSES Cardiamegaly S12
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 8: Testosterone – Sleep
Details Intermediate Association Status
Cut-off date:
01/01/2012
By.
Miller/Rindflesc
h
(Article)
Cortisol/Hydrocortisone Testosterone INHIBITS Cortisol Cortisol DISRUPTS Sleep S7
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis
Details Intermediate Association Status
Cut-off date:
01/01/2013
By.
Cairelli/Rindfle
sch
(Article)
PParGamma DEHP STIMULATES PParGamma PParGamma INHIBITS Sepsis
Legend
ZR-zero rarity
singleton
S-Subgraph
Not Found
Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation
Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
44
Statistical Evaluation
Association Rarity Interestingness
45
Statistical Evaluation
Experiment
# Unique
Associations
Total
MEDLINE
Frequency
Rarity
r(E)
Interestingness
I(E)
Raynaud-Fish Oil 10 0 0.00 1.00
Magnesium-Migraine 48 27 0.56 0.64
SomaC-Arginine 18 306 17.00 0.06
Indomethacin-
Alzheimers
21 9 0.43 0.70
Estrogen-Alzheimers 42 36 0.86 0.54
PLA2-Schizophrenia 10 0 0.00 1.00
CPZ-Cardiac
Hypertrophy
21 2 0.10 0.91
Testosterone-Sleep 61 654 10.72 0.09
Average 29 129 3.71 0.62
Table 3: Rarity and Interestingness score of the subgraphs in the rediscoveries
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Dissertation
Contributions
Knowledge
Exploration
Limitations
&
Future
Work
47
Predications-based Knowledge Exploration
Corpus
Predications Graph
Definitional Knowledge (UMLS + MeSH)
Provenance
Knowledge Abstraction
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature
International Bioinformatics and Biomedical Conference (BIBM11). 512–519 , 2011.
Contribution #4
Combining Assertional and
Definitional Knowledge
for Knowledge Exploration
48
Levels of Contexts
A CB
Predication
Context
A CB1 B2 Bi
Path
Context
A CB1 B2 B3
A CB1 B2
Shared
Context
A C
PRODUCES
INHIBITS
Subgraph
Context
…
…
…
…
…
…
A C
A C
A C
…
Dimensions
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Dissertation
Contributions
Knowledge
Exploration
Limitations
&
Future
Work
50
Dissertation Contributions
1. Context-Driven Subgraph Model
– Knowledge Rediscovery & Decomposition
2. Predication/Path Context
– Vector of MeSH Descriptors
3. Shared Context
– Background Knowledge (MeSH Hierarchy)
4. Semantic Predications-based Text Exploration
– Obvio Web Application
51
Innovation
System/Technique
Technique
Type
Automatic Relational
Evidence-
based
Thematic
Results
#Discoveries #Rediscoveries
IRIDESCENT [108] Keyword 1 0
ARROWSMITH [84]
Keyword/Conc
ept
5 0
DAD [101,102] Concept 0 2
BITOLA [46] Concept 0 1
Litlinker [110] Concept 0 2
Manjal [87,88] Concept × 0 5
SemBT [40,41,42] Relations × × 0 1
BioSbKDS [47] Relations × × 0 1
Wilkowski [107] Graph × × 0 0
Ramakrishnan [72] Graph × × 0 1*
Zhang [114] Graph × × × 0 0
Obvio [19, 21] Graph × × × × 0 8
ARROWSMITH v2 [86,98] Hybrid × 0 6*
Semantic MEDLINE [18,63] Hybrid × × 2 0
Note: References are from the PhD Dissertation manuscript entitled: A Context Driven Subgraph Model for Literature-Based Discovery
Table 4: Comparison of capabilities and accomplishments of LBD techniques
Literature-
Based
Discovery
Context-
Driven
Subgraph
Model
Foundations
Automatic
Subgraph
Creation
Experimental
Results
Dissertation
Contributions
Knowledge
Exploration
Limitations
&
Future
Work
53
Limitations
1. Manual Threshold
– MeSH Semantic Similarity
2. Path Relatedness Threshold
– Only Approximate Gaussian
3. Definition of Context
54
Levels of Semantic Representation
Keywords
Concepts
MeSH Descriptors
Semantic Predications
Ensemble of Features
Relationships
A B
Semantic Predication
PREDICATE
55
Limitations
1. Manual Threshold
– MeSH Semantic Similarity
2. Path Relatedness Threshold
– Only Approximate Gaussian
3. Definition of Context
4. MEDLINE Querying
– Deep integration of Assertional/Definitional
5. Contradiction Detection
6. Statistical Evaluation
7. Scalability of Clustering Algorithm
8. Subgraph Labeling
56
Take Away
• Future of Information Processing
– Rich Knowledge Representations
o Implicit, Formal, Powerful semantics
– Application to Literature-Based Discovery
57
Conclusion
• Context-Driven Subgraph Model
– Manually create Complex Associations
– Automatic Subgraph Creation
o Novel definitions for Context and Shared Context
o Multiple Thematic Dimensions
– Predications-based Knowledge Exploration
o Predicates
o Highlighted MEDLINE sentences
– Knowledge Rediscovery
o 8 out of 9 existing scientific discoveries
58
Publications
1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven Automatic Subgraph Creation for
Literature-Based Discovery (under preparation)
2. D. Cameron, A. P. Sheth, N. Jaykumar, G. Anand, K. Thirunarayan, G. A. Smith. A Hybrid Approach to Finding Relevant Social Media Content for
Domain Specific Information Needs. (submitted to the Journal of Web Semantics)
3. D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery
and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013.
4. D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web
Platform for Drug Abuse Epidemiology using Social Media Journal of Biomedical Informatics (JBI13). 46(6): 985–997, 2013.
5. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. “I just wanted to tell you that Loperamide WILL WORK: A Web-Based
Study of Extra-medical use of Loperamide. Journal of Drug and Alcohol Dependence (DAD13) 130(1–3): 241–244, 2013.
6. D. Cameron, V. Bhagwan, A. P. Sheth. Towards Comprehensive Longitudinal Healthcare Data Capture. International Workshop on Semantic Web
in Literature-Based Discovery (SWLBD12). 241–247, 2012.
7. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. A Web-Based Study of Extra-medical use of Loperamide. The College on
Problems of Drug Dependence (CPDD12), 2012.
8. D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in
Biomedical Literature. International Bioinformatics and Biomedical Conference (BIBM11). 512–519, 2011.
9. D. Cameron, B. Aleman-Meza, I. B. Arpinar, S. L. Decker, A. P. Sheth. A Taxonomy-based Model for Expertise Extrapolation. International
Conference on Semantic Computing (ICSC10). 333–240, 2010.
10. D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference
(ACMSE10). 14, 2010.
11. C. Thomas, W. Wang, P. Mehra, D. Cameron, P. N. Mendes, A. P. Sheth. What Goes Around Comes Around – Improving Linked Open Data through On-
Demand Model Creation. Web Science Conference (WebSci10), 2010.
12. P. N. Mendes, P. Kapanipathi, D. Cameron, A. P. Sheth. Dynamic Associative Relationships on the Linked Data Web. Web Science Conference (WebSci10),
2010.
59
Research Expertise
Literature-Based
Discovery
Text MiningQuestion
Answering
[1]
Information
Retrieval
[2]
[3]
[6]
[4]
[8]
[10]
[5]
[7]
60
Parting Words
“...some day the piecing together of dissociated knowledge will open up such
terrifying vistas of reality,...that we shall either go mad from the revelation or
flee from the deadly light into the peace and safety of a new dark age.”
– H. P. Lovecraft (The Call of Cthulhu, The Horror in Clay).
H. P. Lovecraft. The Call of Cthulhu. In S. T. Joshi, editor. The Call of Cthulhu and Other Weird Stories. Penguin Books Ltd., London, 1999
61
Acknowledgements
• Olivier Bodenreider
• Marcelo Fiszman
• Mike Cairelli
• Swapna Abhyankar
• Drashti Dave
• Dongwook Shin
• Special Thanks
o Pavan
o Shreyansh
o Swapnil
o Nishita
• PREDOSE Team
o Nishita
o Gaurish
o Alan
o Revathy
62
Ph.D. Committee Members
Amit P. Sheth
(Advisor)
T.K. Prasad Michael Raymer
Ramakanth Kavuluru Thomas C. Rindflesch Varun Bhagwan

More Related Content

Similar to Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Determining cognitive distance between publication portfolios of evaluators a...
Determining cognitive distance between publication portfolios of evaluators a...Determining cognitive distance between publication portfolios of evaluators a...
Determining cognitive distance between publication portfolios of evaluators a...Jakaria Rahman
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Artificial Intelligence Institute at UofSC
 
[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdf
[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdf[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdf
[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdfDataScienceConferenc1
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsChristos Argyropoulos
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
Leibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific NotationLeibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific Notationkhinsen
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposterElsa Fecke
 
TCS: A new multiple sequence alignment reliability measure to estimate align...
 TCS: A new multiple sequence alignment reliability measure to estimate align... TCS: A new multiple sequence alignment reliability measure to estimate align...
TCS: A new multiple sequence alignment reliability measure to estimate align...JIA-MING CHANG
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Md Rahman
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1Double Check ĆŐNSULTING
 
The Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems BiologyThe Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems Biologyinside-BigData.com
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceIJSTA
 
2018 presentation montréal_handouts
2018 presentation montréal_handouts2018 presentation montréal_handouts
2018 presentation montréal_handoutsMichiel Stock
 

Similar to Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery (20)

Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Determining cognitive distance between publication portfolios of evaluators a...
Determining cognitive distance between publication portfolios of evaluators a...Determining cognitive distance between publication portfolios of evaluators a...
Determining cognitive distance between publication portfolios of evaluators a...
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
 
[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdf
[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdf[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdf
[DSC Adria 23] Ljupco Todorovski Data Science for Science.pdf
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive Models
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Leibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific NotationLeibniz: A Digital Scientific Notation
Leibniz: A Digital Scientific Notation
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 
TCS: A new multiple sequence alignment reliability measure to estimate align...
 TCS: A new multiple sequence alignment reliability measure to estimate align... TCS: A new multiple sequence alignment reliability measure to estimate align...
TCS: A new multiple sequence alignment reliability measure to estimate align...
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1
 
The Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems BiologyThe Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems Biology
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
 
08 entropie
08 entropie08 entropie
08 entropie
 
MAGIC POPULATION
MAGIC POPULATIONMAGIC POPULATION
MAGIC POPULATION
 
2018 presentation montréal_handouts
2018 presentation montréal_handouts2018 presentation montréal_handouts
2018 presentation montréal_handouts
 

Recently uploaded

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 

Recently uploaded (20)

INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 

Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for Literature-based Discovery

  • 1. A CONTEXT-DRIVEN SUBGRAPH MODEL FOR LITERATURE-BASED DISCOVERY PH.D. DISSERTATION DEFENSE DELROY CAMERON AUGUST 18, 2014 PH.D. COMMITTEE AMIT P. SHETH (ADVISOR) KRISHNAPRASAD THIRUNARAYAN MICHAEL RAYMER RAMAKANTH KAVULURU (UKY) THOMAS C. RINDFLESCH (NIH) VARUN BHAGWAN (YAHOO! LABS)All truths are easy to understand once they are discovered; the point is to discover them. (Galileo Galilei, 1564–1642)
  • 2. 2 Historical Perspectives Walter Sutton (1877 – 1916) Theodor Boveri (1862 – 1915) Gregor Johann Mendel (1822 – 1884) Mendelian Laws of Inheritance (1866) Boveri-Sutton Chromosome Theory (1903)
  • 3. 3 Science of Making Discoveries Discovery Information Processing System What is promising?
  • 4. 4 Thesis Statement An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery.
  • 5. 5 Motivation Rofecoxib Osteoarthritis1999 TREAT Merck & Co. Increased risk of Heart Attack 2002 2004 $254.3 million Settlement 2005 Vioxx Withdrawn $4.85 billion Settlement Confirmed by Clinical Trial 2007 2011 $950 million Settlement 2013 $23 million Settlement
  • 7. 7 Literature-Based Discovery (LBD) ABC Model AnC Model Context-Driven Subgraph Model A CB A CB1 B2 BiSource: Wikipedia - http://en.wikipedia.org/wiki/Don_R._Swanson Keyword-based Concept-based Relations-based 2006 20111986 1996 ARROWSMITH v1 Term Frequency 1999 IRIDESCENT Term Co-occurrence 2001 DAD MetaMAP UMLS 2003 Litlinker MeSH, UMLS, Rules Level of Support Contribution #1 Context-Driven Subgraph Model for LBD SemBT Semantic Predications Level of Support Discovery Browsing Degree Centrality Cooperative Reciprocity Manual 2013 Manjal UMLS, MeSH Topic Profiles, TF-IDF 2004 Rajolink MeSH, Rarity BioSbKDS UMLS Relations MeSH 2005 BITOLA UMLS, MeSH Assoc. Rules, Confidence Graph-based ACS (2004) MeSH, Hebbian Learning A CB CAUSESINHIBITS A C PRODUCES INHIBITS Discovery Patterns Hybrid ARROWSMITH v2 8 Features (2007) Semantic MEDLINE Summarization Discovery Browsing Epiphanet Predications-based Semantic Indexing CoPub Keywords, Mutual Information 2010 Literature-based discovery refers to the use of papers and other academic publications (the “literature”) to find new relationships between existing knowledge (the “discovery”). Definition courtesy of Wikipedia: http://en.wikipedia.org/wiki/Literature-based_discovery
  • 8. 8 Application: Raynaud Syndrome – Fish Oil ISA Prostaglandin I3 CONVERTS_TO Dietary Fish Oils Platelet Aggregation DISRUPTS ISA DISRUPTS DISRUPTS Epoprostenol DISRUPTS ISA STIMULATES Prostaglandin CONVERTS_TO Raynaud Syndrome TREATS CAUSES D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013. Dietary Fish Oils Platelet Aggregation Raynaud Syndrome DISRUPTS CAUSES Dietary Fish Oils Platelet Aggregation Raynaud Syndrome Keyword/ Concept based Relations based Subgraph based Inferred predicates
  • 9. 9 Comparison Scenario Intermediate Cameron [19] Srinivasan [88, 89] Weeber [101, 102] Gordon [36,37,38] Hristovski [40] Raynaud Syndrome – Dietary Fish Oils Blood Viscosity × × × × × Platelet Aggregation × × × × × Vascular Reactivity × × × × Ramakrishnan [72]* ? ? ? Table 1: Comparison of intermediates rediscovered for Raynaud Syndrome – Dietary Fish Oil
  • 10. DISRUPTS ISA ISA Dietary Fish Oils Platelet Aggregation DISRUPTS Raynaud Syndrome CAUSES Prostaglandins CONVERTS_TO Prostacyclin (PGI2) DISRUPTS Prostaglandin I3 (PGI3) TREATSSTIMULATES Raynaud Syndrome Dietary Fish Oils Fatty Acid Essential Fatty Acid Triglyceride Lipid ISA DISRUPTS CAUSES ISA INHIBIT AFFECTS ISA INHIBITS Blood Viscosity Cellular Activity Blood Physiology Problem How to automate this? Tissue Function D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using DISRUPTS ISA Dietary Fish Oils Prostaglandin I3 (PGI3) Prostacyclin (PGI2) Raynaud Syndrome CAUSESVasoconstrictionINHIBIT CONVERTS_TO AFFECTS DISRUPTS TREATS
  • 13. 13 . . . Subgraph Model Predications Graph (G) Candidate Graph (RG) Subgraphs (SG) No two contexts are the same R(s,t)(c1) R(s,t)(c2) R(s,t)(ck) R(s,t) . . . . . . What is context?
  • 15. 15 • Path Relatedness • Semantic Predication Context Context Distribution Assumption: The context of a semantic predication can be expressed as the distribution of all MeSH descriptors associated with all articles that contain it. Semantic Underpinnings Relational Semantic Summary Textual Semantic Summary Concept-Level Semantic Summary Interchangeability Assumption: The concept-level and relational semantic summary of a MEDLINE article are interchangeable.
  • 16. 16 Linguistic Underpinnings Linguistic items with similar distributions have similar meanings “You shall know a word by the company it keeps” – J. R. Firth 1957 Semantic Predications with shared contexts in their distributions are related Distributional Semantics Context-sensitive nature of meaning
  • 18. 18 MeSH Hierarchy MeSH Hierarchy Automatic Subgraph Creation m1 m2 m7 m8 m1 m7 m2 m8 m 1 m5 m9 m 8 Semantic Relatedness of MeSH Context Vectorsm9m1 m5 m8 Contribution #2 Context of a path as a vector of MeSH Descriptors pi pj
  • 19. 19 Path Relatedness 3 32 5 42 2 53 6 Objective #1: Maximize weights of In-Context Descriptors Objective #2: Minimize weights of Out-Of-Context Descriptors C(pi) C(pj) 1 3 1 2 2 3 00 00 02 0 0 03 22 5 42 53 61 3 1 20 00 p – path t – semantic predication m1 m2 m3 m4 m5 m1 m2 m6 m7 m8 m9 m10 m11 m12 m13 m1 m2 m6 m7 m8 m9 m10 m11 m12 m13m3 m4 m5 C(pi) C(pj)
  • 20. 20 Path Relatedness: Shared Context 1 00 00 01 0 0 01 11 1 11 11 11 1 1 10 00 Platelet aggregation Platelet activation Epoprostenol Platelet adhesiveness Prostaglandinsm3 m4 m5 m9 m10 m11 m12 m13 G-Tree platelet aggregation hemostasis Blood physiological process Blood physiological phenomena Circulatory and respiratory physiological phenomena platelet adhesiveness platelet activation Epoprostenol D-Tree Prostaglandins I Arachidonic Acids Fatty Acids, Unsaturated Fatty Acids Lipids Prostaglandins Eicosanoids Contribution #3 Structured Background Knowledge for computing shared context of paths C(pi) C(pj)
  • 21. 21 Path Relatedness Score *Dictionary of Distances, Elena Deza, Michel-Marie Deza, Elsevier, 2006
  • 22. 22 Hierarchical Agglomerative Clustering A C A CA CA C A CA CA C A C Iteration 1 Iteration n . . . Bucket PopulationBucket Merging ... A C A C A C A C Path Relatedness Threshold 1. Bucket Population 2. Bucket Merging 3. Subgraph Ranking
  • 23. 23 Summary of Metrics • Path Relatedness – Model: MeSH Context Vectors – Metrics: Semantics-enhanced shared context, Log Reduction – Threshold: ?? • MeSH Semantic Similarity – Model: MeSH Hierarchy – Metrics: Dice Similarity – Threshold: Manually
  • 24. 24 Automatic Threshold Selection RS-DFO Experiment Manual Threshold = 3.0 Gaussian Distribution Path Relatedness Score NumberofPathPairs
  • 25. 25 Automatic Threshold Selection Gaussian Function Path Relatedness Score ExpectedValue
  • 26. 26 Automatic Threshold Selection • Gaussian Distribution Diagram courtesy of Wikipedia* Points of Inflection
  • 27. 27 Threshold Comparisons Scenario Path Relatedness Score Max 2 Std Dev. Manual 3 Std Dev. RS-DFO 2.68 3.0 3.04 3.38 Testosterone-Sleep 3.35 3.5 3.8262 6.22 DEHP-Sepsis 3.94 4.0 4.53 4.84 Table 2: Path Relatedness Threshold Comparisons
  • 28. 28 Bucket Merging Ba Bb Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to information retrieval. Cambridge University Press 2008, ISBN 978-0-521-86571-5, pp. I-XXI, 1-482 Straggly Clusters Compact Clusters Broad Clusters
  • 31. 31 Summary of Metrics • Path Relatedness – Model: MeSH Context Vectors – Metrics: Semantics-enhanced shared context, Log Reduction – Manual Threshold for Semantic Similarity, Dice Similarity – Threshold: 2nd Standard Deviation from Mean of Gaussian • Bucket Relatedness – Model: Set of Paths – Metric: Inter-Cluster Similarity – Threshold: 2nd Standard Deviation from Mean of Gaussian • Subgraph Ranking – Metrics: Intra-Cluster Similarity, Singleton Rank (Association Rarity)
  • 34. 34 Raynaud Syndrome – Dietary Fish Oil Inferred predicates Path Relatedness Threshold = 3σ
  • 35. Scenario 1: Raynaud Syndrome – Dietary Fish Oil Details Intermediate Association Status Cut-off date: Nov. 1985 By. D. R. Swanson (Article) Blood Viscosity Dietary Fish Oils INHIBITS Blood Viscosity Blood Viscosity CAUSES Raynaud Syndrome ZR-15 Platelet Aggregation Dietary Fish Oils INHIBITS Platelet Aggregation Platelet Aggregation CAUSES Raynaud Syndrome S1 Vasoconstriction Dietary Fish Oils INHIBITS Vasoconstriction Vasoconstriction CAUSES Raynaud Syndrome Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 36. Scenario 2: Magnesium – Migraine Details Intermediate Association Status Cut-off date: Apr. 1987 By. D. R. Swanson (Article) Calcium Channel Blockers Magnesium ISA Calcium Channel Blocker Calcium Channel Blockers TREATS Migraine S22 Epilepsy Magnesium AFFECTS Epilepsy Epilepsy CO_EXISTS_WITH Migraine S9 Hypoxia Magnesium INHIBITS Hypoxia Hypoxia ASSOCIATED_WITH Migraine Inflammation Magnesium INHIBITS Inflammation Inflammation CAUSES Migraine ZR-3 Platelet Activity Magnesium INHIBITS Platelet Aggregation Platelet Aggregation CAUSES Migraine S1 Prostaglandins Magnesium STIMULATES Prostaglandins Prostaglandins DISRUPTS Migraine S4 Stress/Type A Personality STRESS INHIBITS Magnesium Stress ASSOICATED_WITH Migraine Serotonin Magnesium INHIBITS Serotonin Serotonin CAUSES Migraine S1 Cortical Depression Magnesium INHIBITS Spreading Cortical Depression Spreading Cortical Depression CAUSES Migraine Substance P Magnesium INHIBITS Substance P Substance P CAUSES Migraine Vascular Mechanisms Magnesium INHIBITS Vasoconstriction Vasoconstriction CAUSES Migraine S9 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 37. Scenario 3: Somatomedin C – Arginine Details Intermediate Association Status Cut-off date: Apr. 1989 By. D. R. Swanson (Article) Growth Hormone Arginine STIMULATES Growth Hormone Growth Hormone STIMULATES Somatomedins (IGF1) S5 Body Weight (body mass) Somatomedins (IGF1) STIMULATES Growth Arginine STIMULATES Growth S7 Malnutrition Somatomedins TREATS Malnutrition Arginine TREATS Malnutrition S7 Wound Healing (NK activity) Somatomedins STIMULATES Wound Healing Arginine STIMULATES Wound Healing Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/ Legend ZR-zero rarity singleton S-Subgraph Not Found
  • 38. Scenario 4: Indomethacin – Alzheimer’s Disease Details Intermediate Association Status Cut-off date: Jul. 1995 By. Swanson/Smal heiser (Article) Acetylcholine Indomethacin INHIBITS Acetylcholine Acetylcholine CAUSES Alzheimers S4 Lipid Peroxidation Indomethacin INHIBITS Lipid Peroxidation Lipid Peroxidation CAUSES Alzheimers S2 M2-Muscarinic Indomethacin INHIBITS M2- Muscarinic M2-Muscarinic CAUSES Alzheimers Membrane Fluidity Indomethacin INHIBITS Membrane Fluidity Membrane Fluidity CAUSES Alzheimers Lymphocytes Indomethacin STIMULATES Natural Killer T-Cell Activity T-Cell Activity INHIBITS Alzheimers S14 Thyrotropin Indomethacin STIMULATES Thyrotropin Thyrotropin AFFECTS Alzheimers ZR-20 T-lymphocytes (T-Cells) Indomethacin STIMULATES T- lymphocytes T-lymphocyte Activity INHIBITS Alzheimers S3 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 39. Scenario 5: Estrogen – Alzheimer’s Disease Details Intermediate Association Status Cut-off date: Jul. 1995 By. Swanson/Smal heiser (Article) Antioxidant Activity Estrogen INHIBITS Antioxidant Activity Antioxidant Activity CAUSES Alzheimers S4 Aliproprotein E (ApoE) Estrogen INHIBITS ApoE ApoE CAUSES Alzheimers S3 Calbindin D28k Estrogen REGULATES Caldindin D28k Calbindin D28k AFFECTS Alzheimers S4 Cathepsin D Estrogen STIMULATES Cathepsin D Cathepsin D PREVENTS Alzheimers Cytochrome C Oxidase Subunit III Estrogen STIMULATES Cytochrome C Oxidase Subunit III Cytochrome C Oxidase Subunit III AFFECTS Alzheimers Glutamate Estrogen STIMULATES Glutamate Glutamate AFFECTS Alzheimers Receptor Polymorphism Estrogen EXHIBITS Receptor Polymorphism Receptor Polymorphism AFFECTS Alzheimers Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 40. Scenario 6: Calcium Independent PLA2 – Schizophrenia Details Intermediate Association Status Cut-off date: 1997 By. Swanson/Smal heiser (Article) Oxidative Stress Oxidative Stress INHIBITS Calcium- Independent PLA2 Oxidative Stress CAUSES Schizophrenia ZR-2 Selenium Selenium INHIBITS Calcium- Independent PLA2 Selenium PREVENTS Schizophrenia ZR-2 Vitamin E Vitamin E INHIBITS Calcium- Independent PLA2 Vitamin E PREVENTS Schizophrenia ZR-2 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 41. Scenario 7: Chlorpromazine – Cardiac Hypertrophy Details Intermediate Association Status Cut-off date: 01/01/2002 By. J. D. Wren (Article) Calcineurin Chlorpromazine INHIBITS Calcineurin Calcineurin CAUSES Cardiac Hypertrophy S5 Isoproterenol Chlorpromazine INHIBITS Isoproterenol Isoproterenol CAUSES Cardiamegaly S12 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 42. Scenario 8: Testosterone – Sleep Details Intermediate Association Status Cut-off date: 01/01/2012 By. Miller/Rindflesc h (Article) Cortisol/Hydrocortisone Testosterone INHIBITS Cortisol Cortisol DISRUPTS Sleep S7 Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 43. Scenario 9: Diethylhexyl Phthalate (DEHP) – Sepsis Details Intermediate Association Status Cut-off date: 01/01/2013 By. Cairelli/Rindfle sch (Article) PParGamma DEHP STIMULATES PParGamma PParGamma INHIBITS Sepsis Legend ZR-zero rarity singleton S-Subgraph Not Found Results available online: http://wiki.knoesis.org/index.php/Obvio#Automatic_Subgraph_Creation Obvio Web Application: http://knoesis-hpco.cs.wright.edu/obvio/
  • 45. 45 Statistical Evaluation Experiment # Unique Associations Total MEDLINE Frequency Rarity r(E) Interestingness I(E) Raynaud-Fish Oil 10 0 0.00 1.00 Magnesium-Migraine 48 27 0.56 0.64 SomaC-Arginine 18 306 17.00 0.06 Indomethacin- Alzheimers 21 9 0.43 0.70 Estrogen-Alzheimers 42 36 0.86 0.54 PLA2-Schizophrenia 10 0 0.00 1.00 CPZ-Cardiac Hypertrophy 21 2 0.10 0.91 Testosterone-Sleep 61 654 10.72 0.09 Average 29 129 3.71 0.62 Table 3: Rarity and Interestingness score of the subgraphs in the rediscoveries
  • 47. 47 Predications-based Knowledge Exploration Corpus Predications Graph Definitional Knowledge (UMLS + MeSH) Provenance Knowledge Abstraction D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11). 512–519 , 2011. Contribution #4 Combining Assertional and Definitional Knowledge for Knowledge Exploration
  • 48. 48 Levels of Contexts A CB Predication Context A CB1 B2 Bi Path Context A CB1 B2 B3 A CB1 B2 Shared Context A C PRODUCES INHIBITS Subgraph Context … … … … … … A C A C A C … Dimensions
  • 50. 50 Dissertation Contributions 1. Context-Driven Subgraph Model – Knowledge Rediscovery & Decomposition 2. Predication/Path Context – Vector of MeSH Descriptors 3. Shared Context – Background Knowledge (MeSH Hierarchy) 4. Semantic Predications-based Text Exploration – Obvio Web Application
  • 51. 51 Innovation System/Technique Technique Type Automatic Relational Evidence- based Thematic Results #Discoveries #Rediscoveries IRIDESCENT [108] Keyword 1 0 ARROWSMITH [84] Keyword/Conc ept 5 0 DAD [101,102] Concept 0 2 BITOLA [46] Concept 0 1 Litlinker [110] Concept 0 2 Manjal [87,88] Concept × 0 5 SemBT [40,41,42] Relations × × 0 1 BioSbKDS [47] Relations × × 0 1 Wilkowski [107] Graph × × 0 0 Ramakrishnan [72] Graph × × 0 1* Zhang [114] Graph × × × 0 0 Obvio [19, 21] Graph × × × × 0 8 ARROWSMITH v2 [86,98] Hybrid × 0 6* Semantic MEDLINE [18,63] Hybrid × × 2 0 Note: References are from the PhD Dissertation manuscript entitled: A Context Driven Subgraph Model for Literature-Based Discovery Table 4: Comparison of capabilities and accomplishments of LBD techniques
  • 53. 53 Limitations 1. Manual Threshold – MeSH Semantic Similarity 2. Path Relatedness Threshold – Only Approximate Gaussian 3. Definition of Context
  • 54. 54 Levels of Semantic Representation Keywords Concepts MeSH Descriptors Semantic Predications Ensemble of Features Relationships A B Semantic Predication PREDICATE
  • 55. 55 Limitations 1. Manual Threshold – MeSH Semantic Similarity 2. Path Relatedness Threshold – Only Approximate Gaussian 3. Definition of Context 4. MEDLINE Querying – Deep integration of Assertional/Definitional 5. Contradiction Detection 6. Statistical Evaluation 7. Scalability of Clustering Algorithm 8. Subgraph Labeling
  • 56. 56 Take Away • Future of Information Processing – Rich Knowledge Representations o Implicit, Formal, Powerful semantics – Application to Literature-Based Discovery
  • 57. 57 Conclusion • Context-Driven Subgraph Model – Manually create Complex Associations – Automatic Subgraph Creation o Novel definitions for Context and Shared Context o Multiple Thematic Dimensions – Predications-based Knowledge Exploration o Predicates o Highlighted MEDLINE sentences – Knowledge Rediscovery o 8 out of 9 existing scientific discoveries
  • 58. 58 Publications 1. D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Context-Driven Automatic Subgraph Creation for Literature-Based Discovery (under preparation) 2. D. Cameron, A. P. Sheth, N. Jaykumar, G. Anand, K. Thirunarayan, G. A. Smith. A Hybrid Approach to Finding Relevant Social Media Content for Domain Specific Information Needs. (submitted to the Journal of Web Semantics) 3. D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13). 46(2): 238–251, 2013. 4. D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media Journal of Biomedical Informatics (JBI13). 46(6): 985–997, 2013. 5. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. “I just wanted to tell you that Loperamide WILL WORK: A Web-Based Study of Extra-medical use of Loperamide. Journal of Drug and Alcohol Dependence (DAD13) 130(1–3): 241–244, 2013. 6. D. Cameron, V. Bhagwan, A. P. Sheth. Towards Comprehensive Longitudinal Healthcare Data Capture. International Workshop on Semantic Web in Literature-Based Discovery (SWLBD12). 241–247, 2012. 7. R. Daniulaityte, R. Carlson, R. Falck, D. Cameron, S. Perera, L. Chen, A. P. Sheth. A Web-Based Study of Extra-medical use of Loperamide. The College on Problems of Drug Dependence (CPDD12), 2012. 8. D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature. International Bioinformatics and Biomedical Conference (BIBM11). 512–519, 2011. 9. D. Cameron, B. Aleman-Meza, I. B. Arpinar, S. L. Decker, A. P. Sheth. A Taxonomy-based Model for Expertise Extrapolation. International Conference on Semantic Computing (ICSC10). 333–240, 2010. 10. D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10). 14, 2010. 11. C. Thomas, W. Wang, P. Mehra, D. Cameron, P. N. Mendes, A. P. Sheth. What Goes Around Comes Around – Improving Linked Open Data through On- Demand Model Creation. Web Science Conference (WebSci10), 2010. 12. P. N. Mendes, P. Kapanipathi, D. Cameron, A. P. Sheth. Dynamic Associative Relationships on the Linked Data Web. Web Science Conference (WebSci10), 2010.
  • 60. 60 Parting Words “...some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality,...that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.” – H. P. Lovecraft (The Call of Cthulhu, The Horror in Clay). H. P. Lovecraft. The Call of Cthulhu. In S. T. Joshi, editor. The Call of Cthulhu and Other Weird Stories. Penguin Books Ltd., London, 1999
  • 61. 61 Acknowledgements • Olivier Bodenreider • Marcelo Fiszman • Mike Cairelli • Swapna Abhyankar • Drashti Dave • Dongwook Shin • Special Thanks o Pavan o Shreyansh o Swapnil o Nishita • PREDOSE Team o Nishita o Gaurish o Alan o Revathy
  • 62. 62 Ph.D. Committee Members Amit P. Sheth (Advisor) T.K. Prasad Michael Raymer Ramakanth Kavuluru Thomas C. Rindflesch Varun Bhagwan

Editor's Notes

  1. Thank everyone for coming. Feel free to ask questions
  2. Explored the Research Question of: Characteristics of Inheritance of Traits across Generations of Peas Gregor Johann Mendel – Debunked Blending Inheritance, Founder of Genetics, Pea Hybridization, 1866 EXPERIMENTATION OBSERVATION - Inheritance of traits across generations seemed to extend beyond the immediate parents in the lineage EXPLANATION - Inheritance of traits appears to be influenced by the presence of dominant and recessive factors, which split, then independently recombine THEORY - Law of Segregation - Law of Independent assortment Explored the Research Question of: The mechanism of Cell Division (cytology) in the embryos of Grasshoppers Walter Sutton & Theodor Boveri – Cytology 1903, Genetic Inheritance, each cell split is equally likely – gives the causal mechanism for Mendel’s law OBSERVED - splitting of chromosomes in the cells of grasshoppers (meiosis) EXPLAINED - Mendels laws of inheritance applied to chromosomes at the cellular level in living organisms THEORIZED - Chromosomes are the basis for genetic inheritance Jorn Dyerberg & Hans Olaf Bang (1913–1994) – The Greenland Eskimo OBSERVED - Greenland Eskimos, no AMI EXPLAINED - diet rich in omega-3 fatty acids THEORIZED - marine oils can treat thrombosis, atherosceloris, and AMI
  3. LBD is now driven by digital data (in silico as opposed to in vivo) Four activities involved in the science of making discoveries under the guidance of a Human
  4. An information processing system that leverages rich representations of textual content from scientific literature based on implicit and explicit context can provide effective means for literature-based discovery. This has been convincingly demonstrated through rediscovery of several well-known associations (between biomedical concepts) and their substantiation using MEDLINE and the Medical Subject Headings (MeSH) vocabulary.
  5. Vioxx Brand Name (Rofecoxib is a nonsteroidal anti-inflammatory drug - NSAID) - stronger pain medication than Naproxen (Brand Name Aleve) - easier on the stomach than Naproxen 2004 Merck’s Clinical Trial - proved risk of heart attack Lawsuit by 50,000 patients
  6. Vioxx (anti-inflammatory) - stronger - less severe side effects (easier on the stomach) Lawsuit by 50,000 patients
  7. LBD is different from traditional research Direct observations of the object of interest Keyword-based – error prone due to absence of text normalization to standard concepts Concept-based – (also Semantics-based, concepts but no explicit relationships) Relations-based – (explicit relationships) but limited complexity, unable to capture causality, mechanisms of interaction Graph-based - Giant Component, Clustering Coefficient, Geodesic, Centrality (betweenness, closeness) Hybrid – combine machine learning, summarization with traditional LBD approaches
  8. Rich representations Personalization Google Knowledge Graph Human Activity Modeling Mobile Applications/Advertising (get examples) Two goals for automation: Create subgraphs that capture complex associations Along multiple thematic dimensions Use of background knowledge to improve LBD BKR MeSH
  9. Context overcome combinatorial explosion enable scalability
  10. Problem definition In terms of path relatedness Decomposed to semantic predication relatedness To achieve this, we have studied characteristics of MEDLINE abstracts Articles have properties/attributes Provide various levels of abstraction of the full text
  11. Given a way to represent context of a path, subgraphs can be automatically created in 6 steps
  12. Frequency is the epiphenomenon of context
  13. Compute Path Relatedness Two Objectives Binarize the vectors
  14. Notice the binary vectors MeSH Semantic Similarity Set-based (Jaccard, Dice) Path Length (Rada, Wu&Palmer, Leacock&Chodorow) Information Content (Lin, Resnik, Jiang&Conrath) Gloss Vectors(LSI)
  15. Mean – weighted average of the points Variance – average of the sum of squared distances away from the mean Standard Deviation – square root of Variance (What is normal, what is not)
  16. Mean – weighted average of the points Variance – average of the sum of squared distances away from the mean Standard Deviation – square root of Variance (What is normal, what is not)
  17. Single-link Cluster if maximum similarity is above the threshold Straggly Clusters Complete-link Cluster if minimum similarity is above threshold Strict, compact clusters Group-average Average of intra-cluster + inter-cluster Well connected but more broad connections than complete link
  18. Definitional Knowledge – Top-down Assertional Knowledge – Bottom-up Using both together is probably best.
  19. Analogy Google Knowledge Graph IBM Human Activity Modeling Yahoo Personalization Biomedicine Literature-based Discovery Mobile Applications