OECD bibliometric indicators: Selected highlights, April 2024
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene University
1. Open data, compound repurposing,
and rare diseases
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org
January 30, 2017
Slides: slideshare.net/andrewsu
13. 13
NGLY1
(11 PubMed articles)
Congenital disorders of
glycosylation
(822)
PNGase
(686)
ERAD
(1330)
glycosylation
(48,862)
alacrima
(164)
Genetic
interactors
(3016)
symptoms
(109,928)
25 million articles in PubMed
14. The biomedical literature is massive…
14
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1985 1990 1995 2000 2005 2010 2015
Number of new PubMed-indexed articles
16. … but it is very hard to query and compute
16
Imatinib
Crizotinib
Erlotinib
Gefitinib
Sorafenib
Lapatinib
Dasatinib
…
Acute myeloid leukemia
Acute lymphoblastic leukemia
Chronic myelogenous leukemia
Chronic lymphocytic leukemia
Hodgkin lymphoma
Non-Hodgkin lymphoma
Myeloma
…
AND
Gleevec
Glivec
STI-571
STI 571
STI571
ST1571
ST 1571
CGP-57148
CGP 57148
CGP57148
CGP57148B
…
17. … but it is very hard to query and compute
17
EntrezGene ID HGNC symbol Description
10884 MRPS30 mitochondrial ribosomal protein S30
10914 PAPOLA poly(A) polymerase alpha
11333 PDAP1 PDGFA associated protein 1
11334 TUSC2 tumor suppressor candidate 2
130120 REG3G regenerating islet-derived 3 gamma
5068 REG3A regenerating islet-derived 3 alpha
50807 ASAP1 ArfGAP with SH3 domain, ankyrin repeat and PH domain 1
55 ACPP acid phosphatase, prostate
8853 ASAP2 ArfGAP with SH3 domain, ankyrin repeat and PH domain 2
Human genes referred to as “PAP”
19. Information extraction from biomedical text
19
1. Identify biomedical concepts in text
… We report a case of familial systemic
mastocytosis with the rare KIT K509I germ
line mutation. In vitro treatment with imatinib,
dasatinib and PKC412 reduced cell viability
of primary mast cells harboring KIT K509I
mutation. Both patients with familial systemic
mastocytosis had remarkable hematological
and skin improvement after three months of
imatinib treatment.
Leuk Res. 2014 Oct;38(10):1245-51. doi: 10.1016/j.leukres.
GENES
DISEASES
DRUGS
VARIANTS
20. Information extraction from biomedical text
20
imatinib
dasatinib
PKC412
Familial systemic
mastocytosis
KIT
K509I
1. Identify biomedical concepts in text
2. Identify relationships between concepts
Mutation
of
Mutation
causes
causes
treats
inhibits
21. 21
Goal: Assemble a network of biomedical
knowledge that is comprehensive,
current, computable and traceable.
23. The Gene Wiki project, circa 2008
23
Protein structure
Symbols and
identifiers
Tissue expression
pattern
Gene Ontology
annotations
Links to structured
databases
Gene
summary
Protein
interactions
Linked
references
Huss, PLoS Biol, 2008
33. Seeding Wikidata with biomedical data
• All human, mouse genes and
proteins
• All Gene Ontology terms
• All FDA approved drugs
• 9,000+ human diseases
• 120 reference microbial genomes
Mitraka et al (2015) Semantic Web Applications for the Life Sciences
Burgstaller-Muelbacher et al (2016) Database
Putman et al (2016) Database
34. Centralizing key data storage
34
287 language editions of Wikipedia
Bioinformatics
community
Toxicology
community
Epidemiology
community
… …
35. “Show all tyrosine kinase
inhibitors that are used to
treat hematologic cancers.”
36. “Show all human membrane
proteins associated with
colorectal cancer.”
44. 46
Paid crowdsourcing
• F = 0.84
• 28 days
• 212 workers
• Total cost: $0
$$$
• F = 0.87
• 9 days
• 145 workers
• Total: $630.96
“Help science, please”
Citizen Science
45. Does Citizen Science scale?
47
1,000,000 articles * 10 AE / article
15,828
volunteers
needed
10,275 AE * 365 days
212 annotators* 28 days
AE = Annotation events
=
Number of annotation
events per year
Number of annotation
events per year
per volunteer
52. Finding new indications for existing drugs or therapies
55
Raynaud’s
Syndrome
Fish oil
Abnormal
platelet
activity
Abnormal
blood
viscosity
High blood
viscosity
Elevated RBC
rigidity
Vasodilation
Low blood
triglycerides
Increased
prostacyclins
A
C
B
B
B
B
B
B
B
53. 56
A preliminary view of the NGLY1-
focused biological network
A
C
B
B
B
B
B
B
B
AB
B
B
B
B
B
B
A
B
B
B
B
B
B
B
56. Louis Gioia
Julee Adesara
Toby Li
Karthik G
Erick Scott
Adam Mark
Kevin Xin
Jake Bruggemann
Mike Mayers
Andra Waagmeester
Max Nanis
Cyrus Afrasiabi
Ian MacLeod
Julia Turner
Ginger Tsueng
Sebastien Lelong
Erik Clarke
Jennifer Fouquier
Ben GoodChunlei Wu Shirley Willis
Tobias Meissner Katie Fisch Sandip
Chatterjee
Ramya Gamini Greg Stupp Sebastian
Burgstaller
Tim Putman Nuria Queralt
Rosinach
Sal Loguercio
M2C M2C
GW
GW
GW
GW GW
GW
GW