2. Disclosures
•Founder & Consultant, Personalis Inc (genome
sequencing for clinical applications).
•Funding support: NIH, NSF, Microsoft, Oracle,
LightspeedVentures, PARSA Foundation.
•I am a fan of informatics, genomics, medicine &
clinical pharmacology.
3. Goals
•Provide an overview of the scientific trends and
publications in translational bioinformatics
•Create a “snapshot” of what seems to be
important in Spring, 2014 for the amusement of
future generations.
•Marvel at the progress made and the
opportunities ahead.
4. Process
1. Follow literature through the year
2. Solicit nominations from colleagues
3. Search key journals and key topics on PubMed
4. Evaluate & ponder
5. Select papers to highlight in ~2-3 slides
5. Caveats
•Translational bioinformatics = informatics methods
that link biological entities (genes, proteins, small
molecules) to clinical entities (diseases, symptoms,
drugs)--or vice versa.
•Considered last ~14 months (to this week)
•Focused on human biology and clinical implications:
molecules, clinical data, informatics.
•NOTE: Amazing biological papers with
straightforward informatics generally not included.
•NOTE: Amazing informatics papers which don’t link
clinical to molecular generally not included.
6. Final list
•105 Semifinalists, 49 finalists
•32 Presented here (briefly) + 10 “shout outs”
•Apologies to those I misjudged. Mistakes are mine.
•These slides and bibliography will be made available on
rbaltman.wordpress.com
•8 TOPICS: Controversies, Clinical genomics, Drugs,
Genetic basis of disease, Emerging data sources, Mice,
Scientific process, Odds & End.
7. Thanks!
Conversations and recommendations
Phil Bourne
Josh Denny
Joel Dudley
Michel Dumontier
Guy Fernald
George Hripcsak
Larry Hunter
Konrad Karczewski
Lang Li
Yong Li
Tianyun Liu
Yves Lussier
Dan Masys
Hua Fan-Minogue
Alex Morgan
Sandy Napel
Peter O’Donnell
Lucila Ohno-
Machado
Chirag Patel
Beth Percha
Raul Rabadan
Dan Roden
Neil Sarkar
Nigam Shah
David States
Jost Stuart
Peter Tarczy-
Hornoch
Nick Tatonetti
Laura Taylor
Jessie Tenenbaum
Olga Troyanskaya
Piet van der Graaf
Scott Waldman
9. “Warning Letter. November 22, 2013” (Alberto
Gutierrez, Director Office of InVitro Diagnostics &
Radiological Health, US FDA to Ann Wojcicki, CEO,
23andme)
• Goal: Stop marketing a ‘device’ that is not cleared.
• Method: Send letter, acknowledge 14 face-to-face
meetings, cite laws & regulations.
• Result: 23andme suspending health advice on
website, still providing raw data.
• Conclusion: Do not mess with the FDA.
FDA Document Number: GEN1300666
11. “Why I read the network nonsense papers” (Lior
Pachter, Prof. of Math, Berkeley )
• Goal: Use untraditional channels (blog) to voice
concern over potentially flawed science.
• Method: Blog posts with detailed analysis of papers
and concerns about correctness of conclusions,
especially directed at a particular colleague.
• Result: Entertaining/informative set of accusations
and responses, serving as a reminder to do diligence
in literature review and technical content.
• Conclusion: Do not mess with Lior Pachter.
13. “Inconsistency in large pharmacogenomic
studies” (Haibe-Kains et al, Nature)
• Goal: Evaluate consistency of two major reports of
cancer cell line drug sensitivity.
• Method: Curate and compare results on same
drugs, as possible.
• Result: Correlation of drug sensitivity ranged from 0
to 0.6.
• Conclusion: Do not mess with experimental data.
PMID: 24284626
14. “Inconsistency in large pharmacogenomic
studies” (Haibe-Kains et al, Nature)
• Goal: Evaluate consistency of two major studies
(CCLE & CGP) of cancer cell line drug sensitivity.
• Method: Curate and compare results on same
drugs, as possible.
Result: Correlation of drug sensitivity ranged from 0
to 0.6.
• Conclusion: High variability in experimental
measures of drug sensitivity indicate extreme
caution in using these measures uncritically.
24284626
15. “Inconsistency in large pharmacogenomic
studies” (Haibe-Kains et al, Nature)
• Goal: Evaluate consistency of two major studies
(CCLE & CGP) of cancer cell line drug sensitivity.
• Method: Curate and compare results on same
drugs, as possible.
Result: Correlation of drug sensitivity ranged from 0
to 0.6.
• Conclusion: High variability in experimental
measures of drug sensitivity indicate extreme
caution in using these measures uncritically.
24284626
18. “A pharmacogenetic versus clinical algorithm for warfarin
dosing” (Kimmel et al, NEJM)
“A randomized trial of genotype-guided dosing of
acenocoumarol and phenprocoumon” (Verhoef et al, NEJM)
“A randomized trial of genotype-guided dosing of
warfarin” (Pirmohamed et al, NEJM)
•Goal: See if genetics improves warfarin dosing.
•Method: Randomized trials vs. clinical algorithm OR standard of care.
•Result: PGx beats standard of care, but not clinical algorithm. African-
Americans seemed to do worse with PGx.
•Conclusion: Study design matters, quality of execution matters, what
SNPS are measured matters.
24251361
20. “Clinically actionable genotypes among 10,000
patients with preemptive pharmacogenomic
testing” (Van Driest et al, Clin Pharmacol Ther)
• Goal: Estimate value of preemptive testing versus
“reactive” testing for pharmacogenomics.
• Method: Focus on five drug-gene interactions, .
• Result: 1+ actionable variant in 91% of patients (96%
of AA). “Reactive” strategy would generate 15K
tests.
• Conclusion: Most patients have at least one PGx
variant, point of care availability helps, less total
testing with preemptive strategy.
242563661
23. “Genic intolerance to functional variance and the
intepretation of personal genomes” (Petrovski et al,
PLoS Genetics)
• Goal: Figuring out which mutations will most likely
influence disease.
• Method: Using 6503 exomes, create a scoring
system for “intolerance” to mutations based on
amount of observed genetic variation vs. expected.
• Result: Mendelian disease genes very intolerant,
striking variation within other classes.
• Conclusion: May aid in identifying class-specific
deleterious mutations.
23990802
26. “A general framework for estimating the relative
pathogenicity of human genetic variants” (Kircher et
al, Nat Genetics)
• Goal: Integrate diverse annotations into a single
score for evaluating SNP probable impact on health.
• Method: Combined Annotation-Dependent
Depletion (C-Score) defined and computed for 8.6
billion SNPs using machine learning approach.
• Result: C-score correlates with pathogenicity, disease
severity, regulatory effects, allelic diversity.
• Conclusion: CADD can prioritize functional,
deleterious and pathogenic variants across many
categories. 24487276
28. “An informatics approach to analyzing the
incidentalome” (Berg et al, Genet Med)
Result: Categorized 2016 genes into bins based
on clinical utility and validity, analyzed 80 genomes,
created algorithm that selected variants worth
pursuing.
“Whole genome sequencing in support of
wellness and health maintenance” (Patel et al,
Genome Medicine)
Result: Combine genetic and clinical markers to
assess risk and make lifestyle recommendations.
Shout Outs for Clinical Genomics
22995991
23806097
30. “A CTD-Pfizer collaboration: manual curation of
88,000 scientific articles text mined for drug-disease
and drug-phenotype interactions” (Davis et al,
Database)
• Goal: Curate the relationship of 1200 drugs to
potential toxicities in CV, neuro, renal, liver.
• Method: In one year, 5 curators curated 88K articles
and 254,173 interactions (!).
• Result: 152,173 chemical-disease, 58572 chemical-
gene, 5345 gene-disease and 38083 chemical-
phenotype.
• Conclusion: Comprehensive manual curation of the
literature is possible and useful. 24288140
33. “DGIdb: mining the druggable genome” (Griffith et al,
Nature Methods)
• Goal: Create central resource to associated
mutated genes with their potential to be “drugged.”
• Method: Mine existing gene-drug relationship
resources, and bring into a single resource.
• Result: 14,144 drug-gene interactions (2611 genes &
6307 drugs). 39 druggable gene categories.
• Conclusion: http://dgidb.org/ is a useful compendium
of existing and potential drug targets
24122041
35. “Pathway-based screening strategy for multi target
inhibitors of diverse proteins in metabolic
pathways” (Hsu et al, PLoS Comp Bio)
• Goal: Find ways to treat pathways and networks vs.
single targets (to avoid resistance, ineffectiveness)
• Method: Pathway-based screening using 3D
structural information to find promiscuous inhibitors
that hit multiple members of a pathway.
• Result: Two inhibitors for pathways in H. pylori.
• Conclusion: Shared small molecule binding
properties within pathways may yield poly-active
compounds.
23861662
37. “Systematic identification of proteins that elicit drug
side effects” (Kuhn et al, Mol Sys Biol)
• Goal: Can we clarify the mechanism of action
associated for drug side effects?
• Method: Integrate drug-phenotype and drug-target
relations to establish target-phenotype relations.
• Result: 732 side effects with single protein
associations, 137 of these with existing evidence. 1
novel proven experimentally (HTR7 and
hyperesthesia)
• Conclusion: Large fraction of drug side effects are
mediated predominantly by single proteins.
23632385
39. “Network-assisted prediction of potential drugs for
addiction” (Sun et al, Biomed Res Intl)
• Goal: Novel therapeutics are needed to battle
addiction.
• Method: Create a network of drugs and their
associated genes, expand to include other drugs.
• Result: Addictive drugs with similar actions cluster
together. Predicted 94 non-addictive drugs that may
modulate addictive response.
• Conclusion: Network analyses provides candidate
drugs for addiction treatment (or risk).
24689033
42. “A drug repositioning approach identifies tricyclic
antidepressants as inhibitors of small cell lung cancer
and other neuroendocrine tumnors” (Jahchan et al,
Cancer Discovery)
• Goal: Find novel treatments for small cell lung
cancer (SCLC, neuroendocrine subtype).
• Method: Query gene expression compendium to
find drugs that oppose or synergize with SCLC
• Result: Tricyclic antidepressants consistently
antagonize SCLC, induce SCLC apoptosis, activate
stress pathways.
• Conclusion: Expression data can suggest novel drug
treatments for difficult diseases. 24078773
45. “Combinatorial therapy discovery using mixed integer linear
programming” (Pang et al, Bioinformatics)
Result: Combinatorial algorithm for maximizing coverage of
targets, minimize off-targets for drug combinations.
“The druggable genome: evaluation of drug targets in clinical trials
suggests major shifts in molecular class and indication” (Rask-
Andersen et al, Ann Rev Pharm Toxicol)
Result: Analyzed clinical trials to find 475 novel targets.
“Identification and characterization of potential drug targets by
subtractive genome analyses of methicillin resistant Staphylococcus
aureus” (Uddin & Saeed, Comp Biol & Chem)
Result: Find non-homologous & essential proteins in MRSA
genome to define new drug targets.
Shout Outs for Drugs
24463180
24016212
24361957
47. “A nondegenerate code of deleterious variants in
Mendelian loci contributes to complex disease
risk” (Blair et al, Cell)
• Goal: Understand genetic architecture of complex
disease.
• Method: Mine EMR of 110 million patients to
associate Mendelian variation with complex disease.
• Result: Each complex disorder linked to a unique
set of Mendelian disorders. GWAS hits enriched in
these, Mendelian variants contribute more to risk.
• Conclusion: Complex diseases have comorbidity
with Mendelian, with deep genetic overlap.
Mendelian genes are key for complex disease 24074861
50. “Systematic comparison of phenome-wide association
study of electronic medical record data and genome-
wide association data” (Denny et al, Nature Biotech)
• Goal: Replicate genetic associations using PheWAS.
• Method: For each of 3144 SNPs, look for
associations with 1358 EMR-defined phenotypes in
14K individuals.
• Result: 51/77 associations replicated. 63 SNPs with
pleiotropic associations.
• Conclusion: EMR and PheWAS powerful tool for
genetic discovery and replication.
24270849
52. “Coherent functional modules improve transcription
factor target identification, cooperatively prediction,
and disease association” (Karczewski et al, PLoS
Genetics)
• Goal: Understand role of transcription factors (TFs)
in disease.
• Method: Integrate TF binding data with functional
gene modules from 9K expression experiments to
establish associations of TFs to modules.
• Result: 30 TF-TF associations (14 known). 4K TF-
disease relationships, including MEF2A + Crohn’s.
• Conclusion: Chip-Seq data + co-expression modules
amplifies signal of TF-TF and TF-disease relations.
24516403
55. “Towards building a disease-phenotype knowledge
base: extracting disease-manifestation relationship
from literature” (Xu et al, Bioinformatics)
• Goal: Catalog full set of disease manifestations
• Method: Extract connections between disease and
their manifestations using NLP.
• Result: 119M sentences provide 121K Disease-
Manifestation pairs, 99.2% of them previously not
available in structured repository.
• Conclusion:Automated characterization of disease
will be useful for disease classification and ultimately
treatment.
23828786
57. “A common rejection module (CRM) for acute
rejection across multiple organs identifies novel
therapeutics for organ transplantation” (Khatri et al, J
Exp Med)
• Goal: Understand biology of acute rejection.
• Method: Use expression data from 8 transplant data
sets to find genes significantly and consistently over
expressed in rejected organs.
• Result: Defined a module of 11 genes present in all
rejection samples. Suggested sensitivity to
atorvastatin and dasatinib, based on their targets.
• Conclusion: This CRM useful for both diagnosis &
treatment of acute rejection in transplant. 24127489
59. “Network models of genome-wide association
studies uncover the topological centrality of protein
interactions in complex disease” (Lee et al, JAMIA)
Result: Complex trait associated loci are more likely
to be hub and bottleneck genes in protein-protein
interaction networks.
Shout Outs for Genetic Basis of Disease
23355459
61. “A network based method for analysis of lncRNA-
disease associations and prediction of lncRNAs
implicated in disease” (Yang et al, PLoS ONE)
• Goal: Understand role of Long non-coding RNAs
(lncRNA) in disease
• Method: Create network of lncRNA-disease
associations from literature, and linked to known
disease-genes.
• Result: 295 lncRNAs associated with 801 genes in
context of 214 diseases. Predict 768 new
associations using shared links. Validated 3 of them.
• Conclusion: lncRNAs have important role in
regulating disease gene expression and thus disease.
24498199
63. “Lineage structure of the human antibody repertoire
in response to influenza vaccination” (Jiang et al, Sci
Trans Med)
• Goal: Understand immune response to vaccines
• Method: Sequence B-cell antibodies in 17 volunteers
(young and old) after flu vaccine.
• Result: Elderly subjects have decreased number of
B-cell lineages, increased pre-vaccine diversity,
decreased post-vaccine diversity.
• Conclusion: Immune response evolves with age, and
can be directly interrogated with NGS technology.
23390249
65. “An integrated clinico-metabolomic model improves
prediction of death in sepsis” (Langley et al, Sci Trans
Med)
• Goal: Understand predictors of death from sepsis.
• Method: Combine metabolome and proteome of
patients admitted with sepsis.
• Result: Those who died from sepsis showed
divergent profiles for fatty acid transport, b-
oxidation, gluconeogenesis, citric acid cycle.
Classifier created to predict survival.
• Conclusion: Proteome/metabolome can predict
outcomes in patient with sepsis.
23884467
67. “Meta-analyses of studies of the human
microbiota” (Lozupone et al, Genome Research)
• Goal: Understand the ability to pool microbiome
data across populations.
• Method: Combine data from 12 studies to evaluate
reproducibility.
• Result: Different body sites consistently clear signal.
Fecal samples dominated by local factors. Some
unusual similarities suggest need for care.
• Conclusion: Microbiome studies must select cases
and controls carefully, and measure effect size with
“out groups.”
23861384
69. “PhenDisco: phenotype discovery system for the database of
genotypes and phenotypes” (Doan et al, JAMIA)
Result: It may be possible to search dbGAP!
“Comorbidity clusters in autism spectrum disorders: an electronic
health record time-series analysis” (Doshi-Velez et al, Pediatrics)
Result: Three distinct syndromes/trajectories seen in ASD.
“Network-based analysis of vaccine-related associations reveals
consistent knowledge with the vaccine ontology” (Zhang et al, J
Biomed Sem)
Result: Identified connections between different vaccines and
genes important for vaccine response
!
Shout Outs for Emerging Data Sources
23989082
24323995
24209834
71. “Knockouts model of the 100 best-selling drugs—will
they model the next 100?” (Zambrowicz & Sands,
Nature)
• Goal: Evaluate value of mouse-knockouts for drug
target discovery & validation.
• Method: Retrospective evaluation for 100 best-
selling drugs.
• Result: Phenotypes correlate well with known drug
efficacy.
• Conclusion: Large-scale mouse knockout programs
may be likely source of new targets and useful drugs.
12509758
73. “Mouse model phenotypes provide information about
human drug targets” (Hoehndorf et al, Bioinformatics)
• Goal: Create automated methods for transferring
data from model organisms (mice) to humans.
• Method: Use metric of phenotypic similarity to map
from mouse to human drug-relevant phenotypes.
• Result: General method. Example mapping for
diclofenac.
• Conclusion: Semantic methods may be useful for
automated mapping of mouse knockout phenotypes
to relevant human disease phenotypes.
24158600
75. “Genomic responses in mouse models poorly mimic
human inflammatory disease” (Seok et al, PNAS)
• Goal:Assess the utility of mouse models of acute
inflammation.
• Method:Assess gene expression changes in humans
and mice for burn, trauma, endotoxemia.
• Result: Mouse results don’t agree with humans.
Mouse results don’t agree with mouse results.
• Conclusion: Mouse models for human inflammatory
diseases are not going to be useful.
23401516
78. “Atypical combinations and scientific impact” (Uzzi et
al, Science)
• Goal: Understand why some papers have high
scientific impact.
• Method: Analyze frequency of co-citation between
all pairs of papers. Define “conventionality” metric,
and “tail” metric for out-of-discipline citations.
• Result: High impact papers are both very
conventional and feature unusual citations.Teams are
38% more likely than solo authors to do something
novel.
• Conclusion: Read and refer to papers outside your
discipline. Write papers in groups. 24159044
80. “Chapter 4: Protein Interactions and
Disease” (Gonzalez & Kann, PLoS Comp Bio)
• Goal: Disseminate knowledge about translational
bioinformatics widely.
• Method: Publish a textbook in an Open Source
journal.
• Result: “Translational Bioinformatics” edited by
Kann, available at PLoS Comp Bio. 17 chapters +
intro.
• Conclusion: You can publish an open source
textbook. Count citations to your chapter!
23300410
82. “Quantifying long-term scientific impact” (Wang et al, Science)
Result: Initial citation trajectory predicts lifetime trajectory.
“A historic moment for open science: theYale University open
data access project and Medtronic” (Krumholz et al, Ann Intern
Med)
Result: Created a model for sharing industrial trial data for re-
analysis.
“Evidence of community structure in biomedical research grant
collaborations” (Nagarajan et al, J Biomed Inf)
Result: CTSAs have encouraged more team-science and more
collaborative publications.
Shout Outs for the Scientific Process
24092745
23778908
22981843
84. “A haplotype-resolved genome and epigenome of the
aneuploid HeLa cancer cell line” (Adey et al, Nature)
• Goal: Understand the genomic features of the HeLa
cell line genome.
• Method: High quality, phased sequencing of the
genome.
• Result: Valuable map of genetic variations. Careful
attention paid to sensitive release and data access
involving NIH leadership & family.
• Conclusion: HeLa cells continue to provide valuable
information at genotype and phenotype levels.
23925245
86. “A social network of hospital acquired infection built
from electronic medical record data” (Cusumano-
Tower et al, JAMIA)
• Goal: Understand how infections spread in a hospital
• Method: Use EMR to create social network of
patient contacts, and simulate infectious outbreaks.
• Result: Simulations reflect staffing and patient flow
practices.
• Conclusion: EMR allowed creation of robust
network, useful for simulation.
23467473
87. 23467473
Room sharing Provider sharing
Probability of spread (influenza) between wards
MRSA Simulation results: seed in the MRI suite.
88. “The hidden geometry of complex, network-driven
contagion phenomena” (Brockmann & Helbing,
Science)
• Goal: Understand global spread of epidemics.
• Method: Wave propogation models applied to
“effective distance” between locations based on air
traffic flow.
• Result: Method can predict arrival times and correct
for discontinuities in effective distance.
• Conclusion: You are closer to SARS than you think.
24337289
90. “How do you feel?Your computer knows” (Geller, CACM)
Result: Facial expression encodes emotions, and can be decoded
by current algorithms.
!
“Simulation of repetitive diagnostic blood loss and onset of
iatrogenic anemia in critical care patients with a mathematical
model” (Lyon et al, Comp in Biol & Med)
Result: If you order too many blood tests, you can bleed your
patient to death. This can be modeled with math.
!
Shout Outs for the Scientific Process
23228481
DOI:10.1145/2555809
91. 2013 Crystal ball...
Increased focus on methods to untangle regulatory
control of clinical phenotypes
Rare variant GWAS with exomes & genomes
Microbiome integrated with immunology &
metabolomics, and disease risk.
Emphasis on non European-descent populations for
discovery of disease associations
Mobile computing resources for genomics
Crowd-based discovery in translational bioinformatics
92. 2014 Crystal ball...
Emphasis on non European-descent populations for
discovery of disease associations
Crowd-based discovery in translational bioinformatics
Methods to recommend treatment for cancer based on
genome/transcriptome
Increase in “trained systems” (ala Watson) applications
in translational bioinformatics
Repurposing with combinations of drugs (vs. one)
More cost-effectiveness evidence for genomics
Linking essential genes, drug targets, and drug response