Genomics is mapping complex data about human biology and promises major medical advances. In particular, genomics is enabling precision medicine, the use of a patient's genome and physiological state to improve therapeutic efficacy and outcome. However, routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with "Big data". These data are so complex and large that typical researchers are not able to cope with them. Collectively, these data require an understanding of many aspects of experimental biology and medicine to correctly process and interpret. Data size is also an issue, as individual researchers may need to handle tens of terabytes (genomes from a few hundred patients), which is challenging to download and store on typical workstations. To effectively support precision medicine, scientists from a wide range of disciplines, including computer science, must develop algorithms to improve precision medicine (e.g. diagnostics and prognostics), genome interpretation, raw data processing and secure high performance computing.
2. PRECISION MEDICINE
• TRADITIONAL MEDICINE, WITH MORE DATA
• DIAGNOSIS: ASSIGNING PATIENTS TO GROUPS
– BIOLOGY, DISEASE PROGRESSION, TREATMENT RESPONSE
• PERSONALIZED, BUT NOT EVERYONE HAS A
DIFFERENT DISEASE
NATURE MEDICINE 19, 249 (2013) DOI:10.1038/NM0313-249
3. NATIONAL COMPREHENSIVE CANCER NETWORK (NCCN)
Breast Cancer
Noninvasive Invasive
Lobular Carcinoma
In Situ
Ductal Carcinoma
In Situ
Lobular Carcinoma Ductal Carcinoma Inflammatory
4. IMPROVING PRECISION WITH GENOMICS
• BRCA1/BRCA2 MUTATIONS PREDICT RISK
• COMMERCIAL PROGNOSTIC TESTS BASED ON GENE
SIGNATURES
HTTP://THEBIGCANDME.BLOGSPOT.CA/
5. GENOMICS
• NEW TECHNOLOGY FOR READING/WRITING DNA
• MEASURE OUR GENETIC CODE AND SYSTEM STATE
• LOTS OF VARIABLES
– WHOLE GENOME, TRANSCRIPT AND PROTEIN
EXPRESSION, SPLICING, CHROMATIN STRUCTURE,
MOLECULAR INTERACTION, TRANSCRIPTION FACTOR,
METHYLATION, METABOLITE, PATIENT PHENOTYPE
7. HTTP://WWW.LHSC.ON.CA/
SOURCE CODE
ON DISK
LOAD TO ACTIVE
MEMORY
COMPILER
RUNNING
SOFTWARE
ACTIVE
MEMORY
4 LETTER CODE (DNA/RNA BASES) 20 LETTER CODE (AMINO ACIDS)
MEEPQSDPSVEPPLSQETFSDLWKLLPEN…GATGGGATTGGGGTTTTCCCCTCCCAT…
14. COMPUTING NEEDS: 1 HUMAN GENOME
• ~125 BASE READ LENGTH X MILLIONS
• >30X COVERAGE
• ALIGNMENT TO REFERENCE GENOME
• COMPUTE VARIANTS (MUTATIONS)
• ANNOTATE VARIANTS
• COMPUTE TIME: UP TO 2 DAYS/GENOME
– OPTIMIZED 4 HOURS: 128G/2CPU/SSD, 3.1GHZ
• MEDICALLY IMPORTANT TO BE FAST
15. THE POWER OF GENOMICS IN MEDICINE
• 7000 RARE MONOGENIC DISEASES
– 50% HAVE A KNOWN GENE RESPONSIBLE
– QUADRUPLED RATE OF IDENTIFICATION SINCE 2012
• BRAIN DOPAMINE-SEROTONIN VESICULAR
TRANSPORT DISEASE AND ITS TREATMENT
– TWO YEARS FROM DISEASE DEFINITION TO GENE
IDENTIFICATION TO TREATMENT
NAT REV GENET. 2013 OCT;14(10):681-91 N ENGL J MED. 2013 FEB 7;368(6):543-50
18. CANCER GENOMICS
• GERM LINE VS. SOMATIC MUTATIONS
• AIM: IDENTIFY FREQUENT MUTATIONS IN CANCER
• >11,000 TUMOUR GENOMES, 9M MUTATIONS
HUMAN COLORECTAL CARCINOMA
HTTPS://DCC.ICGC.ORG/
19. COMPUTING CHALLENGES
• EXPONENTIAL DATA GROWTH (>MOORE’S LAW)
– BILLIONS OF GENOMES
– SIZE: >100GB/HUMAN GENOME, 4GB PROCESSED,
MBS (JUST MUTATIONS)
• HETEROGENEOUS, NOISY, COMPLEX DATA
– DATA SCIENTISTS, DOMAIN EXPERTS
21. COMPUTATIONAL BIOLOGY
• RESEARCH: USING COMPUTERS TO ANSWER
BIOLOGICAL/BIOMEDICAL QUESTIONS
• EXPLORE, INTERPRET AND DISCOVER: SEARCH
• SPEED AND ACCURACY: ALGORITHMS
• PREDICTING FUNCTIONAL MUTATIONS, PATIENT
CLASSIFICATION: MACHINE LEARNING
• PRIVACY: DIFFERENTIAL PRIVACY, ENCRYPTION
• USABLE APPLICATIONS: SOFTWARE ENGINEERING
22. MedSavant
search engine for
genetic variants
WWW.MEDSAVANT.COM
Developers: Marc Fiume, James Vlasblom, Ron Ammar, Orion Buske, Eric Smith, Andrew Brook, Misko Dzamba,
Khushi Chachcha, Sergiu Dumitriu
Scientific Advisors: Christian Marshall, Kym Boycott, Marta Girdea, Peter Ray, Gary Bader, Michael Brudno
27. PREDICT TREATMENT RESPONSE
• SUPERVISED MACHINE LEARNING E.G. RHEUMATOID
ARTHRITIS METHOTREXATE RESPONSE
B
New
A
A
B
B
B
A
Personal Medical Network
Responder
Non-Responder
New
New patient
(Predicted
Non-Responder)
Weakly similar
Highly similar
Response to treatment
A
Similar e.g.
SNP, smoking status
SHIRLEY HUI, RUTH ISSERLIN, HUSSAM KACA, TABITHA KUNG, KATHY SIMINOVITCH
28. EXPLAINING GENOMICS DATA
• SNAPSHOTS OF SYSTEM STATE
– E.G. CANCER VS. NORMAL
• EXPLAIN WHY STATES DIFFER
– E.G. REGULATOR PERTURBATION
– CAUSAL MODELING
– PRIOR KNOWLEDGE ABOUT
MECHANISM: PATHWAYS
WITT H ET AL. CANCER CELL. 2011 AUG 16;20(2):143-57
38. Microtubule
Cytoskeleton
Cell Projection
& Cell Motility
Cell Proliferation
Glycosylation
Adhesion
Regulation of GTPase
Kinase Activity/Regulation
CNS Development
Intellectual
Disability
Autism
GTPase/Ras
Signaling
Regulation of cell proliferation
Positive regulation of cell proliferation
Tyrosin kinase
Vasculature develepment
Palate develepment
Organ Morphogenesis
Behavior
Heart develepment
RHO Ras
Membrane
Kinase regulation
Cell Motility
(stricter cluster)
Centrosome
Nucleolus
Cell cycle
Regulation of
hormone levels
Aminoacid
derivative /
amine
metabolism
Synaptic vescicle maturation
Reelin pathway
LIS1 in neuronal
migration and
development
Negative
regulation
of cell cycle
cKIT
pathwaymTor
pathway
Zn finger
domain
Carboxyl
esterase
domain
Ras signaling GTPase regulator
Neuron
migration
Cell Motility
(stricter cluster)
Cell morphogenesis
Cell projection
organization
CNS
development
Brain
development
Neurite development
CNS neuron
differentiation
Axonogenesis
Projection neuron
axonogenesis
Cerebral cortex
cell migration
SMC flexible hinge domain
Urea and amine group metabolism
MHC-I
Zoom of CNS-Development
ID ID
ASDASD
Both
0%
12.5%
Enriched
in deletions
FDR
Known
disease genes
Enriched only
in disease genes
Node type (gene-set)
Edge type (gene-set overlap)
From disease genes
to enriched gene-sets
Between gene-sets
enriched in deletions
Between sets enriched in
deletions and in disease
genes or between disease
sets only
Pinto
et
al.
FuncJonal
impact
of
global
rare
copy
number
variaJon
in
auJsm
spectrum
disorders.
Nature.
2010
Jun
9.
39. Adhe
Reelin pathway
development
Nega
regula
of cell
Neuron
migration
Cell Motility
(stricter cluster)
Cell morphogenesis
Cell projection
organization
CNS
development
Brain
development
Neurite development
CNS neuron
differentiation
Axonogenesis
Projection neuron
axonogenesis
Cerebral cortex
cell migration
Zoom of CNS-Development
40. PATIENT #1 PATIENT #2 PATIENT #3 PATIENT #I
PATHWAYGSI
CNV-AFFECTED GENE
COUNT = 1 COUNT = 1 COUNT = 1 COUNT = 0
• IF WE HAVE AT LEAST ONE CNV AFFECTING AT LEAST ONE GENE IN A CERTAIN PATHWAY GI,
THEN WE HAVE A PERTURBATION POTENTIAL IN THAT PATHWAY
• WE COUNT THE PRESENCE / ABSENCE OF SUCH PERTURBATION POTENTIAL IN PATIENTS
PaJent
#1
PaJent
#2
PaJent
#3
…
PaJent
#i
…
PaJent
#n
GS1
1
1
1
…
0
…
0
GS2
0
0
1
…
1
…
0
GS3
0
0
0
…
0
…
0
DANIELE MERICO
PATHWAY ASSOCIATION TEST
41. DESCRIPTION:
• THE SIGNIFICANCE OF A GENE-SET IS THEN ASSESSED USING THE FISHER S EXACT TEST FOR ASSOCIATION
• A SIGNIFICANT GENE-SET IS AFFECTED BY A MUTATION POTENTIAL MORE FREQUENTLY IN CASES THAN
CONTROLS
• THE FDR IS ESTIMATED BY SHUFFLING THE COLUMNS IN THE GENE-SET BY PATIENT COUNT TABLE
Case
Control
GSi
13
1
Not
in
GSi
1146
-‐
13
889
-‐
1
PaJent
#1
PaJent
#2
PaJent
#3
…
PaJent
#i
…
PaJent
#n
GS1
1
1
1
…
0
…
0
GS2
0
0
1
…
1
…
0
GS3
0
0
0
…
0
…
0
PATHWAY ASSOCIATION TEST
42.
43. BENEFITS OF SYSTEMS THINKING
• IMPROVES STATISTICAL POWER
– FEWER TESTS
• MORE REPRODUCIBLE
– E.G. GENE EXPRESSION SIGNATURES
• EASIER TO INTERPRET
– FAMILIAR CONCEPTS E.G. CELL CYCLE
• IDENTIFIES MECHANISM
– CAN EXPLAIN CAUSE
VS. PARTS THINKING
46. THE FACTOID PROJECT
MAX FRANZ, IGOR RODCHENKOV, OZGUN BABUR, EMEK DEMIR, CHRIS SANDER
HELPING AUTHORS
DIGITIZE THEIR PUBLISHED
KNOWLEDGE
HTTP://FACTOID.BADERLAB.ORG/
47. NETWORK VISUALIZATION AND ANALYSIS
UCSD, ISB, AGILENT, MSKCC, PASTEUR, UCSF
HTTP://CYTOSCAPE.ORG
PATHWAY COMPARISON
LITERATURE MINING
GENE ONTOLOGY ANALYSIS
ACTIVE MODULES
COMPLEX DETECTION
NETWORK MOTIF SEARCH
49. GENE FUNCTION PREDICTION
HTTP://WWW.GENEMANIA.ORG
QUAID MORRIS (DONNELLY)
RASHAD BADRAWI, OVI COMES, SYLVA DONALDSON,
MAX FRANZ, CHRISTIAN LOPES, FARZANA KAZI,
JASON MONTOJO, HAROLD RODRIGUEZ, KHALID ZUBERI
• GUILT-BY-ASSOCIATION PRINCIPLE
• BIOLOGICAL NETWORKS ARE COMBINED
INTELLIGENTLY TO OPTIMIZE PREDICTION ACCURACY
• ALGORITHM IS MORE FAST AND ACCURATE THAN ITS
PEERS
50. SOCIAL CHALLENGES
• BIOETHICS AND DATA SHARING
• ENGAGING RESEARCHERS
– CROWDSOURCING: TCGA PAN CANCER, DREAM
• ENCOURAGING RESEARCHERS TO EXPLORE
UNCHARTED TERRITORY
• NEED FOR QUANTITATIVE THINKING IN BIOLOGY
– NEW PH.D. PROGRAM IN THE MOLECULAR GENETICS
DEPARTMENT AT THE UNIVERSITY OF TORONTO
NATURE. 2011 FEB 10;470(7333):163-5WWW.NATURE.COM/TCGA/
51. EPENDYMOMA
• 3RD MOST COMMON BRAIN TUMOUR IN CHILDREN
• INCURABLE IN UP TO 45% OF PATIENTS
STEVE
MACK,
MICHAEL
TAYLOR,
RUTH
ISSERLIN
-‐
CANCER
CELL.
2011
AUG
16;20(2):143-‐57
GENE
EXPRESSION
PATIENT
AGE
OVERALL
SURVIVAL
52. EPENDYMOMA
GENOMIC
ANALYSIS
• EPENDYMOMA
BRAIN
CANCER
-‐
MOST
COMMON
AND
MORBID
LOCATION
FOR
CHILDHOOD
IS
THE
POSTERIOR
FOSSA
(PF
=
BRAINSTEM
+
CEREBELLUM)
• TWO
SUBTYPES
BY
GENE
EXPRESSION:
PFA
-‐
YOUNG,
DISMAL
PROGNOSIS,
PFB
-‐
OLDER,
EXCELLENT
PROGNOSIS.
• WHOLE
GENOME
SEQUENCING
(47
SAMPLES)
SHOWED
ALMOST
NO
MUTATIONS,
HOWEVER
DNA
METHYLATION
ARRAYS
SHOWED
CLEAR
CLUSTERING
INTO
PFA
AND
PFB
(79
SAMPLES)
• PFA
MORE
TRANSCRIPTIONALLY
SILENCED
BY
CPG
METHYLATION
STEVE MACK, MICHAEL TAYLOR, SCOTT ZUYDERDUYN NATURE, FEB. 2014
53. POLYCOMB REPRESSOR COMPLEX 2 – INHIBITED BY DZNEP AND GSK343 – KILLED PFA CELLS
NO KNOWN TREATMENT, SO NOW GOING TO CLINICAL TRIAL, COMPASSIONATE USE IN ONE PATIENT
54. 2 MONTHS 3 MONTHS
3 CYCLES VIDAZA
9 YO WITH METASTATIC PF EPENDYMOMA TO LUNG TREATED WITH AZACYTIDINE
TREATMENT OF METASTATIC PF EPENDYMOMA WITH VIDAZA
MICHAEL TAYLOR
55. ACKNOWLEDGEMENTS
BADER LAB
DOMAIN INTERACTION TEAM
SHOBHIT JAIN
BRIAN LAW
JÜRI REIMAND
MOHAMED HELMY
ANDREA UETRECHT
MARINA OLHOVSKY
CANCER GENOMICS
FLORENCE CAVALLI
DAVID SHIH
ASHA ROSTAMIANFAR
PRECISION MEDICINE
RON AMMAR
SHIRLEY HUI
FUNDING
HTTP://BADERLAB.ORG
PATHWAY AND NETWORK
ANALYSIS
RUTH ISSERLIN
IGOR RODCHENKOV
SCOTT ZUYDERDUYN
RUTH WONG
VERONIQUE VOISIN
SHAHEENA BASHIR
KHALID ZHUBERI
CHRISTIAN LOPES
JASON MONTOJO
MAX FRANZ
HAROLD RODRIGUEZ