SlideShare a Scribd company logo
1 of 47
Download to read offline
EWAS and the elusive architecture of
exposome variance in phenotype
Chirag J Patel

Mt. Sinai Exposome Symposium

Brescia, Italy 

5/21/2019
chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org
I have a ticket to the opera 

(tomorrow evening, 8pm)!

Please contact me if you would like to go
@chiragjp
chirag@hms.harvard.edu
P = G + EType 2 Diabetes

Cancer

Alzheimer’s

Gene expression
Phenotype Genome
Variants
Environment
Infectious agents

Diet + Nutrients

Pollutants

Drugs
σ2P =
σ2G + σ2E + σ2error
σ2G
σ2P
H2 =
Heritability (H2) is the range of phenotypic
variability attributed to genetic variability in a
population
This estimate captures the genetic architecture of
phenotype, important for way we model genetic
risk for disease
For example:
(1) Is my G-P association specified correctly?
Or, how much of the P variation is additive vs. interaction?

P = a + b1*g1

vs.

P = a + b1*g1 + z1*c1 

vs.

P = a + b1*g1 + b2*g2 + b2*g1*g2 + z1*c1
Stratification (confounding)?
Epistasis (interaction)?
For example:
(2) How much P variation explained by what we can measure?
evolut
partic
eases;
tase 1)
well a
biolog
The
captur
implem
STRU
revert
subset
librium
clearly
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
80
60
40
100
rvedteststatistic
a
b
NATURE|Vol 447|7 June 2007
WTCCC, 2007
AA Aa aa
case
control
ie, how much variation in P captured by GWAS?
Image credit: illumina
What is the exposomic architecture of complex
phenotype?
?
σ2P
E2 =
Important for way we model exposomic risk for
disease, including the role of mixtures!
E2
Combination of shared and specific environment
σ2E = σ2shared + σ2specific + error
🏠
σ2shared
σ2P
C2 =
Shared E (C2) is the range of phenotypic variability
attributed to shared household or geography
(but not genetics)
Lakhani et al., Nature Genetics 2019
Decompose the mixture of genetics, shared E
(C2), and indicators of shared E (air pollution,
weather, and socioeconomic status)
chirag “the better” braden
Arjun Manrai
(BCH CHIP)
http://apps.chiragjpgroup.org/catch/
To decompose P, G, and E, we need to
measure them all simultaneously!
Decomposing the mixture of genetics, shared E (C2), and indicators of
shared E (air pollution, weather, and socioeconomic status)
in real-world data
+ Disease (ICD9/ICD10),

procedures, drugs, labs

N ~ 45M
chirag “the better” braden
Jian Yang
Peter M Visscher
Arjun Manrai
(BCH CHIP)
insurance claims
Weather
Air Pollution
Census SES
http://apps.chiragjpgroup.org/catch/
Claims Analysis of Twin Correlation and
Heritability (CATCH)
vulture.com
Lakhani et al., Nature Genetics 2019
Amassing (the largest) twin and sibling cohort
in the US to estimate G and E in ~500 P
• Assume familial relationships in
subscriber groups

• Subscriber group less than 15 members

• Both members are child of primary
subscriber (e.g., employed individual)

• Same date of birth
• Year of birth occurs on or after 1985

• Member enrollment greater than 36
months
Same Sex -
Female
17,919
Same Sex -
Male
17,835
Opposite
Sex
20,642
total 56,396
Largest collection of twins in US (next largest has ~28k pairs)
Largest collection of twins in US (next largest has ~28k pairs)
724K siblings!
Lakhani et al., Nature Genetics 2019
Where do we get E indicators?

Exposome Data Warehouse (~1TB)
Geographical information system-enabled
database to map individuals to E
US distribution of twins and siblings that can be “linked”
to variation in air pollution, climate, and socioeconomic
status
We mapped 13360 ICD9 billing codes to 1809
PheWAS codes (in addition to 95 Mendelian
disorders)
Denny, Bastarache, et al. 2013

Rzhetsky, White et al. 2013
CARDIOVASCULAR
hypertension (401)
cardiac dysrhythmias (427)
DIGESTIVE
irritable bowel syndrome (564.1)
ENDOCRINE
type 2 diabetes (250.1)
type 1 diabetes (250.2)
(and 11 more phenotype groups)
h2 = 2(rmz - rdz)

c2 = 2rdz - rmz

In a twin study, h2 and c2 can be
estimated using Falconer’s formula
Tetrachoric correlation to estimate rmz & rdz
h2 : narrow-sense heritability

c2 : shared environment

rmz: correlation of phenotype between identical twins

rdz: correlation of phenotype between fraternal twins
… but we do not know the zygosity status of
claimants…
But we do know:

Opposite sex twins: all fraternal

Same Sex twins 👯 : mixture of identical and fraternal
We can estimate the proportion of fraternal and
identical twins using opposite sex twin prevalence
Weinberg, 1902

Benyamin, et al, 2005, 2006
P(mz) ~ 1 - 2(NOS / Nall) = 0.26
p(ss) = Nss / Nall = 0.63
p = P(mz|ss) = P(mz) / P(ss) = 0.41
h2 = 2/p (rss - ros)
c2 = (ros(p+1) - rss) / p
Lakhani et al., Nature Genetics 2019
http://apps.chiragjpgroup.org/catch/
Patient cohorts in the “real-world” :
overall heritability (0.32) and shared environment (0.09): a
global view among 560 phenotypes 

gives a nuanced view of G and E
CaTCH: Claims analysis of Twin Correlation and Heritability
US-based, ages < 25
statistic
Phenotype category
Overall (0.32) Endocrine (0.4) Metabolic (0.4)
Decomposing G (h2), shared E (c2) and factors of shared E 

(SES, air quality, and temperature)

in 2 candidate phenotypes:

Lyme and Obesity
Lakhani et al., Nature Genetics 2019
temperature/seasonality = 5% of shared environment
SES explains 10% of shared E
significant total genetics
significant total shared E
zero genetics
moderate shared environment
h2 and c2 estimates for 560 phenotypes versus statistical significance :

326/560 traits (>50%) have a heritable and 180/560 (32%) had a shared
environment component!
r=0.817
Lakhani et al., Nature Genetics 2019
… but air pollution, climate, and geocoded SES
play a modest role in total shared environment (c2)
r=0.817
Lakhani et al., Nature Genetics 2019
56K twins and 700K siblings in a massive health insurance cohort
point to complex and elusive variation in 560 phenotypes
560phenotypes
http://apps.chiragjpgroup.org/catch/
h2=0.3
c2=0.1 (🏠)
?
https://rdcu.be/boZeV
• 58.2% of P had non-zero h2 

• 32% of P had a non-zero c2

• 87.9% had non-zero age fixed effect

• 50% had non-zero sex fixed effect

• Shared c2 != pollution, income, climate

• 0.32 + 0.09 = 0.41
• 1 - 0.41 = 0.59!
The shared environment and genetic contribution in phenotype
is complex and modest:
Where is the rest of the variation on phenotype?
Where is the rest of the variation in P?
Explaining the missing variation in P with the Exposome:
A data-driven paradigm for robust discovery of E
Wild, 2005, 2012
Ioannidis , 2009

Rappaport and Smith, 2010, 2011

Buck-Louis and Sundaram 2012

Miller and Jones, 2014

Patel CJ and Ioannidis JPAI, 2014ab
Ioannidis, 2016
Manrai et al 2017
Hypothesis - on average, explaining only 10% of the stuff in red
560phenotypes
http://apps.chiragjpgroup.org/catch/
h2=0.3
c2=0.1 (🏠)
~10%
https://rdcu.be/boZeV
Challenges in EWAS for identification of specific E factors, E
mixtures, and variance explained (E2) of the exposome
Power (both low and large sample sizes)
Model misspecification and mis?-interpretation
Dense correlational web of the exposome
Time-dependence of the exposome-phenome associations
ARPH 2016

JAMA 2014

JECH 2014
Scalable measurement of the exposome
Red: positive ρ

Blue: negative ρ

thickness: |ρ|
for each pair of E:

Spearman ρ

(575 factors: 81,937 correlations)
Correlation globes paint a complex view of the exposome:
average correlation of < 0.3
permuted data to produce

“null ρ”

sought replication in > 1
cohort
Pac Symp Biocomput. 2015

JECH. 2015

Chung et al., ES&T 2018
Effective number of
variables:

500 (10% decrease)
Co-E patterns between 

females and males are similar
females
males
… however:
Co-E within household 

weaker!
Chung et al., ES&T 2018
Jake Chung
High-throughput data analytics to mitigate analytical challenges of
exposome-based research:

What model do we use, how do we assess them, and how to
interpret them?
Agier, Portengen et al, EHP 2016
Unclear what model to use in the context of adjustment
variables
Estimating the Vibration of Effects (or Risk)
Variable of Interest
e.g., 1 SD of log(serum Vitamin D)
Adjusting Variable Set
n=13
All-subsets Cox regression
213+ 1 = 8,193 models
SES [3rd tertile]
education [>HS]
race [white]
body mass index [normal]
total cholesterol
any heart disease
family heart disease
any hypertension
any diabetes
any cancer
current/past smoker [no smoking]
drink 5/day
physical activity
Data Source
NHANES 1999-2004
417 variables of interest
time to death
N≧1000 (≧100 deaths)
effect sizes
p-values
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
5.0
7.5
−log10(pvalue)
Vitamin D (1SD(log))
RHR = 1.14
RPvalue = 4.68
A
B
C D
E
median p-value/HR for k
percentile indicator
JCE, 2015
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
2.5
5.0
7.5
0.64 0.68 0.72 0.76
Hazard Ratio
−log10(pvalue)
Vitamin D (1SD(log))
RHR = 1.14
RP = 4.68
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
50
99
1 50 99
1
2
3
4
0.75 0.80 0.85 0.90
Hazard Ratio
−log10(pvalue)
Thyroxine (1SD(log))
RHR = 1.15
RP = 2.90
http://bit.ly/effectvibration
The Vibration of Effects:
Vitamin D and Thyroxine and attenuated risk in mortality
JCE, 2015
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
2.5
5.0
7.5
0.64 0.68 0.72 0.76
Hazard Ratio
−log10(pvalue)
Vitamin D (1SD(log))
RHR = 1.14
RP = 4.68
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
50
99
1 50 99
1
2
3
4
0.75 0.80 0.85 0.90
Hazard Ratio
−log10(pvalue)
Thyroxine (1SD(log))
RHR = 1.15
RP = 2.90
●
●
●
●
●
9
10
111213
1
5
10
1.3
−log10(pvalue)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
111213
1
50
99
1 50 99
5
10
1.3 1.4 1.5 1.6
Hazard Ratio
−log10(pvalue)
Cadmium (1SD(log))
adjustment=current_past_smoking
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
111213
1
50
99
1 50 99
5
10
1.3 1.4 1.5 1.6
Hazard Ratio
−log10(pvalue)
Cadmium (1SD(log))
RHR = 1.29
RP = 8.29
The Vibration of Effects: shifts in the effect size distribution
due to select adjustments (e.g., adjusting cadmium levels
JCE, 2015
pcb
b-carotene
C-reactive protein
cotinine
JCE, 2015
Risk and significance depends on modeling scenario!
The Vibration of Effects: beware of the Janus effect

(both risk and protection?!)
“risk”“protection”
“significant”
http://bit.ly/effectvibration
How do we interpret the complex models?:
What is a mixture? - an integration of correlated E? Interaction?
Curr Epidemiol Rep 2017
P = a + b1*e1 + b2*e2 + b3*e3

vs.

P = a + b1*e1 + b2*e2 + b3*e3 + b4*e1*e2 

vs.
P = a + b1*e1 + b2*e2 + b3*e3 + b4*e1*e2

+ b5*e1*e2 + b6*e2*e3 + b7*e1*e3 + b8*e1*e2*e3
All terms significant? 

Only interaction terms?
Additive?
Simple interaction?
All 2 way/n-way?
Our current exposomic research cohorts may be underpowered
for complex modeling!
Chung et al, Environment International, 2019
… or overpowered for detection of simple
associations!
Emerging large sample sized cohorts will expose massive mis-
misspecification, or confounding, and correlation amongst E!
Manrai et al, American Journal of Epidemiology 2019
EWAS in telomere length
(Patel et al., IJE 2017)
Adjusted for age, poverty, ethnicity, and sex
GWAS in telomere length
(Pearson and Manolio, JAMA 2008)
Before and after confounding adjustment
Aside from residual confounding and mis-specification:
Modest association sizes and R2— if effects — may be diluted
by the complex phenomenon of individual exposure and time
Athersuch Bioanalysis 2012
Cumulative (cadmium, PCB)
Constant, but excreted (phenols, vitamins)
Intervention (drugs)
Seasonal (allergen)
In-utero
Not shown: Diurnal
In conclusion:
In complex traits and phenotypes, we are missing much
of phenotypic variation of the population.
H2 ~ 0.3
C2 ~ 0.1
Challenges in EWAS for identification of specific E factors, E
mixtures, and variance explained (E2) of the exposome
Power (both low and large sample sizes)
Model misspecification and mis(?)-interpretation
Dense correlational web of the exposome
Time-dependence of the exposome-phenome associations
ARPH 2016

JAMA 2014

JECH 2014
Scalable measurement of the exposome
Now hiring!

Data scientist for translational exposome informatics!
phenome exposome genome
individuals
= +
diabetes
telom
eres
airpollution
nutrients
influenza
FTOTCF7L2
thousands of environmental exposures millions of genetic variants
Develop data science tools for dissecting relationships between the
phenome, exposome, and genome to optimize medical decision
making in the era of massive data.
(you)
your phenome
P E G
@chiragjp

chirag@hms.harvard.edu
Harvard DBMI
Susanne Churchill

Nathan Palmer

Sophia Mamousette

Sunny Alvear

Chirag J Patel

chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org
NIH Common Fund

Big Data to Knowledge
Acknowledgements
RagGroup
Jake Chung
Kajal Claypool
Chirag Lakhani
Danielle Rasooly

Alan LeGoallec

Braden Tierney

Yixuan He
Mentioned Collaborators
Arjun Manrai
John Ioannidis

Peter Visscher

More Related Content

What's hot

Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
Sean Paul
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaper
Mathias Hibbard
 
MathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinalMathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinal
Mathias Hibbard
 
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET
 

What's hot (20)

NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019
 
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in disease
 
AACR 041616 digital exposomes
AACR 041616 digital exposomesAACR 041616 digital exposomes
AACR 041616 digital exposomes
 
Correlation globes of the exposome 2016
Correlation globes of the exposome 2016Correlation globes of the exposome 2016
Correlation globes of the exposome 2016
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slides
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven E
 
Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Search engine for E NEU network science 080817
Search engine for E NEU network science 080817
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 
Osmf rnk
Osmf rnkOsmf rnk
Osmf rnk
 
Introduction to Network Medicine
Introduction to Network MedicineIntroduction to Network Medicine
Introduction to Network Medicine
 
BRN Seminar 12/06/14 Introduction to Network Medicine
BRN Seminar 12/06/14 Introduction to Network Medicine BRN Seminar 12/06/14 Introduction to Network Medicine
BRN Seminar 12/06/14 Introduction to Network Medicine
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaper
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
MathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinalMathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinal
 
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
dkNET Webinar: Population-Based Approaches to Investigate Endocrine Communica...
 
Parent of origin effect
Parent of origin effectParent of origin effect
Parent of origin effect
 

Similar to EWAS and the exposome: Mt Sinai in Brescia 052119

1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th
AgripinaBeaulieuyw
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th
sachazerbelq9l
 
1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docx1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docx
jeremylockett77
 

Similar to EWAS and the exposome: Mt Sinai in Brescia 052119 (19)

Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019
 
6 55 E
6 55 E6 55 E
6 55 E
 
Hoisington be and mh healthy buildings 2017
Hoisington be and mh healthy buildings 2017Hoisington be and mh healthy buildings 2017
Hoisington be and mh healthy buildings 2017
 
Multi-trait modeling in polygenic scores
Multi-trait modeling in polygenic scoresMulti-trait modeling in polygenic scores
Multi-trait modeling in polygenic scores
 
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
 
Role of Human Genome Project in Medical Science
Role of Human Genome Project in Medical ScienceRole of Human Genome Project in Medical Science
Role of Human Genome Project in Medical Science
 
Presentation on Heritability
 Presentation on Heritability Presentation on Heritability
Presentation on Heritability
 
Theory and practice
Theory and practiceTheory and practice
Theory and practice
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th
 
1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docx1- Why was the Tomasetti et al article so misinterpreted by th.docx
1- Why was the Tomasetti et al article so misinterpreted by th.docx
 
Probability.pptx
Probability.pptxProbability.pptx
Probability.pptx
 
Hetman immem xi final March 2016
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016
 
Genetics research for society and global understanding - Myles Axton
Genetics research for society and global understanding - Myles AxtonGenetics research for society and global understanding - Myles Axton
Genetics research for society and global understanding - Myles Axton
 
Dermatoglyphic patterns of autistic children in nigeria
Dermatoglyphic patterns of autistic children in nigeriaDermatoglyphic patterns of autistic children in nigeria
Dermatoglyphic patterns of autistic children in nigeria
 
De novo reciprocal translocation t(4;20) (q28;q11) associated in a child with...
De novo reciprocal translocation t(4;20) (q28;q11) associated in a child with...De novo reciprocal translocation t(4;20) (q28;q11) associated in a child with...
De novo reciprocal translocation t(4;20) (q28;q11) associated in a child with...
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?
 
General Genetics: Gene Segregation and Integration (Part 3)
General Genetics: Gene Segregation and Integration (Part 3)General Genetics: Gene Segregation and Integration (Part 3)
General Genetics: Gene Segregation and Integration (Part 3)
 
Dr. Ángel Carracedo - Simposio Internacional 'La enfermedad de la duda: el TOC'
Dr. Ángel Carracedo - Simposio Internacional 'La enfermedad de la duda: el TOC'Dr. Ángel Carracedo - Simposio Internacional 'La enfermedad de la duda: el TOC'
Dr. Ángel Carracedo - Simposio Internacional 'La enfermedad de la duda: el TOC'
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 

Recently uploaded (20)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

EWAS and the exposome: Mt Sinai in Brescia 052119

  • 1. EWAS and the elusive architecture of exposome variance in phenotype Chirag J Patel Mt. Sinai Exposome Symposium Brescia, Italy 5/21/2019 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org
  • 2. I have a ticket to the opera (tomorrow evening, 8pm)! Please contact me if you would like to go @chiragjp chirag@hms.harvard.edu
  • 3. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Diet + Nutrients Pollutants Drugs
  • 4. σ2P = σ2G + σ2E + σ2error
  • 5. σ2G σ2P H2 = Heritability (H2) is the range of phenotypic variability attributed to genetic variability in a population This estimate captures the genetic architecture of phenotype, important for way we model genetic risk for disease
  • 6. For example: (1) Is my G-P association specified correctly? Or, how much of the P variation is additive vs. interaction? P = a + b1*g1 vs. P = a + b1*g1 + z1*c1 vs. P = a + b1*g1 + b2*g2 + b2*g1*g2 + z1*c1 Stratification (confounding)? Epistasis (interaction)?
  • 7. For example: (2) How much P variation explained by what we can measure? evolut partic eases; tase 1) well a biolog The captur implem STRU revert subset librium clearly −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 80 60 40 100 rvedteststatistic a b NATURE|Vol 447|7 June 2007 WTCCC, 2007 AA Aa aa case control ie, how much variation in P captured by GWAS? Image credit: illumina
  • 8. What is the exposomic architecture of complex phenotype? ? σ2P E2 = Important for way we model exposomic risk for disease, including the role of mixtures!
  • 9. E2 Combination of shared and specific environment σ2E = σ2shared + σ2specific + error
  • 10. 🏠 σ2shared σ2P C2 = Shared E (C2) is the range of phenotypic variability attributed to shared household or geography (but not genetics)
  • 11. Lakhani et al., Nature Genetics 2019 Decompose the mixture of genetics, shared E (C2), and indicators of shared E (air pollution, weather, and socioeconomic status) chirag “the better” braden Arjun Manrai (BCH CHIP) http://apps.chiragjpgroup.org/catch/
  • 12. To decompose P, G, and E, we need to measure them all simultaneously!
  • 13. Decomposing the mixture of genetics, shared E (C2), and indicators of shared E (air pollution, weather, and socioeconomic status) in real-world data + Disease (ICD9/ICD10), procedures, drugs, labs N ~ 45M chirag “the better” braden Jian Yang Peter M Visscher Arjun Manrai (BCH CHIP) insurance claims Weather Air Pollution Census SES
  • 14. http://apps.chiragjpgroup.org/catch/ Claims Analysis of Twin Correlation and Heritability (CATCH) vulture.com Lakhani et al., Nature Genetics 2019
  • 15. Amassing (the largest) twin and sibling cohort in the US to estimate G and E in ~500 P • Assume familial relationships in subscriber groups • Subscriber group less than 15 members • Both members are child of primary subscriber (e.g., employed individual) • Same date of birth • Year of birth occurs on or after 1985 • Member enrollment greater than 36 months Same Sex - Female 17,919 Same Sex - Male 17,835 Opposite Sex 20,642 total 56,396 Largest collection of twins in US (next largest has ~28k pairs) Largest collection of twins in US (next largest has ~28k pairs) 724K siblings! Lakhani et al., Nature Genetics 2019
  • 16. Where do we get E indicators? Exposome Data Warehouse (~1TB) Geographical information system-enabled database to map individuals to E
  • 17. US distribution of twins and siblings that can be “linked” to variation in air pollution, climate, and socioeconomic status
  • 18. We mapped 13360 ICD9 billing codes to 1809 PheWAS codes (in addition to 95 Mendelian disorders) Denny, Bastarache, et al. 2013 Rzhetsky, White et al. 2013 CARDIOVASCULAR hypertension (401) cardiac dysrhythmias (427) DIGESTIVE irritable bowel syndrome (564.1) ENDOCRINE type 2 diabetes (250.1) type 1 diabetes (250.2) (and 11 more phenotype groups)
  • 19. h2 = 2(rmz - rdz) c2 = 2rdz - rmz In a twin study, h2 and c2 can be estimated using Falconer’s formula Tetrachoric correlation to estimate rmz & rdz h2 : narrow-sense heritability c2 : shared environment rmz: correlation of phenotype between identical twins rdz: correlation of phenotype between fraternal twins
  • 20. … but we do not know the zygosity status of claimants… But we do know: Opposite sex twins: all fraternal Same Sex twins 👯 : mixture of identical and fraternal
  • 21. We can estimate the proportion of fraternal and identical twins using opposite sex twin prevalence Weinberg, 1902 Benyamin, et al, 2005, 2006 P(mz) ~ 1 - 2(NOS / Nall) = 0.26 p(ss) = Nss / Nall = 0.63 p = P(mz|ss) = P(mz) / P(ss) = 0.41 h2 = 2/p (rss - ros) c2 = (ros(p+1) - rss) / p
  • 22. Lakhani et al., Nature Genetics 2019 http://apps.chiragjpgroup.org/catch/ Patient cohorts in the “real-world” : overall heritability (0.32) and shared environment (0.09): a global view among 560 phenotypes gives a nuanced view of G and E CaTCH: Claims analysis of Twin Correlation and Heritability US-based, ages < 25 statistic Phenotype category Overall (0.32) Endocrine (0.4) Metabolic (0.4)
  • 23. Decomposing G (h2), shared E (c2) and factors of shared E (SES, air quality, and temperature) in 2 candidate phenotypes: Lyme and Obesity Lakhani et al., Nature Genetics 2019 temperature/seasonality = 5% of shared environment SES explains 10% of shared E significant total genetics significant total shared E zero genetics moderate shared environment
  • 24. h2 and c2 estimates for 560 phenotypes versus statistical significance : 326/560 traits (>50%) have a heritable and 180/560 (32%) had a shared environment component! r=0.817 Lakhani et al., Nature Genetics 2019
  • 25. … but air pollution, climate, and geocoded SES play a modest role in total shared environment (c2) r=0.817 Lakhani et al., Nature Genetics 2019
  • 26. 56K twins and 700K siblings in a massive health insurance cohort point to complex and elusive variation in 560 phenotypes 560phenotypes http://apps.chiragjpgroup.org/catch/ h2=0.3 c2=0.1 (🏠) ? https://rdcu.be/boZeV
  • 27. • 58.2% of P had non-zero h2 • 32% of P had a non-zero c2 • 87.9% had non-zero age fixed effect • 50% had non-zero sex fixed effect • Shared c2 != pollution, income, climate • 0.32 + 0.09 = 0.41 • 1 - 0.41 = 0.59! The shared environment and genetic contribution in phenotype is complex and modest: Where is the rest of the variation on phenotype? Where is the rest of the variation in P?
  • 28. Explaining the missing variation in P with the Exposome: A data-driven paradigm for robust discovery of E Wild, 2005, 2012 Ioannidis , 2009 Rappaport and Smith, 2010, 2011 Buck-Louis and Sundaram 2012 Miller and Jones, 2014 Patel CJ and Ioannidis JPAI, 2014ab Ioannidis, 2016 Manrai et al 2017
  • 29. Hypothesis - on average, explaining only 10% of the stuff in red 560phenotypes http://apps.chiragjpgroup.org/catch/ h2=0.3 c2=0.1 (🏠) ~10% https://rdcu.be/boZeV
  • 30. Challenges in EWAS for identification of specific E factors, E mixtures, and variance explained (E2) of the exposome Power (both low and large sample sizes) Model misspecification and mis?-interpretation Dense correlational web of the exposome Time-dependence of the exposome-phenome associations ARPH 2016 JAMA 2014 JECH 2014 Scalable measurement of the exposome
  • 31. Red: positive ρ Blue: negative ρ thickness: |ρ| for each pair of E: Spearman ρ (575 factors: 81,937 correlations) Correlation globes paint a complex view of the exposome: average correlation of < 0.3 permuted data to produce “null ρ” sought replication in > 1 cohort Pac Symp Biocomput. 2015 JECH. 2015 Chung et al., ES&T 2018 Effective number of variables: 500 (10% decrease)
  • 32. Co-E patterns between females and males are similar females males … however: Co-E within household weaker! Chung et al., ES&T 2018 Jake Chung
  • 33. High-throughput data analytics to mitigate analytical challenges of exposome-based research: What model do we use, how do we assess them, and how to interpret them? Agier, Portengen et al, EHP 2016
  • 34. Unclear what model to use in the context of adjustment variables Estimating the Vibration of Effects (or Risk) Variable of Interest e.g., 1 SD of log(serum Vitamin D) Adjusting Variable Set n=13 All-subsets Cox regression 213+ 1 = 8,193 models SES [3rd tertile] education [>HS] race [white] body mass index [normal] total cholesterol any heart disease family heart disease any hypertension any diabetes any cancer current/past smoker [no smoking] drink 5/day physical activity Data Source NHANES 1999-2004 417 variables of interest time to death N≧1000 (≧100 deaths) effect sizes p-values ● ● ● ● ● ● ● ● ● ● ● ● ●● 0 1 2 3 4 5 6 7 8 9 10 11 1213 1 50 99 1 50 99 5.0 7.5 −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RPvalue = 4.68 A B C D E median p-value/HR for k percentile indicator JCE, 2015 ● ● ● ● ● ● ● ● ● ● ● ● ●● 0 1 2 3 4 5 6 7 8 9 10 11 1213 1 50 99 1 50 99 2.5 5.0 7.5 0.64 0.68 0.72 0.76 Hazard Ratio −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RP = 4.68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 50 99 1 50 99 1 2 3 4 0.75 0.80 0.85 0.90 Hazard Ratio −log10(pvalue) Thyroxine (1SD(log)) RHR = 1.15 RP = 2.90 http://bit.ly/effectvibration
  • 35. The Vibration of Effects: Vitamin D and Thyroxine and attenuated risk in mortality JCE, 2015 ● ● ● ● ● ● ● ● ● ● ● ● ●● 0 1 2 3 4 5 6 7 8 9 10 11 1213 1 50 99 1 50 99 2.5 5.0 7.5 0.64 0.68 0.72 0.76 Hazard Ratio −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RP = 4.68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 50 99 1 50 99 1 2 3 4 0.75 0.80 0.85 0.90 Hazard Ratio −log10(pvalue) Thyroxine (1SD(log)) RHR = 1.15 RP = 2.90
  • 36. ● ● ● ● ● 9 10 111213 1 5 10 1.3 −log10(pvalue) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 111213 1 50 99 1 50 99 5 10 1.3 1.4 1.5 1.6 Hazard Ratio −log10(pvalue) Cadmium (1SD(log)) adjustment=current_past_smoking ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 111213 1 50 99 1 50 99 5 10 1.3 1.4 1.5 1.6 Hazard Ratio −log10(pvalue) Cadmium (1SD(log)) RHR = 1.29 RP = 8.29 The Vibration of Effects: shifts in the effect size distribution due to select adjustments (e.g., adjusting cadmium levels JCE, 2015
  • 38. JCE, 2015 Risk and significance depends on modeling scenario! The Vibration of Effects: beware of the Janus effect (both risk and protection?!) “risk”“protection” “significant” http://bit.ly/effectvibration
  • 39. How do we interpret the complex models?: What is a mixture? - an integration of correlated E? Interaction? Curr Epidemiol Rep 2017 P = a + b1*e1 + b2*e2 + b3*e3 vs. P = a + b1*e1 + b2*e2 + b3*e3 + b4*e1*e2 vs. P = a + b1*e1 + b2*e2 + b3*e3 + b4*e1*e2 + b5*e1*e2 + b6*e2*e3 + b7*e1*e3 + b8*e1*e2*e3 All terms significant? Only interaction terms? Additive? Simple interaction? All 2 way/n-way?
  • 40. Our current exposomic research cohorts may be underpowered for complex modeling! Chung et al, Environment International, 2019
  • 41. … or overpowered for detection of simple associations!
  • 42. Emerging large sample sized cohorts will expose massive mis- misspecification, or confounding, and correlation amongst E! Manrai et al, American Journal of Epidemiology 2019 EWAS in telomere length (Patel et al., IJE 2017) Adjusted for age, poverty, ethnicity, and sex GWAS in telomere length (Pearson and Manolio, JAMA 2008) Before and after confounding adjustment
  • 43. Aside from residual confounding and mis-specification: Modest association sizes and R2— if effects — may be diluted by the complex phenomenon of individual exposure and time Athersuch Bioanalysis 2012 Cumulative (cadmium, PCB) Constant, but excreted (phenols, vitamins) Intervention (drugs) Seasonal (allergen) In-utero Not shown: Diurnal
  • 44. In conclusion: In complex traits and phenotypes, we are missing much of phenotypic variation of the population. H2 ~ 0.3 C2 ~ 0.1
  • 45. Challenges in EWAS for identification of specific E factors, E mixtures, and variance explained (E2) of the exposome Power (both low and large sample sizes) Model misspecification and mis(?)-interpretation Dense correlational web of the exposome Time-dependence of the exposome-phenome associations ARPH 2016 JAMA 2014 JECH 2014 Scalable measurement of the exposome
  • 46. Now hiring! Data scientist for translational exposome informatics! phenome exposome genome individuals = + diabetes telom eres airpollution nutrients influenza FTOTCF7L2 thousands of environmental exposures millions of genetic variants Develop data science tools for dissecting relationships between the phenome, exposome, and genome to optimize medical decision making in the era of massive data. (you) your phenome P E G @chiragjp chirag@hms.harvard.edu
  • 47. Harvard DBMI Susanne Churchill Nathan Palmer Sophia Mamousette Sunny Alvear Chirag J Patel chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org NIH Common Fund Big Data to Knowledge Acknowledgements RagGroup Jake Chung Kajal Claypool Chirag Lakhani Danielle Rasooly Alan LeGoallec Braden Tierney Yixuan He Mentioned Collaborators Arjun Manrai John Ioannidis Peter Visscher