Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Studying the elusive in larger scale

590 views

Published on

Chirag/s talk at BU on 11/30/15

Published in: Health & Medicine
  • A professional Paper writing services can alleviate your stress in writing a successful paper and take the pressure off you to hand it in on time. Check out, please ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Yes you are right. There are many research paper writing services available now. But almost services are fake and illegal. Only a genuine service will treat their customer with quality research papers. ⇒ www.WritePaper.info ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The "END OF GOUT" is a short, to the point guide on how to reverse gout symptoms without ever leaving your home. The guide goes into extensive detail on exactly what you need to do to safely, effectively and permanently get rid of gout, and you are GUARANTEED to see dramatic improvements in days if not hours. ➤➤ https://t.cn/A6AZCtO2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • GOUT! Is it a life sentence? I used to think so. I MANAGED SYMPTOMS. But the pain never ended. But now US AND European researches finally understand what really causes gout. This program tackles the disease at its cause. I got rid of 3 years of gout in just days. click here to do the same... ■■■ http://t.cn/A67DoaUo
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Today I have no problem, I even had an emotional time over the weekend where I did eat a little too much but it didnt affect me at all. I did not binge eat or get worried I just let the food digest, had a good rest and was back to myself the next day, just what normal people experience. ●●● http://ishbv.com/bulimiarec/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Studying the elusive in larger scale

  1. 1. Studying the elusive environment in larger scale with the exposome and EWAS Chirag J Patel Boston University 11/30/15 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org
  2. 2. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs
  3. 3. We are great at G investigation! over 2000 Genome-wide Association Studies (GWAS) https://www.ebi.ac.uk/gwas/ G
  4. 4. Nothing comparable to elucidate E influence! We lack high-throughput methods and data to discover new E in P… E: ???
  5. 5. A similar paradigm for discovery should exist for E! Why?
  6. 6. σ2 P = σ2 G + σ2 E
  7. 7. σ2 G σ2 P H2 = Heritability (H2) is the range of phenotypic variability attributed to genetic variability in a population Indicator of the proportion of phenotypic differences attributed to G.
  8. 8. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) SNPedia.com G estimates for complex traits are low and variable: massive opportunity for high-throughput E discovery Type 2 Diabetes (25%) Heart Disease (25-30%) Autism (50%???)
  9. 9. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) SNPedia.com G estimates for complex traits are low and variable: massive opportunity for high-throughput E discovery σ2 E : Exposome!
  10. 10. ©2015NatureAmerica,Inc.Allrightsreserved. Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All the results can be visualized using the MaTCH webtool. Specifically, the partitioning of observed variability into underlying genetic and environmental sources and the relative importance of additive and non-additive genetic variation are continually debated1–5. Recent results from large-scale genome-wide association studies (GWAS) show that many genetic variants contribute to the variation in complex traits and that effect sizes are typically small6,7. However, the sum of the variance explained by the detected variants is much smaller than the reported heritability of the trait4,6–10. This ‘missing heritability’ has led some investigators to conclude that non-additive variation must be important4,11. Although the presence of gene-gene interaction has been demonstrated empirically5,12–17, little is known about its relative contribution to observed variation18. In this study, our aim is twofold. First, we analyze empirical esti- mates of the relative contributions of genes and environment for virtually all human traits investigated in the past 50 years. Second, we assess empirical evidence for the presence and relative importance of non-additive genetic influences on all human traits studied. We rely on classical twin studies, as the twin design has been used widely to disentangle the relative contributions of genes and environment, across a variety of human traits. The classical twin design is based on contrasting the trait resemblance of monozygotic and dizygotic twin pairs. Monozygotic twins are genetically identical, and dizygotic twins are genetically full siblings. We show that, for a majority of traits (69%), the observed statistics are consistent with a simple and parsi- monious model where the observed variation is solely due to additive genetic variation. The data are inconsistent with a substantial influence from shared environment or non-additive genetic variation. We also show that estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. Our results are based on a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications includ- ing 14,558,903 partly dependent twin pairs, virtually all twin studies of complex traits published between 1958 and 2012. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All Meta-analysis of the heritability of human traits based on fifty years of twin studies Tinca J C Polderman1,10, Beben Benyamin2,10, Christiaan A de Leeuw1,3, Patrick F Sullivan4–6, Arjen van Bochoven7, Peter M Visscher2,8,11 & Danielle Posthuma1,9,11 1Department of Complex Trait Genetics, VU University, Center for Neurogenomics and Cognitive Research, Amsterdam, the Netherlands. 2Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia. 3Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, the Netherlands. 4Center for Psychiatric Genomics, Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA. 5Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina, USA. 6Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. 7Faculty of Sciences, VU University, Insight into the nature of observed variation in human traits is impor- tant in medicine, psychology, social sciences and evolutionary biology. It has gained new relevance with both the ability to map genes for human traits and the availability of large, collaborative data sets to do so on an extensive and comprehensive scale. Individual differences in human traits have been studied for more than a century, yet the causes of variation in human traits remain uncertain and controversial. Nature Genetics, 2015 17,804 traits of the phenome 2,748 publications 14,558,903 twin pairs Average H2 (genome): 0.49 Exposome may play an equal role.
  11. 11. Explaining the other 50%: A new data-driven paradigm for robust discovery of via EWAS and the exposome what to measure? how to measure? PERSPECTIVES Xenobiotics Inflammation Preexisting disease Lipid peroxidation Oxidative stress Gut flora Internal chemical environment Externalenvironment ExposomeRADIATION DIET POLLUTION INFECTIONS DRUGS LIFE-STYLE STRESS Reactive electrophiles Metals Endocrine disrupters Immune modulators Receptor-binding proteins itical entity for disease eti- ogy (7). Recent discussion as focused on whether and ow to implement this vision 8). Although fully charac- rizing human exposomes daunting, strategies can be eveloped for getting “snap- hots” of critical portions of person’s exposome during ifferent stages of life. At ne extreme is a “bottom-up” rategy in which all chemi- als in each external source f a subject’s exposome are easured at each time point. lthoughthisapproachwould ave the advantage of relat- g important exposures to e air, water, or diet, it would quire enormous effort and ould miss essential compo- ents of the internal chemi- al environment due to such actors as gender, obesity, flammation, and stress. By ontrast, a “top-down” strat- gy would measure all chem- als (or products of their ownstream processing or ffects, so-called read-outs r signatures) in a subject’s ood. This would require nly a single blood specimen each time point and would relate directly ruptors and can be measured through serum some (telomere) length in peripheral blood mono- nuclear cells responded to chronic psychological stress, possibly mediated by the production of reac- tive oxygen species (15). Characterizing the exposome represents a tech- nological challenge like that of thehumangenomeproject,which began when DNA sequencing was in its infancy (16). Analyti- cal systems are needed to pro- cess small amounts of blood from thousands of subjects. Assays should be multiplexed for mea- suring many chemicals in each class of interest. Tandem mass spectrometry, gene and protein chips, and microfluidic systems offer the means to do this. Plat- forms for high-throughput assays shouldleadtoeconomiesofscale, again like those experienced by the human genome project. And because exposome technologies would provide feedback for thera- peuticinterventionsandpersonal- ized medicine, they should moti- vate the development of commer- cial devices for screening impor- tant environmental exposures in blood samples. With successful characterization of both Characterizing the exposome. The exposome represents the combined exposures from all sources that reach the internal chemical environment. Toxicologically important classes of exposome chemicals are shown. Signatures and biomarkers can detect these agents in blood or serum. onOctober21,2010www.sciencemag.orgrom “A more comprehensive view of environmental exposure is needed ... to discover major causes of diseases...” how to analyze in relation to health? Wild, 2005 Rappaport and Smith, 2010, 2011 Buck-Louis and Sundaram 2012 Miller and Jones, 2014 Patel CJ and Ioannidis JPAI, 2014
  12. 12. Connecting Environmental Exposure with Disease: Missing the “System” of Exposures? E+ E- diseased non- diseased ? Exposed to many things, but do not assess the multiplicity. Fragmented literature of associations. Challenge to discover E associated with disease.
  13. 13. Examples of exposome-driven discovery machinery
  14. 14. Gold standard for breadth of human exposure information: National Health and Nutrition Examination Survey1 since the 1960s now biannual: 1999 onwards 10,000 participants per survey The sample for the survey is selected to represent the U.S. population of all ages. To produce reli- able statistics, NHANES over-samples persons 60 and older, African Americans, and Hispanics. Since the United States has experienced dramatic growth in the number of older people during this century, the aging population has major impli- cations for health care needs, public policy, and research priorities. NCHS is working with public health agencies to increase the knowledge of the health status of older Americans. NHANES has a primary role in this endeavor. All participants visit the physician. Dietary inter- views and body measurements are included for everyone. All but the very young have a blood sample taken and will have a dental screening. Depending upon the age of the participant, the rest of the examination includes tests and proce- dures to assess the various aspects of health listed above. In general, the older the individual, the more extensive the examination. Survey Operations Health interviews are conducted in respondents’ homes. Health measurements are performed in specially-designed and equipped mobile centers, which travel to locations throughout the country. The study team consists of a physician, medical and health technicians, as well as dietary and health interviewers. Many of the study staff are bilingual (English/Spanish). An advanced computer system using high- end servers, desktop PCs, and wide-area networking collect and process all of the NHANES data, nearly eliminating the need for paper forms and manual coding operations. This system allows interviewers to use note- book computers with electronic pens. The staff at the mobile center can automatically transmit data into data bases through such devices as digital scales and stadiometers. Touch-sensi- tive computer screens let respondents enter their own responses to certain sensitive ques- tions in complete privacy. Survey information is available to NCHS staff within 24 hours of collection, which enhances the capability of collecting quality data and increases the speed with which results are released to the public. In each location, local health and government officials are notified of the upcoming survey. Households in the study area receive a letter from the NCHS Director to introduce the survey. Local media may feature stories about the survey. NHANES is designed to facilitate and en- courage participation. Transportation is provided to and from the mobile center if necessary. Participants receive compensation and a report of medical findings is given to each participant. All information collected in the survey is kept strictly confidential. Privacy is protected by public laws. Uses of the Data Information from NHANES is made available through an extensive series of publications and articles in scientific and technical journals. For data users and researchers throughout the world, survey data are available on the internet and on easy-to-use CD-ROMs. Research organizations, universities, health care providers, and educators benefit from survey information. Primary data users are federal agencies that collaborated in the de- sign and development of the survey. The National Institutes of Health, the Food and Drug Administration, and CDC are among the agencies that rely upon NHANES to provide data essential for the implementation and evaluation of program activities. The U.S. Department of Agriculture and NCHS coop- erate in planning and reporting dietary and nutrition information from the survey. NHANES’ partnership with the U.S. Environ- mental Protection Agency allows continued study of the many important environmental influences on our health. • Physical fitness and physical functioning • Reproductive history and sexual behavior • Respiratory disease (asthma, chronic bron- chitis, emphysema) • Sexually transmitted diseases • Vision 1 http://www.cdc.gov/nchs/nhanes.htm >250 exposures (serum + urine) GWAS chip >85 quantitative clinical traits (e.g., serum glucose, lipids, BMI) Death index linkage (cause of death)
  15. 15. Gold standard for breadth of human exposure information: National Health and Nutrition Examination Survey Nutrients and Vitamins vitamin D, carotenes Infectious Agents hepatitis, HIV, Staph. aureus Plastics and consumables phthalates, bisphenol A Physical Activity stepsPesticides and pollutants atrazine; cadmium; hydrocarbons Drugs statins; aspirin
  16. 16. What E factors are associated with type 2 diabetes?
  17. 17. EWAS in Type 2 Diabetes: Searching >250 exposures for associations with FBG > 125 mg/dL −log10(pvalue) ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● acrylamide allergentest bacterialinfection cotinine diakyl dioxins furansdibenzofuran heavymetals hydrocarbons latex nutrientscarotenoid nutrientsminerals nutrientsvitaminA nutrientsvitaminB nutrientsvitaminC nutrientsvitaminD nutrientsvitaminE pcbs perchlorate pesticidesatrazine pesticideschlorophenol pesticidesorganochlorine pesticidesorganophosphate pesticidespyrethyroid phenols phthalates phytoestrogens polybrominatedethers polyflourochemicals viralinfection volatilecompounds 012 Heptachlor Epoxide OR=3.2, 1.8 PCB170 OR=4.5,2.3 γ-tocopherol (vitamin E) OR=1.8,1.6 β-carotene OR=0.6,0.6 FDR<10% age, sex, race, SES, BMI PLOS ONE. 2010
  18. 18. What E factors are associated with mortality and biological aging?
  19. 19. EWAS to search for exposures and behaviors associated with all-cause mortality. NHANES: 1999-2004 National Death Index linked mortality 246 behaviors and exposures (serum/urine/self-report) NHANES: 1999-2001 N=330 to 6008 (26 to 655 deaths) ~5.5 years of followup Cox proportional hazards baseline exposure and time to death False discovery rate < 5% NHANES: 2003-2004 N=177 to 3258 (20-202 deaths) ~2.8 years of followup p < 0.05 IJE, 2013
  20. 20. Adjusted Hazard Ratio -log10(pvalue) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8 02468 1 2 3 4 5 67 1 Physical Activity 2 Does anyone smoke in home? 3 Cadmium 4 Cadmium, urine 5 Past smoker 6 Current smoker 7 trans-lycopene (11) 1 2 3 4 5 6 78 9 10 1112 13 14 1516 1 age (10 year increment) 2 SES_1 3 male 4 SES_0 5 black 6 SES_2 7 SES_3 8 education_hs 9 other_eth 10 mexican 11 occupation_blue_semi 12 education_less_hs 13 occupation_never 14 occupation_blue_high 15 occupation_white_semi 16 other_hispanic (69) All-cause mortality: 253 exposure/behavior associations in survival age, sex, income, education, race/ethnicity, occupation [in red] FDR < 5% sociodemographics replicated factor IJE, 2013
  21. 21. Adjusted Hazard Ratio -log10(pvalue) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8 02468 1 2 3 4 5 67 1 Physical Activity 2 Does anyone smoke in home? 3 Cadmium 4 Cadmium, urine 5 Past smoker 6 Current smoker 7 trans-lycopene (11) 1 2 3 4 5 6 78 9 10 1112 13 14 1516 1 age (10 year increment) 2 SES_1 3 male 4 SES_0 5 black 6 SES_2 7 SES_3 8 education_hs 9 other_eth 10 mexican 11 occupation_blue_semi 12 education_less_hs 13 occupation_never 14 occupation_blue_high 15 occupation_white_semi 16 other_hispanic (69) EWAS (re)-identifies factors associated with all-cause mortality: Volcano plot of 200 associations age (10 years) income (quintile 2) income (quintile 1) male black income (quintile 3) any one smoke in home? age, sex, income, education, race/ethnicity, occupation [in red] serum and urine cadmium [1 SD] past smoker? current smoker?serum lycopene [1SD] physical activity [low, moderate, high activity]* *derived from METs per activity and categorized by Health.gov guidelines R2 ~ 2%
  22. 22. 452 associations in Telomere Length: Polychlorinated biphenyls associated with longer telomeres?! Manrai, Kohane (in review) 0 1 2 3 4 −0.2 −0.1 0.0 0.1 0.2 effect size −log10(pvalue) PCBs FDR<5% Trunk Fat Alk. PhosCRP Cadmium Cadmium (urine)cigs per day retinyl stearate R2 ~ 1% VO2 Maxpulse rate shorter telomeres longer telomeres adjusted by age, age2, race, poverty, education, occupation median N=3000; 300-7000
  23. 23. Interindividual variation in mean leukocyte telomere length (LTL) is associated with cancer and several age-associated diseases. We report here a genome-wide meta-analysis of 37,684 individuals with replication of selected variants in an additional 10,739 individuals. We identified seven loci, including five new loci, associated with mean LTL (P < 5 × 10−8). Five of the loci contain candidate genes (TERC, TERT, NAF1, OBFC1 and RTEL1) that are known to be involved in telomere biology. Lead SNPs at two loci (TERC and TERT) associate with several cancers and other diseases, including idiopathic pulmonary fibrosis. Moreover, a genetic risk score analysis combining lead variants at all 7 loci in 22,233 coronary artery disease cases and 64,762 controls showed an association of the alleles associated with shorter LTL with increased risk of coronary artery disease (21% (95% confidence interval, 5–35%) per standard deviation in LTL, P = 0.014). Our findings support a causal role of telomere-length variation in some age-related diseases. Telomeres are the protein-bound DNA repeat structures at the ends of chromosomes that are important in maintaining genomic sta- bility1. They are critical in regulating cellular replicative capacity2. During somatic-cell replication, telomere length progressively short- ens because of the inability of DNA polymerase to fully replicate the 3 end of the DNA strand. Once a critically short telomere length is reached, the cell is triggered to enter replicative senescence, which subsequently leads to cell death1,2. Conversely, in germ cells and other stem cells that require renewal, telomere length is maintained by the enzyme telomerase, a ribonucleoprotein that contains the RNA template TERC and a reverse transcriptase TERT3. Both longer and shorter telomere length are associated with increased risk of certain cancers4,5, and reactivation of telomerase, which bypasses cellular senescence, is a common requirement for oncogenic pro- gression6. Therefore, telomere length is an important determinant of telomere function. Mean telomere length exhibits considerable interindividual vari- ability and has high heritability with estimates varying between 44% and 80% (refs. 7–9). Most of these studies have measured mean telomere length in blood leukocytes. However, there is evidence that, within an individual, mean LTL and telomere length in other tissues are highly correlated10,11. In cross-sectional population studies, mean LTL is longer in women than in men and is inversely associated with age (declining by between 20–40 bp per year)9,12–14. Shorter age- adjusted and sex-adjusted mean LTL has been found to be associated with risk of several age-related diseases, including coronary artery disease (CAD)12–15, and has been advanced as a marker of biologi- cal aging16. However, the extent to which the association of shorter LTL with age-related disorders is causal in nature remains unclear. Identifying genetic variants that affect telomere length and testing their association with disease could clarify any causal role. So far, common variants at two loci on chromosome 3q26 (TERC)17–19 and chromosome 10q24.33 (OBFC1)18, which explain <1% of the variance in telomere length, have shown a replicated asso- ciation with mean LTL in genome-wide association studies (GWAS). To identify other genetic determinants of LTL, we conducted a large- scale GWAS meta-analysis of 37,684 individuals from 15 cohorts, followed by replication of selected variants in an additional 10,739 individuals from 6 more cohorts. Details of the studies included in the GWAS meta-analysis and in the replication phase are provided in the Supplementary Note, and key characteristics are summarized in Supplementary Table 1. All subjects were of European descent, the majority of the cohorts were population based and three of the replication cohorts were addi- tional subjects from studies used in the meta-analysis. The genotyp- ing platforms and the imputation method (to HapMap 2 build 36) used by each GWAS cohort are summarized in Supplementary Table 2. We measured mean LTL in each cohort using a quantitative PCR method and expressed it as a ratio of telomere repeat length to copy number of a single-copy gene (T/S ratio; Online Methods and Supplementary Note). Then we analyzed LTL, adjusted for age, sex and any study-specific covariates, for association with genotype using linear regression in each study and adjusted the results for genomic inflation control fac- tors (Supplementary Table 2). We performed an inverse variance– weighted meta-analysis for 2,362,330 SNPs (Online Methods) with correction for the overall genomic inflation control factor ( = 1.007; quantile-quantile plot for the meta-analysis is shown in Supplementary Fig. 1). SNPs in seven loci exhibited association with mean LTL at genome- wide significance (P < 5 × 10−8; Figs. 1, 2, Table 1 and Supplementary Fig. 2). The association of the lead SNP on chromosome 2p16.2 (rs11125529) was very close to the threshold for genome-wide sig- nificance, and the lead SNP in a locus on 16q23.3 (rs2967374) fell just short of this threshold (Table 1). We therefore sought replication of results for these two loci. We confirmed the association of rs11125529 Identification of seven loci affecting mean telomere length and their association with disease A full list of authors and affiliations appears at the end of the paper. Received 26 June 2012; accepted 19 December 2012; published online 27 March 2013; doi:10.1038/ng.2528 Nature Genetics, 2013 Interindividual variation in mean leukocyte telomere length (LTL) is associated with cancer and several age-associated diseases. We report here a genome-wide meta-analysis of 37,684 individuals with replication of selected variants in an additional 10,739 individuals. We identified seven loci, including five new loci, associated with mean LTL (P < 5 × 10−8). Five of the loci contain candidate genes (TERC, TERT, NAF1, OBFC1 and RTEL1) that are known to be involved in telomere biology. Lead SNPs at two loci (TERC and TERT) associate with several cancers and other diseases, including idiopathic pulmonary fibrosis. Moreover, a genetic risk score analysis combining lead variants at all 7 loci in 22,233 coronary artery disease cases and 64,762 controls showed an association of the alleles associated with shorter LTL with increased risk of coronary artery disease (21% (95% confidence interval, 5–35%) per standard deviation in LTL, P = 0.014). Our findings support a causal role of telomere-length variation in some age-related diseases. Telomeres are the protein-bound DNA repeat structures at the ends of chromosomes that are important in maintaining genomic sta- bility1. They are critical in regulating cellular replicative capacity2. During somatic-cell replication, telomere length progressively short- ens because of the inability of DNA polymerase to fully replicate the 3 end of the DNA strand. Once a critically short telomere length is reached, the cell is triggered to enter replicative senescence, which subsequently leads to cell death1,2. Conversely, in germ cells and other stem cells that require renewal, telomere length is maintained age (declining by between 20–40 bp per year)9,12–14. Shorter age- adjusted and sex-adjusted mean LTL has been found to be associated with risk of several age-related diseases, including coronary artery disease (CAD)12–15, and has been advanced as a marker of biologi- cal aging16. However, the extent to which the association of shorter LTL with age-related disorders is causal in nature remains unclear. Identifying genetic variants that affect telomere length and testing their association with disease could clarify any causal role. So far, common variants at two loci on chromosome 3q26 (TERC)17–19 and chromosome 10q24.33 (OBFC1)18, which explain <1% of the variance in telomere length, have shown a replicated asso- ciation with mean LTL in genome-wide association studies (GWAS). To identify other genetic determinants of LTL, we conducted a large- scale GWAS meta-analysis of 37,684 individuals from 15 cohorts, followed by replication of selected variants in an additional 10,739 individuals from 6 more cohorts. Details of the studies included in the GWAS meta-analysis and in the replication phase are provided in the Supplementary Note, and key characteristics are summarized in Supplementary Table 1. All subjects were of European descent, the majority of the cohorts were population based and three of the replication cohorts were addi- tional subjects from studies used in the meta-analysis. The genotyp- ing platforms and the imputation method (to HapMap 2 build 36) used by each GWAS cohort are summarized in Supplementary Table 2. We measured mean LTL in each cohort using a quantitative PCR method and expressed it as a ratio of telomere repeat length to copy number of a single-copy gene (T/S ratio; Online Methods and Supplementary Note). Then we analyzed LTL, adjusted for age, sex and any study-specific Identification of seven loci affecting mean telomere length and their association with disease Does PCB exposure influence expression of 24 (29) genes implicated in telomere length GWAS? L E T T E R S but not of rs2967374 (Table 1). The com- bined P value from the GWAS meta-analyses and replication cohorts for rs11125529 was 7.50 × 10−10. There was no evidence of sex- dependent effects or additional independent signals at any of these loci (Online Methods and Supplementary Tables 3, 4). Details of key genes in each locus associated with LTL and their location in relation to the lead SNP are provided in Supplementary Table 5. The most significantly associated locus we found was the previously reported TERC locus on 3q26 (Figs. 1, 2 and Table 1)17. Four additional loci, 5p15.33 (TERT), 4q32.2 (NAF1, nuclear assembly factor 1), 10q24.33 (OBFC1, oligonucleotide/oligosaccharide-binding fold containing 1)18 and 20q13.3 (RTEL1, regulator of telomere elon- gation helicase 1), harbor genes that encode proteins with known function in telomere biology3,20–23. NAF1 protein is required for assembly of H/ACA box small nucleolar RNA, the RNA family to which TERC belongs20. Thus, the three most significantly associated loci (3q26, 5p15.33 and 4q32.2) harbor genes involved in the forma- tion and activity of telomerase. We therefore examined whether the lead SNPs at these loci as well as the other identified loci associate with leukocyte telomerase activity in available data from 208 individuals. We did not find an association of any of the variants with telomerase activity (Supplementary Table 6). However, the study only had 80% power ( of 0.05) to detect a SNP effect that explained 3.7% of the variance in telomerase activity, and therefore smaller effects are likely to have been missed in this exploratory analysis. We also found a significant association (P = 6.90 × 10−11) at the previously reported OBFC1 locus18. OBFC1 is a component of the telomere-binding CST complex that also contains CTC1 and TEN1 (ref. 21). In yeast, this complex binds to the single-stranded gua- nine overhang at the telomere and functions to promote telomere replication. RTEL1 is a DNA helicase that has been shown to have important roles in setting telomere length, telomere maintenance and DNA repair in mice22,23. However, it should be noted that the Figure 1 Signal-intensity plot of genotype association with telomere length. Data are displayed as –log10(P values) against chromosomal location for the 2,362,330 SNPs that were tested. The dotted line represents a genome-wide level of significance at P = 5 × 10−8. Loci that showed an association at this level are plotted in red. a 35 30 25 20 value) r 2 0.8 0.6 0.4 0.2 rs10936599 100 Recombination 80 60 b 0.8 0.6 0.4 0.2 20 15 lue) rs2736100 100 80 r 2 Recombinat c 0.8 0.6 0.4 0.2 15 ) rs7675998 100 80 r 2 Recombina 30 20 –log10(Pvalue) 10 ACYP2 NAF1 TERT Chromosome OBFC1 ZNF208 RTEL1 TERC 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
  24. 24. Samples exposed to PCBs associated with difference in genes implicated in telomere length GWAS? Expression differences for 24 GWAS implicated genes Queried the Gene Expression Omnibus for PCBs Affymetrix human arrays (GPL570) 7 gene expression experiments on humans 52 exposed; 14 unexposed Differential gene expression and a functional analysis of PCB-exposed children: Understanding disease and disorder development Sisir K. Dutta a, ⁎, Partha S. Mitra a,1 , Somiranjan Ghosh a,1 , Shizhu Zang a,1 , Dean Sonneborn b , Irva Hertz-Picciotto b , Tomas Trnovec c , Lubica Palkovicova c , Eva Sovcikova c , Svetlana Ghimbovschi d , Eric P. Hoffman d a Molecular Genetics Laboratory, Howard University, Washington, DC, USA b Department of Public Health Sciences, University of California Davis, Davis, CA, USA c Slovak Medical University, Bratislava, Slovak Republic d Center for Genetic Medicine, Children's National Medical Center, Washington, DC, USA a b s t r a c ta r t i c l e i n f o Article history: Received 20 December 2010 Accepted 10 July 2011 The goal of the present study is to understand the probable molecular mechanism of toxicities and the associated pathways related to observed pathophysiology in high PCB-exposed populations. We have performed a microarray-based differential gene expression analysis of children (mean age 46.1 months) of Environment International 40 (2012) 143–154 Contents lists available at ScienceDirect Environment International journal homepage: www.elsevier.com/locate/envint
  25. 25. 0 1 2 −0.50 −0.25 0.00 0.25 0.50 0.75 log(difference) −log10(pvalue) 1555203_s_at (SLC44A4) 1555203_s_at (MYNN) 224206_x_at (MYNN) Samples exposed to PCBs associated with difference in genes implicated in telomere length GWAS?
  26. 26. Interdependencies of the exposome: Correlation globes paint a complex view of exposure Red: positive ρ Blue: negative ρ thickness: |ρ| permuted data to produce “null ρ” sought replication in > 1 cohort Pac Symp Biocomput 2015 JECH 2015 for each pair of E: Spearman ρ (575 factors: 81,937 correlations)
  27. 27. Red: positive ρ Blue: negative ρ thickness: |ρ| Interdependencies of the exposome: Correlation globes paint a complex view of exposure permuted data to produce “null ρ” sought replication in > 1 cohort Pac Symp Biocomput 2015 JECH 2015 Effective number of variables: 500 (10% decrease) for each pair of E: Spearman ρ (575 factors: 81,937 correlations)
  28. 28. Telomere Length All-cause mortality http://bit.ly/globebrowse Interdependencies of the exposome: Telomeres vs. all-cause mortality
  29. 29. Browse these and 82 other phenotype-exposome globes! http://www.chiragjpgroup.org/exposome_correlation
  30. 30. What nodes have the most correlations / have the most connections? (“hubs of the network”) (What factors are correlated with others the most?) income... AJE, 2015
  31. 31. Pulse rate Eosinophils number Lymphocyte number Monocyte Segmented neutrophils number Blood 2,5-Dimethylfuran Cadmium LeadCotinine C-reactive protein Floor, GFAAS Protoporphyrin Glycohemoglobin Glucose, plasma g-tocopherol Hepatitis A Antibody Homocysteine Herpes I Herpes II Red cell distribution width Alkaline phosphotase Globulin Glucose, serum Gamma glutamyl transferase Triglycerides Blood Benzene Blood 1,4-Dichlorobenzene Blood Ethylbenzene Blood Styrene Blood Toluene Blood m-/p-Xylene White blood cell count Mono-benzyl phthalate 3-fluorene 2-fluorene 3-phenanthrene 2-phenanthrene 1-pyrene Cadmium, urine Albumin, urine Lead, urine 10 20 30 -0.3 -0.2 -0.1 0.0 Effect Size per 1SD of income/poverty ratio -log10(pvalue) overall income/poverty ratio effects (per 1SD) validated results Lower income associated with 43 of 330 (>13%) exposures and biomarkers in the US population Higher income: lower levels of biomarkers AJE, 2015 (Another 23 associated with higher levels=20%)
  32. 32. Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- proach (an analogy to genome-wide association stud- ies).Forexample,Wangetal4 screenedmorethan2000 chemicalsinserumtodiscoverendogenousexposuresas- sociated with risk for cardiovascular disease. Therearenotablehurdlesinanalyzing“big”environ- mental data. These same problems affect epidemiology of1-risk-factor-at-a-time,butinEWAStheirprevalencebe- comes more clearly manifest at large scale. When study- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets, may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observational data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of the multiple correlations also highlights the challenge thatinterveningtomodify1putativeriskfactoralsomay inadvertently affect multiple other correlated factors. Even when a seemingly simple intervention is tested in randomizedtrials(affectingasingleriskfactoramongthe manycorrelations),theinterventionisnotreallysimple. In essence what is tested are multiple perturbations of factors correlated with the one targeted for interven- VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion JAMA, 2014 JECH, 2014 Proc Symp Biocomp, 2015 How can we study the elusive environment in larger scale for biomedical discovery? Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observationa data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion High-throughputascertainmentofendogenousindicatorsofen- vironmentalexposurethatmayreflecttheexposomeincreasinglyat- tractattention,andtheirperformanceneedstobecarefullyevaluated. These include chemical detection of indicators of exposure through metabolomics, proteomics, and biosensors.7 Eventually, patterns of US federally funded gene expression experiment data be d itedinpublicrepositoriessuchastheGeneExpressionOmnibu repositoryhasbeeninstrumentalindevelopmentoftechnolo measurement of gene expression, data standardization, and ofdatafordiscovery.JustaswiththeGeneExpressionOmnib Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt Nutrition Examination Survey (NHANES) Participants, 2003-2004 A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Negative correlation Positive correl Infectious agents Pollutants Nutrients and vitamins Demographic attributes Eachcorrelationinterdependencyglobeincludes317environmentalexposures representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and thicknessofeachedgeindicatesthemagnitudeofthecorrelation. Opinion Viewpoint •bioinformatics to connect exposome with phenome •new ‘omics technologies to measure the exposome •dense correlations •reverse causality •confounding •(longitudinal) publicly available data
  33. 33. http://grants.nih.gov/grants/guide/rfa-files/RFA-ES-15-010.html NIH National Institute of Environmental Health: $34M in FY 2015: new technologies for ascertaining the exposome in children E LaboratoryE LaboratoryE LaboratoryE Laboratory E Data Center •Data repository •Analytic ecosystem •Data standards Exposome Laboratory Network
  34. 34. with Paul Avillach, Michael McDuffie, Jeremy Easton-Marks, Cartik Saravanamuthu and the BD2K PIC-SURE team 40K participants >1000 indicators of exposure Data and API available now http://nhanes.hms.harvard.edu BD2K Patient-Centered Information Commons NHANES exposome browser
  35. 35. Connecting Environmental Exposure with Disease: Missing the “System” of Exposures? E+ E- diseased non- diseased ? Exposed to many things, but do not assess the multiplicity. Fragmented literature of associations. Challenge to discover E associated with disease.
  36. 36. Example of fragmentation: Is everything we eat associated with cancer? Schoenfeld and Ioannidis, AJCN (2012) 50 random ingredients from Boston Cooking School Cookbook Any associated with cancer? FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie outliers are not shown (effect estimates .10). Of 50, 40 studied in a cancer risk Weak statistical evidence: non-replicated inconsistent effects non-standardized
  37. 37. e modelling oblem is akin to – but less well sed and more poorly understood than – e testing. For example, consider the use r regression to adjust the risk levels of atments to the same background level There can be many covariates, and t of covariates can be in or out of the With ten covariates, there are over 1000 models. Consider a maze as a metaphor elling (Figure 3). The red line traces the path out of the maze. The path through ze looks simple, once it is known. ways in the literature for dealing with model selection, so we propose a new, composite 2. Publication bias is general recognition that a paper much better chance of acceptance if hing new is found. This means that, for ation, the claim in the paper has to sed on a p-value less than 0.05. From g’s point of view5 , this is quality by tion. The journals are placing heavy ce on a statistical test rather than nation of the methods and steps that o a conclusion. As to having a p-value han 0.05, some might be tempted to the system10 through multiple testing, ple modelling or unfair treatment of or some combination of the three that to a small p-value. Researchers can be creative in devising a plausible story to statistical finding. 2 The data cleaning team creates a modelling data set and a holdout set and P < 0.05 Figure 3. The path through a complex process can appear quite simple once the path is defined. Which terms are included in a multiple linear regression model? Each turn in a maze is analogous to including or not a specific term in the evolving linear model. By keeping an eye on the p-value on the term selected to be at issue, one can work towards a suitably small p-value. © ktsdesign – Fotolia A maze of associations is one way to a fragmented literature and Vibration of Effects Young, 2011 univariate sex sex & age sex & race sex & race & age JCE, 2015
  38. 38. Distribution of associations and p-values due to model choice: Estimating the Vibration of Effects (or Risk) (e.g., mortality) Variable of Interest e.g., 1 SD of log(serum Vitamin D) Adjusting Variable Set n=13 All-subsets Cox regression 213+ 1 = 8,193 models SES [3rd tertile] education [>HS] race [white] body mass index [normal] total cholesterol any heart disease family heart disease any hypertension any diabetes any cancer current/past smoker [no smoking] drink 5/day physical activity Data Source NHANES 1999-2004 417 variables of interest time to death N≧1000 (≧100 deaths) effect sizes p-values ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 1 50 1 50 99 5.0 7.5 −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RPvalue = 4.68 A B C D E median p-value/HR for k percentile indicator JCE, 2015 ● ● ● ● ● ● ● ● ● ● ● ● ●● 0 1 2 3 4 5 6 7 8 9 10 11 1213 1 50 99 1 50 99 2.5 5.0 7.5 0.64 0.68 0.72 0.76 Hazard Ratio −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RP = 4.68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 50 99 1 50 99 1 2 3 4 0.75 0.80 0.85 0.90 Hazard Ratio −log10(pvalue) Thyroxine (1SD(log)) RHR = 1.15 RP = 2.90
  39. 39. The Vibration of Effects: examples for Vitamin D and Thyroxine in association with mortality risk JCE, 2015 ● ● ● ● ● ● ● ● ● ● ● ● ●● 0 1 2 3 4 5 6 7 8 9 10 11 1213 1 50 99 1 50 99 2.5 5.0 7.5 0.64 0.68 0.72 0.76 Hazard Ratio −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RP = 4.68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 50 99 1 50 99 1 2 3 4 0.75 0.80 0.85 0.90 Hazard Ratio −log10(pvalue) Thyroxine (1SD(log)) RHR = 1.15 RP = 2.90
  40. 40. ● ● ● ● ● 9 10 111213 1 5 10 1.3 −log10(pvalue) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 111213 1 50 99 1 50 99 5 10 1.3 1.4 1.5 1.6 Hazard Ratio −log10(pvalue) Cadmium (1SD(log)) adjustment=current_past_smoking ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 111213 1 50 99 1 50 99 5 10 1.3 1.4 1.5 1.6 Hazard Ratio −log10(pvalue) Cadmium (1SD(log)) RHR = 1.29 RP = 8.29 The Vibration of Effects: shifts in the effect size distribution due to select adjustments (e.g., adjusting cadmium levels with smoking status) JCE, 2015
  41. 41. JCE, 2015 Janus (two-faced) risk profile Risk and significance depends on modeling scenario! The Vibration of Effects: beware of the Janus effect (both risk and protection?!) “risk”“protection” “significant” Brittanica.com
  42. 42. oblem is akin to – but less well sed and more poorly understood than – e testing. For example, consider the use r regression to adjust the risk levels of atments to the same background level There can be many covariates, and t of covariates can be in or out of the With ten covariates, there are over 1000 models. Consider a maze as a metaphor elling (Figure 3). The red line traces the P < 0.05 Figure 3. The path through a complex process can appear quite simple once the path is defined. Which terms are included in a multiple linear regression model? Each turn in a maze is analogous to including or not a specific term in the evolving linear model. By keeping an eye on the p-value on the term selected to be at issue, one can work towards a suitably small p-value. © ktsdesign – Fotolia Our modeling scenarios can lead to a fragmented literature; however we can assess the distribution of effects with VoE JCE, 2015 http://bit.ly/effectvibration
  43. 43. Can exposure enable re-classification of phenotypes?
  44. 44. P We are many phenotypes simultaneously: Can we better categorize these P? Body Measures Body Mass Index Height Blood pressure & fitness Systolic BP Diastolic BP Pulse rate VO2 Max Metabolic Glucose LDL-Cholesterol Triglycerides Inflammation C-reactive protein white blood cell count Kidney function Creatinine Sodium Uric Acid Liver function Aspartate aminotransferase Gamma glutamyltransferase Aging Telomere length
  45. 45. EWAS-derived phenotype-exposure association map: A 2-D view of phenotype-exposure associations for re- classification PCB170 Glucose BMI Height Cholesterol β-carotene folate http://bit.ly.com/pemap
  46. 46. Creation of a phenotype-exposure association map: A 2-D view of 83 phenotype by 252 exposure associations > 0 < 0 Association Size: Clusters of exposures associated with clusters of phenotypes? 252 biomarkers of exposure × 83 clinical trait phenotypes NHANES 1999-2000, 2001-2002, 2005-2006 ~21K regressions: replicated significant (FDR < 5%) in 2003-2004 adjusted by age, age2, sex, race, income, chronic disease Hugues Aschard, JP Ioannidis 83phenotypes 252 exposures
  47. 47. Alpha-carotene Alcohol VitaminEasalpha-tocopherol Beta-carotene Caffeine Calcium Carbohydrate Cholesterol Copper Beta-cryptoxanthin Folicacid Folate,DFE Foodfolate Dietaryfiber Iron Energy Lycopene Lutein+zeaxanthin MFA16:1 MFA18:1 MFA20:1 Magnesium Totalmonounsaturatedfattyacids Moisture Niacin PFA18:2 PFA18:3 PFA20:4 PFA22:5 PFA22:6 Totalpolyunsaturatedfattyacids Phosphorus Potassium Protein Retinol SFA4:0 SFA6:0 SFA8:0 SFA10:0 SFA12:0 SFA14:0 SFA16:0 SFA18:0 Selenium Totalsaturatedfattyacids Totalsugars Totalfat Theobromine VitaminA,RAE Thiamin VitaminB12 Riboflavin VitaminB6 VitaminC VitaminK Zinc NoSalt OrdinarySalt a-Carotene VitaminB12,serum trans-b-carotene cis-b-carotene b-cryptoxanthin Folate,serum g-tocopherol Iron,FrozenSerum CombinedLutein/zeaxanthin trans-lycopene Folate,RBC Retinylpalmitate Retinylstearate Retinol VitaminD a-Tocopherol Daidzein o-Desmethylangolensin Equol Enterodiol Enterolactone Genistein EstimatedVO2max PhysicalActivity Doesanyonesmokeinhome? Total#ofcigarettessmokedinhome Cotinine CurrentCigaretteSmoker? Agelastsmokedcigarettesregularly #cigarettessmokedperdaywhenquit #cigarettessmokedperdaynow #dayssmokedcigsduringpast30days Avg#cigarettes/dayduringpast30days Smokedatleast100cigarettesinlife Doyounowsmokecigarettes... numberofdayssincequit Usedsnuffatleast20timesinlife drink5inaday drinkperday days5drinksinyear daysdrinkinyear 3-fluorene 2-fluorene 3-phenanthrene 1-phenanthrene 2-phenanthrene 1-pyrene 3-benzo[c]phenanthrene 3-benz[a]anthracene Mono-n-butylphthalate Mono-phthalate Mono-cyclohexylphthalate Mono-ethylphthalate Mono-phthalate Mono--hexylphthalate Mono-isobutylphthalate Mono-n-methylphthalate Mono-phthalate Mono-benzylphthalate Cadmium Lead Mercury,total Barium,urine Cadmium,urine Cobalt,urine Cesium,urine Mercury,urine Iodine,urine Molybdenum,urine Lead,urine Platinum,urine Antimony,urine Thallium,urine Tungsten,urine Uranium,urine BloodBenzene BloodEthylbenzene Bloodo-Xylene BloodStyrene BloodTrichloroethene BloodToluene Bloodm-/p-Xylene 1,2,3,7,8-pncdd 1,2,3,7,8,9-hxcdd 1,2,3,4,6,7,8-hpcdd 1,2,3,4,6,7,8,9-ocdd 2,3,7,8-tcdd Beta-hexachlorocyclohexane Gamma-hexachlorocyclohexane Hexachlorobenzene HeptachlorEpoxide Mirex Oxychlordane p,p-DDE Trans-nonachlor 2,5-dichlorophenolresult 2,4,6-trichlorophenolresult Pentachlorophenol Dimethylphosphate Diethylphosphate Dimethylthiophosphate PCB66 PCB74 PCB99 PCB105 PCB118 PCB138&158 PCB146 PCB153 PCB156 PCB157 PCB167 PCB170 PCB172 PCB177 PCB178 PCB180 PCB183 PCB187 3,3,4,4,5,5-hxcb 3,3,4,4,5-pncb 3,4,4,5-tcb Perfluoroheptanoicacid Perfluorohexanesulfonicacid Perfluorononanoicacid Perfluorooctanoicacid Perfluorooctanesulfonicacid Perfluorooctanesulfonamide 2,3,7,8-tcdf 1,2,3,7,8-pncdf 2,3,4,7,8-pncdf 1,2,3,4,7,8-hxcdf 1,2,3,6,7,8-hxcdf 1,2,3,7,8,9-hxcdf 2,3,4,6,7,8-hxcdf 1,2,3,4,6,7,8-hpcdf Measles Toxoplasma HepatitisAAntibody HepatitisBcoreantibody HepatitisBSurfaceAntibody HerpesII Albumin, urine Uric acid Phosphorus Osmolality Sodium Potassium Creatinine Chloride Total calcium Bicarbonate Blood urea nitrogen Total protein Total bilirubin Lactate dehydrogenase LDH Gamma glutamyl transferase Globulin Alanine aminotransferase ALT Aspartate aminotransferase AST Alkaline phosphotase Albumin Methylmalonic acid PSA. total Prostate specific antigen ratio TIBC, Frozen Serum Red cell distribution width Red blood cell count Platelet count SI Segmented neutrophils percent Mean platelet volume Mean cell volume Mean cell hemoglobin MCHC Hemoglobin Hematocrit Ferritin Protoporphyrin Transferrin saturation White blood cell count Monocyte percent Lymphocyte percent Eosinophils percent C-reactive protein Segmented neutrophils number Monocyte number Lymphocyte number Eosinophils number Basophils number mean systolic mean diastolic 60 sec. pulse: 60 sec HR Total Cholesterol Triglycerides Glucose, serum Insulin Homocysteine Glucose, plasma Glycohemoglobin C-peptide: SI LDL-cholesterol Direct HDL-Cholesterol Bone alkaline phosphotase Trunk Fat Lumber Pelvis BMD Lumber Spine BMD Head BMD Trunk Lean excl BMC Total Lean excl BMC Total Fat Total BMD Weight Waist Circumference Triceps Skinfold Thigh Circumference Subscapular Skinfold Recumbent Length Upper Leg Length Standing Height Head Circumference Maximal Calf Circumference Body Mass Index -0.4 -0.2 0 0.2 0.4 Value 050100150 Color Key and Histogram Count http://bit.ly.com/pemap phenotypes exposures +- EWAS-derived phenotype-exposure association map: A 2-D view of connections between P and E
  48. 48. Alpha-carotene Alcohol VitaminEasalpha-tocopherol Beta-carotene Caffeine Calcium Carbohydrate Cholesterol Copper Beta-cryptoxanthin Folicacid Folate,DFE Foodfolate Dietaryfiber Iron Energy Lycopene Lutein+zeaxanthin MFA16:1 MFA18:1 MFA20:1 Magnesium Totalmonounsaturatedfattyacids Moisture Niacin PFA18:2 PFA18:3 PFA20:4 PFA22:5 PFA22:6 Totalpolyunsaturatedfattyacids Phosphorus Potassium Protein Retinol SFA4:0 SFA6:0 SFA8:0 SFA10:0 SFA12:0 SFA14:0 SFA16:0 SFA18:0 Selenium Totalsaturatedfattyacids Totalsugars Totalfat Theobromine VitaminA,RAE Thiamin VitaminB12 Riboflavin VitaminB6 VitaminC VitaminK Zinc NoSalt OrdinarySalt a-Carotene VitaminB12,serum trans-b-carotene cis-b-carotene b-cryptoxanthin Folate,serum g-tocopherol Iron,FrozenSerum CombinedLutein/zeaxanthin trans-lycopene Folate,RBC Retinylpalmitate Retinylstearate Retinol VitaminD a-Tocopherol Daidzein o-Desmethylangolensin Equol Enterodiol Enterolactone Genistein EstimatedVO2max PhysicalActivity Doesanyonesmokeinhome? Total#ofcigarettessmokedinhome Cotinine CurrentCigaretteSmoker? Agelastsmokedcigarettesregularly #cigarettessmokedperdaywhenquit #cigarettessmokedperdaynow #dayssmokedcigsduringpast30days Avg#cigarettes/dayduringpast30days Smokedatleast100cigarettesinlife Doyounowsmokecigarettes... numberofdayssincequit Usedsnuffatleast20timesinlife drink5inaday drinkperday days5drinksinyear daysdrinkinyear 3-fluorene 2-fluorene 3-phenanthrene 1-phenanthrene 2-phenanthrene 1-pyrene 3-benzo[c]phenanthrene 3-benz[a]anthracene Mono-n-butylphthalate Mono-phthalate Mono-cyclohexylphthalate Mono-ethylphthalate Mono-phthalate Mono--hexylphthalate Mono-isobutylphthalate Mono-n-methylphthalate Mono-phthalate Mono-benzylphthalate Cadmium Lead Mercury,total Barium,urine Cadmium,urine Cobalt,urine Cesium,urine Mercury,urine Iodine,urine Molybdenum,urine Lead,urine Platinum,urine Antimony,urine Thallium,urine Tungsten,urine Uranium,urine BloodBenzene BloodEthylbenzene Bloodo-Xylene BloodStyrene BloodTrichloroethene BloodToluene Bloodm-/p-Xylene 1,2,3,7,8-pncdd 1,2,3,7,8,9-hxcdd 1,2,3,4,6,7,8-hpcdd 1,2,3,4,6,7,8,9-ocdd 2,3,7,8-tcdd Beta-hexachlorocyclohexane Gamma-hexachlorocyclohexane Hexachlorobenzene HeptachlorEpoxide Mirex Oxychlordane p,p-DDE Trans-nonachlor 2,5-dichlorophenolresult 2,4,6-trichlorophenolresult Pentachlorophenol Dimethylphosphate Diethylphosphate Dimethylthiophosphate PCB66 PCB74 PCB99 PCB105 PCB118 PCB138&158 PCB146 PCB153 PCB156 PCB157 PCB167 PCB170 PCB172 PCB177 PCB178 PCB180 PCB183 PCB187 3,3,4,4,5,5-hxcb 3,3,4,4,5-pncb 3,4,4,5-tcb Perfluoroheptanoicacid Perfluorohexanesulfonicacid Perfluorononanoicacid Perfluorooctanoicacid Perfluorooctanesulfonicacid Perfluorooctanesulfonamide 2,3,7,8-tcdf 1,2,3,7,8-pncdf 2,3,4,7,8-pncdf 1,2,3,4,7,8-hxcdf 1,2,3,6,7,8-hxcdf 1,2,3,7,8,9-hxcdf 2,3,4,6,7,8-hxcdf 1,2,3,4,6,7,8-hpcdf Measles Toxoplasma HepatitisAAntibody HepatitisBcoreantibody HepatitisBSurfaceAntibody HerpesII Albumin, urine Uric acid Phosphorus Osmolality Sodium Potassium Creatinine Chloride Total calcium Bicarbonate Blood urea nitrogen Total protein Total bilirubin Lactate dehydrogenase LDH Gamma glutamyl transferase Globulin Alanine aminotransferase ALT Aspartate aminotransferase AST Alkaline phosphotase Albumin Methylmalonic acid PSA. total Prostate specific antigen ratio TIBC, Frozen Serum Red cell distribution width Red blood cell count Platelet count SI Segmented neutrophils percent Mean platelet volume Mean cell volume Mean cell hemoglobin MCHC Hemoglobin Hematocrit Ferritin Protoporphyrin Transferrin saturation White blood cell count Monocyte percent Lymphocyte percent Eosinophils percent C-reactive protein Segmented neutrophils number Monocyte number Lymphocyte number Eosinophils number Basophils number mean systolic mean diastolic 60 sec. pulse: 60 sec HR Total Cholesterol Triglycerides Glucose, serum Insulin Homocysteine Glucose, plasma Glycohemoglobin C-peptide: SI LDL-cholesterol Direct HDL-Cholesterol Bone alkaline phosphotase Trunk Fat Lumber Pelvis BMD Lumber Spine BMD Head BMD Trunk Lean excl BMC Total Lean excl BMC Total Fat Total BMD Weight Waist Circumference Triceps Skinfold Thigh Circumference Subscapular Skinfold Recumbent Length Upper Leg Length Standing Height Head Circumference Maximal Calf Circumference Body Mass Index -0.4 -0.2 0 0.2 0.4 Value 050100150 Color Key and Histogram Count http://bit.ly.com/pemap phenotypes exposures +- nutrients BMI,weight, BMD metabolic renalfunction pcbs metabolic bloodparameters hydrocarbons EWAS-derived phenotype-exposure association map: A 2-D view of connections between P and E
  49. 49. Toward a phenotype-exposure association map: (Re)-categorizing phenotypes with E 7 6 5 4 3 2 1 0 Distance liver:Albumin kidney:Bicarbonate immunological:Basophils percent immunological:Lymphocyte percent immunological:Eosinophils percent kidney:Phosphorus liver:Total protein liver:Aspartate aminotransferase AST liver:Alanine aminotransferase ALT body measures:Head Circumference body measures:Recumbent Length liver:Lactate dehydrogenase LDH cancer:Prostate specific antigen ratio cancer:PSA, free blood:Transferrin saturation liver:Total bilirubin heart:Direct HDL-Cholesterol immunological:Monocyte percent bone:Head BMD body measures:Standing Height body measures:Upper Leg Length bone:Total BMD bone:Lumber Spine BMD bone:Lumber Pelvis BMD heart:Triglycerides heart:LDL-cholesterol heart:Total Cholesterol blood:MCHC blood:TIBC, Frozen Serum blood:Hematocrit blood:Hemoglobin kidney:Potassium blood:Mean cell hemoglobin blood:Mean cell volume kidney:Uric acid kidney:Blood urea nitrogen kidney:Total calcium kidney:Creatinine blood:Ferritin blood:Red blood cell count body measures:Weight blood:Segmented neutrophils percent body measures:Total Lean excl BMC body measures:Trunk Lean excl BMC body measures:Body Mass Index body measures:Waist Circumference body measures:Triceps Skinfold body measures:Maximal Calf Circumference body measures:Thigh Circumference liver:Gamma glutamyl transferase blood pressure:60 sec. pulse: metabolic:Insulin body measures:Total Fat body measures:Trunk Fat body measures:Subscapular Skinfold blood pressure:mean systolic immunological:C-reactive protein liver:Globulin immunological:Monocyte number immunological:Segmented neutrophils number immunological:Lymphocyte number immunological:White blood cell count immunological:Basophils number immunological:Eosinophils number blood:Mean platelet volume heart:Homocysteine nutrition:Methylmalonic acid kidney:Osmolality kidney:Chloride kidney:Sodium kidney:Albumin, urine blood pressure:60 sec HR cancer:PSA. total blood:Platelet count SI blood:Protoporphyrin blood:Red cell distribution width bone:Bone alkaline phosphotase liver:Alkaline phosphotase blood pressure:mean diastolic metabolic:C-peptide: SI metabolic:Glycohemoglobin metabolic:Glucose, plasma metabolic:Glucose, serum inflammation adiposity kidney function metabolic traits
  50. 50. 7 6 5 4 3 2 1 0 Distance liver:Albumin kidney:Bicarbonate immunological:Basophils percent immunological:Lymphocyte percent immunological:Eosinophils percent kidney:Phosphorus liver:Total protein liver:Aspartate aminotransferase AST liver:Alanine aminotransferase ALT body measures:Head Circumference body measures:Recumbent Length liver:Lactate dehydrogenase LDH cancer:Prostate specific antigen ratio cancer:PSA, free blood:Transferrin saturation liver:Total bilirubin heart:Direct HDL-Cholesterol immunological:Monocyte percent bone:Head BMD body measures:Standing Height body measures:Upper Leg Length bone:Total BMD bone:Lumber Spine BMD bone:Lumber Pelvis BMD heart:Triglycerides heart:LDL-cholesterol heart:Total Cholesterol blood:MCHC blood:TIBC, Frozen Serum blood:Hematocrit blood:Hemoglobin kidney:Potassium blood:Mean cell hemoglobin blood:Mean cell volume kidney:Uric acid kidney:Blood urea nitrogen kidney:Total calcium kidney:Creatinine blood:Ferritin blood:Red blood cell count body measures:Weight blood:Segmented neutrophils percent body measures:Total Lean excl BMC body measures:Trunk Lean excl BMC body measures:Body Mass Index body measures:Waist Circumference body measures:Triceps Skinfold body measures:Maximal Calf Circumference body measures:Thigh Circumference liver:Gamma glutamyl transferase blood pressure:60 sec. pulse: metabolic:Insulin body measures:Total Fat body measures:Trunk Fat body measures:Subscapular Skinfold blood pressure:mean systolic immunological:C-reactive protein liver:Globulin immunological:Monocyte number immunological:Segmented neutrophils number immunological:Lymphocyte number immunological:White blood cell count immunological:Basophils number immunological:Eosinophils number blood:Mean platelet volume heart:Homocysteine nutrition:Methylmalonic acid kidney:Osmolality kidney:Chloride kidney:Sodium kidney:Albumin, urine blood pressure:60 sec HR cancer:PSA. total blood:Platelet count SI blood:Protoporphyrin blood:Red cell distribution width bone:Bone alkaline phosphotase liver:Alkaline phosphotase blood pressure:mean diastolic metabolic:C-peptide: SI metabolic:Glycohemoglobin metabolic:Glucose, plasma metabolic:Glucose, serum “bad” cholesterol “good” cholesterol Toward a phenotype-exposure association map: (Re)-categorizing phenotypes with E
  51. 51. 7 6 5 4 3 2 1 0 Distance liver:Albumin kidney:Bicarbonate immunological:Basophils percent immunological:Lymphocyte percent immunological:Eosinophils percent kidney:Phosphorus liver:Total protein liver:Aspartate aminotransferase AST liver:Alanine aminotransferase ALT body measures:Head Circumference body measures:Recumbent Length liver:Lactate dehydrogenase LDH cancer:Prostate specific antigen ratio cancer:PSA, free blood:Transferrin saturation liver:Total bilirubin heart:Direct HDL-Cholesterol immunological:Monocyte percent bone:Head BMD body measures:Standing Height body measures:Upper Leg Length bone:Total BMD bone:Lumber Spine BMD bone:Lumber Pelvis BMD heart:Triglycerides heart:LDL-cholesterol heart:Total Cholesterol blood:MCHC blood:TIBC, Frozen Serum blood:Hematocrit blood:Hemoglobin kidney:Potassium blood:Mean cell hemoglobin blood:Mean cell volume kidney:Uric acid kidney:Blood urea nitrogen kidney:Total calcium kidney:Creatinine blood:Ferritin blood:Red blood cell count body measures:Weight blood:Segmented neutrophils percent body measures:Total Lean excl BMC body measures:Trunk Lean excl BMC body measures:Body Mass Index body measures:Waist Circumference body measures:Triceps Skinfold body measures:Maximal Calf Circumference body measures:Thigh Circumference liver:Gamma glutamyl transferase blood pressure:60 sec. pulse: metabolic:Insulin body measures:Total Fat body measures:Trunk Fat body measures:Subscapular Skinfold blood pressure:mean systolic immunological:C-reactive protein liver:Globulin immunological:Monocyte number immunological:Segmented neutrophils number immunological:Lymphocyte number immunological:White blood cell count immunological:Basophils number immunological:Eosinophils number blood:Mean platelet volume heart:Homocysteine nutrition:Methylmalonic acid kidney:Osmolality kidney:Chloride kidney:Sodium kidney:Albumin, urine blood pressure:60 sec HR cancer:PSA. total blood:Platelet count SI blood:Protoporphyrin blood:Red cell distribution width bone:Bone alkaline phosphotase liver:Alkaline phosphotase blood pressure:mean diastolic metabolic:C-peptide: SI metabolic:Glycohemoglobin metabolic:Glucose, plasma metabolic:Glucose, serum height + BMD Toward a phenotype-exposure association map: (Re)-categorizing phenotypes with E
  52. 52. σ2 EH2 vs.
  53. 53. Triglycerides Total Cholesterol LDL-cholesterol Trunk Fat Albumin, urine Insulin Total Fat Head Circumference Blood urea nitrogen Albumin Homocysteine C-peptide: SI C-reactive protein Body Mass Index Ferritin Thigh Circumference Maximal Calf Circumference Direct HDL-Cholesterol Total calcium Total bilirubin Red cell distribution width Gamma glutamyl transferase Mean cell volume Mean cell hemoglobin White blood cell count Uric acid Protoporphyrin Hemoglobin Total protein Alkaline phosphotase Waist Circumference Hematocrit Weight Standing Height 1/Creatinine Creatinine Trunk Lean excl BMC Methylmalonic acid Triceps Skinfold Lymphocyte number Subscapular Skinfold Total Lean excl BMC Segmented neutrophils number Lactate dehydrogenase LDH Bone alkaline phosphotase TIBC, Frozen Serum Aspartate aminotransferase AST Phosphorus Lumber Pelvis BMD Glycohemoglobin Globulin Chloride Bicarbonate Alanine aminotransferase ALT 60 sec. pulse: Upper Leg Length Total BMD Potassium Glucose, serum Glucose, plasma Red blood cell count Lumber Spine BMD Platelet count SI MCHC Osmolality Monocyte number mean systolic Lymphocyte percent Segmented neutrophils percent Recumbent Length Eosinophils number Monocyte percent Head BMD mean diastolic Prostate specific antigen ratio 60 sec HR Basophils number Sodium PSA, free Mean platelet volume Eosinophils percent PSA. total Basophils percent 0 10 20 30 40 R^2 * 100 1 to 66 exposures identified for 81 phenotypes Additive effect of E factors: Describe < 20% of variability in P (On average: 8%) σ2 E?
  54. 54. Emerging technologies to ascertain exposome will enable biomedical discovery High-throughput E standards: mitigate fragmented literature of associations Confounding, reverse causality: how to handle at large dimension? e.g., EWASs in T2D, telomere length, and mortality Facilitate G and E interaction investigations and more precise definitions of P
  55. 55. Possible to use high-throughput data modalities to discover the role of E (and G) in P. −log10(pvalue) ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● acrylamide allergentest bacterialinfection cotinine diakyl dioxins furansdibenzofuran heavymetals hydrocarbons latex nutrientscarotenoid nutrientsminerals nutrientsvitaminA nutrientsvitaminB nutrientsvitaminC nutrientsvitaminD nutrientsvitaminE pcbs perchlorate pesticidesatrazine pesticideschlorophenol pesticidesorganochlorine pesticidesorganophosphate pesticidespyrethyroid phenols phthalates phytoestrogens polybrominatedethers polyflourochemicals viralinfection volatilecompounds 012 A Serum cotinine B Serum total mercury 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Infectious agents Pollutants Nutrients and vitamins Demographic attributes P = G + E
  56. 56. Harvard HMS Isaac Kohane Susanne Churchill Stan Shaw Nathan Palmer Jenn Grandfield Sunny Alvear Michal Preminger Harvard Chan Hugues Aschard Francesca Dominici Stanford John Ioannidis Atul Butte (UCSF) U Queensland Jian Yang Peter Visscher Cochrane Belinda Burford Chirag Lakhani Adam Brown Nam Pho Danielle Rasooly Arjun Manrai Chirag J Patel chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org CDC/NCHS Ajay Yesupriya Imperial Ioanna Tzoulaki Paul Elliott Lund (Sweden) Jan Sundquist Kristina Sundquist NIH Common Fund Big Data to Knowledge Thanks... Stefano Monti David Scherr

×