Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a search engine for exposures in disease

275 views

Published on

Talk at University of Puerto Rico, Humacao

Published in: Health & Medicine
  • Writing a good research paper isn't easy and it's the fruit of hard work. For help you can check writing expert. Check out, please ⇒ www.HelpWriting.net ⇐ I think they are the best
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Ryan Shed Plans 12,000 Shed Plans and Designs For Easy Shed Building! ▶▶▶ https://url.cn/I86oXShh
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Grab 5 Free Shed Plans Now! Download 5 Full-Blown Shed Plans with Step-By-Step Instructions & Easy To Follow Blueprints! ➤➤ https://url.cn/1lvHlCNO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Unlock The Universe & Get Answers You Seek Today In Your FREE Tarot Reading. DO THIS FIRST... To get the most out of your tarot reading, I first need you to focus your intention - this concentrates the energy on the universe to answer the questions that you most desire the answers for. Take 10 seconds to think of your #1 single biggest CHALLENGE right now. (Yes, stop for 10 seconds, close your eyes, and focus your energy on ONE key problem) Ready? Okay, let's proceed. ▶▶▶ https://url.cn/YtemTEAx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Enough is a enough! Is this going to be the day you finally do something about your health? It is a lot easier than you think to be able to shed off unwanted weight. See how you can get started today with 1 minute weight loss routines! ▲▲▲ http://t.cn/A6PnIGtz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Building a search engine for exposures in disease

  1. 1. Building a search engine to find environmental and phenotypic factors associated with disease and health Chirag J Patel University of Puerto Rico, Humacao U-STAR 02/21/17 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org
  2. 2. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs
  3. 3. We are great at G investigation! over 2400 Genome-wide Association Studies (GWAS) https://www.ebi.ac.uk/gwas/ G
  4. 4. Nothing comparable to elucidate E influence! E: ??? We lack high-throughput methods and data to discover new E in P…
  5. 5. A similar paradigm for discovery should exist for E! Why?
  6. 6. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs Remember….
  7. 7. σ2 P = σ2 G + σ2 E + (σ2 ExG + σ2 GxG)
  8. 8. σ2 G σ2P H2 = Heritability (H2) is the range of phenotypic variability attributed to genetic variability in a population Indicator of the proportion of phenotypic differences attributed to G.
  9. 9. Height is an example of a heritable trait: Francis Galton shows how its done (1887) “mid-height of 205 parents described 60% of variability of 928 offspring”
  10. 10. Height is an example of a heritable trait: Francis Galton shows how its done (1887) “mid-height of 205 parents described 60% of variability of 928 offspring” what explains the other 40%??? nutrition? economics?
  11. 11. height is not the only one…
  12. 12. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for burdensome diseases are low and variable: massive opportunity for high-throughput E discovery Type 2 Diabetes Heart Disease Autism (50%???)
  13. 13. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for complex traits are low and variable: massive opportunity for high-throughput E discovery σ2 E : Exposome!
  14. 14. ©2015NatureAmerica,Inc.Allrightsreserved. Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All the results can be visualized using the MaTCH webtool. Specifically, the partitioning of observed variability into underlying genetic and environmental sources and the relative importance of additive and non-additive genetic variation are continually debated1–5. Recent results from large-scale genome-wide association studies (GWAS) show that many genetic variants contribute to the variation in complex traits and that effect sizes are typically small6,7. However, the sum of the variance explained by the detected variants is much smaller than the reported heritability of the trait4,6–10. This ‘missing heritability’ has led some investigators to conclude that non-additive variation must be important4,11. Although the presence of gene-gene interaction has been demonstrated empirically5,12–17, little is known about its relative contribution to observed variation18. In this study, our aim is twofold. First, we analyze empirical esti- mates of the relative contributions of genes and environment for virtually all human traits investigated in the past 50 years. Second, we assess empirical evidence for the presence and relative importance of non-additive genetic influences on all human traits studied. We rely on classical twin studies, as the twin design has been used widely to disentangle the relative contributions of genes and environment, across a variety of human traits. The classical twin design is based on contrasting the trait resemblance of monozygotic and dizygotic twin pairs. Monozygotic twins are genetically identical, and dizygotic twins are genetically full siblings. We show that, for a majority of traits (69%), the observed statistics are consistent with a simple and parsi- monious model where the observed variation is solely due to additive genetic variation. The data are inconsistent with a substantial influence from shared environment or non-additive genetic variation. We also show that estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. Our results are based on a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications includ- ing 14,558,903 partly dependent twin pairs, virtually all twin studies of complex traits published between 1958 and 2012. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All Meta-analysis of the heritability of human traits based on fifty years of twin studies Tinca J C Polderman1,10, Beben Benyamin2,10, Christiaan A de Leeuw1,3, Patrick F Sullivan4–6, Arjen van Bochoven7, Peter M Visscher2,8,11 & Danielle Posthuma1,9,11 1Department of Complex Trait Genetics, VU University, Center for Neurogenomics and Cognitive Research, Amsterdam, the Netherlands. 2Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia. 3Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, the Netherlands. 4Center for Psychiatric Genomics, Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA. 5Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina, USA. 6Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. 7Faculty of Sciences, VU University, Insight into the nature of observed variation in human traits is impor- tant in medicine, psychology, social sciences and evolutionary biology. It has gained new relevance with both the ability to map genes for human traits and the availability of large, collaborative data sets to do so on an extensive and comprehensive scale. Individual differences in human traits have been studied for more than a century, yet the causes of variation in human traits remain uncertain and controversial. Nature Genetics, 2015 17,804 traits of the phenome 2,748 publications 14,558,903 twin pairs Average H2 (genome): 0.49 Exposome may play an equal role.
  15. 15. It took a new paradigm of GWAS for discovery: Human Genome Project to GWAS Sequencing of the genome 2001 HapMap project: http://hapmap.ncbi.nlm.nih.gov/ Characterize common variation 2001-current day High-throughput variant assay < $99 for ~1M variants Measurement tools ~2003 (ongoing) ARTICLES Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium* There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined ,2,000 individuals for each of 7 major diseases and a shared set of ,3,000 controls. Case-control comparisons identified 24 independent association signals at P , 5 3 1027 : 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a 25 27 Vol 447|7 June 2007|doi:10.1038/nature05911 WTCCC, Nature, 2008. Comprehensive, high-throughput analyses GWAS
  16. 16. Explaining the other 50%: A big data-driven paradigm for robust discovery of E in disease via EWAS and the exposome what to measure? how to measure? PERSPECTIVES Xenobiotics Inflammation Preexisting disease Lipid peroxidation Oxidative stress Gut flora Internal chemical environment Externalenvironment ExposomeRADIATION DIET POLLUTION INFECTIONS DRUGS LIFE-STYLE STRESS Reactive electrophiles Metals Endocrine disrupters Immune modulators Receptor-binding proteins itical entity for disease eti- ogy (7). Recent discussion as focused on whether and ow to implement this vision 8). Although fully charac- rizing human exposomes daunting, strategies can be eveloped for getting “snap- hots” of critical portions of person’s exposome during ifferent stages of life. At ne extreme is a “bottom-up” rategy in which all chemi- als in each external source f a subject’s exposome are easured at each time point. lthoughthisapproachwould ave the advantage of relat- g important exposures to e air, water, or diet, it would quire enormous effort and ould miss essential compo- ents of the internal chemi- al environment due to such actors as gender, obesity, flammation, and stress. By ontrast, a “top-down” strat- gy would measure all chem- als (or products of their ownstream processing or ffects, so-called read-outs r signatures) in a subject’s ood. This would require nly a single blood specimen each time point and would relate directly ruptors and can be measured through serum some (telomere) length in peripheral blood mono- nuclear cells responded to chronic psychological stress, possibly mediated by the production of reac- tive oxygen species (15). Characterizing the exposome represents a tech- nological challenge like that of thehumangenomeproject,which began when DNA sequencing was in its infancy (16). Analyti- cal systems are needed to pro- cess small amounts of blood from thousands of subjects. Assays should be multiplexed for mea- suring many chemicals in each class of interest. Tandem mass spectrometry, gene and protein chips, and microfluidic systems offer the means to do this. Plat- forms for high-throughput assays shouldleadtoeconomiesofscale, again like those experienced by the human genome project. And because exposome technologies would provide feedback for thera- peuticinterventionsandpersonal- ized medicine, they should moti- vate the development of commer- cial devices for screening impor- tant environmental exposures in blood samples. With successful characterization of both Characterizing the exposome. The exposome represents the combined exposures from all sources that reach the internal chemical environment. Toxicologically important classes of exposome chemicals are shown. Signatures and biomarkers can detect these agents in blood or serum. onOctober21,2010www.sciencemag.orgrom “A more comprehensive view of environmental exposure is needed ... to discover major causes of diseases...” how to analyze in relation to health? Wild, 2005 Rappaport and Smith, 2010, 2011 Buck-Louis and Sundaram 2012 Miller and Jones, 2014 Patel CJ and Ioannidis JPAI, 2014
  17. 17. What is a Genome-Wide Association Study (GWAS)?: Data-driven search for G factors in P evolut partic eases; tase 1) well a biolog The captur implem STRU revert subset librium clearly −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 80 60 40 100 rvedteststatistic a b NATURE|Vol 447|7 June 2007 WTCCC, 2007 AA Aa aa case control Robust, transparent, and comprehensive search for G in P
  18. 18. evolu parti eases tase 1 well biolo Th captu imple STRU rever subse libriu clearl −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 80 60 40 100 ervedteststatistic a b NATURE|Vol 447|7 June 2007 comprehensive and transparent multiplicity controlled novel findings (and validated) Patel CJ, Ioannidis JPAI, JAMA 2014 Patel CJ, Ioannidis JPAI, JECH 2014 Why carry out a Genome-Wide Association Study: Analytically robust, transparent, and comprehensive search for G in P
  19. 19. GWAS example Example of the big data paradigm: GWAS to drives discovery in G in P A RT I C L E S 50 Locus established previously Locus identified by current study Locus not confirmed by current study BCL11A THADA NOTCH2 ADAMTS9 IRS1 IGF2BP2 WFS1 ZBED3 CDKAL1 HHEX/IDE KCNQ1 (2 signals*: ) TCF7L2 KCNJ11 CENTD2 MTNR1B HMGA2 ZFAND6 PRC1 FTO HNF1B DUSP9 Conditional analysis Unconditional analysis TSPAN8/LGR5 HNF1A CDC123/CAMK1D CHCHD9 CDKN2A/2B SLC30A8 TP53INP1 JAZF1 KLF14 PPAR 40 30 –log10(P)–log10(P) 20 10 10 1 2 3 4 5 6 7 8 Chromosome 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X 0 0 Suggestive statistical association (P < 1 10 –5 ) Association in identified or established region (P < 1 10 –4 ) Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta- analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4). Voight et al, Nature Genetics 2012 N=8K T2D, 39K Controls Impossible to reach this scale in E based investigations
  20. 20. Connecting E with Disease: Missing the “System” of Exposures? E+ E- diseased non- diseased ? Exposed to many things, but do not assess the multiplicity. Fragmented literature of associations. Challenge to discover E associated with disease.
  21. 21. Examples of exposome-driven discovery machinery
  22. 22. Gold standard for breadth of human exposure information: National Health and Nutrition Examination Survey1 since the 1960s now biannual: 1999 onwards 10,000 participants per survey The sample for the survey is selected to represent the U.S. population of all ages. To produce reli- able statistics, NHANES over-samples persons 60 and older, African Americans, and Hispanics. Since the United States has experienced dramatic growth in the number of older people during this century, the aging population has major impli- cations for health care needs, public policy, and research priorities. NCHS is working with public health agencies to increase the knowledge of the health status of older Americans. NHANES has a primary role in this endeavor. All participants visit the physician. Dietary inter- views and body measurements are included for everyone. All but the very young have a blood sample taken and will have a dental screening. Depending upon the age of the participant, the rest of the examination includes tests and proce- dures to assess the various aspects of health listed above. In general, the older the individual, the more extensive the examination. Survey Operations Health interviews are conducted in respondents’ homes. Health measurements are performed in specially-designed and equipped mobile centers, which travel to locations throughout the country. The study team consists of a physician, medical and health technicians, as well as dietary and health interviewers. Many of the study staff are bilingual (English/Spanish). An advanced computer system using high- end servers, desktop PCs, and wide-area networking collect and process all of the NHANES data, nearly eliminating the need for paper forms and manual coding operations. This system allows interviewers to use note- book computers with electronic pens. The staff at the mobile center can automatically transmit data into data bases through such devices as digital scales and stadiometers. Touch-sensi- tive computer screens let respondents enter their own responses to certain sensitive ques- tions in complete privacy. Survey information is available to NCHS staff within 24 hours of collection, which enhances the capability of collecting quality data and increases the speed with which results are released to the public. In each location, local health and government officials are notified of the upcoming survey. Households in the study area receive a letter from the NCHS Director to introduce the survey. Local media may feature stories about the survey. NHANES is designed to facilitate and en- courage participation. Transportation is provided to and from the mobile center if necessary. Participants receive compensation and a report of medical findings is given to each participant. All information collected in the survey is kept strictly confidential. Privacy is protected by public laws. Uses of the Data Information from NHANES is made available through an extensive series of publications and articles in scientific and technical journals. For data users and researchers throughout the world, survey data are available on the internet and on easy-to-use CD-ROMs. Research organizations, universities, health care providers, and educators benefit from survey information. Primary data users are federal agencies that collaborated in the de- sign and development of the survey. The National Institutes of Health, the Food and Drug Administration, and CDC are among the agencies that rely upon NHANES to provide data essential for the implementation and evaluation of program activities. The U.S. Department of Agriculture and NCHS coop- erate in planning and reporting dietary and nutrition information from the survey. NHANES’ partnership with the U.S. Environ- mental Protection Agency allows continued study of the many important environmental influences on our health. • Physical fitness and physical functioning • Reproductive history and sexual behavior • Respiratory disease (asthma, chronic bron- chitis, emphysema) • Sexually transmitted diseases • Vision 1 http://www.cdc.gov/nchs/nhanes.htm >250 exposures (serum + urine) GWAS chip >85 quantitative clinical traits (e.g., serum glucose, lipids, body mass index) Death index linkage (cause of death)
  23. 23. Gold standard for breadth of exposure & behavior data: National Health and Nutrition Examination Survey Nutrients and Vitamins vitamin D, carotenes Infectious Agents hepatitis, HIV, Staph. aureus Plastics and consumables phthalates, bisphenol A Physical Activity e.g., stepsPesticides and pollutants atrazine; cadmium; hydrocarbons Drugs statins; aspirin
  24. 24. What exposures are correlated with type 2 diabetes?
  25. 25. Type 2 Diabetes Mellitus: A complex, multifactorial disease •Insulin production vs. use •beta-cell function •insulin sensitivity (BMI) •Moves glucose from blood into cells •Complications arise due to glucose in blood, hyperglycemia •diagnosed by blood glucose levels CDC, body weight, diet, lifestyle, age
  26. 26. EWAS in Type 2 Diabetes: >200 associations with a Manhattan Plot−log10(pvalue) ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● acrylamide allergentest bacterialinfection cotinine diakyl dioxins furansdibenzofuran heavymetals hydrocarbons latex nutrientscarotenoid nutrientsminerals nutrientsvitaminA nutrientsvitaminB nutrientsvitaminC nutrientsvitaminD nutrientsvitaminE pcbs perchlorate pesticidesatrazine pesticideschlorophenol pesticidesorganochlorine pesticidesorganophosphate pesticidespyrethyroid phenols phthalates phytoestrogens polybrominatedethers polyflourochemicals viralinfection volatilecompounds 012 Heptachlor Epoxide OR=3.2, 1.8 PCB170 OR=4.5,2.3 γ-tocopherol (vitamin E) OR=1.8,1.6 β-carotene OR=0.6,0.6 FBG > 125 mg/dL adjusted by age, sex, race, SES, BMI PLOS ONE. 2010 FDR<10%
  27. 27. What E are correlated with heart disease risk factors?
  28. 28. EWAS on Serum Lipid Levels: Triglycerides, LDL-Cholesterol, HDL-Cholesterol • Risk factors for coronary heart disease (CHD) • Targets for intervention (ie, statins) • Influenced by smoking, physical activity, diet, genetics1 Teslovich et al. Nature (2010) Grundy et al. ATVB (2004) Gotto et al. JACC (2004) • LDL-C Δ1%: 1% increased risk for CHD2 • HDL-C Δ1%: 2% decreased risk for CHD3 • Triglycerides: higher risk for CHD image: google.com
  29. 29. EWAS in HDL-C: 17 Validated Factors FDR < 5% carotenes cotinine heavy metals organochlorine pesticides Int J Epidem. 2012 hydrocarbons log10(HDL-C) adjusted for BMI, SES, ethnicity, age, age2, sex N=1000-3000 E Vitamins DCBA minerals 1-5 mg/dL R2 ~ 15%
  30. 30. EWAS in Triglycerides and LDL-C 22 factors organochlorine pesticides polychlorinated biphenyls carotenoids vitamin E vitamin A 8 factors carotenoids vitamin E vitamin A Int J Epidem. 2012. 1-15 mg/dL R2 ~ 15, 2%
  31. 31. Effect Sizes For Validated Factors: HDL-C % change = Δ 1 SD in Exposure 17 validated factors survey! N! P-value! FDR! Effect (mg/dL)! pollutants nutrient factors R2 ~ 15% Int J Epidem. 2012.
  32. 32. Persistent pollutants and endocrine disruptors found in T2D and Heart Disease risk factors: How are these factors linked with these diseases? •organochlorine pesticides •polychlorinated biphenyls •dibenzofurans •dioxins •found all over the world •persist in food chain Porta et al, Environ Int 2008 •heart disease, •T2D/insulin resistance Porta et al, Lancet, 2006 Lee et al, Diabetes Care, 2006 Lee et al, Diabetologia, 2007 Everett et al, Environ Res, 2010 Lind et al, EHP, 2011 (Korea, Japan, Europe) Biological mechanisms remain elusive... capacitors adhesives
  33. 33. Challenges in exposome data mining: confounding and reverse causality hinder inference! example: HDL-C Could the disease “lead” to exposure? “Reverse causality” γ-tocopherol ? tocopherol (vitamin e) supplements for T2D individuals? T2D Could there something confounding the association? statin use β-carotene confounders high HDL ??
  34. 34. Longitudinal Study: “Silver Standard” to mitigate risk of reverse •exposure changing through time •reverse causality bias •compute disease risk age/time HDL-Cholesterol (mg/dL) [high] [low] [γ-tocopherol] tocopherol (vitamin e) supplements for CHD individuals? T2D ? γ-tocopherol
  35. 35. age/time Rateofmortality [high] [low] [E factors] What environmental factors associated with long-term risk for death?
  36. 36. What E are associated with aging: all-cause mortality and telomere length?
  37. 37. How does it work?: Searching for exposures and behaviors associated with all- cause mortality. NHANES: 1999-2004 National Death Index linked mortality 246 behaviors and exposures (serum/urine/self-report) NHANES: 1999-2001 N=330 to 6008 (26 to 655 deaths) ~5.5 years of followup Cox proportional hazards baseline exposure and time to death False discovery rate < 5% NHANES: 2003-2004 N=177 to 3258 (20-202 deaths) ~2.8 years of followup p < 0.05 Int J Epidem. 2013
  38. 38. Adjusted Hazard Ratio -log10(pvalue) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8 02468 1 2 3 4 5 67 1 Physical Activity 2 Does anyone smoke in home? 3 Cadmium 4 Cadmium, urine 5 Past smoker 6 Current smoker 7 trans-lycopene (11) 1 2 3 4 5 6 78 9 10 1112 13 14 1516 1 age (10 year increment) 2 SES_1 3 male 4 SES_0 5 black 6 SES_2 7 SES_3 8 education_hs 9 other_eth 10 mexican 11 occupation_blue_semi 12 education_less_hs 13 occupation_never 14 occupation_blue_high 15 occupation_white_semi 16 other_hispanic (69) EWAS in All-cause mortality: 253 exposure/behavior associations in survival Multivariate Cox (age, sex, income, education, race/ethnicity, occupation [in red]) FDR < 5% sociodemographics replicated factor Int J Epidem. 2013
  39. 39. Adjusted Hazard Ratio -log10(pvalue) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8 02468 1 2 3 4 5 67 1 Physical Activity 2 Does anyone smoke in home? 3 Cadmium 4 Cadmium, urine 5 Past smoker 6 Current smoker 7 trans-lycopene (11) 1 2 3 4 5 6 78 9 10 1112 13 14 1516 1 age (10 year increment) 2 SES_1 3 male 4 SES_0 5 black 6 SES_2 7 SES_3 8 education_hs 9 other_eth 10 mexican 11 occupation_blue_semi 12 education_less_hs 13 occupation_never 14 occupation_blue_high 15 occupation_white_semi 16 other_hispanic (69) EWAS (re)-identifies factors associated with all-cause mortality: Volcano plot of 200 associations age (10 years) income (quintile 2) income (quintile 1) male black income (quintile 3) any one smoke in home? Multivariate cox (age, sex, income, education, race/ethnicity, occupation [in red]) serum and urine cadmium [1 SD] past smoker? current smoker?serum lycopene [1SD] physical activity [low, moderate, high activity]* *derived from METs per activity and categorized by Health.gov guidelines R2 ~ 2%
  40. 40. What exposures modulate telomere length?
  41. 41. 452 associations in Telomere Length: Polychlorinated biphenyls associated with longer telomeres?! 0 1 2 3 4 −0.2 −0.1 0.0 0.1 0.2 effect size −log10(pvalue) PCBs FDR<5% Trunk Fat Alk. PhosCRP Cadmium Cadmium (urine)cigs per day retinyl stearate R2 ~ 1% VO2 Maxpulse rate shorter telomeres longer telomeres adjusted by age, age2, race, poverty, education, occupation median N=3000; N range: 300-7000 IJE, 2016
  42. 42. Samples exposed to PCBs associated with difference in genes implicated in telomere length GWAS? Expression differences for 24 GWAS implicated genes Queried the Gene Expression Omnibus for PCBs Affymetrix human arrays (GPL570) 7 gene expression experiments on humans 52 exposed; 14 unexposed Differential gene expression and a functional analysis of PCB-exposed children: Understanding disease and disorder development Sisir K. Dutta a, ⁎, Partha S. Mitra a,1 , Somiranjan Ghosh a,1 , Shizhu Zang a,1 , Dean Sonneborn b , Irva Hertz-Picciotto b , Tomas Trnovec c , Lubica Palkovicova c , Eva Sovcikova c , Svetlana Ghimbovschi d , Eric P. Hoffman d a Molecular Genetics Laboratory, Howard University, Washington, DC, USA b Department of Public Health Sciences, University of California Davis, Davis, CA, USA c Slovak Medical University, Bratislava, Slovak Republic d Center for Genetic Medicine, Children's National Medical Center, Washington, DC, USA a b s t r a c ta r t i c l e i n f o Article history: Received 20 December 2010 Accepted 10 July 2011 The goal of the present study is to understand the probable molecular mechanism of toxicities and the associated pathways related to observed pathophysiology in high PCB-exposed populations. We have performed a microarray-based differential gene expression analysis of children (mean age 46.1 months) of Environment International 40 (2012) 143–154 Contents lists available at ScienceDirect Environment International journal homepage: www.elsevier.com/locate/envint IJE, 2016
  43. 43. Suggestive, but need more N! 0 1 2 −0.50 −0.25 0.00 0.25 0.50 0.75 log(difference) −log10(pvalue) 1555203_s_at (SLC44A4) 1555203_s_at (MYNN) 224206_x_at (MYNN) Could PCBs influence expression of genes implicated in telomere length GWAS? myoneurin bladder, leukemia, colorectal cancer GWASs IJE, 2016
  44. 44. Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- proach (an analogy to genome-wide association stud- ies).Forexample,Wangetal4 screenedmorethan2000 chemicalsinserumtodiscoverendogenousexposuresas- sociated with risk for cardiovascular disease. Therearenotablehurdlesinanalyzing“big”environ- mental data. These same problems affect epidemiology of1-risk-factor-at-a-time,butinEWAStheirprevalencebe- comes more clearly manifest at large scale. When study- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets, may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observational data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of the multiple correlations also highlights the challenge thatinterveningtomodify1putativeriskfactoralsomay inadvertently affect multiple other correlated factors. Even when a seemingly simple intervention is tested in randomizedtrials(affectingasingleriskfactoramongthe manycorrelations),theinterventionisnotreallysimple. In essence what is tested are multiple perturbations of factors correlated with the one targeted for interven- VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion JAMA, 2014 JECH, 2014 Proc Symp Biocomp, 2015 How can we study the elusive environment in larger scale for biomedical discovery? Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observationa data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion High-throughputascertainmentofendogenousindicatorsofen- vironmentalexposurethatmayreflecttheexposomeincreasinglyat- tractattention,andtheirperformanceneedstobecarefullyevaluated. These include chemical detection of indicators of exposure through metabolomics, proteomics, and biosensors.7 Eventually, patterns of US federally funded gene expression experiment data be d itedinpublicrepositoriessuchastheGeneExpressionOmnibu repositoryhasbeeninstrumentalindevelopmentoftechnolo measurement of gene expression, data standardization, and ofdatafordiscovery.JustaswiththeGeneExpressionOmnib Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt Nutrition Examination Survey (NHANES) Participants, 2003-2004 A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Negative correlation Positive correl Infectious agents Pollutants Nutrients and vitamins Demographic attributes Eachcorrelationinterdependencyglobeincludes317environmentalexposures representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and thicknessofeachedgeindicatesthemagnitudeofthecorrelation. Opinion Viewpoint •bioinformatics to connect exposome with phenome •new ‘omics technologies to measure the exposome •dense correlations •reverse causality •confounding •(longitudinal) publicly available data
  45. 45. Interdependencies of the exposome: Correlation globes paint a complex view of exposure Red: positive ρ Blue: negative ρ thickness: |ρ| for each pair of E: Spearman ρ (575 factors: 81,937 correlations) permuted data to produce “null ρ” sought replication in > 1 cohort Pac Symp Biocomput. 2015 JECH. 2015
  46. 46. Red: positive ρ Blue: negative ρ thickness: |ρ| for each pair of E: Spearman ρ (575 factors: 81,937 correlations) Interdependencies of the exposome: Correlation globes paint a complex view of exposure permuted data to produce “null ρ” sought replication in > 1 cohort Pac Symp Biocomput. 2015 JECH. 2015 Effective number of variables: 500 (10% decrease)
  47. 47. Telomere Length All-cause mortality http://bit.ly/globebrowse Interdependencies of the exposome: Telomeres vs. all-cause mortality
  48. 48. Testing all associations systematically: Consideration of multiplicity of hypotheses and correlational web! Explicit in number of hypotheses tested False discovery rate; family-wise error rate; Report database size! Does my correlation matter? How does my new correlation compare to the family of correlations? 0.17 (e.g., carotene and diabetes) is average ρ much less than 0.17? greater? ρ JAMA 2014 JECH 2015
  49. 49. Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- proach (an analogy to genome-wide association stud- ies).Forexample,Wangetal4 screenedmorethan2000 chemicalsinserumtodiscoverendogenousexposuresas- sociated with risk for cardiovascular disease. Therearenotablehurdlesinanalyzing“big”environ- mental data. These same problems affect epidemiology of1-risk-factor-at-a-time,butinEWAStheirprevalencebe- comes more clearly manifest at large scale. When study- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets, may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observational data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of the multiple correlations also highlights the challenge thatinterveningtomodify1putativeriskfactoralsomay inadvertently affect multiple other correlated factors. Even when a seemingly simple intervention is tested in randomizedtrials(affectingasingleriskfactoramongthe manycorrelations),theinterventionisnotreallysimple. In essence what is tested are multiple perturbations of factors correlated with the one targeted for interven- VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion JAMA, 2014 JECH, 2014 Proc Symp Biocomp, 2015 How can we study the elusive environment in larger scale for biomedical discovery? Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observationa data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion High-throughputascertainmentofendogenousindicatorsofen- vironmentalexposurethatmayreflecttheexposomeincreasinglyat- tractattention,andtheirperformanceneedstobecarefullyevaluated. These include chemical detection of indicators of exposure through metabolomics, proteomics, and biosensors.7 Eventually, patterns of US federally funded gene expression experiment data be d itedinpublicrepositoriessuchastheGeneExpressionOmnibu repositoryhasbeeninstrumentalindevelopmentoftechnolo measurement of gene expression, data standardization, and ofdatafordiscovery.JustaswiththeGeneExpressionOmnib Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt Nutrition Examination Survey (NHANES) Participants, 2003-2004 A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Negative correlation Positive correl Infectious agents Pollutants Nutrients and vitamins Demographic attributes Eachcorrelationinterdependencyglobeincludes317environmentalexposures representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and thicknessofeachedgeindicatesthemagnitudeofthecorrelation. Opinion Viewpoint •bioinformatics to connect exposome with phenome •new ‘omics technologies to measure the exposome •dense correlations •reverse causality •confounding •(longitudinal) publicly available data
  50. 50. You can play with these data!
  51. 51. http://chiragjpgroup.org/exposome-analytics-course Nam Pho
  52. 52. You can use these data! http://chiragjpgroup.org/exposome-analytics-course Contact me for project ideas! @chiragjp chirag_patel@hms.harvard.edu
  53. 53. Connecting Environmental Exposure with Disease: Missing the “System” of Exposures? E+ E- diseased non- diseased ? Exposed to many things, but do not assess the multiplicity. Fragmented literature of associations. Challenge to discover E associated with disease.
  54. 54. Example of fragmentation: Is everything we eat associated with cancer? Schoenfeld and Ioannidis, AJCN 2012 50 random ingredients from Boston Cooking School Cookbook Any associated with cancer? FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie outliers are not shown (effect estimates .10). Of 50, 40 studied in cancer risk Weak statistical evidence: non-replicated inconsistent effects non-standardized
  55. 55. https://www.youtube.com/watch?v=0Rnq1NpHdmw
  56. 56. New ways of measuring P are here now! Can we use them to assess E (and G)?
  57. 57. physical activity monitors (fitbit) smart devices (iOS) personal E sensors (exposome band?!) propeller health
  58. 58. Now possible to consent thousands of people at the push of a button! http://researchkit.org
  59. 59. Monitoring fasting glucose is imperative for diabetics!
  60. 60. Possible to survey P (fasting glucose) of diabetics consented through ResearchKit? Adam Brown Stanley Shaw (MGH) Dennis Ausiello (MGH) http://bit.ly/glucosuccess
  61. 61. Does the high physical activity population have lower fasting glucose?: YES! mashing up 24K step counts with glucose (N=600)
  62. 62. Is step count on previous day associated with fasting glucose the next day?: YES! mashing up 24K step counts with glucose (N=600)
  63. 63. !😀 #
  64. 64. Age (years): 43.6 Male %: 80% Female %: 20% Race (%): White: 57% Black: 7% Hispanic: 11% Other: 25% Education (%): Some High School: 2% High School: 8% Some college: 20% 2-year college: 10% 4 year college: 26% Post-college: 32% http://bit.ly/glucosuccess Mean Years Diabetic: 7.8 GlucoSuccess reflects a unique population: must do more to get more involved! Comorbidities (CDC*) Stroke: 2% (0.7%) Heart Failure: 2% (1%) High Blood Pressure: 47% (57%) High Lipids: 36% (58%) Kidney Disease: 4% (0.2%*) Circulation problems: 8% (4%) Eye problems: 9% (17%*) *end-stage renal disease *visual impairment http://www.cdc.gov/diabetes Body Mass Index: 31 Hemoglobin A1C: 7.7
  65. 65. http://bit.ly/glucosuccess GlucoSuccess-like apps can enable longitudinal and dynamic surveillance of P However: population-level differences and generalizability
  66. 66. Rolando Acosta, Jr Shreyas Bhave Sivateja Tangirala Alan LeGoallec Danielle Rasooly RagGroup Team: 2 post-docs, 3 PhD, 2 MS, 1 HS, 2 visiting
  67. 67. Possible to discover new E using high-throughput data (exposome, medical claims, devices) to discover the role of E (and G) in P. −log10(pvalue) ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● acrylamide allergentest bacterialinfection cotinine diakyl dioxins furansdibenzofuran heavymetals hydrocarbons latex nutrientscarotenoid nutrientsminerals nutrientsvitaminA nutrientsvitaminB nutrientsvitaminC nutrientsvitaminD nutrientsvitaminE pcbs perchlorate pesticidesatrazine pesticideschlorophenol pesticidesorganochlorine pesticidesorganophosphate pesticidespyrethyroid phenols phthalates phytoestrogens polybrominatedethers polyflourochemicals viralinfection volatilecompounds 012 A Serum cotinine B Serum total mercury 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Infectious agents Pollutants Nutrients and vitamins Demographic attributes P = G + E
  68. 68. Harvard DBMI Isaac Kohane Susanne Churchill Stan Shaw Nathan Palmer Jenn Grandfield Sunny Alvear Michal Preminger Chirag J Patel chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org NIH Common Fund Big Data to Knowledge Acknowledgements RagGroup Chirag Lakhani Adam Brown Danielle Rasooly Nam Pho Jake Chung Alan LeGoallec Arjun Manrai Sivateja Tangirala Shreyas Bhave Rolando Acosta Dr. Edwin Traverso Aviles

×