SlideShare a Scribd company logo
1 of 5
Download to read offline
Statistical Foundations for the design/analysis and
interpretation of Molecular Biomarker studies
Athula Herath, PhD, MBCS, CITP, CEng
https://uk.linkedin.com/in/athulaherath
March 2006 (Printed October 2015)
Shortcomings of the Statistical Analysis of molecular profiling data
• Often are tempted to find the “smoking gun”, the one or few
most important genes/proteins/metabolites.
• The underlying biology is often lost.
• Often normal/faulty biological processes are manifested by
the artefacts of large number of biological entities (i.e.
multivariate) rather than one or few
– For example:
• Large number of malfunctioning genes set the stage for cardiovascular disease
• Almost 300 genes are involved in Asthma
• 140 faulty genes contributes to the problem of failing memory (Alzheimer's and other)
• It is therefore prudent to use biology to explore the effects
instead and form/test/validate/reform hypothesis
• An additional advantage is that we are building the biological
story simultaneously with the analysis
biomarkers
Statistical Foundation for Molecular Biomarker studies
Biological
processes
Study the association
between the environmental
factors and the disease of
individuals
 Create a list of factors
(hypothesis)
 Asses the effects of
these systematically
 Draw
conclusions/refine
Epidemiology
Clinical Samples
Molecular Profiling
(transcriptomics,
proteomics,
metabolomics etc.)
Traditional clinical chemistry analysis
Biologically motivated Data Analysis
• Hypothesis Free
– Independently discover and make
inferences from data.
• Hypothesis Driven
Establish a biological relationship
between entities (Form
Hypothesis)
Test
ConcludeRefine
Pragmatic and opportunistic approach.
Biology driven analysis, how?
1. Be pragmatic and be subject specific (e.g. breast cancer, Alzheimer's etc or even
narrower areas within wider subject areas) in establishing such active knowledge
repositories in step 5.
2. Filter and extract (using the keywords, synonyms etc) the appropriate molecular
entities and pathways from public and commercial, curated pathway databases (e.g.:
Entrez Gene, KEGG, GenMAPP, GO, UNIPROT, … ).
3. Collate all genetic polymorphisms data (OMIM, dbSNIP, HAPMAP) on the relevant
molecular entities,
4. Use 1,2, and 3 above as seeds and establish new relationships using literature –
literature mining tools.
5. Collate the results from stages 1-4 into a repository (denovo, focussed, targetted
disease oriented active knowledge repository)
6. Suitably parameterize the facts to form multivariate models (e.g: in a Bayesian
framework) to form an associated statistical model repository.
7. Construct a suitable inference engine to generate plausible hypothesis.
8. Use the hypothesis as an aid to design biomarker studies.
9. Integrate the experimental data from relevant molecular profiling experiments
(transcriptomics, proteomics and metabolomics etc).
10. Drive the statistical analysis (testing-and verification of hypothesis).

More Related Content

What's hot

RIN case studies in the life sciences: findings on data management
RIN case studies in the life sciences: findings on data managementRIN case studies in the life sciences: findings on data management
RIN case studies in the life sciences: findings on data managementResearch Information Network
 
What is data science
What is data scienceWhat is data science
What is data scienceJohn Spencer
 
Paid Undergrad Research Opportunity Deadline 3/22/15
Paid Undergrad Research Opportunity Deadline 3/22/15Paid Undergrad Research Opportunity Deadline 3/22/15
Paid Undergrad Research Opportunity Deadline 3/22/15Danielle N. Lee, PhD
 
Smart Subjects - Application Independent Subject Recommendations
Smart Subjects - Application Independent Subject RecommendationsSmart Subjects - Application Independent Subject Recommendations
Smart Subjects - Application Independent Subject Recommendationseby
 
St. Jude Children’s Research Hospital Unravels the Cancer Genome
St. Jude Children’s Research Hospital Unravels the Cancer GenomeSt. Jude Children’s Research Hospital Unravels the Cancer Genome
St. Jude Children’s Research Hospital Unravels the Cancer GenomeAurelio Galli
 
Diagnostic criteria and clinical guidelines standardization to automate case ...
Diagnostic criteria and clinical guidelines standardization to automate case ...Diagnostic criteria and clinical guidelines standardization to automate case ...
Diagnostic criteria and clinical guidelines standardization to automate case ...Melanie Courtot
 
Evidence Farming and Open Architecture
Evidence Farming and Open ArchitectureEvidence Farming and Open Architecture
Evidence Farming and Open ArchitectureIda Sim
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...David Peyruc
 
Evidencia de vida real en oncología
Evidencia de vida real en oncologíaEvidencia de vida real en oncología
Evidencia de vida real en oncologíaMauricio Lema
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
 
Training of the next generation data scientists
Training of the next generation data scientistsTraining of the next generation data scientists
Training of the next generation data scientistsLana Garmire PhD
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12Russ Altman
 
Oral Presentation
Oral PresentationOral Presentation
Oral PresentationAUT
 

What's hot (20)

RIN case studies in the life sciences: findings on data management
RIN case studies in the life sciences: findings on data managementRIN case studies in the life sciences: findings on data management
RIN case studies in the life sciences: findings on data management
 
What is data science
What is data scienceWhat is data science
What is data science
 
Paid Undergrad Research Opportunity Deadline 3/22/15
Paid Undergrad Research Opportunity Deadline 3/22/15Paid Undergrad Research Opportunity Deadline 3/22/15
Paid Undergrad Research Opportunity Deadline 3/22/15
 
Research Interest Oct2016
Research Interest Oct2016Research Interest Oct2016
Research Interest Oct2016
 
Smart Subjects - Application Independent Subject Recommendations
Smart Subjects - Application Independent Subject RecommendationsSmart Subjects - Application Independent Subject Recommendations
Smart Subjects - Application Independent Subject Recommendations
 
CV
CVCV
CV
 
Professor of Med.
Professor of Med.Professor of Med.
Professor of Med.
 
St. Jude Children’s Research Hospital Unravels the Cancer Genome
St. Jude Children’s Research Hospital Unravels the Cancer GenomeSt. Jude Children’s Research Hospital Unravels the Cancer Genome
St. Jude Children’s Research Hospital Unravels the Cancer Genome
 
Diagnostic criteria and clinical guidelines standardization to automate case ...
Diagnostic criteria and clinical guidelines standardization to automate case ...Diagnostic criteria and clinical guidelines standardization to automate case ...
Diagnostic criteria and clinical guidelines standardization to automate case ...
 
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...Effectiveness of New, Informationist-led Curriculum Changes at the College of...
Effectiveness of New, Informationist-led Curriculum Changes at the College of...
 
Evidence Farming and Open Architecture
Evidence Farming and Open ArchitectureEvidence Farming and Open Architecture
Evidence Farming and Open Architecture
 
Mammals
MammalsMammals
Mammals
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
 
qims_Li-webpage_111416
qims_Li-webpage_111416qims_Li-webpage_111416
qims_Li-webpage_111416
 
Evidencia de vida real en oncología
Evidencia de vida real en oncologíaEvidencia de vida real en oncología
Evidencia de vida real en oncología
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
 
Training of the next generation data scientists
Training of the next generation data scientistsTraining of the next generation data scientists
Training of the next generation data scientists
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Oral Presentation
Oral PresentationOral Presentation
Oral Presentation
 

Similar to biologydriven

演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪Beckett Hsieh
 
A Guide to Conducting a Meta-Analysis.pdf
A Guide to Conducting a Meta-Analysis.pdfA Guide to Conducting a Meta-Analysis.pdf
A Guide to Conducting a Meta-Analysis.pdfTina Gabel
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
A real life example to show the added value of the Phenotype Database (dbNP)....
A real life example to show the added value of the Phenotype Database (dbNP)....A real life example to show the added value of the Phenotype Database (dbNP)....
A real life example to show the added value of the Phenotype Database (dbNP)....Chris Evelo
 
Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell COUNTDOWN on NTDs
 
Meta analysis - qualitative research design
Meta analysis - qualitative research designMeta analysis - qualitative research design
Meta analysis - qualitative research designDinesh Selvam
 
HighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdf
HighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdfHighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdf
HighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdfsdfghj21
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
Meta analysis.pptx
Meta analysis.pptxMeta analysis.pptx
Meta analysis.pptxVishwasATL
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchJessica Minnier
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
Biostats2019 5
Biostats2019 5Biostats2019 5
Biostats2019 5daforerog
 
biostatistics-220223232107.pdf
biostatistics-220223232107.pdfbiostatistics-220223232107.pdf
biostatistics-220223232107.pdfBagalanaSteven
 
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docxeugeniadean34240
 
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...OECD Environment
 

Similar to biologydriven (20)

演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪
 
A Guide to Conducting a Meta-Analysis.pdf
A Guide to Conducting a Meta-Analysis.pdfA Guide to Conducting a Meta-Analysis.pdf
A Guide to Conducting a Meta-Analysis.pdf
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
A real life example to show the added value of the Phenotype Database (dbNP)....
A real life example to show the added value of the Phenotype Database (dbNP)....A real life example to show the added value of the Phenotype Database (dbNP)....
A real life example to show the added value of the Phenotype Database (dbNP)....
 
Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell
 
Meta analysis - qualitative research design
Meta analysis - qualitative research designMeta analysis - qualitative research design
Meta analysis - qualitative research design
 
HighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdf
HighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdfHighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdf
HighFidelity Simulation in Nursing Education for EndofLife Care Essay.pdf
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
Meta analysis.pptx
Meta analysis.pptxMeta analysis.pptx
Meta analysis.pptx
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Biostats2019 5
Biostats2019 5Biostats2019 5
Biostats2019 5
 
biostatistics-220223232107.pdf
biostatistics-220223232107.pdfbiostatistics-220223232107.pdf
biostatistics-220223232107.pdf
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
 
Izant openscience
Izant openscienceIzant openscience
Izant openscience
 
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 

biologydriven

  • 1. Statistical Foundations for the design/analysis and interpretation of Molecular Biomarker studies Athula Herath, PhD, MBCS, CITP, CEng https://uk.linkedin.com/in/athulaherath March 2006 (Printed October 2015)
  • 2. Shortcomings of the Statistical Analysis of molecular profiling data • Often are tempted to find the “smoking gun”, the one or few most important genes/proteins/metabolites. • The underlying biology is often lost. • Often normal/faulty biological processes are manifested by the artefacts of large number of biological entities (i.e. multivariate) rather than one or few – For example: • Large number of malfunctioning genes set the stage for cardiovascular disease • Almost 300 genes are involved in Asthma • 140 faulty genes contributes to the problem of failing memory (Alzheimer's and other) • It is therefore prudent to use biology to explore the effects instead and form/test/validate/reform hypothesis • An additional advantage is that we are building the biological story simultaneously with the analysis
  • 3. biomarkers Statistical Foundation for Molecular Biomarker studies Biological processes Study the association between the environmental factors and the disease of individuals  Create a list of factors (hypothesis)  Asses the effects of these systematically  Draw conclusions/refine Epidemiology Clinical Samples Molecular Profiling (transcriptomics, proteomics, metabolomics etc.) Traditional clinical chemistry analysis
  • 4. Biologically motivated Data Analysis • Hypothesis Free – Independently discover and make inferences from data. • Hypothesis Driven Establish a biological relationship between entities (Form Hypothesis) Test ConcludeRefine Pragmatic and opportunistic approach.
  • 5. Biology driven analysis, how? 1. Be pragmatic and be subject specific (e.g. breast cancer, Alzheimer's etc or even narrower areas within wider subject areas) in establishing such active knowledge repositories in step 5. 2. Filter and extract (using the keywords, synonyms etc) the appropriate molecular entities and pathways from public and commercial, curated pathway databases (e.g.: Entrez Gene, KEGG, GenMAPP, GO, UNIPROT, … ). 3. Collate all genetic polymorphisms data (OMIM, dbSNIP, HAPMAP) on the relevant molecular entities, 4. Use 1,2, and 3 above as seeds and establish new relationships using literature – literature mining tools. 5. Collate the results from stages 1-4 into a repository (denovo, focussed, targetted disease oriented active knowledge repository) 6. Suitably parameterize the facts to form multivariate models (e.g: in a Bayesian framework) to form an associated statistical model repository. 7. Construct a suitable inference engine to generate plausible hypothesis. 8. Use the hypothesis as an aid to design biomarker studies. 9. Integrate the experimental data from relevant molecular profiling experiments (transcriptomics, proteomics and metabolomics etc). 10. Drive the statistical analysis (testing-and verification of hypothesis).