SlideShare a Scribd company logo
1 of 14
Automation of Biological Data Analysis and Report Generation
Dmitry Grapov, PhD
Bots write the darndest things
http://www.latimes.com/local/lanow/earthquake-27-quake-strikes-near-westwood-
california-rdivor,0,3229825.story#axzz2wQwc82EK
•fill in the template (easy)
•human-guided automation
(e.g. Metaboanalyst,
intermediate)
•intelligent/reactive writing
(e.g. ~AI, advanced)
http://narrativescience.com/
Humans + Bots
Interaction:
•Bots and humans combine
in guided analyses
•Humans: make choices
(based on bot guides)
•Bots: automate!
Facilitate:
• workflow logging and
template creation
•reproducible results
Bot: Initial data and meta data
parsing and quality validation
(need: template input)
Human: data cleaning and
experimental design identification
(use: multiple choice, dynamic GUI)
Bot: instantiation of complex
workflows
Human: overview of bot
assumptions and results
Bot: Numerical and text output
generation
Humans + Bots write
darndender things?
Choose Your Own Life Adventure!
?
https://github.com/
dgrapov/AdventureR
Data Analysis Tasks
Visualization (how does it look?)
• histograms, density plots, box plots, line plots, scatter plots, networks, etc.
Statistical Analysis (what is statistically significant?)
• summary tables, ANOVA, FDR adjustment, power analysis, etc.
Exploration (what are the major patterns/trends?)
• clustering, PCA, ICA, etc.
Predictive Modeling (what explains my hypothesis?)
• mixed effects, partial least squares (O-/PLS/-DA), etc.
Network Analysis and Mapping (how are things related?)
• Functional analysis: pathway enrichment or overrepresentation
• Networks: biochemical, structural, mass spectral and empirical networks
• Mapping: projection of analysis results onto network
WCMC Data Analysis Reports ™
Statistical analysis
Clustering
PCA
O-PLS-DA
Biochemical enrichment
Network mapping
Input template: BinBase
•inference of experimental
goals from sample meta data
•mapping variables to external
databases
Tasks:
Report:
Tools:
Automation Challenges
Data cleaning and quality validation
•use: quality control samples; identify: precision/accuracy,
normalization, batch corrections; mitigate: outliers, missing
values, batch effects, etc.
Identification of experimental goals
•use: meta data, identify: main and accessory effects;
choose: statistics, multivariate tests and visualizations
Integration of multiple tasks to evolve robust analyses
•tasks: statistics, multivariate, functional, networks, database
mapping, etc
Data analysis report generation
•use: R, Latex, markdown
?
Challenges to automated
metabolite ID mapping
Stereochemistry?
Search: catechin
Best Match:
Catechin
Biologically relevant:
D-catechin
Synonyms?
Search: UDP GlcNAc
FAIL: UDP GlcNac
PASS: UDP-GlcNac
Strategies for automated
metabolite ID mapping (from synonym)
#1: CTS+ #2: Web query #3: Curated DB
•Use CTS to translate
from synonyms to KEGG
(KID) and PubChem (CID)
•Use KEGGREST and
PUG to filter and choose
most appropriate IDs
•Use fuzzy matching and
word similarity metrics
(e.g. Damerau–
Levenshtein distance)
•Use KEGGREST +
PubChem PUG to
translate synonyms to
IDs
•For KEGG ID:
synonym  SID  KID
•Generate a curated DB
for KEGG and CID
translations +
•Include InChI Keys
•Map to other DBs
•Allow fuzzy matching
on synonyms
•e.g. IDEOM
http://bioinformatics.oxfordjournals.org/content
/early/2012/02/04/bioinformatics.bts069
Interactive Analysis and
Report Generation
knitr (http://yihui.name/knitr/)
Analysis Report Generation
•Analysis on rails or open sandbox
•Humans facilitate robust results generation + Bots ensure reproduction
•Generation of Methods and Results should be automateable
Devium 2.0
Human-guided automated data
analysis and report generator
Human-guided automation could help
ensure robust results by making choices
which are otherwise difficult to automate.
https://github.com/dgrapov/DeviumWeb
MetaMapR
Linking data analysis and biology
https://github.com/dgrapov/MetaMapR
Integration of complex work flows is key to automation.
+ Workflows for complex experiments (e.g. time-course)
+ Biochemical functional analysis (pathway enrichment)
+ GUI for report generation (Devium 2.0)
+ Integrate multi-’Omic’ data sets (MetaMapR 2.0)
+ Scientific literature mining (RapportR)
+ Interactive plots and networks (JavaScript)
Future Goals
dgrapov@ucdavis.edu
metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154

More Related Content

What's hot

High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldDmitry Grapov
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsDmitry Grapov
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Dmitry Grapov
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration StrategiesDmitry Grapov
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)Dmitry Grapov
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisUC Davis
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Dmitry Grapov
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data ebrahim_bagheri
 
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics SoftwareHarnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Softwarejatwood3
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 

What's hot (20)

High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
Metabolomic data analysis and visualization tools
Metabolomic data analysis and visualization toolsMetabolomic data analysis and visualization tools
Metabolomic data analysis and visualization tools
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Data analysis workflows part 2 2015
Data analysis workflows part 2 2015Data analysis workflows part 2 2015
Data analysis workflows part 2 2015
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
0 introduction
0  introduction0  introduction
0 introduction
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
3 data normalization (2014 lab tutorial)
3  data normalization (2014 lab tutorial)3  data normalization (2014 lab tutorial)
3 data normalization (2014 lab tutorial)
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data
 
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics SoftwareHarnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 

Viewers also liked

6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
A Primer for Your Next Data Science Proof of Concept on the Cloud
A Primer for Your Next Data Science Proof of Concept on the CloudA Primer for Your Next Data Science Proof of Concept on the Cloud
A Primer for Your Next Data Science Proof of Concept on the CloudAlton Alexander
 
Pragmatic steps to implement big data analytics
Pragmatic steps to implement big data analyticsPragmatic steps to implement big data analytics
Pragmatic steps to implement big data analyticsAlton Alexander
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextDmitry Grapov
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningDmitry Grapov
 

Viewers also liked (10)

6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
A Primer for Your Next Data Science Proof of Concept on the Cloud
A Primer for Your Next Data Science Proof of Concept on the CloudA Primer for Your Next Data Science Proof of Concept on the Cloud
A Primer for Your Next Data Science Proof of Concept on the Cloud
 
Pragmatic steps to implement big data analytics
Pragmatic steps to implement big data analyticsPragmatic steps to implement big data analytics
Pragmatic steps to implement big data analytics
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with Context
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similar to Automation of (Biological) Data Analysis and Report Generation

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Transversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick GraceTransversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick GraceDevOpsDays Baltimore
 
Transversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick GraceTransversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick GraceDevOpsDays Baltimore
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerMongoDB
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriFlink Forward
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data QualityVera Ekimenko
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...ChemAxon
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 

Similar to Automation of (Biological) Data Analysis and Report Generation (20)

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Transversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick GraceTransversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick Grace
 
Transversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick GraceTransversal Delivery Pipeline by Mike Nescot and Nick Grace
Transversal Delivery Pipeline by Mike Nescot and Nick Grace
 
Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision Maker
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data Quality
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
 
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
 
Venkata brundavanam 2020
Venkata brundavanam 2020Venkata brundavanam 2020
Venkata brundavanam 2020
 

More from Dmitry Grapov

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideDmitry Grapov
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 courseDmitry Grapov
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Dmitry Grapov
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisDmitry Grapov
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014Dmitry Grapov
 

More from Dmitry Grapov (7)

R programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s GuideR programming for Data Science - A Beginner’s Guide
R programming for Data Science - A Beginner’s Guide
 
Network mapping 101 course
Network mapping 101 courseNetwork mapping 101 course
Network mapping 101 course
 
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
 
Dmitry Grapov Resume and CV
Dmitry Grapov Resume and CVDmitry Grapov Resume and CV
Dmitry Grapov Resume and CV
 
Machine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network AnalysisMachine Learning Powered Metabolomic Network Analysis
Machine Learning Powered Metabolomic Network Analysis
 
Modeling poster
Modeling posterModeling poster
Modeling poster
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 

Recently uploaded

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 

Recently uploaded (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

Automation of (Biological) Data Analysis and Report Generation

  • 1. Automation of Biological Data Analysis and Report Generation Dmitry Grapov, PhD
  • 2. Bots write the darndest things http://www.latimes.com/local/lanow/earthquake-27-quake-strikes-near-westwood- california-rdivor,0,3229825.story#axzz2wQwc82EK •fill in the template (easy) •human-guided automation (e.g. Metaboanalyst, intermediate) •intelligent/reactive writing (e.g. ~AI, advanced) http://narrativescience.com/
  • 3. Humans + Bots Interaction: •Bots and humans combine in guided analyses •Humans: make choices (based on bot guides) •Bots: automate! Facilitate: • workflow logging and template creation •reproducible results Bot: Initial data and meta data parsing and quality validation (need: template input) Human: data cleaning and experimental design identification (use: multiple choice, dynamic GUI) Bot: instantiation of complex workflows Human: overview of bot assumptions and results Bot: Numerical and text output generation
  • 4. Humans + Bots write darndender things? Choose Your Own Life Adventure! ? https://github.com/ dgrapov/AdventureR
  • 5. Data Analysis Tasks Visualization (how does it look?) • histograms, density plots, box plots, line plots, scatter plots, networks, etc. Statistical Analysis (what is statistically significant?) • summary tables, ANOVA, FDR adjustment, power analysis, etc. Exploration (what are the major patterns/trends?) • clustering, PCA, ICA, etc. Predictive Modeling (what explains my hypothesis?) • mixed effects, partial least squares (O-/PLS/-DA), etc. Network Analysis and Mapping (how are things related?) • Functional analysis: pathway enrichment or overrepresentation • Networks: biochemical, structural, mass spectral and empirical networks • Mapping: projection of analysis results onto network
  • 6. WCMC Data Analysis Reports ™ Statistical analysis Clustering PCA O-PLS-DA Biochemical enrichment Network mapping Input template: BinBase •inference of experimental goals from sample meta data •mapping variables to external databases Tasks: Report: Tools:
  • 7. Automation Challenges Data cleaning and quality validation •use: quality control samples; identify: precision/accuracy, normalization, batch corrections; mitigate: outliers, missing values, batch effects, etc. Identification of experimental goals •use: meta data, identify: main and accessory effects; choose: statistics, multivariate tests and visualizations Integration of multiple tasks to evolve robust analyses •tasks: statistics, multivariate, functional, networks, database mapping, etc Data analysis report generation •use: R, Latex, markdown ?
  • 8. Challenges to automated metabolite ID mapping Stereochemistry? Search: catechin Best Match: Catechin Biologically relevant: D-catechin Synonyms? Search: UDP GlcNAc FAIL: UDP GlcNac PASS: UDP-GlcNac
  • 9. Strategies for automated metabolite ID mapping (from synonym) #1: CTS+ #2: Web query #3: Curated DB •Use CTS to translate from synonyms to KEGG (KID) and PubChem (CID) •Use KEGGREST and PUG to filter and choose most appropriate IDs •Use fuzzy matching and word similarity metrics (e.g. Damerau– Levenshtein distance) •Use KEGGREST + PubChem PUG to translate synonyms to IDs •For KEGG ID: synonym  SID  KID •Generate a curated DB for KEGG and CID translations + •Include InChI Keys •Map to other DBs •Allow fuzzy matching on synonyms •e.g. IDEOM http://bioinformatics.oxfordjournals.org/content /early/2012/02/04/bioinformatics.bts069
  • 10. Interactive Analysis and Report Generation knitr (http://yihui.name/knitr/) Analysis Report Generation •Analysis on rails or open sandbox •Humans facilitate robust results generation + Bots ensure reproduction •Generation of Methods and Results should be automateable
  • 11. Devium 2.0 Human-guided automated data analysis and report generator Human-guided automation could help ensure robust results by making choices which are otherwise difficult to automate. https://github.com/dgrapov/DeviumWeb
  • 12. MetaMapR Linking data analysis and biology https://github.com/dgrapov/MetaMapR Integration of complex work flows is key to automation.
  • 13. + Workflows for complex experiments (e.g. time-course) + Biochemical functional analysis (pathway enrichment) + GUI for report generation (Devium 2.0) + Integrate multi-’Omic’ data sets (MetaMapR 2.0) + Scientific literature mining (RapportR) + Interactive plots and networks (JavaScript) Future Goals
  • 14. dgrapov@ucdavis.edu metabolomics.ucdavis.edu This research was supported in part by NIH 1 U24 DK097154