SlideShare a Scribd company logo
1 of 35
Download to read offline
Reproducible Research(1)
문건웅
2017-10-11
1 / 35
내 용
Replication ? Reproducible Research ?
Reproducible research using RStudio
2 / 35
3 / 35
Replication
Replication of findings and conducting studies with independent
Investigators
Data
Analytic Methods
Laboratories
Instruments
Replication is particularly important in studies that can impact broad
policy or regulatory decisions
4 / 35
What’s Wrong with Replication?
Some studies cannot be replicated
No time, opportunistic : 장기간 연구
No money : 대규모 무작위 - 대조군 연구
Unique
대규모 , 장기간 연구가 아닌 경우
replication이 가능할까?
5 / 35
6 / 35
Repeatability of published microarray gene
expression analyses
Selected articles published in Nature Genetics between January 2005 and
December 2006 that had used profiling with microarrays
Of the 56 items retrieved electronically, 20 articles were considered
potentially eligible for the project
The four teams were from
University of Alabama at Birmingham(UAB)
Stanford/Dana-Farber(SD)
London(L)
Ioannina/Trento(IT)
Each team was comprised of 3-6 scientists who worked together to
evaluate each article.
7 / 35
Results
Result could be reproduced : n=2
Reproduced with discrepancy : n=6
Could not be reproduced : n=10
No data n=4 (no data n=2, subset n=1, no reporter data n=1)
Confusion over matching of data to analysis (n=2)
Specialized software required and not available (n=1)
Raw data available but could not be processed (n=2)
8 / 35
9 / 35
How Can We Bridge the Gap
10 / 35
How Can We Bridge the Gap
11 / 35
Reproducible Research
data and the computer code used to analyze the data be made available
to others
attainable minimum reproducibility standard
fill the gap between full replication of a study and no replication
12 / 35
Why Do We Need Reproducible Research?
New technologies increasing data collection throughput; data are more
complex and extremely high dimensional
Existing databases can be merged into new “megadatabases”
Computing power is greatly increased, allowing more sophisticated
analyses
For every field “X” there is a field “Computational X”
13 / 35
Research Pipeline
14 / 35
Research Pipeline
15 / 35
Recent Developments in Reproducible Research
16 / 35
Recent Developments in Reproducible Research
17 / 35
Recent Developments in Reproducible Research
18 / 35
Omics ??
genomics
transcriptomics
proteomics
metabolomics
19 / 35
The IOM Report
In the Discovery/Test Validation stage of omics-based tests:
Data/metadata used to develop test should be made publicly available
The computer code and fully specified computational procedures used
for development of the candidate omics-based test should be made
sustainably available
“Ideally, the computer code that is released will encompass all of the
steps of computational analysis, including all data preprocessing steps,
that have been described in this chapter. All aspects of the analysis need
to be transparently reported.”
20 / 35
Reproducible research
Literate Programing (문학적 프로그래밍)
Reproducible Research (재현가능한 연구)
21 / 35
When issues of reproducibility arise
"Remember that microarray analysis you did six months ago?
We ran a few more arrays.
Can you add them to the project and repeat the same analysis?"
"The statistical analyst who looked at the data I generated previously is no
longer available.
Can you get someone else to analyze my new data set using the same
methods (and thus producing a report I can expect to understand)"
"Please write/edit the methods sections for the abstract/paper/grant
proposal I am submitting based on the analysis you did several months
ago."
22 / 35
Typical work ow of many research projects
First have an idea
e.g. stopping distance correlate with speed ?
23 / 35
1. Prepare data(Excel/Numbers)
24 / 35
2. Do Some analysis(R/SPSS/SAS)
25 / 35
All results(figures, tables) manually
imported to Word
3. Write a report/paper(Word?Pages)
26 / 35
This work ow is BROKEN
1. Collect and manage data(EXCEL)
2. Analysis (R/SPSS)
3. Writeup(WORD)
27 / 35
What analysis is behind this
figure? Did you account for
[ooo] in the analysis?
What dataset was used (e.g.
final vs preliminary dataset)?
Oops, there is an error in the
data. Can you repeat the
analysis? And update
figures/tables in Word!
As a coauthor/reader, I'd like to
see the whole research process
(how you arrived to that
conclusion), rather than cooked
manuscript with inserted
tables/figures.
Problems brought by the broken work ow
28 / 35
RStudio allows us to x the disconnect
Integrating
Data management
Data analysis
Writing up results
in a single dynamic document
Reproducible research !!
29 / 35
In literate programming, an
analytical document is
composed of a descriptive
narrative “woven” together with
software code and computed
results.
Advantages
A single document both
describes and performs the
analysis
Enforces reproducibility
Let's make our project reproducible
30 / 35
If error
If spotting eror in data, or using different dataset...
make changes in Rmarkdown and report will update automatically
31 / 35
Data management fully
documented (no more manual
changes in Excel!)
Analysis fully documented
Automated reports
can publish via Rpubs.com :
http://www.rpubs.com/cardiomoon/19541
can share the project
So... Main Advantages
32 / 35
Summary
Reproducible research is important as a minimum standard, particularly
for studies that are difficult to replicate
Infrastructure is needed for creating and distributing reproducible
documents, beyond what is currently available
There is a growing number of tools for creating reproducible documents
33 / 35
참고자료
의학논문작성을 위한 R통계와 그래프
R과 knitr를 활용한 데이타 연동형 문서 만들기
34 / 35
Online Resources
www.rstudio.com
Web-R.org
온라인교육프로그램
Coursera : Reproducible Research (Jones Hopkins University)
35 / 35

More Related Content

What's hot

06 quantitative data processing
06 quantitative data processing06 quantitative data processing
06 quantitative data processingKanagaraj Easwaran
 
Business Research Methods Chap008
Business Research Methods Chap008Business Research Methods Chap008
Business Research Methods Chap008Mazhar Masood
 
Data Management Lab: Session 3 Data Entry Best Practices
Data Management Lab: Session 3 Data Entry Best PracticesData Management Lab: Session 3 Data Entry Best Practices
Data Management Lab: Session 3 Data Entry Best PracticesIUPUI
 
Business Research Methods Chap017
Business Research Methods Chap017Business Research Methods Chap017
Business Research Methods Chap017Mazhar Masood
 
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...arx-deidentifier
 
Data warehouse 14 data reconciliation tools
Data warehouse 14 data reconciliation toolsData warehouse 14 data reconciliation tools
Data warehouse 14 data reconciliation toolsVaibhav Khanna
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Datafeiwin
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineeringMD TOUFIQ HASAN ANIK
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2Luis Borbon
 
How Bird Atlas Data is getting used
How Bird Atlas Data is getting used How Bird Atlas Data is getting used
How Bird Atlas Data is getting used Praveen Jayadevan
 
Change Up Your 2016 Election Coverage. Create a Computational Campaign.
Change Up Your 2016 Election Coverage. Create a Computational Campaign.Change Up Your 2016 Election Coverage. Create a Computational Campaign.
Change Up Your 2016 Election Coverage. Create a Computational Campaign.Online News Association
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entitykartik179
 
rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...Leo Lahti
 
International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)ijcsa
 
International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)ijcsa
 

What's hot (20)

06 quantitative data processing
06 quantitative data processing06 quantitative data processing
06 quantitative data processing
 
Business Research Methods Chap008
Business Research Methods Chap008Business Research Methods Chap008
Business Research Methods Chap008
 
Diploma_Thesis_Abstract
Diploma_Thesis_AbstractDiploma_Thesis_Abstract
Diploma_Thesis_Abstract
 
20090813MEETING
20090813MEETING20090813MEETING
20090813MEETING
 
Data Management Lab: Session 3 Data Entry Best Practices
Data Management Lab: Session 3 Data Entry Best PracticesData Management Lab: Session 3 Data Entry Best Practices
Data Management Lab: Session 3 Data Entry Best Practices
 
Business Research Methods Chap017
Business Research Methods Chap017Business Research Methods Chap017
Business Research Methods Chap017
 
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
A Tool for Optimizing De-Identified Health Data for Use in Statistical Classi...
 
Data warehouse 14 data reconciliation tools
Data warehouse 14 data reconciliation toolsData warehouse 14 data reconciliation tools
Data warehouse 14 data reconciliation tools
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Data
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineering
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
 
How Bird Atlas Data is getting used
How Bird Atlas Data is getting used How Bird Atlas Data is getting used
How Bird Atlas Data is getting used
 
Input modeling
Input modelingInput modeling
Input modeling
 
Change Up Your 2016 Election Coverage. Create a Computational Campaign.
Change Up Your 2016 Election Coverage. Create a Computational Campaign.Change Up Your 2016 Election Coverage. Create a Computational Campaign.
Change Up Your 2016 Election Coverage. Create a Computational Campaign.
 
Ei u4
Ei u4Ei u4
Ei u4
 
An experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithmsAn experimental comparison of globally-optimal data de-identification algorithms
An experimental comparison of globally-optimal data de-identification algorithms
 
IRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entityIRE2014 Filtering Tweets Related to an entity
IRE2014 Filtering Tweets Related to an entity
 
rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...
 
International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)
 
International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)International Journal on Computational Science & Applications (IJCSA)
International Journal on Computational Science & Applications (IJCSA)
 

Similar to Reproducible research(1)

Role of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRole of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRKavithamani
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...Adel Sabour
 
softwares in public health
softwares in public healthsoftwares in public health
softwares in public healthPragyan Parija
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...Steffan Stringer
 
Spss and software Application
Spss and software ApplicationSpss and software Application
Spss and software ApplicationAshok Pandey
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open DataBlerina Spahiu
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
 
Large Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a HaystackLarge Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a HaystackMarcus Botacin
 
Modellbildung, Berechnung und Simulation in Forschung und Lehre
Modellbildung, Berechnung und Simulation in Forschung und LehreModellbildung, Berechnung und Simulation in Forschung und Lehre
Modellbildung, Berechnung und Simulation in Forschung und LehreJoachim Schlosser
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightUniversity Medicine Greifswald
 
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptxQUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptxRheaannCaparas1
 
Reproducibility in computer-assisted research
Reproducibility in computer-assisted researchReproducibility in computer-assisted research
Reproducibility in computer-assisted researchkhinsen
 
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptxData Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptxcharlslabarda
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Paolo Missier
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396Nida Rashid
 

Similar to Reproducible research(1) (20)

Role of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRole of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data Analysis
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
 
softwares in public health
softwares in public healthsoftwares in public health
softwares in public health
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...Accelerating the production of safety summary and clinical safety reports - a...
Accelerating the production of safety summary and clinical safety reports - a...
 
Spss and software Application
Spss and software ApplicationSpss and software Application
Spss and software Application
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
Large Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a HaystackLarge Scale Studies: Malware Needles in a Haystack
Large Scale Studies: Malware Needles in a Haystack
 
Modellbildung, Berechnung und Simulation in Forschung und Lehre
Modellbildung, Berechnung und Simulation in Forschung und LehreModellbildung, Berechnung und Simulation in Forschung und Lehre
Modellbildung, Berechnung und Simulation in Forschung und Lehre
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
 
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptxQUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
 
Reproducibility in computer-assisted research
Reproducibility in computer-assisted researchReproducibility in computer-assisted research
Reproducibility in computer-assisted research
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Spss and software
Spss and softwareSpss and software
Spss and software
 
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptxData Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396
 

Recently uploaded

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 

Recently uploaded (20)

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

Reproducible research(1)

  • 2. 내 용 Replication ? Reproducible Research ? Reproducible research using RStudio 2 / 35
  • 4. Replication Replication of findings and conducting studies with independent Investigators Data Analytic Methods Laboratories Instruments Replication is particularly important in studies that can impact broad policy or regulatory decisions 4 / 35
  • 5. What’s Wrong with Replication? Some studies cannot be replicated No time, opportunistic : 장기간 연구 No money : 대규모 무작위 - 대조군 연구 Unique 대규모 , 장기간 연구가 아닌 경우 replication이 가능할까? 5 / 35
  • 7. Repeatability of published microarray gene expression analyses Selected articles published in Nature Genetics between January 2005 and December 2006 that had used profiling with microarrays Of the 56 items retrieved electronically, 20 articles were considered potentially eligible for the project The four teams were from University of Alabama at Birmingham(UAB) Stanford/Dana-Farber(SD) London(L) Ioannina/Trento(IT) Each team was comprised of 3-6 scientists who worked together to evaluate each article. 7 / 35
  • 8. Results Result could be reproduced : n=2 Reproduced with discrepancy : n=6 Could not be reproduced : n=10 No data n=4 (no data n=2, subset n=1, no reporter data n=1) Confusion over matching of data to analysis (n=2) Specialized software required and not available (n=1) Raw data available but could not be processed (n=2) 8 / 35
  • 10. How Can We Bridge the Gap 10 / 35
  • 11. How Can We Bridge the Gap 11 / 35
  • 12. Reproducible Research data and the computer code used to analyze the data be made available to others attainable minimum reproducibility standard fill the gap between full replication of a study and no replication 12 / 35
  • 13. Why Do We Need Reproducible Research? New technologies increasing data collection throughput; data are more complex and extremely high dimensional Existing databases can be merged into new “megadatabases” Computing power is greatly increased, allowing more sophisticated analyses For every field “X” there is a field “Computational X” 13 / 35
  • 16. Recent Developments in Reproducible Research 16 / 35
  • 17. Recent Developments in Reproducible Research 17 / 35
  • 18. Recent Developments in Reproducible Research 18 / 35
  • 20. The IOM Report In the Discovery/Test Validation stage of omics-based tests: Data/metadata used to develop test should be made publicly available The computer code and fully specified computational procedures used for development of the candidate omics-based test should be made sustainably available “Ideally, the computer code that is released will encompass all of the steps of computational analysis, including all data preprocessing steps, that have been described in this chapter. All aspects of the analysis need to be transparently reported.” 20 / 35
  • 21. Reproducible research Literate Programing (문학적 프로그래밍) Reproducible Research (재현가능한 연구) 21 / 35
  • 22. When issues of reproducibility arise "Remember that microarray analysis you did six months ago? We ran a few more arrays. Can you add them to the project and repeat the same analysis?" "The statistical analyst who looked at the data I generated previously is no longer available. Can you get someone else to analyze my new data set using the same methods (and thus producing a report I can expect to understand)" "Please write/edit the methods sections for the abstract/paper/grant proposal I am submitting based on the analysis you did several months ago." 22 / 35
  • 23. Typical work ow of many research projects First have an idea e.g. stopping distance correlate with speed ? 23 / 35
  • 25. 2. Do Some analysis(R/SPSS/SAS) 25 / 35
  • 26. All results(figures, tables) manually imported to Word 3. Write a report/paper(Word?Pages) 26 / 35
  • 27. This work ow is BROKEN 1. Collect and manage data(EXCEL) 2. Analysis (R/SPSS) 3. Writeup(WORD) 27 / 35
  • 28. What analysis is behind this figure? Did you account for [ooo] in the analysis? What dataset was used (e.g. final vs preliminary dataset)? Oops, there is an error in the data. Can you repeat the analysis? And update figures/tables in Word! As a coauthor/reader, I'd like to see the whole research process (how you arrived to that conclusion), rather than cooked manuscript with inserted tables/figures. Problems brought by the broken work ow 28 / 35
  • 29. RStudio allows us to x the disconnect Integrating Data management Data analysis Writing up results in a single dynamic document Reproducible research !! 29 / 35
  • 30. In literate programming, an analytical document is composed of a descriptive narrative “woven” together with software code and computed results. Advantages A single document both describes and performs the analysis Enforces reproducibility Let's make our project reproducible 30 / 35
  • 31. If error If spotting eror in data, or using different dataset... make changes in Rmarkdown and report will update automatically 31 / 35
  • 32. Data management fully documented (no more manual changes in Excel!) Analysis fully documented Automated reports can publish via Rpubs.com : http://www.rpubs.com/cardiomoon/19541 can share the project So... Main Advantages 32 / 35
  • 33. Summary Reproducible research is important as a minimum standard, particularly for studies that are difficult to replicate Infrastructure is needed for creating and distributing reproducible documents, beyond what is currently available There is a growing number of tools for creating reproducible documents 33 / 35
  • 34. 참고자료 의학논문작성을 위한 R통계와 그래프 R과 knitr를 활용한 데이타 연동형 문서 만들기 34 / 35
  • 35. Online Resources www.rstudio.com Web-R.org 온라인교육프로그램 Coursera : Reproducible Research (Jones Hopkins University) 35 / 35