SlideShare a Scribd company logo
1 of 30
Download to read offline
BioSMACK
a Linux Live CD for Analysis of
      Genome-Wide
                    Association
BioSMACK: a Linux Live CD for
             Analysis of GWA




             IEEE BIBM 2010 Workshop?
오송




광저우




      홍콩
대학


                           BGI




공항
                      숙소




     홍콩섬


                흥홈역
Northern Han

                                햅맵 중국인


     싱가폴-중국인
                             홍콩-중국인



121 samples - Chinese University of Hong Kong
BioSMACK: a Linux Live CD for
             Analysis of GWA




               What is the Genome-Wide
               Association Study?
23andMe 설립
                                                           KHapMap 완성 (2003~)
                                         HGP 완성
                        국립보건원
                        유전체센터                                               Sceience, Breakthrough of the year에
  국립보건원 유                 설립                                                Human Genetic Variation
  전체센터 입사
                                                                            KARE 프로젝트 시작
                                                                            벤터
                                1991 1996 2001 2006 2011
                                                                         왓슨, 얀 후안밍
                                1992 1997 2002 2007                      김성진 박사 whole genome 완성
                                                                         1000 Genomes Project 시작
Illumina 설립                     1993 1998 2003 2008                      PGP-10 데이터 공개


                                1994 1999 2004 2009                       Nature Genetics 한국인 GWAS 결과 발표
                                                                          Science, 한국인 이동경로
    HGP 시작           1990 1995 2000 2005 2010                             서울대, Nature에 한국인 whole genome 논문
                                                                          KAREBrowser 개발
                                                                          벤터 DTC 서비스 논문 발표
 Affymetrix/Illumina SNP 칩 개발


                                                                 904 published GWAS for 165 traits
                    HapMap 완성 (2002~)                                   게놈연구재단, 한국인 게놈 프로젝트 출범
                    최초의 GWAS-노인성 황반 변성, Science                     genomeunzipped등의 public personal genome 공개
                    Illumina, Infinium whole-genome genotyping
                    (100,000 markers)
1 Analysis millions of genotype data requires
more computing power and highly skilled
specialist for handling large data and series of
analysis
2  Various software (e.g. PLINK, Eigensoft,
STRUCTURE and SnpMatrix) have been developed
for GWAS
3 Researchers often encounter the problem in the
process of compiling/installing and configure the
environmental parameters and library dependency
고민해결??
BioSMACK: a Linux Live CD for
             Analysis of GWA




             What is the Linux Live CD?
• Linux is the free open source operating system
• Many GWA softwares support linux
• Linux live CD is bootable customized linux from
  CD/UBS flash drives
• Developer can makes linux live CD for their
 usage (e.g. biology, chemistry, physics, games)
• For biological data analysis - BioLinux, Open
 Discovery, GRIMP, BioConductorBuntu and PhyLIS
• GWAS methods are rapid development, there is a
 need for a Live CD focusing on GWAS
BioSMACK: a Linux Live CD for
             Analysis of GWA




                How implementation of
                BioSMACK?
•Based on Open-Source software (free to use,
 redistribute under GNU General Public License)
•Based on the Ubuntu Linux distribution (v5.5)
•Ubuntu Linux is the most popular Linux distribution
•Pre-compiled, installed and configured for GWA
 software
•Command line and JAVA Swing based GUI for GWA
 software execute
•User-manual and example data also included
•Calling genotype from genome-wide SNP chip
•Covert PLINK binary format from raw genotype data
•Detect the population stratification
•Association analysis using PLINK
•Estimate the genotype of SNPs that were not
 observed in GWAS (imputation)
•Meta-analysis in two-sample comparisons
•PLINK        •HTML Based
•SnpMatrix                  목차
•EIGENSTRAT
•STRUCTURE
•RMETA
•METAL
•IMPUTE
•MACH
                                 명령어 설명

                       예제 데이터 실행 명령어
BioSMACK: a Linux Live CD for
             Analysis of GWA




                How to install BioSMACK?
1   Download BioSMACK ISO image file (about 1GB
    size) - freely available at ksnp.cdc.go.kr/biosmack

2   Can be make CD/DVD from ISO image

    Can be make USB flash drives from ISO image

3   Installed on hard disk (erasing the previous
    operation system)

    Not installed on hard disk (boot from CD/USB
    flash drives without making changes to the
    underlying operating system)
BioSMACK: a Linux Live CD for
             Analysis of GWA




          Result and Future Works
1 Useful for educational purpose and simple analysis on
the fly without installation and configuration

2 Use BioSMACK on various kinds of laptops and
netbook in the 5th workshop of Asian Institute in
Statistical Genetics and Genomics

3 Fully functional research environment for GWAS can
be setting up on any computer within couple of hours
1 Cloud computing
computing using resources acquired on demand

2 Cluster computing
support parallel job with job scheduler (e.g. Sun
Grid Engine, Open PBS, Torque)

3 Parallel Software
High-performance, parallel, on demand for GWAS
will be support BioSMACK AMI (Amazon Machine
Image) - for cloud computing
will be support parallel job script - for HPC
BioSMACK - Linux Live CD for GWAS

More Related Content

What's hot

What's hot (9)

BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Library
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
 
Sequencing 2017
Sequencing 2017Sequencing 2017
Sequencing 2017
 
Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 

Viewers also liked

20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
Computer Science Club
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
Varsha Gayatonde
 

Viewers also liked (7)

20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talk
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
GWAS
GWASGWAS
GWAS
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
A Walk Through GWAS
A Walk Through GWASA Walk Through GWAS
A Walk Through GWAS
 

Similar to BioSMACK - Linux Live CD for GWAS

Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
Hong ChangBum
 
Alferov_resume_2016
Alferov_resume_2016Alferov_resume_2016
Alferov_resume_2016
Oleg Alferov
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a Service
Justin Johnson
 

Similar to BioSMACK - Linux Live CD for GWAS (20)

Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Dawn Field: the Genomics Standards Consortium (GSC)
Dawn Field: the Genomics Standards Consortium (GSC)Dawn Field: the Genomics Standards Consortium (GSC)
Dawn Field: the Genomics Standards Consortium (GSC)
 
History and scope in bioinformatics
History and scope in bioinformaticsHistory and scope in bioinformatics
History and scope in bioinformatics
 
DNA sequencer by kk sahu
DNA sequencer by kk sahu DNA sequencer by kk sahu
DNA sequencer by kk sahu
 
ngs.pptx
ngs.pptxngs.pptx
ngs.pptx
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An Overview
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final20211119 ntuh azure hpc workshop final
20211119 ntuh azure hpc workshop final
 
Metagenomics Over Lambdas: Update on the CAMERA Project
Metagenomics Over Lambdas: Update on the CAMERA ProjectMetagenomics Over Lambdas: Update on the CAMERA Project
Metagenomics Over Lambdas: Update on the CAMERA Project
 
Alferov_resume_2016
Alferov_resume_2016Alferov_resume_2016
Alferov_resume_2016
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
A Journey Through The History Of DNA Sequencing
A Journey Through The History Of DNA Sequencing A Journey Through The History Of DNA Sequencing
A Journey Through The History Of DNA Sequencing
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a Service
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 

More from Hong ChangBum

worldwide population
worldwide populationworldwide population
worldwide population
Hong ChangBum
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & Bioinformatics
Hong ChangBum
 
Perspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variationsPerspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variations
Hong ChangBum
 

More from Hong ChangBum (20)

Demo chapter3
Demo chapter3Demo chapter3
Demo chapter3
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
 
Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble Approach
 
통계유전학워크샵
통계유전학워크샵통계유전학워크샵
통계유전학워크샵
 
Genomics and BigData - case study
Genomics and BigData - case studyGenomics and BigData - case study
Genomics and BigData - case study
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Workshop 2011
Workshop 2011Workshop 2011
Workshop 2011
 
How to genome
How to genomeHow to genome
How to genome
 
worldwide population
worldwide populationworldwide population
worldwide population
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & Bioinformatics
 
Perspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variationsPerspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variations
 
Genome Browser based on Google Maps API
Genome Browser based on Google Maps APIGenome Browser based on Google Maps API
Genome Browser based on Google Maps API
 
Korean Database of Genomic Variants
Korean Database of Genomic VariantsKorean Database of Genomic Variants
Korean Database of Genomic Variants
 
Dt Ccompanieslist
Dt CcompanieslistDt Ccompanieslist
Dt Ccompanieslist
 
DTC Companies List
DTC Companies ListDTC Companies List
DTC Companies List
 
My Project
My ProjectMy Project
My Project
 
Genome Browser
Genome BrowserGenome Browser
Genome Browser
 
GenomeBrowser
GenomeBrowserGenomeBrowser
GenomeBrowser
 
Desire
DesireDesire
Desire
 

BioSMACK - Linux Live CD for GWAS

  • 1. BioSMACK a Linux Live CD for Analysis of Genome-Wide Association
  • 2.
  • 3.
  • 4.
  • 5. BioSMACK: a Linux Live CD for Analysis of GWA IEEE BIBM 2010 Workshop?
  • 6.
  • 8. 대학 BGI 공항 숙소 홍콩섬 흥홈역
  • 9.
  • 10.
  • 11. Northern Han 햅맵 중국인 싱가폴-중국인 홍콩-중국인 121 samples - Chinese University of Hong Kong
  • 12.
  • 13. BioSMACK: a Linux Live CD for Analysis of GWA What is the Genome-Wide Association Study?
  • 14. 23andMe 설립 KHapMap 완성 (2003~) HGP 완성 국립보건원 유전체센터 Sceience, Breakthrough of the year에 국립보건원 유 설립 Human Genetic Variation 전체센터 입사 KARE 프로젝트 시작 벤터 1991 1996 2001 2006 2011 왓슨, 얀 후안밍 1992 1997 2002 2007 김성진 박사 whole genome 완성 1000 Genomes Project 시작 Illumina 설립 1993 1998 2003 2008 PGP-10 데이터 공개 1994 1999 2004 2009 Nature Genetics 한국인 GWAS 결과 발표 Science, 한국인 이동경로 HGP 시작 1990 1995 2000 2005 2010 서울대, Nature에 한국인 whole genome 논문 KAREBrowser 개발 벤터 DTC 서비스 논문 발표 Affymetrix/Illumina SNP 칩 개발 904 published GWAS for 165 traits HapMap 완성 (2002~) 게놈연구재단, 한국인 게놈 프로젝트 출범 최초의 GWAS-노인성 황반 변성, Science genomeunzipped등의 public personal genome 공개 Illumina, Infinium whole-genome genotyping (100,000 markers)
  • 15. 1 Analysis millions of genotype data requires more computing power and highly skilled specialist for handling large data and series of analysis 2 Various software (e.g. PLINK, Eigensoft, STRUCTURE and SnpMatrix) have been developed for GWAS 3 Researchers often encounter the problem in the process of compiling/installing and configure the environmental parameters and library dependency
  • 17. BioSMACK: a Linux Live CD for Analysis of GWA What is the Linux Live CD?
  • 18. • Linux is the free open source operating system • Many GWA softwares support linux • Linux live CD is bootable customized linux from CD/UBS flash drives
  • 19. • Developer can makes linux live CD for their usage (e.g. biology, chemistry, physics, games) • For biological data analysis - BioLinux, Open Discovery, GRIMP, BioConductorBuntu and PhyLIS • GWAS methods are rapid development, there is a need for a Live CD focusing on GWAS
  • 20. BioSMACK: a Linux Live CD for Analysis of GWA How implementation of BioSMACK?
  • 21. •Based on Open-Source software (free to use, redistribute under GNU General Public License) •Based on the Ubuntu Linux distribution (v5.5) •Ubuntu Linux is the most popular Linux distribution •Pre-compiled, installed and configured for GWA software •Command line and JAVA Swing based GUI for GWA software execute •User-manual and example data also included
  • 22. •Calling genotype from genome-wide SNP chip •Covert PLINK binary format from raw genotype data •Detect the population stratification •Association analysis using PLINK •Estimate the genotype of SNPs that were not observed in GWAS (imputation) •Meta-analysis in two-sample comparisons
  • 23. •PLINK •HTML Based •SnpMatrix 목차 •EIGENSTRAT •STRUCTURE •RMETA •METAL •IMPUTE •MACH 명령어 설명 예제 데이터 실행 명령어
  • 24.
  • 25. BioSMACK: a Linux Live CD for Analysis of GWA How to install BioSMACK?
  • 26. 1 Download BioSMACK ISO image file (about 1GB size) - freely available at ksnp.cdc.go.kr/biosmack 2 Can be make CD/DVD from ISO image Can be make USB flash drives from ISO image 3 Installed on hard disk (erasing the previous operation system) Not installed on hard disk (boot from CD/USB flash drives without making changes to the underlying operating system)
  • 27. BioSMACK: a Linux Live CD for Analysis of GWA Result and Future Works
  • 28. 1 Useful for educational purpose and simple analysis on the fly without installation and configuration 2 Use BioSMACK on various kinds of laptops and netbook in the 5th workshop of Asian Institute in Statistical Genetics and Genomics 3 Fully functional research environment for GWAS can be setting up on any computer within couple of hours
  • 29. 1 Cloud computing computing using resources acquired on demand 2 Cluster computing support parallel job with job scheduler (e.g. Sun Grid Engine, Open PBS, Torque) 3 Parallel Software High-performance, parallel, on demand for GWAS will be support BioSMACK AMI (Amazon Machine Image) - for cloud computing will be support parallel job script - for HPC