SlideShare a Scribd company logo
1 of 9
Download to read offline
The	Genome	Assembly	
Problem
a	lightning	talk
Mark	Chang
The	Genome	Assembly	Problem
ACCTCAGAACCCCGCAGTCACGTAGCGTTTGTGGGTACCTCGTGTCTAGT
ACCTCAGAACCCCGCAGTCACGTAGCGTTTGTGGGTACCTCGTGTCTAGT
Fragmented	and	Sequenced
Finding	Overlaps
Recunstruction
CGTAGCGTTTGTGGGTACCTCAGAACCC
AACCCCGCAGTCACGTAG GTGGGTACCTCGTG
TCGTGTCTAGT
Genome
ACCTCAGAACCC
AACCCCGCAGTCACGTAG
GTGGGTACCTCGTGCGTAGCGTTTGTGGGT
TCGTGTCTAGT
Short-reads
Finding	Overlaps
• It	is	very	hard	to	find	the	overlaps	between	millions	of	short-reads
ACCTCAGAACCC
AACCCCGCAGTCACGTAG
GTGGGTACCTCGTGCGTAGCGTTTGTGGGT
TCGTGTCTAGT
Finding	Overlaps
• Using	de	Bruijn Graphs
AAB
CDC
ABC
BCC
CCD
CDA
DCC
AABCCDCCDA
graph	traversal
AABCCDCCDA
convert	to	de	Bruijn Graph
Using	de	Bruijn Graphs
• Convert	the	short-reads	into	k-mers
ACCTCAGAACCC AACCCCGCAGTCACGTAG
GAACC
AGAAC
CAGAA
ACCTC
CCTCA
CTCAG
TCAGA
AACCC
AACCC
ACGTA
CACGT
GCAGT
CGCAG
CCGCA
CCCGC
ACCCC
CCCCG
CGTAG
Using	de	Bruijn Graphs
• Build	de	bruijn graph	from	k-mers
GAACC
AGAAC
CAGAA
ACCTC
CCTCA
CTCAG
TCAGA
AACCC
GAACC AGAAC CAGAA
ACCTC CCTCA CTCAG TCAGA
AACCC
Using	de	Bruijn Graphs
• Build	de	bruijn graph	from	k-mers
GAACC AGAAC CAGAA
ACCTC CCTCA CTCAG TCAGA
AACCC
AACCC
ACGTA
CACGT
GCAGT
CGCAG
CCGCA
CCCGC
ACCCC
CCCCG
CGTAG
ACGTA CACGT GCAGT CGCAG
CCGCACCCGCACCCC CCCCG
CGTAG
Using	de	Bruijn Graphs
• Graph	traversal
ACCTCAGAACCCCGCAGTCACGTAG
GAACC AGAAC CAGAA
ACCTC CCTCA CTCAG TCAGA
AACCC
ACGTA CACGT GCAGT CGCAG
CCGCACCCGCACCCC CCCCG
CGTAG
Reference
• The	Genome	Assembly	Problem
• http://homolog.us/Tutorials/index.php?p=1.1&s=1

More Related Content

Viewers also liked

Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
Chris Fregly
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Chris Fregly
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Chris Fregly
 

Viewers also liked (20)

[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsOptimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs
 
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用
 

Similar to The Genome Assembly Problem

chapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodiachapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodia
Sami Mut
 
Sequencing, Alignment and Assembly
Sequencing, Alignment and AssemblySequencing, Alignment and Assembly
Sequencing, Alignment and Assembly
Shaun Jackman
 
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Vall d'Hebron Institute of Research (VHIR)
 
Cable and configuration
Cable  and configurationCable  and configuration
Cable and configuration
Pheun vireak
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
PTIHPA
 
ScienceShare.co.uk Shared Resource
ScienceShare.co.uk Shared ResourceScienceShare.co.uk Shared Resource
ScienceShare.co.uk Shared Resource
ScienceShare.co.uk
 

Similar to The Genome Assembly Problem (20)

Fast qPCR assay optimization and validation techniques for HTS
Fast qPCR assay optimization and validation techniques for HTSFast qPCR assay optimization and validation techniques for HTS
Fast qPCR assay optimization and validation techniques for HTS
 
CS176: Genome Assembly
CS176: Genome AssemblyCS176: Genome Assembly
CS176: Genome Assembly
 
Understanding Garbage Collection
Understanding Garbage CollectionUnderstanding Garbage Collection
Understanding Garbage Collection
 
introduction to metagenomics
introduction to metagenomicsintroduction to metagenomics
introduction to metagenomics
 
chapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodiachapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodia
 
Sequencing, Alignment and Assembly
Sequencing, Alignment and AssemblySequencing, Alignment and Assembly
Sequencing, Alignment and Assembly
 
Sequence Assembly
Sequence AssemblySequence Assembly
Sequence Assembly
 
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human...
 
Lecture6.pptx
Lecture6.pptxLecture6.pptx
Lecture6.pptx
 
Cable and configuration
Cable  and configurationCable  and configuration
Cable and configuration
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
 
Java GC, Off-heap workshop
Java GC, Off-heap workshopJava GC, Off-heap workshop
Java GC, Off-heap workshop
 
Bio info
Bio infoBio info
Bio info
 
Sage technology
Sage technologySage technology
Sage technology
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
Bio animation
Bio animationBio animation
Bio animation
 
Gc algorithms
Gc algorithmsGc algorithms
Gc algorithms
 
ScienceShare.co.uk Shared Resource
ScienceShare.co.uk Shared ResourceScienceShare.co.uk Shared Resource
ScienceShare.co.uk Shared Resource
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
PLC
PLCPLC
PLC
 

More from Mark Chang

More from Mark Chang (20)

Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOW
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
淺談深度學習
淺談深度學習淺談深度學習
淺談深度學習
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
TensorFlow 深度學習快速上手班--深度學習
 TensorFlow 深度學習快速上手班--深度學習 TensorFlow 深度學習快速上手班--深度學習
TensorFlow 深度學習快速上手班--深度學習
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10
 
Neural Doodle
Neural DoodleNeural Doodle
Neural Doodle
 
TensorFlow 深度學習講座
TensorFlow 深度學習講座TensorFlow 深度學習講座
TensorFlow 深度學習講座
 
Computational Linguistics week 5
Computational Linguistics  week 5Computational Linguistics  week 5
Computational Linguistics week 5
 
Neural Art (English Version)
Neural Art (English Version)Neural Art (English Version)
Neural Art (English Version)
 
AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

The Genome Assembly Problem