SlideShare a Scribd company logo
1 of 44
Comparative genomics
Presented by
Arooba Baig
Fomaz Tariq
Genomics
Genomics is an area within genetics that concerns the sequencing and analysis of an
organism’s genome.
Development and application of genetic mapping, sequencing, and computation
(bioinformatics) to analyze the genomes of organisms.
Sub-fields of genomics:
Structural genomics-genetic and physical mapping of genomes.
Functional genomics-analysis of gene function (and non-genes).
Comparative genomics-comparison of genomes across species.
 Includes structural and functional genomics.
 Evolutionary genomics.
Comparative genomics
Comparative genomics is an exciting field of biological research in which
researchers use a variety of tools, including computer-based analysis, to
compare the complete genome sequences of different species
A comparison of gene numbers, gene locations & biological functions of
gene, in the genomes of different organisms, one objective being to
identify groups of genes that play a unique biological role in a particular
organism.
History
• Comparative genomics has a root in the comparison of virus genomes in
the early 1980s.
• For example, small RNA viruses infecting animals (picorna viruses) and
those infecting plants ( cowpea mosaic virus) were compared and turned
out to share significant sequence similarity and, in part, the order of their
genes.
• In 1986, the first comparative genomic study at a larger scale was
published, comparing the genomes of varicella-zoster virus and Epstein-
Barr virus that contained more than 100 genes each
Contd..
• The first complete genome sequence of a cellular organism, that of
Haemophilus influenzae Rd, was published in 1995.
• The second genome sequencing paper was of the small parasitic
bacterium Mycoplasma genitalium published in the same year.
• Saccharomyces cerevisiae, the baker's yeast, was the first eukaryote
to have its complete genome sequence published in 1996.
• After the publication of the roundworm Caenorhabditis elegans genome
in 1998, and together with the fruit fly Drosophila melanogaster genome
in 2000, Gerald M. Rubin and his team published a paper titled
"Comparative Genomics of the Eukaryotes“.
• In which they compared the genomes of the eukaryotes D. melanogaster,
C. elegans, and S. cerevisiae, as well as the prokaryote H. influenza .
Related Terminologies
• Homology is the relationship of any two characters (such as two proteins that have similar
sequences) that have descended, usually through divergence, from a common ancestral
character
• Homologues Homologues can either be orthologues, paralogues
• Orthologues are homologues that have evolved from a common ancestral gene by
speciation. They usually have similar function
• Paralogues are homologues that are related or produced by duplication within a
genome. They often have evolved to perform different functions
Comparative Genomics Tools
Similarity search programs
• BLAST2 (Basic Local Alignment Search Tool)
• FASTA
• MUMmer (Maximal Unique Match) (Comparisons and analyses at both
Nucleic acid and protein level)
Other alignment programs
• DBA [DNA Block Aligner]
• Blastz
• BLAT/AVID, – WABA [Wobble Aware Bulk Aligner]
• DIALIGN [Diagonal ALIGNment]
• SSAHA [Sequence Search and Alignment by Hashing Algorithm]
Contd..
Comparative gene prediction programs
Twins can
Double scan
SGP-1
 Regulatory region prediction
 Consite
Visualization/ Sequence analysis programs
Dot plot (e.g. Dotter)
PIP maker (Percent Identity Plot)
Alfresco
 VISTA (VISualization Tools for Alignments)
 ACT (Artemis comparison tool) S S Jena
Comparative Genomics Tool
 The UCSC Genome Browser is an on-line genome
browser hosted by the University of California, Santa
Cruz. The UCSC Genome Browser is an on-
line genome browser hosted by the University of
California, Santa Cruz
Synteny Regions
Synteny Regions of two genomes that show considerable similarity in
terms of sequence and conservation of the order of genes.
Genes that are in the same relative position on two different
chromosomes.
Closely related species generally have similar order of genes on
chromosomes.
Synteny can be used to identify genes in one species based on map-
position in another
Interactive DAGchianer Algorithm:
Tool for mining GenomeDuplication & Synteny
 Finding putative genes or regions of homology between two
genomes
 Identifying collinear sets of genes or regions of sequence
 Generating a dot plot of the results and coloring syntenic pairs.
Comparative Genomics Tool
 Syntentic dot plot: Syntentic dot plots give biologists very
valuable information about how organisms diverged from a
common ancestor.
 Biologists can easily look at one of these dot plots and see
where large sections of DNA have been deleted, inserted,
copied, or moved.
 The dot plots are also very good at depicting how closely two
organisms are related through the quantity and linearity of
green dots over an entire genome.
Sequence Similarity Search
The most frequently performed type of sequence comparison is the
sequence similarity search
 Sequence comparisons that implicate function are widely used:
 To determine if newly sequenced cDNA or genomic region encodes gene
of known function.
 Search for similar sequence in other species (or in same species)
Contd..
 Search databases of DNA sequences
 Use computer algorithms to align sequences
 Don’t require perfect matches between sequences
 Most commonly used algorithms:
 BLAST
 FAST-A Homology searches
BLAST
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to sequence databases
and calculates the statistical significance of matches. BLAST can be used to infer functional
and evolutionary relationships between sequences as well as help identify members of gene
families.
General Databases Useful for Comparative
Genomics
• Locus Link/Ref Seq:
http://www.ncbi.nih.gov/LocusLink/
• PEDANT-Protein Extraction Description Analysis Tool
http://pedant.gsf.de
• COGs - Cluster of Orthologous Groups (of proteins)
http://www.ncbi.nih.gov/COG/
• KEGG- Kyoto Encyclopedia of Genes and Genomes
http://www.genome.ad.jp/kegg/
• MBGD - Microbial Genome Database
http://mbgd.genome.ad.jp/
• GOLD - Genome Online Database
http://wit.integratedgenomics.com/GOLD/
• TIGR – The Institute of Genome Research
Comparative genomics of Parasites
Comparative genomic process
 Alignment of DNA sequences is the core process in comparative
genomics.
An alignment is a mapping of the nucleotides in one sequence onto the
nucleotides in the other sequence, with gaps introduced into one or the
other sequence to increase the number of positions with matching
nucleotides.
Several powerful alignment algorithms have been developed to align two
or more sequences
Methods for comparative genomics
• Comparative analysis of genome structure
• Comparative analysis of coding regions (exon)
• Comparative analysis of non-coding regions (introns)
Comparative analysis of genome structure
Analysis of the global structure of genomes, such as nucleotide
composition, syntenic relationships, and gene ordering offer insight into the
similarities and differences between genomes.
This provide information on the organization and evolution of the
genomes, and highlight the unique features of individual genomes
The structure of different genomes can be compared at three levels:
• Overall nucleotide statistics,
• Genome structure at DNA level
• Genome structure at gene level.
Comparison of genome structure at DNA level
Chromosomal breakage and exchange of chromosomal fragments are
common mode of gene evolution. They can be studied by comparing
genome structures at DNA level.
• Identification of conserved Synteny and genome rearrangement events
• Analysis of breakpoints
• Analysis of content and distribution of DNA repeats
Comparison of genome structure at gene level
Chromosomal breakage and exchange of chromosomal fragments
cause disruption of gene order
Therefore gene order correlates with evolutionary distance between
genomes
Comparative analysis of coding regions
The analysis and comparison of the coding regions starts with the gene
identification algorithm that is used to infer what portions of the genomic
sequence actively code for genes.
There are four basic approaches for gene identification
Comparative analysis of coding regions
25
Number of algorithms that have been use in comparative genomics
to aid function prediction of genes.
Identification of gene-coding regions
comparison of gene content
comparison of protein content
Comparative genome based function prediction
Comparison of gene content
After the predicted gene set is generated, it is very interesting and important to
compare the content of genes across genomes
The first statistics to compare is the estimated total number of genes in a genome,
elucidate the similarities and differences between the genomes include percentage
of the genome that code for genes, distribution of coding regions across the
genome average gene length, codon usage
This is often done using a pairwise sequence comparison tool such as BLASTN or
TBLASTX
26
Comparison of protein content
A second level of analysis that can be performed is to compare the set of
gene products (protein) between the genomes, which has been termed
‘‘comparative proteomics”
It is important to compare the protein contents in critical pathways and
important functional categories across genomes
Two widely used resources for pathways and functional categories are the
KEGG pathway database and the Gene Ontology (GO) hierarchy
• Interesting statistics to compare include
• Level of sequence identity between orthologous pairs across genome
• Paralogous pairs within genome,
• Number of replicated copies in corresponding paralog families
• Functions of the paralogs
Comparative analysis of noncoding regions
Noncoding regions of the genome gained a lot of attention in recent years
because of its predicted role in regulation of transcription, DNA replication,
and other biological functions
Insights into Genome Fluxes and the Processes of Evolution
• From an evolutionary biology perspective, whole genome comparisons
provide molecular insights into the processes of evolution that include the
molecular events responsible for the variations and fluxes that occur through
a genome. These include processes like, inversions, translocations,
deletions, duplications and insertions.
30
The Impact of Comparative Genomics in Phylogenetic Analysis
 Schematic depiction of Microsporidia's phylogenetic position based on Small Subunit RNA (SSU
rRNA) as an early branching eukaryote that evolved prior to the acquisiton of mitochondria,
and it's subsequent placement based on a composite gene phylogeny where it was placed
closer to fungi. The latter placement has been confirmed by the complete sequenceof the
micro-sporidia, Encephalitozoon cuniculi, where despite the absence of mitochondria, the
presence of several mitochondrial genes could be observed.
31
Contd…
We have learned from homologous sequence alignment that the information that
can be gained by comparing two genomes together is largely dependent upon
the phylogenetic distance between them.
Phylogenetic distance is a measure of the degree of separation between two
organisms or their genomes on an evolutionary scale, usually expressed as the
number of accumulated sequence changes, number of years, or number of
generations.
The more distantly related two organisms are, the less sequence similarity or
shared genomic features will be detected between them.
 Thus, only general insights about classes of shared genes can be gathered by
genomic comparisons at very long phylogenetic distances (e.g., over one billion
years since their separation). Over such very large distances, the order of genes
and the signatures of sequences that regulate their transcription are rarely
conserved
How Are Genomes Compared?
• A simple comparison of the general features of genomes such as genome
size, number of genes, and chromosome number presents an entry point
into comparative genomic analysis.
• Data for several fully-sequenced model organisms is shown in Table 1.
Contd…
• For example, while the tiny flowering plant Arabidopsis thaliana has a
smaller genome than that of the fruit fly Drosophila melanogaster
(157 million base pairs v. 165 million base pairs, respectively)
• It possesses nearly twice as many genes (25,000 v. 13,000).
• In fact A. thaliana has approximately the same number of genes as
humans (~25,000).
• Thus, a very early lesson learned in the "genomic era" is that genome
size does not correlate with evolutionary status, nor is the number of
genes proportionate to genome size.
Contd..
• Figure 1 depicts a chromosome-level comparison of the human and mouse
genomes that shows the level of Synteny between these two mammals
• Synteny is a situation in which genes are arranged in similar blocks in
different species.
• The nature and extent of conservation of Synteny differs substantially among
chromosomes.
• For example, the X chromosomes are represented as single, reciprocal
syntenic blocks.
• Human chromosome 20 corresponds entirely to a portion of mouse
chromosome 2, with nearly perfect conservation of order along almost the
entire length, disrupted only by a small central segment
• Human chromosome 17 corresponds entirely to a portion of mouse
chromosome 11.
• Other chromosomes, however, show evidence of more extensive
interchromosomal rearrangement.
• Results such as these provide an extraordinary glimpse into the
chromosomal changes that have shaped the mouse and human genomes
since their divergence from a common ancestor 75–80 million years ago.
Comparing Human, Chimp, and Mouse Genomes
 The graphs below indicate the similarity between the human genome and those of the chimpanzee
and the mouse as they are mapped to identical locations in the human genome.
 Since the chimpanzee genome is closer in evolutionary time to the human genome, the chimp
chromosomes map very closely to human chromosomes
 The mouse genome is more distant in evolutionary time from human, and thus its chromosomes do
not map as closely as do the chimp chromosomes.
 The white areas indicate areas of the human genome that either do not map well to the other
genome, or are areas of centromeres and telomeres where the genome sequence is unknown.
 Chromosome numbering is purely arbitrary, based upon early microscopic estimates of
chromosome length.
 The chimpanzee genome has 23 numbered chromosomes, the human genome has 22 numbered
chromosomes (chimp chromosomes 2a and 2b map to human chromosome 2), the mouse genome
has 19 numbered chromosomes.
 The X and Y sex chromosomes have unique names, as well as other unique characteristics.
Mouse genome mapped
on the human genome
• This image shows the 34% of the
mouse genome that maps to identical
sequence in the human genome.
• The matching locations are jumbled,
indicating rearrangements of the two
genomes since their last common
ancestor, approximately 75 million
years before present.
• Data for this figure comes from
assemblies of the human and mouse
genomes available from the UCSC
Genome Browser in June 2006.
Chimpanzee genome mapped on the human
genome
• This image shows the 95% of the
chimpanzee genome that maps to identical
sequence in the human genome.
• The consistency of the color indication
demonstrates the close identity between the
two genomes since their last common
ancestor, approximately 5 million years
before present.
• The human chromosome 2 actually aligns
to two separate chimp chromosomes, now
called chr2a and chr2 and represented
here by the same color..
• Data for this figure comes from assemblies
of the human and chimpanzee genomes
available from the UCSC Genome Browser
in June 2006.
Benefits of comparative genomics
Identifying DNA sequences that have been "conserved“
It pinpoints genes that are essential to life and highlights genomic signals
that control gene function across many species
Comparative genomics also provides a powerful tool for studying evolution
Applications
• agriculture,
• biotechnology
• and zoology
• evolutionary tree
• Drugs discovery
Comparative Genomics in Drug Discovery
Comparative genomic studies throw important light on the
pathogenesis of organisms, throwing up opportunities for
therapeutic intervention as well as help in understanding and
identifying disease genes
One of the most important fallouts of comparative analyses at a
genome-wide scale is in the ability to identify and develop novel
drug targets
Comparative genomics in drug discovery programs. A flow chart diagram
explaining how comparative genomics can facilitate drug discovery programs for
the discovery of new antimicrobials
References
1. http://www.slideshare.net/naripati/comparative-genomics-45921801
2. http://www.genome.gov/11509542
3. http://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=3150&context=etd
4. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891719/
5. http://www.powershow.com/view1/1fa4ca-
ZDc1Z/UCSC_Genome_Browser_Tutorial_powerpoint_ppt_presentation
6. http://www.dcode.org/
7. http://blast.ncbi.nlm.nih.gov/Blast.cgi
8. http://www.proteinstructures.com/Sequence/Sequence/sequence-
alignment.html
9. https://www.dnalc.org/view/1241-Breakpoints.html
genomic comparison

More Related Content

What's hot

What's hot (20)

Comparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesComparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organelles
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Transcriptomics
TranscriptomicsTranscriptomics
Transcriptomics
 
SNP
SNPSNP
SNP
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Single nucleotide polymorphism, (SNP)
Single nucleotide polymorphism, (SNP)Single nucleotide polymorphism, (SNP)
Single nucleotide polymorphism, (SNP)
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
 

Viewers also liked

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Amol Kunde
 
PSYC1101 - Chapter 6, 4th Edition PowerPoint
PSYC1101 - Chapter 6, 4th Edition PowerPointPSYC1101 - Chapter 6, 4th Edition PowerPoint
PSYC1101 - Chapter 6, 4th Edition PowerPoint
hunzikerCCC
 
PSYC1101 - Chapter 5, 4th Edition PowerPoint
PSYC1101 - Chapter 5, 4th Edition PowerPointPSYC1101 - Chapter 5, 4th Edition PowerPoint
PSYC1101 - Chapter 5, 4th Edition PowerPoint
hunzikerCCC
 

Viewers also liked (20)

What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentation
 
The evolutionary conserved neurobiology of operant learning
The evolutionary conserved neurobiology of operant learningThe evolutionary conserved neurobiology of operant learning
The evolutionary conserved neurobiology of operant learning
 
Kkocabiyik ispeech
Kkocabiyik ispeechKkocabiyik ispeech
Kkocabiyik ispeech
 
Phd Proposal
Phd ProposalPhd Proposal
Phd Proposal
 
Reiter lecture 11.11.14
Reiter lecture 11.11.14Reiter lecture 11.11.14
Reiter lecture 11.11.14
 
Ch8 ppt
Ch8 pptCh8 ppt
Ch8 ppt
 
Memory2
Memory2Memory2
Memory2
 
Molecular Biology of Memory
Molecular Biology of MemoryMolecular Biology of Memory
Molecular Biology of Memory
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
PSYC1101 - Chapter 6, 4th Edition PowerPoint
PSYC1101 - Chapter 6, 4th Edition PowerPointPSYC1101 - Chapter 6, 4th Edition PowerPoint
PSYC1101 - Chapter 6, 4th Edition PowerPoint
 
PSYC1101 - Chapter 5, 4th Edition PowerPoint
PSYC1101 - Chapter 5, 4th Edition PowerPointPSYC1101 - Chapter 5, 4th Edition PowerPoint
PSYC1101 - Chapter 5, 4th Edition PowerPoint
 
Learning in Organisational Behaviour
Learning in Organisational BehaviourLearning in Organisational Behaviour
Learning in Organisational Behaviour
 
Teaching and Learning Process
Teaching and Learning ProcessTeaching and Learning Process
Teaching and Learning Process
 
Classical conditioning IVAN PAVLOV
Classical conditioning IVAN PAVLOVClassical conditioning IVAN PAVLOV
Classical conditioning IVAN PAVLOV
 
Classical Conditioning
Classical ConditioningClassical Conditioning
Classical Conditioning
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
 

Similar to genomic comparison

Bioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsBioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomics
juancarlosrise
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
rehman2009
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
ChijiokeNsofor
 

Similar to genomic comparison (20)

Comparative genomics.pdf
Comparative genomics.pdfComparative genomics.pdf
Comparative genomics.pdf
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Applications of bioinformatics
 
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Bioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsBioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomics
 
Genomics
GenomicsGenomics
Genomics
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.ppt
 
Genomics
Genomics Genomics
Genomics
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Functional genomics,Pharmaco genomics, and Meta genomics.
Functional genomics,Pharmaco genomics, and Meta genomics.Functional genomics,Pharmaco genomics, and Meta genomics.
Functional genomics,Pharmaco genomics, and Meta genomics.
 
Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 

More from comsats university of science information technology (6)

Csv file read and write
Csv file read and writeCsv file read and write
Csv file read and write
 
Matlab bioinformatics presentation
Matlab bioinformatics presentationMatlab bioinformatics presentation
Matlab bioinformatics presentation
 
Final chick embryonic-development-ppt
Final chick embryonic-development-pptFinal chick embryonic-development-ppt
Final chick embryonic-development-ppt
 
Bio info
Bio infoBio info
Bio info
 
ANTI CHRIST in ISLAM
ANTI CHRIST in ISLAMANTI CHRIST in ISLAM
ANTI CHRIST in ISLAM
 
courtesy 7C's of communication
courtesy 7C's of communicationcourtesy 7C's of communication
courtesy 7C's of communication
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Recently uploaded (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

genomic comparison

  • 2. Genomics Genomics is an area within genetics that concerns the sequencing and analysis of an organism’s genome. Development and application of genetic mapping, sequencing, and computation (bioinformatics) to analyze the genomes of organisms. Sub-fields of genomics: Structural genomics-genetic and physical mapping of genomes. Functional genomics-analysis of gene function (and non-genes). Comparative genomics-comparison of genomes across species.  Includes structural and functional genomics.  Evolutionary genomics.
  • 3. Comparative genomics Comparative genomics is an exciting field of biological research in which researchers use a variety of tools, including computer-based analysis, to compare the complete genome sequences of different species A comparison of gene numbers, gene locations & biological functions of gene, in the genomes of different organisms, one objective being to identify groups of genes that play a unique biological role in a particular organism.
  • 4. History • Comparative genomics has a root in the comparison of virus genomes in the early 1980s. • For example, small RNA viruses infecting animals (picorna viruses) and those infecting plants ( cowpea mosaic virus) were compared and turned out to share significant sequence similarity and, in part, the order of their genes. • In 1986, the first comparative genomic study at a larger scale was published, comparing the genomes of varicella-zoster virus and Epstein- Barr virus that contained more than 100 genes each
  • 5. Contd.. • The first complete genome sequence of a cellular organism, that of Haemophilus influenzae Rd, was published in 1995. • The second genome sequencing paper was of the small parasitic bacterium Mycoplasma genitalium published in the same year. • Saccharomyces cerevisiae, the baker's yeast, was the first eukaryote to have its complete genome sequence published in 1996. • After the publication of the roundworm Caenorhabditis elegans genome in 1998, and together with the fruit fly Drosophila melanogaster genome in 2000, Gerald M. Rubin and his team published a paper titled "Comparative Genomics of the Eukaryotes“. • In which they compared the genomes of the eukaryotes D. melanogaster, C. elegans, and S. cerevisiae, as well as the prokaryote H. influenza .
  • 6. Related Terminologies • Homology is the relationship of any two characters (such as two proteins that have similar sequences) that have descended, usually through divergence, from a common ancestral character • Homologues Homologues can either be orthologues, paralogues • Orthologues are homologues that have evolved from a common ancestral gene by speciation. They usually have similar function • Paralogues are homologues that are related or produced by duplication within a genome. They often have evolved to perform different functions
  • 7. Comparative Genomics Tools Similarity search programs • BLAST2 (Basic Local Alignment Search Tool) • FASTA • MUMmer (Maximal Unique Match) (Comparisons and analyses at both Nucleic acid and protein level) Other alignment programs • DBA [DNA Block Aligner] • Blastz • BLAT/AVID, – WABA [Wobble Aware Bulk Aligner] • DIALIGN [Diagonal ALIGNment] • SSAHA [Sequence Search and Alignment by Hashing Algorithm]
  • 8. Contd.. Comparative gene prediction programs Twins can Double scan SGP-1  Regulatory region prediction  Consite Visualization/ Sequence analysis programs Dot plot (e.g. Dotter) PIP maker (Percent Identity Plot) Alfresco  VISTA (VISualization Tools for Alignments)  ACT (Artemis comparison tool) S S Jena
  • 9. Comparative Genomics Tool  The UCSC Genome Browser is an on-line genome browser hosted by the University of California, Santa Cruz. The UCSC Genome Browser is an on- line genome browser hosted by the University of California, Santa Cruz
  • 10. Synteny Regions Synteny Regions of two genomes that show considerable similarity in terms of sequence and conservation of the order of genes. Genes that are in the same relative position on two different chromosomes. Closely related species generally have similar order of genes on chromosomes. Synteny can be used to identify genes in one species based on map- position in another
  • 11. Interactive DAGchianer Algorithm: Tool for mining GenomeDuplication & Synteny  Finding putative genes or regions of homology between two genomes  Identifying collinear sets of genes or regions of sequence  Generating a dot plot of the results and coloring syntenic pairs. Comparative Genomics Tool
  • 12.
  • 13.  Syntentic dot plot: Syntentic dot plots give biologists very valuable information about how organisms diverged from a common ancestor.  Biologists can easily look at one of these dot plots and see where large sections of DNA have been deleted, inserted, copied, or moved.  The dot plots are also very good at depicting how closely two organisms are related through the quantity and linearity of green dots over an entire genome.
  • 14. Sequence Similarity Search The most frequently performed type of sequence comparison is the sequence similarity search  Sequence comparisons that implicate function are widely used:  To determine if newly sequenced cDNA or genomic region encodes gene of known function.  Search for similar sequence in other species (or in same species)
  • 15. Contd..  Search databases of DNA sequences  Use computer algorithms to align sequences  Don’t require perfect matches between sequences  Most commonly used algorithms:  BLAST  FAST-A Homology searches
  • 16. BLAST The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
  • 17.
  • 18. General Databases Useful for Comparative Genomics • Locus Link/Ref Seq: http://www.ncbi.nih.gov/LocusLink/ • PEDANT-Protein Extraction Description Analysis Tool http://pedant.gsf.de • COGs - Cluster of Orthologous Groups (of proteins) http://www.ncbi.nih.gov/COG/ • KEGG- Kyoto Encyclopedia of Genes and Genomes http://www.genome.ad.jp/kegg/ • MBGD - Microbial Genome Database http://mbgd.genome.ad.jp/ • GOLD - Genome Online Database http://wit.integratedgenomics.com/GOLD/ • TIGR – The Institute of Genome Research Comparative genomics of Parasites
  • 19. Comparative genomic process  Alignment of DNA sequences is the core process in comparative genomics. An alignment is a mapping of the nucleotides in one sequence onto the nucleotides in the other sequence, with gaps introduced into one or the other sequence to increase the number of positions with matching nucleotides. Several powerful alignment algorithms have been developed to align two or more sequences
  • 20. Methods for comparative genomics • Comparative analysis of genome structure • Comparative analysis of coding regions (exon) • Comparative analysis of non-coding regions (introns)
  • 21. Comparative analysis of genome structure Analysis of the global structure of genomes, such as nucleotide composition, syntenic relationships, and gene ordering offer insight into the similarities and differences between genomes. This provide information on the organization and evolution of the genomes, and highlight the unique features of individual genomes The structure of different genomes can be compared at three levels: • Overall nucleotide statistics, • Genome structure at DNA level • Genome structure at gene level.
  • 22. Comparison of genome structure at DNA level Chromosomal breakage and exchange of chromosomal fragments are common mode of gene evolution. They can be studied by comparing genome structures at DNA level. • Identification of conserved Synteny and genome rearrangement events • Analysis of breakpoints • Analysis of content and distribution of DNA repeats
  • 23. Comparison of genome structure at gene level Chromosomal breakage and exchange of chromosomal fragments cause disruption of gene order Therefore gene order correlates with evolutionary distance between genomes
  • 24. Comparative analysis of coding regions The analysis and comparison of the coding regions starts with the gene identification algorithm that is used to infer what portions of the genomic sequence actively code for genes. There are four basic approaches for gene identification
  • 25. Comparative analysis of coding regions 25 Number of algorithms that have been use in comparative genomics to aid function prediction of genes. Identification of gene-coding regions comparison of gene content comparison of protein content Comparative genome based function prediction
  • 26. Comparison of gene content After the predicted gene set is generated, it is very interesting and important to compare the content of genes across genomes The first statistics to compare is the estimated total number of genes in a genome, elucidate the similarities and differences between the genomes include percentage of the genome that code for genes, distribution of coding regions across the genome average gene length, codon usage This is often done using a pairwise sequence comparison tool such as BLASTN or TBLASTX 26
  • 27. Comparison of protein content A second level of analysis that can be performed is to compare the set of gene products (protein) between the genomes, which has been termed ‘‘comparative proteomics” It is important to compare the protein contents in critical pathways and important functional categories across genomes Two widely used resources for pathways and functional categories are the KEGG pathway database and the Gene Ontology (GO) hierarchy
  • 28. • Interesting statistics to compare include • Level of sequence identity between orthologous pairs across genome • Paralogous pairs within genome, • Number of replicated copies in corresponding paralog families • Functions of the paralogs
  • 29. Comparative analysis of noncoding regions Noncoding regions of the genome gained a lot of attention in recent years because of its predicted role in regulation of transcription, DNA replication, and other biological functions
  • 30. Insights into Genome Fluxes and the Processes of Evolution • From an evolutionary biology perspective, whole genome comparisons provide molecular insights into the processes of evolution that include the molecular events responsible for the variations and fluxes that occur through a genome. These include processes like, inversions, translocations, deletions, duplications and insertions. 30
  • 31. The Impact of Comparative Genomics in Phylogenetic Analysis  Schematic depiction of Microsporidia's phylogenetic position based on Small Subunit RNA (SSU rRNA) as an early branching eukaryote that evolved prior to the acquisiton of mitochondria, and it's subsequent placement based on a composite gene phylogeny where it was placed closer to fungi. The latter placement has been confirmed by the complete sequenceof the micro-sporidia, Encephalitozoon cuniculi, where despite the absence of mitochondria, the presence of several mitochondrial genes could be observed. 31
  • 32.
  • 33. Contd… We have learned from homologous sequence alignment that the information that can be gained by comparing two genomes together is largely dependent upon the phylogenetic distance between them. Phylogenetic distance is a measure of the degree of separation between two organisms or their genomes on an evolutionary scale, usually expressed as the number of accumulated sequence changes, number of years, or number of generations. The more distantly related two organisms are, the less sequence similarity or shared genomic features will be detected between them.  Thus, only general insights about classes of shared genes can be gathered by genomic comparisons at very long phylogenetic distances (e.g., over one billion years since their separation). Over such very large distances, the order of genes and the signatures of sequences that regulate their transcription are rarely conserved
  • 34. How Are Genomes Compared? • A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. • Data for several fully-sequenced model organisms is shown in Table 1.
  • 35. Contd… • For example, while the tiny flowering plant Arabidopsis thaliana has a smaller genome than that of the fruit fly Drosophila melanogaster (157 million base pairs v. 165 million base pairs, respectively) • It possesses nearly twice as many genes (25,000 v. 13,000). • In fact A. thaliana has approximately the same number of genes as humans (~25,000). • Thus, a very early lesson learned in the "genomic era" is that genome size does not correlate with evolutionary status, nor is the number of genes proportionate to genome size.
  • 36. Contd.. • Figure 1 depicts a chromosome-level comparison of the human and mouse genomes that shows the level of Synteny between these two mammals • Synteny is a situation in which genes are arranged in similar blocks in different species. • The nature and extent of conservation of Synteny differs substantially among chromosomes. • For example, the X chromosomes are represented as single, reciprocal syntenic blocks. • Human chromosome 20 corresponds entirely to a portion of mouse chromosome 2, with nearly perfect conservation of order along almost the entire length, disrupted only by a small central segment • Human chromosome 17 corresponds entirely to a portion of mouse chromosome 11. • Other chromosomes, however, show evidence of more extensive interchromosomal rearrangement. • Results such as these provide an extraordinary glimpse into the chromosomal changes that have shaped the mouse and human genomes since their divergence from a common ancestor 75–80 million years ago.
  • 37. Comparing Human, Chimp, and Mouse Genomes  The graphs below indicate the similarity between the human genome and those of the chimpanzee and the mouse as they are mapped to identical locations in the human genome.  Since the chimpanzee genome is closer in evolutionary time to the human genome, the chimp chromosomes map very closely to human chromosomes  The mouse genome is more distant in evolutionary time from human, and thus its chromosomes do not map as closely as do the chimp chromosomes.  The white areas indicate areas of the human genome that either do not map well to the other genome, or are areas of centromeres and telomeres where the genome sequence is unknown.  Chromosome numbering is purely arbitrary, based upon early microscopic estimates of chromosome length.  The chimpanzee genome has 23 numbered chromosomes, the human genome has 22 numbered chromosomes (chimp chromosomes 2a and 2b map to human chromosome 2), the mouse genome has 19 numbered chromosomes.  The X and Y sex chromosomes have unique names, as well as other unique characteristics.
  • 38. Mouse genome mapped on the human genome • This image shows the 34% of the mouse genome that maps to identical sequence in the human genome. • The matching locations are jumbled, indicating rearrangements of the two genomes since their last common ancestor, approximately 75 million years before present. • Data for this figure comes from assemblies of the human and mouse genomes available from the UCSC Genome Browser in June 2006.
  • 39. Chimpanzee genome mapped on the human genome • This image shows the 95% of the chimpanzee genome that maps to identical sequence in the human genome. • The consistency of the color indication demonstrates the close identity between the two genomes since their last common ancestor, approximately 5 million years before present. • The human chromosome 2 actually aligns to two separate chimp chromosomes, now called chr2a and chr2 and represented here by the same color.. • Data for this figure comes from assemblies of the human and chimpanzee genomes available from the UCSC Genome Browser in June 2006.
  • 40. Benefits of comparative genomics Identifying DNA sequences that have been "conserved“ It pinpoints genes that are essential to life and highlights genomic signals that control gene function across many species Comparative genomics also provides a powerful tool for studying evolution Applications • agriculture, • biotechnology • and zoology • evolutionary tree • Drugs discovery
  • 41. Comparative Genomics in Drug Discovery Comparative genomic studies throw important light on the pathogenesis of organisms, throwing up opportunities for therapeutic intervention as well as help in understanding and identifying disease genes One of the most important fallouts of comparative analyses at a genome-wide scale is in the ability to identify and develop novel drug targets
  • 42. Comparative genomics in drug discovery programs. A flow chart diagram explaining how comparative genomics can facilitate drug discovery programs for the discovery of new antimicrobials
  • 43. References 1. http://www.slideshare.net/naripati/comparative-genomics-45921801 2. http://www.genome.gov/11509542 3. http://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=3150&context=etd 4. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891719/ 5. http://www.powershow.com/view1/1fa4ca- ZDc1Z/UCSC_Genome_Browser_Tutorial_powerpoint_ppt_presentation 6. http://www.dcode.org/ 7. http://blast.ncbi.nlm.nih.gov/Blast.cgi 8. http://www.proteinstructures.com/Sequence/Sequence/sequence- alignment.html 9. https://www.dnalc.org/view/1241-Breakpoints.html

Editor's Notes

  1. The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
  2. The UCSC Genome Browser is an on-line genome browser.  It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations.
  3. e Statistics main menu option allows you to calculate Nucleotide Composition, Nucleotide Pair Frequencies and Codon
  4. The distances are often placed on phylogenetic trees, which show the deduced relationships among the organisms
  5. no active moiety that hIf one is looking for antibacterial, antifungal, or antiprotozoal proteins to be used as targets, comparative genome analysis can reveal virulence genes, uncharacterized essential genes, species-specific genes, organism-specific genes, while ensuring that the chosen genes have no homologues in humans as been approved by the FDA in any other application