The document summarizes the sequencing of the yeast Saccharomyces cerevisiae genome. Key points:
1) The yeast genome was sequenced between 1989-1996 by over 35 European laboratories in a collaborative effort. By 1996, the entire 12 megabase genome across 16 chromosomes had been sequenced.
2) The genome contains approximately 6,000 open reading frames that were annotated after sequencing. About 30% of yeast genes have homologs in human genes.
3) Sequencing involved creating ordered cosmid libraries, shotgun sequencing, and assembling overlapping sequences into contigs. Genes were identified and analyzed after full genome assembly.
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Yeast Genome Sequencing
1. ISF College of Pharmacy, Moga
Ghal Kalan, GT Road, Moga- 142001,
Punjab, INDIA
Internal Quality Assurance Cell - (IQAC)
Yeast Genome
Ruchika Sharma
Assistant Professor
Dept. of BIOTECHNOLOGY
ISF COLLEGE OF
PHARMACY
Website: - www.isfcp.org
2.
3. INTRODUCTION
Genome: The entire chromosomal genetic material of
an organism.
Sequencing a genome: Determining the identity and
order of nucleotides in the genetic material – usually
DNA, sometimes RNA, of an organism.
3
Gene (DNA) mRNA Protein
4. Genomics: is a discipline in genetics concerned with the
study of the genomes of organisms.
The field includes efforts to determine the entire DNA
sequence of organisms and genetic mapping and other
interactions between loci and alleles within the genome.
The yeast Saccharomyces cerevisiae (“baker’s yeast”) is
probably the ideal eukaryotic microorganism for biological
studies.
Classified in
the kingdom
fungi
1% of all
fungal
species
4
5. History
The first genetic map of S. cerevisiae was published in 1949.
In 1989, it was decided to initiate a yeast sequencing project
within the frame of the European Union biotechnology
programmes.
Based on a network approach, some 35 European
laboratories became initially involved in this enterprise.
5
6. For the first time, in May 1992, the
complete nucleotide sequence (315 kb)
of an entire chromosome - namely,
that of the yeast chromosome III - was
published by 35 European
laboratories
In 1994, the sequence of two more
chromosomes was published:
chromosome II of 820 kb and
chromosome XI of 666 kb.
Conti…
6
7. Conti…
By the end of 1995, more than 50% of the
yeast genome will have been sequenced
under the European Union project, and by
the end of 1996 the entire sequence of the
yeast genome will be known by an
International joint effort.
7
8. Basic problem
Genomes are large (typically
millions or billions of base pairs)
Current technology can only
reliably ‘read’ a short stretch –
typically hundreds of base pairs
8
9. Elements of a solution
Automation – over the past decade, the
amount of hand-labor in the ‘reads’ has
been steadily and dramatically reduced
Assembly of the ‘reads’ (sequences) in an
algorithmic and computational
programme.
9
11. Procedure
The sequencing of chromosome started
from a collection of overlapping plasmid or
phage lambda clones that were distributed
by the DNA co-ordinator to the contracting
laboratories.
However, it soon became evident that
ordered cosmid libraries were much more
advantageous to aid large scale
sequencing.
11
12. A low number of clones was of
interest in setting up ordered
yeast cosmid libraries or sorting
out and mapping the chromosome
specific sublibraries.
For example, a chromosome XI
specific sublibrary composed of
138 clones have been sorted out
from an unordered cosmid library
by colony hybridization, using
chromosome XI the DNA purified
by pulsed-field gel
electrophoresis. The 'nested
chromosomal fragmentation‘
was then applied to rapid
sorting of these clones
Nested chromosomal fragmentation
approach.
12
13. To facilitate sequencing and assembly of the
sequences, contigs of overlapping cosmids and fine
resolution physical maps of the respective
chromosomes were constructed first, by application
of classical mapping methods (fingerprints, cross-
hybridization) or by novel methods developed for
this programme, such as site-specific chromosome
fragmentation
13
16. Sequencing Strategies
Two principle approaches were used to prepare sub
clones for sequencing:
(i) Generation of sub libraries by the use of a series of
appropriate restriction enzymes or from nested
deletions of appropriate sub fragments made by
exonuclease III;
(ii) Generation of shotgun libraries from whole cosmids
or sub cloned fragments by random shearing of the
DNA.
Sequencing by the Sanger technique
16
17. Sequence Analysis
Along with the data submissions by the
single laboratories, and finally when the
complete sequences were available, they
were subjected to analysis by various
algorithms.
17
18. The sequences have been interpreted
using the following principles
(i) All intron splice site pairs detected by using specially defined
patterns.
(ii) All open reading frames (ORF) containing at least 100
contiguous sense codons and not contained entirely in a longer
ORF on either DNA strand were listed (this included partially
overlapping ORFs).
18
19. (iii) The two lists were merged and all intron splice site pairs
occurring inside an ORF but in opposite orientation were
disregarded.
(iv) Centromere and telomere regions thereof were sought by
comparison with previously characterized datasets of such
elements including the database entries provided in a
continuously updated library.
19
20. For similarity of proteins to entries in the
databanks were performed by FASTA, and
FLASH, in combination with the Protein
Sequence Database of PIR-International and
other public databases.
Protein signatures were detected by using the
PROSITE dictionary as well as BLOCKS and
PRODOM domains whenever relevant for the
interpretation of the query sequence.
20
21. Compositional analyses of the
chromosomes (base composition;
nucleotide pattern frequencies, GC
profiles; ORF distribution profiles,
etc.) were performed by using GCG
programmes. For calculations of GC
content of ORFs the algorithm
CODONS was used.
21
22. This information was than
compiled at the end of the
sequencing project to annotate
all genetic elements in the yeast
genome.
22
24. Result
In 1996 the Saccharomyces Genome Project has
revealed the presence of more than 6000 open reading
frames (ORFs) in the S. cerevisiae genome.
The goal of the Saccharomyces Genome Deletion
Project was to generate as complete a set as possible
of yeast deletion strains with the overall goal of
assigning function to the ORFs through phenotypic
analysis of the mutants.
24
25. Conti…
The average ORF size is 1450 bp. The sizes of the majority
of the open reading frames (ORFs) in yeast vary between
100 to 4000 codons.
Less than 1% of the ORFs is estimated to be below 100
codons.
14.8% of the total base pairs are homologues among gene of
unknown function', sometimes called ‘orphans”
25
26. Conti…
Five different types of Ty elements that exhibit
substantial homology to retroviruses and
retrotransposons from plants and animals are
present in the yeast genome.
The average base composition of yeast DNA is
38.4% (G+C).
The protein coding regions have a higher GC
content on average (40.2%) than the non-
coding regions (35.1%).
26
27. Conti…
The genome is composed of about
12,069,313 base pairs and
6,275 genes, compactly organized on
16 chromosomes. Only about 5,800
of these are believed to be true
functional genes.
27
33. Conti…
With the completion of the yeast
genome sequence, for the first
time, it became possible to
define the proteome of a
eukaryotic cell.
The term 'proteome' has been
coined to describe the complete
set of proteins synthesized by a
living cell.
33
34. Comparison of the Yeast Genome with
Other Genomes
The Human-Yeast Connection: It
is estimated that greater than 30% of
the yeast genes have homologues
among the human genes.
34
37. Conclusion
Sequence completed in April 1996.
12 mega bases on 16 chromosomes.
About 6000 open reading frames.
Few introns. (4%)
70% of genome encodes proteins.
75-80% genes are expressed.
43% of genes are functionally
characterized
37