2. Data is stored in a biological database in
the form of sequences or molecular form
Unique file format
Representation of data in biological
database
Categories of file formats
Sequence database
Molecular database
2
3. Gene bank flat-file Format
FASTA Format
Multi-FASTA Format
GCG Format
GCG-MSF Format
EMBL Format
Clustal Format
SWIS PROT format
3
4. Used by NCBI
It is divided into three parts
Header just a direct and very precise or
brief introductory part
Features
all genes in seq., location of genes in genome,
protein product and coding genes etc.
Sequence : ORIGIN atcgatcgatgcgctat //
4
5. HEADRES
Locus
Definition
Accession
Version
Dbsource: dates for creation and modifications
Keywords
Source
Organism
References
Authors
Title
Journal
Medline ID: all published sources
Comment
FEATURES
SEQUENCE
5
9. One line header
Stats with > followed by name of gene
Sequence of gene or protein
Blank spaces
Paragraph marks
Numerals
Are all ignored
Steric sign * at the end
9
12. Just like an aggregation of FASTA file as listed
above
Multiple sequences follow one after the other
Single file
Accepted by several databases
Clustal W
Multalin
12
15. GCG: genetics computer group
First line says it all ….
!!N.A_SEQUENCE 1.0
!!AA_SEQUENCE 1.0
Just a simple format in which we just get
to now the sequence for the genes or
proteins
15
17. Multiple sequences
Sequence name
Sequences
Alignment
Word pileup indicates that It is a multiple
sequence containing file
Mandatory MSF word indicated in the file that
tells that it is an MSF GCG file and is not just
GCG
Comments terminated with //
2 consecutive blank lines
Multiple sequences 17
19. Sequence format of European molecular
biology laboratory database
Starts with ID identification number
Ends with // as terminator
Different lines with own format
Used to record various forms of data
i.e DNA, RNA, GENE, PROTEIN etc etc
19
26. Several software's have been designed by … ?
The aim of these software's is to make a
detailed conversion of one sequence format
into another
Some of the software used widely for sequence
inter-conversion are :
ReadSeq
GCG
SeqVerter
Seqret 26
27. Developed by Dr. D.G Gilbert
Automated conversion
18 supported file formats are there which
can be interconverted into one another
27