Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.
2. Primary Database
It act as repository of raw data (came directly through experimentation)
For DNA
(Nucleotide/Genome Sequence Database)
For Protein
(Proteome Sequence Database)
1. GenBank
2. EMBLE
1. SWISS-PROT
2. PIR
3. Proteomic Databases
SWISS-PROT
Protein sequence database
High level of Annotation Low Redundancy High level of
Integration
• Sequence of Protein
• Structure of Protein
• Function of Protein
• Modifications in Protein
Redundancy means
having multiple
copies of same data
in the database.
Databases 1
Databases 2
Databases 3
4. TrEMBL
Computer annotated supplement of SWISS-PROT that contain all
translations of EMBL nucleotide sequence entries mot yet integrated in
SWISS-PROT.
SRS System
(Sequence Retrieval System)
At EBI
Download entire database
as single flat file
Accessed Through
EMBL
5. PIR (Protein Information Resource)
It produces Protein Sequence Database (PSD) of functionally annotated
protein sequences.
PIR +
EBI (European Bioinformatics Institute)
SIB (Swiss Institute of Bioinformatics)
UniProt
United Protein Database
Central resource of Protein Sequence & Function
6. iProClass
Central point for exploration of protein information.
Protein Family, Structure and Function
of
PIR-PSD SWISS-PROT TrEMBL
8. Secondary Database
• It contain results from the analysis of entries of the primary database.
• These databases are either manually curated or automatically generated.
• Contain information such as the conserved sequence, signature sequence and
active-site residues of protein families arrived at by multiple sequence alignment
of a set of related proteins.
conserved sequence
signature sequence
active-site residues of proteinMultiple sequence alignment
Protein-1
Protein-2
Protein-3
Primary Database Tool Result
9. PROSITE
• PRO+SITE = Protein Site
• Database of short sequence patterns & profiles that characterize
biologically significant sites in Proteins.
• Database of protein family and domains.
New Protein Sequence Protein Sequence in
existing Database
New Site identification
Significant
Property
10. Pfam
• P+fam = Protein Family
• Database of protein families and domains
• A protein domain is a conserved part of a given protein sequence and tertiary structure that can
evolve, function, and exist independently of the rest of the protein chain. Each domain forms a
compact three-dimensional structure and often can be independently stable and folded.
Domain
Multiple alignment
of a set of sequences
Match
SWISS-PROT TrEMBL
11. Structural Database
Main primary database for 3-D structures of macromolecules (protein,
RNA, DNA), determined by X-ray crystallography and NMR.
PDB (Protein Data Bank)
Researcher Publication Database
PDB
12. PDB Entry
1. The contents of each entry are separated into polymers and non-polymers.
Polymers = Proteins/DNA/RNA
Non-polymers = water & small molecules
2. Polymers or non-polymers of identical chemical composition (e.g. proteins with
identical amino acid sequences or sugars with identical chemical formula, bond and
stereochemistry) are grouped together to form a distinct chemical entity.
In addition to the 3-dimensional (3D) atomic coordinates, a PDB entry can be explored for a variety of information:
• visualising interactive 3D structure;
• secondary structure, domains and folds present in the proteins;
• biological assembly or quaternary structures for the proteins and DNA/RNA;
• sequence information for all the proteins and nucleic acids that are present in the entry along with their
mapping to UniProt (protein) or GenBank (RNA);
• bound molecules or ligands and their environment;
• source and expression system of the proteins/nucleic acids;
• quality of the structure and experimental information;
• publication information.
13. CATH
Hierarchical classification of protein domain structure.
Class Architecture Topology Homologous Superfamily
Protein
Structure
Content
Orientation of
Secondary
structure
Topological
Connections
Cluster proteins
with highly
similar structures
14. SCOP
• Structural Classification of Proteins
• Provide detailed and comprehensive description of the structural and
evolutionary relationships between all proteins whose structure is
known.