Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics
2. • Biological databases are libraries of life sciences information, collected from scientific
experiments, published literature, high-throughput experiment technology, and
computational analysis
• They contain information from research areas including genomics, proteomics,
metabolomics, microarray gene expression, and phylogenetics
• Information contained in biological databases includes gene function, structure,
localization (both cellular and chromosomal), clinical effects of mutations as well as
similarities of biological sequences and structures.
2
4. • Primary databases : Experimental results are submitted directly into the database by
researchers, and the data are essentially archival in nature
• Secondary databases : Secondary databases comprise data derived from the results
of analysing primary data
Primary Nucleotide
Database
Secondary Nucleotide
database
Primary protein
database
Secondary protein
database
GenBank Unigene PIR PROSITE
EMBL Ensembl SWISS-PROT PRINTS
DDBJ EMI genomics NRL-3D TrEMBL
4
5. • GenBank : (www.ncbi.nlm.nih.gov/Genbank/-) maintained by the National Center
for Biotechnology Information (NCBI) ,contains nucleotide and aminoacid sequences
& it is a part of international nucleotide sequence database collaboration.
• EMBL : (www.ebi.ac.uk/embl/-) The EMBL (European Molecular Biology
Laboratory) nucleotide sequence database is maintained by the European
Bioinformatics Institute (EBI) and it incorporates ,organises,distributes nucleotide
sequences from public sources.
• DDBJ : (www.ddbj.nig.ac.jp) - DNA databank of japan , a biological database that
collects DNA sequences.
• Unigene : UniGene is a NCBI database of the transcriptome and thus, despite the
name, not primarily a database for genes
• Ensembl and EMI genomics : contains data derived from EMBL-EMI
NUCLEOTIDE DATABASE
5
6. PROTEIN DATABASES
• PIR : Protein Information Resources - maintained by National Biomedical Research
Foundation (NBRF)
• SWISS-PROT : produced by EMBL , maintained by SIB
• NRL-3D : produced by PIR
• Uniprot : (Universal protein resource) a central repository of protein data created by
combining Swiss-Prot,TrEMBl and PIR-PSD databases
• PROSITE : It consists of entries describing the protein families, domains and
functional sites as well as amino acid patterns and profiles in them
• PRINTS : protein fingerprint database - it provides both a detailed annotation
resource for protein families, and a diagnostic tool for newly determined sequences
• TrEMBL : (translated EMBL) is a "computer-annotated supplement of Swiss-prot
that contains all the translations of EMBL nucleotide sequence entries not yet
integrated in Swiss-prot 6
7. STRUCTURAL DATABASES
• PDB : (www.rcsb.org) (protein data bank)- a crystallographic database obtained by
X-ray crystallography, NMR spectroscopy for three dimensional structure data of
large biomolecules like proteins and nucleic acids
• SCOP : The Structural Classification of Proteins (SCOP) database is a largely
manual classification of protein structural domains based on similarities of their
structures and amino acid sequences
• CATH : Class Architecture Topology Homology - CATH Protein Structure
Classification database is a free, publicly available online resource that provides
information on the evolutionary relationships of protein domains
7
8. • KEGG : Kyoto Encyclopedia of Genes and Genomes is a collection of databases
dealing with genomes, biological pathways, diseases, drugs, and chemical substances
• SMPDB : The Small Molecule Pathway Database (SMPDB) is a comprehensive,
high-quality, freely accessible, online database containing more than 600 small
molecule (i.e. metabolic) pathways found in humans
• BioCyc : The BioCyc database collection is an assortment of organism specific
Pathway/ Genome Databases (PGDBs). They provide reference to genome and
metabolic pathway information for thousands of organisms
8