Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester
1. By– Elufer Akram (14/BBT/06)
University Of Science and Technology, Meghalaya
2. What is the Database?
Databases Architecture
Variants Of Biological Database
Nucleotide sequence database
GenBank
NCBI
DDBJ
Protein Sequence Database
PDB ( Protein Data Bank)
TrEMBL, PIR, UniPROT
Collaboration
Main Objectives of Biological Databases
3. Database are convenient system to properly
store, search and retrieve any type of data.
A database helps to easily handle and share
large amount of data and supports large scale
analysis by easy access and data
updation.Further the databases link
information generated from various knowledge
about the subject under consideration
4. Biological databases are libraries of life sciences
information ,collected from scientific
experiments, published literature, high-
throughput experiment technology and
computational analysis. They contain information
from genomics,proteomics,microarry gene
expression.
Information contained in biological databases
includes gene function,structure,localization(both
cellular and chromosomal),biological sequences
and structures.
10. Theses are the primary repositories of data used to
store nucleic acid, protein sequences and structural
information of biological macromolecules.
Some primary databases->
NCBI(The National Centre for Biotechnology
Information),GenBank,DDBJ (DNA data bank of Japan),SWISS-
PROT(Swiss-Prot is the manually annotated and reviewed section of the
UniProt Knowledgebase (UniProtKB)),PIR (Protein Information
Resource),PDB(Protein Data Bank)
This sequence collection of this database is due to the
efforts of basic research from academic industrial and
sequencing lab)
11. This repositories are developed in
collaboration to each other and as a result
contain similar data. However this database
have different user interface to query and
search information available in the database.
12. A Secondary database contain additional
information derived from the analysis of data
available in primary repositories.Secondary
databases are analysed in a variety of ways
and contain different information in different
formats. One of the major primary database
SWISS-PROT is used to derive several other
secondary databases.
Some secondary databases
TrEMBL,Pfam,PROSITE,Profiles,SCOP,CATH
13. A composite database is combines information
from various primary database and makes it
convenient to search the desired information
without querying to all these primary database.
Composite database make searching much
simpler because information from different
resources is gathered in a single database. It has
its own format and different strategies to store
data from various primary database.
Some composite database->
OWL (The Web Ontology Language),MISPX,NRDB (Natural
Resources Database)
14. Created in 1988 as a part of the
National Library of Medicine at NIH
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
– Disseminate biomedical information
Bethesda,MD
15. GenBank, EmBL nucleotide Sequence
Database and DDBJ are major sequence
repositories from which various databases
have been derived.
17. GenBank is the most comprehensive and
annotated collection of publicly available DNA
sequences and is apart of International
Nucleotide Sequence database
Collaboration(INSDC),Which consist of DNA
databank of Japan(DDBJ),The European
Molecular Biology Laboratory(EMBL), And
GenBank at National Centre for Biotechnology
Information(NCBI,USA). A new release of
GenBaNK is made every two months.
18. ACCESSION U07418
VERSION U07418.1 GI:466461
Accession
•Stable
•Reportable
•Universal
Version
Tracks changes in sequence
GI number
NCBI internal use
well annotated
the sequence is the
data
19. The NCBI (The National Centre for Biotechnology
Information) was establish in November 4th
,1988 as a part of the national Library of
medicine (NLM) at the National institute of
Health (NIH),USA .The multidisciplinary
research group consists of Scientist from
diverse fields
(Computers,Mathematics,Biochemistry,
Physics etc.)
23. DNA Data Bank of Japan was established in
1986 at the National Institute of genetics
(NIG),Japan with the support of Ministry of
Education Science, Sports and Culture,Japan.
DDBJ has served as one of the three
collaborating International DNA Databases.
24.
25. Protein has a wide range of database such as SWISS-
PROT , TrEMBL, Protein Information Resource (PIR),
UniPort
SWISS-PROT-> It is a database of protein sequences
and provides high quality with minimum redundancy. It
was created in 1986 at the Department of Medical
Biochemistry, University of Geneva.
SWISS-PROT is a cross referenced with several other
databases including nucleic acid and protein structure
database. It classify its data in to two ways----
i) Core data
ii) Annotation
26.
27. TrEMBL is a computer-annotated supplement
of SWISS-PROT that contains all the
translations of EMBL nucleotide sequence
entries not yet integrated in SWISS-PROT.
These databases are developed by the SWISS-
PROT groups at SIB and at EBI.
It was created in 1996 t with the objective to
fill-up the gap between flow of genomic data
and annotated protein sequences
29. The Protein Information Resource (PIR),
located at Georgetown University Medical
Centre (GUMC), is an integrated public
bioinformatics resource to support genomic
and proteomic research, and scientific studies
PIR was established in 1984 by the National
Biomedical Research Foundation (NBRF) as a
resource to assist researchers and costumers
in the identification and interpretation of
protein sequence information
30. UniProt is a freely accessible database of
protein sequence and functional information,
many entries being derived from genome
sequencing projects. It contains a large
amount of information about the biological
function of proteins derived from the
research literature.
31. The UniProt consortium comprises the European
Bioinformatics Institute (EBI), the Swiss Institute
of Bioinformatics (SIB), and the Protein
Information Resource (PIR). EBI, located at the
Welcome Trust Genome Campus in Hinxton, UK,
hosts a large resource of bioinformatics
databases and services. SIB, located in Geneva,
Switzerland, maintains the ExPASy (Expert Protein
Analysis System) servers that are a central
resource for proteomics tools and databases. PIR,
hosted by the National Biomedical Research
Foundation (NBRF) at the Georgetown University
Medical Centre in Washington, DC, USA, is heir to
the oldest protein sequence database
32.
33. LOCUS: Unique string of 10 letters and numbers in the database. Not maintained
amongst databases, and is therefore a poor sequence identifier.
ACCESSION: A unique identifier to that record, citable entity; does not change
when record is updated. A good record identifier, ideal for citation in publication.
VERSION: New system where the accession and version play the same function as
the accession and gi number.
Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every
time the sequence changes.
PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one
CDS.
Protein gi: Geninfo identifier (gi), a unique integer which will change every time
the sequence changes.
protein_id: Identifier which has the same structure and function as the nucleotide
Differences…..
35. Recognize various data formats, and know what
their primary use.
Know, understand and utilize all types of sequence
identifiers.
Know and understand various feature types
present in the GenBank flat files.
Know and understand the various GenBank
divisions.
Main Objectives of Biological Databases