2. Protein Data Bank
http://www.rcsb.org/pdb/home/home.do
A repository for 3-D biological
macromolecular structure.
It includes proteins, nucleic acids.
Obtained by X-Ray crystallography (80%) or
NMR spectroscopy (16%).
Transferred to the Research Collaboratory for
Structural Bioinformatics (RCSB) in 1998.
Currently it holds 141616 released structures.
3. freely accessible on the Internet via the websites of its member
organisations (PDBe,PDBj,and RCSB).
The PDB is overseen by an organization called the Worldwide
Protein Data Bank, wwPDB.
The PDB is a key resource in areas of structural biology, such
as structural genomics.
Most major scientific journals, and some funding agencies,
now require scientists to submit their structure data to the PDB.
4. Many other databases use protein structures deposited
in the PDB. For example, SCOP and CATH classify
protein structures, while PDBsum provides a graphic
overview of PDB entries using information from other
sources, such as Gene ontology
5. History
Founded in 1971 by Brookhaven National
Laboratory, New York.
In October 1998,the PDB was transferred to the
Research Collaboratory for Structural
Bioinformatics (RCSB);
In 2003, with the formation of the wwPDB, the
PDB became an international organization.
The founding members are PDBe (Europe), RCSB
(USA), and PDBj (Japan).
6. Content
The PDB database is updated weekly.
As of 12 May 2017, the breakdown of current
holdings is as follows:
7. In the past, the number of structures in the PDB has
grown at an approximately exponential rate passing
the 100,000 structures milestone in 2014.
8. PDB data formats/File Formats
The file format initially used by the PDB was called the PDB
file format.
PDB file format was used to contain the coordinates and related
information.
In the late 1990’s, macromolecular Crystallographic Information file
(mmCIF) evolved.
mmCIF and PDBML
Push in to make structure files completely self-contained descriptions of
the experiment and details of the structure determination.
PDB file format unstructured and obsolete
9. PDB File Format
Text file – you can edit with a text editor e.g. WordPad
Atomic co-ordinates
Rich annotation
Citation
Experimental Method
Biological source e.
Etc.
10. Viewing the data
The structure files may be viewed using one of open
software programme, including JMOL, PYMOL, and
RASMOL.
Some other free, but not open source programs include
ICM-Browser,VMD, MDL Chime, UCSF Chimera, Swiss-
PDB Viewer, StarBiochem(a Java-based interactive
molecular viewer with integrated search of protein
databank), Sirius, and VisProt3DS(a tool for Protein
Visualization in 3D stereoscopic view and other modes).
The RCSB PDB website contains an extensive list of both
free and commercial molecule visualization programs and
web browser plugins.
12. PDB File Format
A deposited set of protein coordinates becomes
an entry in PDB.
A deposited set of protein coordinates becomes
an entry in PDB.
One can search a structure in PDB using the
four-letter code or keywords related to its
annotation.
The identified structure can be viewed directly
online or downloaded to a local computer for
analysis.
The PDB website provides options for retrieval,
analysis, and direct viewing of macromolecular
13. It also provides links to protein structural
classification results
available in databases such as SCOP and CATH.
• The data format in PDB was created in FORTAN
compatible format.
• Header:-
• The header section provides an overview of the
protein and the quality of the structure.
It contains information about the name of the
molecule, source organism, bibliographic reference,
methods of structure determination, resolution,
crystallographic parameters, protein sequence,
cofactors, and description of structure types and
locations and sometimes secondary structure
information.
14. Structure coordinates:-there are a specified number of
columns with predetermined contents.
The ATOM part refers to protein atom information
whereas the HETATM(for heteroatom group) part refers
to atoms of cofactor or substrate molecules.
Approximately ten columns of text and numbers are
designated
They include information for the atom number, atom
name, residue name, polypeptide chain identifier,
residue number, x, y, and z Cartesian coordinates,
temperature factor, and occupancy factor.
The last two parameters, occupancy and temperature
factors, relate to disorders of atomic positions in crystals.
END
Restriction:-
The field width for polypeptide chains is only one
character in width, meaning that no more than 26 chains
can be used in a multisubunit protein model
15.
16. mmCIF and MMDB Formats
The most popular new formats include the
macromolecular crystallographic information file
(mmCIF) and the molecular modeling database
(MMDB) file.
Both formats are highly parsable by computer
software, meaning that information in each field of
a record can be retrieved separately.
These new formats facilitate the retrieval and
organization of information from database
structures.
The mmCIF format is similar to the format for a
relational database in which a set of tables are
used to organize database records.
17. Each table or field of information is explicitly
assigned by a tag and linked to other fields
through a special syntax.
a single line of description in the header section
of PDB is divided into many lines or fields with
each field having explicit assignment of item
names and item values.
Each field starts with an underscore character
followed by category name and keyword
description separated by a period.
Using multiple fields with tags for the same
information has the advantage of providing one-
to-one relationship between item names and item
values.
18.
19. MMDB
Another new format is the MMDB format
developed by the NCBI to parse and sort pieces
of information in PDB.
The objective is to allow the information to be
more easily integrated with GenBank and Medline
through Entrez.
An MMDB file is written in the ASN.1 format which
has information in a record structured as a nested
hierarchy.
This allows faster retrieval than mmCIF and PDB.
Furthermore, the MMDB format includes bond
connectivity information for each molecule, called
a “chemical graph,” which is recorded in the
ASN.1 file.