SlideShare a Scribd company logo
1 of 37
By– Elufer Akram (14/BBT/06)
University Of Science and Technology, Meghalaya
 What is the Database?
 Databases Architecture
 Variants Of Biological Database
 Nucleotide sequence database
 GenBank
 NCBI
 DDBJ
 Protein Sequence Database
 PDB ( Protein Data Bank)
 TrEMBL, PIR, UniPROT
 Collaboration
 Main Objectives of Biological Databases
 Database are convenient system to properly
store, search and retrieve any type of data.
A database helps to easily handle and share
large amount of data and supports large scale
analysis by easy access and data
updation.Further the databases link
information generated from various knowledge
about the subject under consideration
 Biological databases are libraries of life sciences
information ,collected from scientific
experiments, published literature, high-
throughput experiment technology and
computational analysis. They contain information
from genomics,proteomics,microarry gene
expression.
 Information contained in biological databases
includes gene function,structure,localization(both
cellular and chromosomal),biological sequences
and structures.
Information system
Query system
Storage System
Data
Databases Architecture
Information system
Query system
Storage System
Data
GenBank flat file
PDB file
Interaction Record
Title of a book
Book
Databases Architecture
Information system
Query system
Storage System
Data
Boxes
Oracle
MySQL
PC binary files
Unix text files
Bookshelves
Databases Architecture
The Google
Entrez
SRS
Information system
Query system
Storage System
Data
Databases Architecture
 1. Primary Database.
 2. Secondary database.
 3. Composite Database.
 Theses are the primary repositories of data used to
store nucleic acid, protein sequences and structural
information of biological macromolecules.
 Some primary databases->
NCBI(The National Centre for Biotechnology
Information),GenBank,DDBJ (DNA data bank of Japan),SWISS-
PROT(Swiss-Prot is the manually annotated and reviewed section of the
UniProt Knowledgebase (UniProtKB)),PIR (Protein Information
Resource),PDB(Protein Data Bank)
This sequence collection of this database is due to the
efforts of basic research from academic industrial and
sequencing lab)
 This repositories are developed in
collaboration to each other and as a result
contain similar data. However this database
have different user interface to query and
search information available in the database.
 A Secondary database contain additional
information derived from the analysis of data
available in primary repositories.Secondary
databases are analysed in a variety of ways
and contain different information in different
formats. One of the major primary database
SWISS-PROT is used to derive several other
secondary databases.
 Some secondary databases
TrEMBL,Pfam,PROSITE,Profiles,SCOP,CATH
 A composite database is combines information
from various primary database and makes it
convenient to search the desired information
without querying to all these primary database.
 Composite database make searching much
simpler because information from different
resources is gathered in a single database. It has
its own format and different strategies to store
data from various primary database.
Some composite database->
OWL (The Web Ontology Language),MISPX,NRDB (Natural
Resources Database)
Created in 1988 as a part of the
National Library of Medicine at NIH
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
– Disseminate biomedical information
Bethesda,MD
 GenBank, EmBL nucleotide Sequence
Database and DDBJ are major sequence
repositories from which various databases
have been derived.
 GenBank File format
 GenBank is the most comprehensive and
annotated collection of publicly available DNA
sequences and is apart of International
Nucleotide Sequence database
Collaboration(INSDC),Which consist of DNA
databank of Japan(DDBJ),The European
Molecular Biology Laboratory(EMBL), And
GenBank at National Centre for Biotechnology
Information(NCBI,USA). A new release of
GenBaNK is made every two months.
ACCESSION U07418
VERSION U07418.1 GI:466461
Accession
•Stable
•Reportable
•Universal
Version
Tracks changes in sequence
GI number
NCBI internal use
well annotated
the sequence is the
data
 The NCBI (The National Centre for Biotechnology
Information) was establish in November 4th
,1988 as a part of the national Library of
medicine (NLM) at the National institute of
Health (NIH),USA .The multidisciplinary
research group consists of Scientist from
diverse fields
(Computers,Mathematics,Biochemistry,
Physics etc.)
GenBank
Sequencing
Centers
TATAGCCG TATAGCCGTATAGCCG TATAGCCG
Labs
Algorithms
UniGene
Curators
RefSeq
Genome
Assembly
TATAGCCG
AGCTCCGATA
CCGATGACAA
Updated
continually
by NCBI
Updated ONLY
by submitters
 DNA Data Bank of Japan was established in
1986 at the National Institute of genetics
(NIG),Japan with the support of Ministry of
Education Science, Sports and Culture,Japan.
DDBJ has served as one of the three
collaborating International DNA Databases.
 Protein has a wide range of database such as SWISS-
PROT , TrEMBL, Protein Information Resource (PIR),
UniPort
SWISS-PROT-> It is a database of protein sequences
and provides high quality with minimum redundancy. It
was created in 1986 at the Department of Medical
Biochemistry, University of Geneva.
SWISS-PROT is a cross referenced with several other
databases including nucleic acid and protein structure
database. It classify its data in to two ways----
i) Core data
ii) Annotation
 TrEMBL is a computer-annotated supplement
of SWISS-PROT that contains all the
translations of EMBL nucleotide sequence
entries not yet integrated in SWISS-PROT.
These databases are developed by the SWISS-
PROT groups at SIB and at EBI.
 It was created in 1996 t with the objective to
fill-up the gap between flow of genomic data
and annotated protein sequences
 PIR HomePage
 The Protein Information Resource (PIR),
located at Georgetown University Medical
Centre (GUMC), is an integrated public
bioinformatics resource to support genomic
and proteomic research, and scientific studies
 PIR was established in 1984 by the National
Biomedical Research Foundation (NBRF) as a
resource to assist researchers and costumers
in the identification and interpretation of
protein sequence information
 UniProt is a freely accessible database of
protein sequence and functional information,
many entries being derived from genome
sequencing projects. It contains a large
amount of information about the biological
function of proteins derived from the
research literature.
 The UniProt consortium comprises the European
Bioinformatics Institute (EBI), the Swiss Institute
of Bioinformatics (SIB), and the Protein
Information Resource (PIR). EBI, located at the
Welcome Trust Genome Campus in Hinxton, UK,
hosts a large resource of bioinformatics
databases and services. SIB, located in Geneva,
Switzerland, maintains the ExPASy (Expert Protein
Analysis System) servers that are a central
resource for proteomics tools and databases. PIR,
hosted by the National Biomedical Research
Foundation (NBRF) at the Georgetown University
Medical Centre in Washington, DC, USA, is heir to
the oldest protein sequence database
LOCUS: Unique string of 10 letters and numbers in the database. Not maintained
amongst databases, and is therefore a poor sequence identifier.
ACCESSION: A unique identifier to that record, citable entity; does not change
when record is updated. A good record identifier, ideal for citation in publication.
VERSION: New system where the accession and version play the same function as
the accession and gi number.
Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every
time the sequence changes.
PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one
CDS.
Protein gi: Geninfo identifier (gi), a unique integer which will change every time
the sequence changes.
protein_id: Identifier which has the same structure and function as the nucleotide
Differences…..
International Nucleotide Sequence Database Collaboration
GenBank EMBL DDBJ
 Recognize various data formats, and know what
their primary use.
 Know, understand and utilize all types of sequence
identifiers.
 Know and understand various feature types
present in the GenBank flat files.
 Know and understand the various GenBank
divisions.
Main Objectives of Biological Databases
 WIKIPEDIA
 NCBI
 DDBJ
 PDB
 GenBank
 PIR
 SWISS-PROT/UniPROT
THANK YOU

More Related Content

What's hot

Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and BioinformaticsSharif Shuvo
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological databaseKAUSHAL SAHU
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Sachin Kumar
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Biological databases
Biological databasesBiological databases
Biological databasesAfra Fathima
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide ArchiveEBI
 

What's hot (20)

Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
 
Structural databases
Structural databases Structural databases
Structural databases
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
NCBI
NCBINCBI
NCBI
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
EMBL
EMBLEMBL
EMBL
 
Protein database
Protein databaseProtein database
Protein database
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide Archive
 
protein data bank
protein data bankprotein data bank
protein data bank
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Viewers also liked

BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 

Viewers also liked (8)

GenomeBrowser
GenomeBrowserGenomeBrowser
GenomeBrowser
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Protein databases
Protein databasesProtein databases
Protein databases
 

Similar to Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester

Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanksNithyaNandapal
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu KAUSHAL SAHU
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEPrashantSharma807
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptxSwarup Malakar
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsRaj Varun
 

Similar to Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester (20)

Biological database
Biological databaseBiological database
Biological database
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
What are Databases?
What are Databases?What are Databases?
What are Databases?
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Recently uploaded

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 

Recently uploaded (20)

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 

Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester

  • 1. By– Elufer Akram (14/BBT/06) University Of Science and Technology, Meghalaya
  • 2.  What is the Database?  Databases Architecture  Variants Of Biological Database  Nucleotide sequence database  GenBank  NCBI  DDBJ  Protein Sequence Database  PDB ( Protein Data Bank)  TrEMBL, PIR, UniPROT  Collaboration  Main Objectives of Biological Databases
  • 3.  Database are convenient system to properly store, search and retrieve any type of data. A database helps to easily handle and share large amount of data and supports large scale analysis by easy access and data updation.Further the databases link information generated from various knowledge about the subject under consideration
  • 4.  Biological databases are libraries of life sciences information ,collected from scientific experiments, published literature, high- throughput experiment technology and computational analysis. They contain information from genomics,proteomics,microarry gene expression.  Information contained in biological databases includes gene function,structure,localization(both cellular and chromosomal),biological sequences and structures.
  • 5. Information system Query system Storage System Data Databases Architecture
  • 6. Information system Query system Storage System Data GenBank flat file PDB file Interaction Record Title of a book Book Databases Architecture
  • 7. Information system Query system Storage System Data Boxes Oracle MySQL PC binary files Unix text files Bookshelves Databases Architecture
  • 8. The Google Entrez SRS Information system Query system Storage System Data Databases Architecture
  • 9.  1. Primary Database.  2. Secondary database.  3. Composite Database.
  • 10.  Theses are the primary repositories of data used to store nucleic acid, protein sequences and structural information of biological macromolecules.  Some primary databases-> NCBI(The National Centre for Biotechnology Information),GenBank,DDBJ (DNA data bank of Japan),SWISS- PROT(Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB)),PIR (Protein Information Resource),PDB(Protein Data Bank) This sequence collection of this database is due to the efforts of basic research from academic industrial and sequencing lab)
  • 11.  This repositories are developed in collaboration to each other and as a result contain similar data. However this database have different user interface to query and search information available in the database.
  • 12.  A Secondary database contain additional information derived from the analysis of data available in primary repositories.Secondary databases are analysed in a variety of ways and contain different information in different formats. One of the major primary database SWISS-PROT is used to derive several other secondary databases.  Some secondary databases TrEMBL,Pfam,PROSITE,Profiles,SCOP,CATH
  • 13.  A composite database is combines information from various primary database and makes it convenient to search the desired information without querying to all these primary database.  Composite database make searching much simpler because information from different resources is gathered in a single database. It has its own format and different strategies to store data from various primary database. Some composite database-> OWL (The Web Ontology Language),MISPX,NRDB (Natural Resources Database)
  • 14. Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases – Research in computational biology – Develop software tools for sequence analysis – Disseminate biomedical information Bethesda,MD
  • 15.  GenBank, EmBL nucleotide Sequence Database and DDBJ are major sequence repositories from which various databases have been derived.
  • 17.  GenBank is the most comprehensive and annotated collection of publicly available DNA sequences and is apart of International Nucleotide Sequence database Collaboration(INSDC),Which consist of DNA databank of Japan(DDBJ),The European Molecular Biology Laboratory(EMBL), And GenBank at National Centre for Biotechnology Information(NCBI,USA). A new release of GenBaNK is made every two months.
  • 18. ACCESSION U07418 VERSION U07418.1 GI:466461 Accession •Stable •Reportable •Universal Version Tracks changes in sequence GI number NCBI internal use well annotated the sequence is the data
  • 19.  The NCBI (The National Centre for Biotechnology Information) was establish in November 4th ,1988 as a part of the national Library of medicine (NLM) at the National institute of Health (NIH),USA .The multidisciplinary research group consists of Scientist from diverse fields (Computers,Mathematics,Biochemistry, Physics etc.)
  • 20.
  • 21.
  • 23.  DNA Data Bank of Japan was established in 1986 at the National Institute of genetics (NIG),Japan with the support of Ministry of Education Science, Sports and Culture,Japan. DDBJ has served as one of the three collaborating International DNA Databases.
  • 24.
  • 25.  Protein has a wide range of database such as SWISS- PROT , TrEMBL, Protein Information Resource (PIR), UniPort SWISS-PROT-> It is a database of protein sequences and provides high quality with minimum redundancy. It was created in 1986 at the Department of Medical Biochemistry, University of Geneva. SWISS-PROT is a cross referenced with several other databases including nucleic acid and protein structure database. It classify its data in to two ways---- i) Core data ii) Annotation
  • 26.
  • 27.  TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. These databases are developed by the SWISS- PROT groups at SIB and at EBI.  It was created in 1996 t with the objective to fill-up the gap between flow of genomic data and annotated protein sequences
  • 29.  The Protein Information Resource (PIR), located at Georgetown University Medical Centre (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies  PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers and costumers in the identification and interpretation of protein sequence information
  • 30.  UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature.
  • 31.  The UniProt consortium comprises the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). EBI, located at the Welcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Centre in Washington, DC, USA, is heir to the oldest protein sequence database
  • 32.
  • 33. LOCUS: Unique string of 10 letters and numbers in the database. Not maintained amongst databases, and is therefore a poor sequence identifier. ACCESSION: A unique identifier to that record, citable entity; does not change when record is updated. A good record identifier, ideal for citation in publication. VERSION: New system where the accession and version play the same function as the accession and gi number. Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes. PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one CDS. Protein gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes. protein_id: Identifier which has the same structure and function as the nucleotide Differences…..
  • 34. International Nucleotide Sequence Database Collaboration GenBank EMBL DDBJ
  • 35.  Recognize various data formats, and know what their primary use.  Know, understand and utilize all types of sequence identifiers.  Know and understand various feature types present in the GenBank flat files.  Know and understand the various GenBank divisions. Main Objectives of Biological Databases
  • 36.  WIKIPEDIA  NCBI  DDBJ  PDB  GenBank  PIR  SWISS-PROT/UniPROT