1. Biomedical Genomics 1
- Investigating protein structure using
online tools
Dr L. Therese Bergendahl
Research Fellow
Joseph Marsh Lab
MRC Human Genetics Unit
Medical Research Council Human Genetics Unit
MRC Institute of Genetics and Molecular Medicine
www.igmm.ac.uk
4. Protein Structure – Ultrafast recap
A Linear sequence of amino acids forms a stable 3D structure fold
All α - helices
All β - strands
Combination of the two, either alternating (α/β) or mixed (α+β)
Membrane proteins
Disordered Proteins
Classified by SCOP or CATH
Classic examples are the globin fold, immunoglobulin fold, SH2
domain and TIM barrel
5. Laurents et al., Protein Sci, 1994
3 %
RMS difference (Cα) is 1.6 Å
Sequence similarity is:
Ovomucoid (green) and the C-terminal of
the L7/L12 ribosomal protein (red)
6. Protein Structure
Domains are analogous to folds, from the point of view of the full protein – essentially
the units of folds that can function independently
Proteins interact!
With ligands crucial for function
As members of transient protein signaling networks
As subunits in stable protein complexes
7. Protein Structure
Disordered Proteins are also important, and are overrepresented in signalling
networks
Lack any recognisable 3D structure either entirely or in parts of the structure.
Data. What is it good for?
Sequence
Folds/ Motifs/ Domains
Interaction with other proteins or macromolecules
Stability
Function
Phenotype
9. My protein: inositol-1,4,5-triphosphate receptor 1 (IP3R1)
IGMM crew used exome sequencing and found a set of interesting de-novo mutations in
the ITPR1 gene in individuals with Gillespie Syndrome
Iris hypoplasia and cerebral volume loss common phenotypes in GS patients
Inositol-1,4,5-trisphosphate receptors (InsP3Rs) are ubiquitous ion channels responsible for cytosolic Ca(2+) signalling and essential for a broad array of cellular processes ranging from contraction to secretion, and from proliferation to cell death. Despite decades of research on InsP3Rs, a mechanistic understanding of their structure-function relationship is lacking.
Go to UniProt.org and type in the name of your protein in the search bar on top of the page. Make sure you are searching the protein knowledgebase (UniProtKB). How many hits do you get?
Take a note of interesting aspects such as disease variants and interaction sites.
BLAST (Basic Local Alignment Search Tool) is a sequence similarity search method, in which a query protein or nucleotide sequence is compared to nucleotide or protein sequences in a target database to identify regions of local alignment and report those alignments that score above a given score threshold.
To determine whether matches to the databases are "significant", we use a threshold E-value. The E-value describes the number of hits one can expect to see by chance when searching a database of a particular size. The lower the E-value, the more "significant" a match to a database sequence is (i.e. there is a smaller probability of finding a match just by chance)
STRING uses a spring model to generate the network images. Nodes are modeled as masses and edges as springs; the final position of the nodes in the image is computed by minimizing the 'energy' of the system. We give high confidence edges a higher 'spring strength' so that they will reach an optimal position before lower confidence edges.