protein structure prediction

80
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka Lecture 14: Protein Structure Prediction

Upload: balachandramohan-bcm

Post on 11-May-2015

2.695 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Lecture 14: Protein Structure Prediction

Page 2: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Review of Proteins

• Proteins: polypeptides with a three dimensional structure

•• Primary structure – sequence of amino

acids constituting polypeptide chain

• Secondary structure – local organization of polypeptide chain into secondary structures such as α helices and β sheets

Page 3: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Review of Proteins

• Tertiary structure –three dimensional arrangements of amino acids as they react to one another due to polarity and interactions between side chains

• Quaternary structure – Interaction of several protein subunits

Page 4: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure

• Proteins: chains of amino acids joined by peptide bonds

• Amino Acids:– Polar (separate positive and negatively charged

regions)– free C=O group (CARBOXYL), can act as

hydrogen bond acceptor– free NH group (AMINYL), can act as hydrogen

bond donor

Page 5: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure

Page 6: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure

• Many confirmations possible due to the rotation around the Alpha-Carbon (Cα) atom

• Confirmational changes lead to differences in three-dimensional structure of protein

Page 7: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure

• Polypeptide chain has pattern of N-Cα-C repeated

• Angle between aminyl group and Cα is PHI (φ) angle; angle between Cα and carboxyl group is PSI (ψ) angle

Page 8: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure

Page 9: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Differences between A.A.’s

• Difference between 20 amino acids is the R side chains

• Amino acids can be separated based on the chemical properties of the side chains:– Hydrophobic– Charged– Polar

Page 10: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Differences between A.A.’s

• Hydrophobic: Alanine(A), Valine(V), phenylalanine (Y), Proline (P), Methionine(M), isoleucine (I), and Leucine(L)

• Charged: Aspartic acid (D), Glutamic Acid (E), Lysine (K), Arginine (R)

• Polar: Serine (S), Theronine (T), Tyrosine (Y); Histidine (H), Cysteine (C), Asparagine (N), Glutamine (Q), Tryptophan (W)

Page 11: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Secondary Structure

• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html

Page 12: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Secondary Structures

• Core of each protein made up of regular secondary structures

• Regular patterns of hydrogen bonds are formed between neighboring amino acids

• Amino acids in secondary structures have similar φ and ψ angles

Page 13: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Secondary Structures

• Structures act to neutralize the polar groups on each amino acid

• Secondary structures tightly packed in protein core and a hydrophobic environment

• Each amino acid side group has a limited space to occupy -- therefore a limited number of possible interactions

Page 14: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Types of Secondary Structures

• α Helices• β Sheets• Loops• Coils

Page 15: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

α Helix• Most abundant secondary

structure

• 3.6 amino acids per turn

• Hydrogen bond formed between every fourth reside

• Average length: 10 amino acids, or 3 turns

• Varies from 5 to 40 amino acids

Image source: http://www.hhmi.princeton.edu/sw/2002/psidelsk/scavengerhunt.htm; http://www4.ocn.ne.jp/~bio/biology/protein.htm

Page 16: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

α Helix

• Normally found on the surface of protein cores

• Interact with aqueous environment– Inner facing side has hydrophobic amino

acids– Outer-facing side has hydrophilic amino

acids

Page 17: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

α Helix

• Every third amino acid tends to be hydrophobic

• Pattern can be detected computationally

• Rich in alanine (A), gutamic acid (E), leucine(L), and methionine (M)

• Poor in proline (P), glycine (G), tyrosine (Y), and serine (S)

Page 18: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

β Sheet

Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html;

http://www4.ocn.ne.jp/~bio/biology/protein.htm

Page 19: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

β Sheet

• Hydrogen bonds between 5-10 consecutive amino acids in one portion of the chain with another 5-10 farther down the chain

• Interacting regions may be adjacent with a short loop, or far apart with other structures in between

Page 20: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

β Sheet

• Directions:– Same: Parallel Sheet– Opposite: Anti-parallel Sheet– Mixed: Mixed Sheet

• Pattern of hydrogen bond formation in parallel and anti-parallel sheets is different

Page 21: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

β Sheet

• Slight counterclockwise rotation

• Alpha carbons (as well as R side groups) alternate above and below the sheet

• Prediction difficult, due to wide range of φ and ψ angles

Page 22: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Interactions in Helices and Sheets

Page 23: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Loop

• Regions between α helices and βsheets

• Various lengths and three-dimensional configurations

• Located on surface of the structure

Page 24: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Loop

• Hairpin loops: complete turn in the polypeptide chain, (anti-parallel β sheets)

• More variable sequence structure

• Tend to have charged and polar amino acids

• Frequently a component of active sites

Page 25: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Coil

• Region of secondary structure that is not a helix, sheet, or loop

Page 26: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Secondary Structure

• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html

Page 27: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

6 Classes of Protein Structure

1) Class α: bundles of α helices connected by loops on surface of proteins

2) Class β: antiparallel β sheets, usually two sheets in close contact forming sandwich

3) Class α/β: mainly parallel β sheets with intervening α helices; may also have mixed βsheets (metabolic enzymes)

Page 28: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

6 Classes of Protein Structure

4) Class α+ β: mainly segregated α helices and antiparallel β sheets

5) Multidomain (α and β) proteins more than one of the above four domains

6) Membrane and cell-surface proteins and peptides excluding proteins of the immune system

Page 29: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

α Class Protein (hemoglobin)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=3hhb;page=;pid=&opt=show&size=250

Page 30: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

β Class Protein (T-Cell CD8)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1cd8;page=;pid=&opt=show&size=500

Page 31: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

α/ β Class Protein(tryptohan synthase)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=2wsy;page=;pid=&opt=show&size=500

Page 32: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

α+β Class Protein(1RNB)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1rnb;page=;pid=&opt=show&size=500

Page 33: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Membrane Protein (10PF)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1opf;page=;pid=&opt=show&size=500

Page 34: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure Databases

• Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques

• Protein Databases:– PDB– SCOP– Swiss-Prot– PIR

Page 35: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure Databases

• Most extensive for 3-D structure is the Protein Data Bank (PDB)

• Current release of PDB (April 8, 2003) has 20,622 structures

Page 36: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Partial PDB FileATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176

Page 37: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Description of PDB File• second column: amino acid position in the

polypeptide chain

• fourth column: current amino acid

• Columns 7, 8, and 9: x, y, and z coordinates (in angstroms)

• The 11th column: temperature factor -- can be used as a measurement of uncertainty

Page 38: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure Classification Databases

• Structural Classification of proteins (SCOP)

• based on expert definition of structural similarities

• SCOP classifies by class, family, superfamily, and fold

• http://scop.mrc-lmb.cam.ac.uk/scop/

Page 39: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure Classification Databases

• Classification by class, architecture, topology, and homology (CATH)

• Classifies proteins into hierarchical levels by class

• a/B and a+B are considered to be a single class

• http://www.biochem.ucl.ac.uk/bsm/cath/

Page 40: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure Classification Databases

• Molecular Modeling Database (MMDB)

• structures from PDB categorized into structurally related groups using the VAST

• looks for similar arrangements of secondary structural elements

• http://www.ncbi.nlm.nih.gov/Entrez

Page 41: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Protein Structure Classification Databases

• Spatial Arrangement of Backbone Fragments (SARF)

• categorized on structural similarities, similar to the MMDB

• http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html

Page 42: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Visualization of Proteins

• A number of programs convert atomic coordinates of 3-d structures into views of the molecule

• allow the user to manipulate the molecule by rotation, zooming, etc.

• Critical in drug design -- yields insight into how the protein might interact with ligands at active sites

Page 43: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Visualization of Proteins

• Most popular program for viewing 3-dimensional structures is Rasmol

Rasmol: http://www.umass.edu/microbio/rasmol/Chime: http://www.umass.edu/microbio/chime/Cn3D: http://www.ncbi.nlm.nih.gov/Structure/Mage: http://kinemage.biochem.duke.edu/website/kinhome.htmlSwiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html

Page 44: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Alignment of Protein Structure

• Three-dimensional structure of one protein compared against three-dimensional structure of second protein

• Atoms fit together as closely as possible to minimize the average deviation

• Structural similarity between proteins does not necessarily mean evolutionary relationship

Page 45: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Alignment of Protein Structure

• Positions of atoms in three-dimensional structures compared

• Look for positions of secondary structural elements (helices and strands) within a protein domain

Page 46: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Alignment of Protein Structure

• Distances between carbon atoms examined to determine degree structures may be superimposed

• Side chain information can be incorporated– Buried; visible

Page 47: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

SSAP

• Secondary Structure Alignment Program

• Incorporates double dynamic programming to produce a structural alignment between two proteins

Page 48: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Steps in SSAP

• 1) Calculate vectors from Cβ of one amino acid to set of nearby amino acids– Vectors from two separate proteins compared– Difference (expressed as an angle) calculated,

and converted to score

• 2) Matrix for scores of vector differences from one protein to the next is computed.

Page 49: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Steps in SSAP

• 3) Optimal alignment found using global dynamic programming, with a constant gap penalty

• 4) Next amino acid residue considered, optimal path to align this amino acid to the second sequence computed

Page 50: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Steps in SSAP

• 5) Alignments transferred to summary matrix– If paths cross same matrix position, scores

are summed– If part of alignment path found in both

matrices, evidence of similarity

Page 51: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Steps in SSAP

• 6) Dynamic programming alignment is performed for the summary matrix– Final alignment represents optimal

alignment between the protein structures– Resulting score converted so it can be

compared to see how closely related two structures are

Page 52: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Distance Matrix Approach

• Uses graphical procedure similar to dot plots

• Identifies atoms that lie most closely together in three-dimensional structure

• Two sequences with similar structure can have dot plots superimposed

Page 53: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Distance Matrix Approach

• Values in distance matrix represent distance between the Cα atoms in the three dimensional structure

• positions of closest packing atoms marked with a dot to highlight regions of interest

• Similar groups superimposed as closely as possible by minimizing sum of atomic distances

Page 54: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

DALI

• Distance Alignment Tool (DALI)

• Uses distance matrix method to align protein structures

• Assembly step uses Monte Carlo simulation to find submatrices that can be aligned

• Existing structures that have been compared are organized into the FSSP database

Page 55: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Fast Structural Similarity Search

• Compare types and arrangements of secondary structures within two proteins

• If elements similarly arranged, three-dimensional structures are similar

• VAST and SARF are programs that use these fast methods

Page 56: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Structural Motifs Based on Sequence Analysis

• Some structural elements can be determined by looking at sequence composition– zinc finger motifs– leucine zippers– coiled-coil structures

Page 57: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Zinc Finger Motifs

• Found by looking at order and spacing of cysteine and histidine residues

• Typical zinc finger motifs are composed of two cysteines followed by two histidines

Image source: www.bmb.psu.edu/faculty/tan/lab/ tanlab_gallery_protdna.html

Page 58: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Leucine Zippers

• Found by looking for two antiparallel alpha helices held together

• Interactions between hydrophobic leucineresidues found every seventh position in helix

Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm

Page 59: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Transmembrane Proteins

• traverse back and forth through alpha helices

• Typical length: 20-30 residues

• Transmembrane alpha helices have hydrophobic residues on the inside facing portions, and hydrophilic residues on the outside

Image source: http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

Page 60: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Membrane Prediction Programs

• PHDhtm: employs neural network approach; neural network trained to recognize sequence patterns and variations of helices in transmembrane proteins of known structures

• Tmpred: functions by searching a protein against a sequence scoring matrix obtained by aligning the sequences of all known transmembrane alpha helix regions

Page 61: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Distance Matrix Approach

• Uses graphical procedure similar to dot plots

• Identifies atoms that lie most closely together in three-dimensional structure

• Two sequences with similar structure can have dot plots superimposed

Page 62: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Distance Matrix Approach

• Values in distance matrix represent distance between the Cα atoms in the three dimensional structure

• positions of closest packing atoms marked with a dot to highlight regions of interest

• Similar groups superimposed as closely as possible by minimizing sum of atomic distances

Page 63: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

DALI

• Distance Alignment Tool (DALI)

• Uses distance matrix method to align protein structures

• Assembly step uses Monte Carlo simulation to find sub-matrices that can be aligned

• Existing structures that have been compared are organized into the FSSP database

Page 64: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Fast Structural Similarity Search

• Compare types and arrangements of secondary structures within two proteins

• If elements similarly arranged, three-dimensional structures are similar

• VAST and SARF are programs that use these fast methods

Page 65: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Structural Motifs Based on Sequence Analysis

• Some structural elements can be determined by looking at sequence composition– zinc finger motifs– leucine zippers– coiled-coil structures

Page 66: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Zinc Finger Motifs

• Found by looking at order and spacing of cysteine and histidine residues

• Typical zinc finger motifs are composed of two cysteines followed by two histidines

Image source: www.bmb.psu.edu/faculty/tan/lab/ tanlab_gallery_protdna.html

Page 67: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Leucine Zippers

• Found by looking for two antiparallel alpha helices held together

• Interactions between hydrophobic leucineresidues found every seventh position in helix

Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm

Page 68: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Transmembrane Proteins

• traverse back and forth through alpha helices

• Typical length: 20-30 residues

• Transmembrane alpha helices have hydrophobic residues on the inside facing portions, and hydrophilic residues on the outside

Image source: http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

Page 69: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Membrane Prediction Programs

• PHDhtm: employs neural network approach; neural network trained to recognize sequence patterns and variations of helices in transmembrane proteins of known structures

• Tmpred: functions by searching a protein against a sequence scoring matrix obtained by aligning the sequences of all known transmembrane alpha helix regions

Page 70: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Chou-Fasman Method

• based on analyzing frequency of amino acids in different secondary structures– A, E, L, and M strong predictors of alpha helices– P and G are predictors in the break of a helix

• Table of predictive values created for alpha helices, beta sheets, and loops

• Structure with greatest overall prediction value greater than 1 used to determine the structure

Page 71: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

GOR Method

• Improves upon the Chou-Fasman method

• Assumes amino acids surrounding the central amino acid influence secondary structure central amino acid is likely to adopt

• Scoring matrices used in GOR method, incorporates information theory and Bayesian statistics

• Mount, p450-451

Page 72: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Neural Network Models

• Programs trained to recognize amino acid patterns located in known secondary structures

• distinguish these patterns from patterns not located in structures

• PHD and NNPREDICT use neural networks

Page 73: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Nearest-neighbor

• machine learning method

• secondary structure confirmation of an amino acid calculated by identifying sequences of known structures similar to the query by looking at the surrounding amino acids

• Nearest-neighbor programs include include PSSP, Simpa96, SOPM, and SOPMA

Page 74: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Prediction of 3d Structures

• Threading is most Robust technique• Time consuming• Requires knowledge of protein structure

Page 75: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Threading

• Searches for structures with similar folds without sequence similarity

• Threading takes a sequence with unknown structure and threads it through the coordinates of a target protein whose structure has been solved– X-ray crystallography– NMR imaging

Page 76: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Threading

• Considered position by position subject to predetermined constraints

• Thermodynamic calculations made to determine most energetically favorable and confirmationally stable alignment

Page 77: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Environmental Template

• Environment of each amino acid in each known structural core is determined– secondary structure– area of side chain buried by closeness to

other atoms– types of nearby side chains

Page 78: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Environmental Template

• Each position classified into one of 18 types– 6 representing increasing levels of residue

burial– three classes of secondary structure (alpha

helices, beta sheets, and loops).

Page 79: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Upcoming Seminars

• Topic TBA– Rafael Irizarry, Johns Hopkins University

• Friday, 4/23/2004• 8:30 AM – 9:30 AM• LOCATION: K-Building Room 2036 (HSC

Campus)

Page 80: Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

Presentations

• 4:45 – 5:00 Richard Jones• 5:00 – 5:15 Steven Xu• 5:15 – 5:30 Olutola Iyun• 5:30 – 5:45 Frank Baker• 5:45 – 6:00 Guanghui Lan• 6:00 – 6:15 Tim Hardin• 6:15 – 6:30 Satish Bollimpalli & Ravi

Gundlapalli