Lessons on Protein Structurefrom Lattice Model
HC Lee 李弘謙
Nanjing University Nanjing, China 2002 May 22 – 25
What is a protein?
• Large molecule:
chain of amino acids
• Several tens to thousands residues
• Folds to specific shape
• Biological machines
DNA & Gene
Now we know, for higher life forms: one gene, many proteins
轉錄與翻譯
Gene to Protein
What do proteins do?
• Links Genotype & Phenotype 基因型與現象型• Structural and Functional 結構與功能
– Structural• blood, muscle, bone, etc.
– Functional• catalytic (enzyme), metabolic, neural, reproductive
催化、新陳代謝、神經、 複製
Aberrant gene > malfunction protein > disease
Protein Conformation
Alpha helix
Beta sheets
HIV reverse transcriptase 反轉錄脢
Understanding protein folding
Driving Force for Protein Folding
-Most important is interaction of residues with water – hydrophobic and hydrophilic
Miyazawa-Jernigan Statistical Interaction
Li-Tang-Wingreen’s representation of MJ Matrix
one-bodytwo-body
Theoretical analysis [Wang & Lee, PRL 84 (2000)]
Fit to one (a) and two-body (b) terms
M
J-m
atri
x
Theory
Compare with MJ-matrix
Correct to first order; dominatedone-body term - hydrophobicity
Lattice Model
-Simple way to learn something about a very complex subject
Lattice model
• Represent space (or, in field theory, space-time) by a discrete lattice.
• Represent a structure by a path on the lattice.
• A peptide is a string of residues.
• A peptide whose residues occupy a path is in a state, or have a conformation.
• Residues may interact with each other according to relative distance. Or,
• In mean-field model, residue interacts only with lattice sites.
Putting a binary peptide on 2D lattice
Random coil and compact path
Binary rep’n of Peptide:0101011010010110010110010
• The most important interaction for protein folding is residue with water: residues are hydrophobic ( 厭水 ) or hydrophilic ( 親水 ).
• In real protein in native conformation, hydrophobic residues like to be buried, hydrophilic residues like to be exposed to water.
• Simplest model: divide residues into hydrophobic and hydrophilic, structure into core and surface sites.
• Both peptide and structure are binary sequences.
Mean-Field HP Model
Structure-path on a 2D lattice
Structure-path on a 2D lattice
Pay attention toonly whether path is on a core (1) or a surface (0) site
Structure has a binary representation: 001100110000110000110011000011111100 (from Li et al. PRL 79 (1997) 765-768)
Designability of Structures
-Very, very few structures are good for proteins
Structure space >> observed structures
Protein Designability
The LTW model
Ground state of peptide p is structure s closest to it in n-dimensional hyperspace. All peptides in Voronoi volume of s has s as ground state.
The Hamiltonian H = ½ (p – s)**2 is a mapping of the set of peptides P to the set of sructures S that partitions P into equivalent classes labeled by s in S. Target of each class is the ground state/conformation of the class.
Designability of a structure is the number of peptides in the class mapped to that structure
Vonoroi volume Voronoi volume
In hyperspace, all peptide sequences within the Voronoi volume of a structure is closest to that structure (from Li et al. PRL (1997)).
No. of structures vs designability
Li, Tang and Wingreen, PRL (1997)
Very few structures have high designability
Designability
Num
ber
of s
truc
ture
s
• Shortest possible Hamming distance btw two paths proportional to difference in switchback numbers (n10)
• Few paths have high n10
• Path with high n10 has large Voronoi volume, hence high designability
Paths with high switchback numbers have high designability
[Shih et al. & HCL, PRL 84 (2000)]
Hi switchback > hi design’ty
Distribution of Hamming dist.
Log distrib’n vs switchback no.
Designability vs n10; (a) 6x6 (b) 21-site triangular
Foldability of Peptides
-Vast majority of peptides do not fold
Alpha helices like paths with high switchback numbers
• Conformation degeneracy – disfavor peptides w/ long strings of identical/similar residues
• Hence proteins rarely have long strings of contiguous hydrophobic or hydrophilic residues
• Alternating short stretches of hydrophobic and hydrophilic residues yields structurally non-degenerate and robust conformations
• 0011 switchback motif simulate alpha helix on the surface
• Empirically most alpha helices on surface
Compare with real proteins
• Compare model high designability peptides with binarized (by hydrphobicity) protein sequences in PDB– Represent peptide by frequency of occurrenc
e of set of all binary words of fixed length l=2k
– Has 22k such words, put frequencies on a 2k x 2k lattce
[Shih et al. & HCL, PRE 65 (2002)]
PDBAlpha-HP
HP-LS
PDBAll - PDBAlpha
PDBAll-HP
Oligomer length
Ove
rlap
of
bina
ry s
eque
nce
Highly foldable peptides in HP-modelresemble alpha-helices in real proteins
[Shih et al. PRL 84 (2000)]
In HP model: peptide that folds into high designability conformations correspond to peptides that fold to alph
a helices in real proteins
Many models give designabilitybut not all are correct
• Any Hamiltonian (H) is a mapping of peptide space (P) onto conformation space (C)
• For coarse grained C, H partitions P into equivalent classes, each class corresponding to a point in C
• Designability results from a highly skewed distribution of the SIZES of the classes
• Example. The LS (Large-Small) model: structure dominated by steric effect; small residues inside, large residues outside. Almost same math as HP model; has designability but wrong physics.
PDBAlpha-LS
HP-LS
PDBAll - PDBAlpha
PDBAll-LS
Oligomer length
Ove
rlap
of
bina
ry s
eque
nce
Highly foldable peptides in LS-modeldoes notresemble alpha-helices in real proteins
[Shih et al. PRL 84 (2000)]
Unlike hydrophobicitySteric effect does not play a domina
nt rolein the determination of native struct
ure
Folding Funneland
Free-energy Barrier
-Why is folding so fast yet so slow ?
Folding funnel
Folding Funnel
Folding funnel (picture)
http://www.npaci.edu/envision/v15.4/proteinfolding.html
Free energy and entropy
Free Energy, Entropy and Monte Carlo
Free-energy barrier
Free-energy barrier [Guan, Su, Shih & Lee (2000)]
(a) Biding energy increase with compactness(b) Entropy lost rapidly as bindin
g energy increases(c) Free-energy barrier formed b
y competition btw energy gain and entropy lost
Log
(S)
|E/Enative|
No.
of
cont
acts (c)
(b)
G =
(E
– T
S)/
Ena
tive
|E/Enative| |E/Enative|
low T
high T
annealing
barrier(a)
Getting over the barrier takes all the folding time
Summary of lessons
• Average hydrophobic/hydrophlic property of residues can be understood by simple physics.
• Lattice model useful for examining coarse-grain phenomena.
• Long folding time caused by need to surmount free-energy barrier formed by rapid lost of entropy.
• Designability of structure is a direct consequence of hydrophobic/hydrophlic dichotomy of residues.
• Very few structures are highly designable; those that are have large switchback numbers.
• Very few peptides are foldable; many of those that are alternate rapidly between hydrophobic and hydrophlic residues.
• Highly foldable peptides folded into high designability structures form robust proteins.
• They fold easily into alpha-helices and to a lesser extent to beta-sheets; hence alpha-helices are formed very, very early in folding process, then beta-sheets.
Summary of lessons (cont’d)
Molecular Dynamics - atomistic description of protein
folding
-takes one giga-flop PC to run one-million days to fold a medium small protein
Massively Distributive Computation
• Molecular dynamics. – Atomistic level simulation needed to understand protein f
olding and function relevant to biology and drug design
• Annealing time very long– Boltzmann probability:
one machine x 1 M days = 1 M machines x one day
• Starting a program of massively distributive computation - use screen saver program for simulation
• of Vijay Pande, Stanford
The End謝謝大家