forces and prediction of protein structure ming-jing hwang ( 黃明經 ) institute of biomedical...
TRANSCRIPT
Forces and Prediction of Protein Structure
Ming-Jing Hwang ( 黃明經 )Institute of Biomedical SciencesAcademia Sinica
http://gln.ibms.sinica.edu.tw/
Science 2005
Sequence - Structure - Function
MADWVTGKVTKVQNWTDALFSLTVHAPVLPFTAGQFTKLGLEIDGERVQRAYSYVNSPDNPDLEFYLVTVPDGKLSPRLAALKPGDEVQVVSEAAGFFVLDEVPHCETLWMLATGTAIGPYLSILR
Sequence/Structure Gap Current (May 15, 2007) entries in protein sequence and structure
database:
SWISS-PROT/TREMBL : 267,354/4,361,897 PDB : 43,459
Year
Num
ber of
ent
ries
Sequence
Structure
Structural Bioinformatics: Sequence/Structure Relationship
All possible sequences of amino acids
Protein sequences observed in nature
Protein structures observed in nature
100
90
80
70
60
50
40
30
20
10
0
Percent Identity
Twilight zoneMidnight zone
Structure Prediction Methods
0 10 20 30 40 50 60 70 80 90 100
ab initio
Fold recognition
% sequence identity
Homology modeling
Levinthal’s paradox (1969) If we assume three possible states for every flexible
dihedral angle in the backbone of a 100-residue protein, the number of possible backbone configurations is 3200. Even an incredibly fast computational or physical sampling in 10-15 s would mean that a complete sampling would take 1080 s, which exceeds the age of the universe by more than 60 orders of magnitude.
Yet proteins fold in seconds or less!Berendsen
Energy landscapes of protein folding
Borman, C&E News, 1998
Levitt’s lecture for S*
Levitt
Levitt
Other factors Formation of 2nd elements Packing of 2nd elements Topologies of fold Metal/co-factor binding Disulfide bond …
Ab initio/new fold prediction
Physics-based (laws of physics) Knowledge-based (rules of evolution)
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Levitt
Molecular Mechanics (Force Field)
Levitt
1-microsecond MD simulation980ns
- villin headpiece - 36 a.a.- 3000 H2O- 12,000 atoms- 256 CPUs (CRAY)-~4 months- single trajectory
Duan & Kollman, 1998
Protein folding by MDPROTEIN FOLDING:
A Glimpse of the Holy Grail?Herman J. C. Berendsen*
"The Grail had many different manifestations
throughout its long history, and many have claimed to
possess it or its like". We might have seen a glimpse of
it, but the brave knights must prepare for a long
pursuit.
Massively distributed computing SETI@home: Folding@home Distributed folding Sengent’s drug design FightAIDS@home …
Letters to nature (2002)
- engineered protein (BBA5)- zinc finger fold (w/o metal)- 23 a.a.- solvation model- thousands of trajectories each of 5-20 ns, totaling 700 s- Folding@home- 30,000 internet volunteers- several months, or ~a million CPU days of simulation
Massively distributed computing
Energy landscapes of protein folding
Borman, C&E News, 1998
Protein-folding prediction techniqueCGU: Convex Global Underestimation- K. Dill’s group
Challenges of physics-based methods
Simulation time scale Computing power Sampling Accuracy of energy functions
Structure Prediction Methods
0 10 20 30 40 50 60 70 80 90 100
ab initio
Fold recognition
% sequence identity
Homology modeling
Flowchart of homology (comparative) modeling
From Marti-Renom et al.
Fold recognitionFind, from a library of folds, the 3D templatethat accommodates the target sequence best.
Also known as “threading” or “inverse folding”
Useful for twilight-zone sequences
Fold recognition (aligning sequence to structure)
(David Shortle, 2000)
3D->1D score
On X-ray, NMR, and computed models
(Rost, 1996)
Marti-Renom et al. (2000)
Reliability and uses of comparative models
Pitfalls of comparative modeling
Cannot correct alignment errors More similar to template than to true
structure Cannot predict novel folds
Ab initio/new fold prediction
Physics-based (laws of physics) Knowledge-based (rules of evolution)
From 1D 2D 3DLGINCRGSSQCGLSGGNLMVRIRDQACGNQGQTWCPGERRAKVCGTGNSISAYSISAYVQVQSTNNCISGTEACRHLTNLVNHGTEACRHLTNLVNHGCRVCGSDPLYAGNDVSRGQLTVNYVNSC
Tertiary
Primary
Secondary(fragment)
fragment assembly
seq. to str. mapping
CASP Experiments
One lab dominated in CASP4
One group dominates the ab initio (knowledge-based) prediction
Some CASP4 successes
Baker’s group
Ab initio structure prediction server
The prediction of protein structure from amino acid sequence is a grand challenge of computational molecular biology. By using a combination of improved low- and high-resolution conformational sampling methods, improved atomically detailed potential functions that capture the jigsaw puzzle–like packing of protein cores, and high-performance computing, high-resolution structure prediction (<1.5 angstroms) can be achieved for small protein domains (<85 residues). The primary bottleneck to consistent high-resolution prediction appears to be conformational sampling.
Toward High-Resolution de Novo Structure Prediction for Small Proteins --Philip Bradley, Kira M. S. Misura, David Baker (Science 2005)
Science 2003
3D to 1D?
A computer-designed protein (93 aa) with 1.2 A resolution
Structure prediction servers
http://bioinfo.pl/cafasp/list.html
Hybrid approach for solving macromolecular complex structures
Thank You!