6b -1 the prediction of protein structures. 6b -2 amino acids ( 胺基酸 )...

81
6B -1 The Prediction of Protein Structures

Post on 19-Dec-2015

329 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -1

The Prediction of Protein Structures

Page 2: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -2

Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

Page 3: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -3

Amino Acids ( 胺基酸 ) 分子

Page 4: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -4

Protein ( 蛋白質 ) 分子

Page 5: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -5

Primary Structure ( 一級結構 ) of Protein

Primary structure: primary sequence of amino acids

牛的胰島素 ( 一種蛋白質 ) 之胺基酸序列:

Page 6: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -6

Secondary Structure ( 二級結構 ) of Protein

Secondary structure: -helix -sheet loop

Page 7: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -7

Tertiary Structure ( 三級結構 ) of Protein

血紅素分子三級結構

Page 8: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -8

Quaternary Structure ( 四級結構 ) of Protein

血紅素分子四級結構

Page 9: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -9

蛋白質動畫

取自 http://elearning.bioinfo.ntu.edu.tw/

Page 10: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -10

蛋白質折疊動畫

取自 http://elearning.bioinfo.ntu.edu.tw/

Page 11: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -11

Relation between Structures Sequence structure function

Page 12: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -12

Reason for Prediction Why do we need protein structure prediction

Biological technique X-ray Crystallography (X-ray 結晶法 ) Nuclear Magnetic Resource(NMR)( 核磁共振 )

Expensive, time-consuming and limit to small or medium protein(~ 700 residues)

Computational strategies

Page 13: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -13

Prediction Competition Advance the methods of identifying protein

structure from sequence CASP(Critical Assessment of Techniques

for Protein Structure Prediction ) http://predictioncenter.org Every 2 years(1994 ~ now) CASP6(Gaeta, Italy, Dec. 2004) CASP7(Pacific Grove, USA, Nov. 2006)

Page 14: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -14

Page 15: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -15

Accuracy Measurement RMSD(Root Mean Square Deviation )

2

1

)(1

N

i

Bi

Ai xx

N

Distance RMSD =

2

1 1

1( )

n nA B

ij iji j

d dn

Page 16: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -16

Prediction of Protein Structures Ab Initio Methods( 重頭起算法 )

Thermodynamics ( 分子熱力學 ) Without reference from other known structures.

Homology Modeling( 同源模擬法 ) Knowledge-based modeling Sequence similarity More accurate

Page 17: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -17

Previous Works

PHDthreader(http://www.embl-heidelberg.de/predictprotein) < 30% of the predicted first hits are true remote homologues Ab initio method

SWISS-MODEL(http://expasy.hcuge.ch/swissmod/SWISS-MODEL.html) An automated knowledge-based protein modeling server

InsightII(http://www.accelrys.com/products/insight/index.html)(Charged) Protein structure prediction

Paircoil(http://ostrich.lcs.mit.edu/cgi-bin/score) Prediction of coiled coil regions

List of other methods or programs http://restools.sdsc.edu/biotools/biotools9.html

Page 18: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -18

Properties of Ab Initio Methods Score functions

HMM(Hidden Markov Model) electrostatics( 電性 ), VdW( 凡得瓦力 ) and H-

bonds( 氫鍵 ) and others. Hydrophobic( 疏水性 ) and hydrophilic( 親水性 )

Protein folding problem

Page 19: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -19

Homology Modeling General presumption:

Little changes on protein sequence would also alter little changes on structure.

Protein identity > 30%

General procedure:1. Database searching and template selection ( 模版選擇 )

2. Energy minimization( 能量最小化 )3. Rationality evaluation( 合理性評估 )

Page 20: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -20

General Procedure of Protein Structure Prediction on Homology Model

Input : S1=SSKCSRLKTFPQNACVYHK Output : The backbone conformation model of S1. Step 1: Select a template.

S2=SVYCSSLACSDHN Step 2: Perform sequence alignment.

S1=SSKCSRLKTFPQNACVYHK

S2=SVYCSSL------ ACSDHN

Page 21: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -21

Step 3 : Find the structurally conversed regions. Copy the coordinators of structurally conversed regions from S2 to S1.

Page 22: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -22

Page 23: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -23

Step 4 : Apply the folding algorithm to position the residues that lose of sequence similarity.

LKTFPQNA 10011001

Page 24: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -24

Step 5 :

- Find the the structure-known proteins with 70% or higher sequence similarity.

- Construct a segment of B-spline curve for every four points.

P

N

T Q

P

AK

TF

P Q

candidate protein structures

K T

FL P

Q N

A

QK

L

TF

A

L

K

F

N

the folding structures

1. 2.

3.

N

L A

Page 25: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -25

Final Conformation

Page 26: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -26

Template Search on Protein Databases

PDB(Protein Data Bank) http://www.rcsb.org/pdb/

Swiss-prot http://tw.expasy.org/sprot/

Classification: CATH(Class, Architecture, Topology and Homologous

superfamily) http://cathwww.biochem.ucl.ac.uk/latest/

SCOP(Structural Classification of Proteins) http://scop.mrc-lmb.cam.ac.uk/scop/index.html

Page 27: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -27

Page 28: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -28

Template Selection Methods (Tools)

How to select Sequence alignment ClustalW, Blastp and others Secondary structure prediction[Al-Lazikani et al.]

Structural reserved blocks ( 結構保留區塊 )

Page 29: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -29

PAM250 Score Matrix A C D E F G H I K L M N P Q R S T V W Y A 2 C -2 12 D 0 -5 4 E 0 -5 3 4 F -4 -4 -6 -5 9 G 1 -3 1 0 -5 5 H -1 -3 1 1 -2 -2 6 I -1 -2 -2 -2 1 -3 -2 5 K -1 -5 0 0 -5 -2 0 -2 5 L -2 -6 -4 -3 2 -4 -2 2 -3 6 M -1 -5 -3 -2 0 -3 -2 2 0 4 6 N 0 -4 2 1 -4 0 2 -2 1 -3 -2 2 P 1 -3 -1 -1 -5 -1 0 -2 -1 -3 -2 -1 6 Q 0 -5 2 2 -5 -1 3 -2 1 -2 -1 1 0 4 R -2 -4 -1 -1 -4 -3 2 -2 3 -3 0 0 0 1 6 S 1 0 0 0 -3 1 -1 -1 0 -3 -2 1 1 -1 0 2 T 1 -2 0 0 -3 0 -1 0 0 -2 -1 0 0 -1 -1 1 3 V 0 -2 -2 -2 -1 -1 -2 4 -2 2 2 -2 -1 -2 -2 -1 0 4 W -6 -8 -7 -7 0 -7 -3 -5 -3 -2 -4 -4 -6 -5 2 -2 -5 -6 17 Y -3 0 -4 -4 7 -5 0 -1 -4 -1 -2 -2 -5 -4 -4 -3 -3 -2 0 10

Page 30: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -30

Blosum62 Matrix A C D E F G H I K L M N P Q R S T V W Y

A 4

C 0 9

D -2 -3 6

E -1 -4 2 5

F -2 -2 -3 -3 6

G 0 -3 -1 -2 -3 6

H -2 -3 1 0 -1 -2 8

I -1 -1 -3 -3 0 -4 -3 4

K -1 -3 -1 1 -3 -2 -1 -3 5

L -1 -1 -4 -3 0 -4 -3 2 -2 4

M -1 -1 -3 -2 0 -3 -2 1 -1 2 5

N -2 -3 1 0 -3 0 -1 -3 0 -3 -2 6

P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -1 7

Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5

R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5

S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4

T -1 -1 1 0 -2 1 0 -2 0 -2 -1 0 1 0 -1 1 4

V 0 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 -2 4

W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -3 -3 11

Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7

Page 31: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -31

Protein Folding Problem Given the primary structure of a protein, to compute it

s 3-dimensional structure. H-P model was Proposed by Dill in 1985 [Dill’85] Minimizing the total free energy The characteristic of each of 20 amino acids:

H (hydrophobic, non-polar) : 1 (hating water, 疏水性 ) P (hydrophilic, polar) : 0 (loving water, 親水性 )

The amino acid sequence of a protein can be viewed as a binary sequence of H’s (1’s) and P’s (0’s).

Page 32: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -32

Example of H-P Model Input sequence: 011001001110010

0 1 1 0

0

1

00

1

11

1 0

0

0

0 1 1 0

0

1

00

1

11

1

0

0

0

Score = 5Score = 3

Page 33: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -33

Protein Folding on H-P Model The protein folding on H-P model: Given a

sequence of 1’s (H’s) and 0’s (P’s), to find a self-avoiding paths embedded in either a 2D or 3D lattice such that the number of pairs of adjacent 1’s is maximized.

NP-complete even for 2D lattice [Hart’97].

Page 34: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -34

U-Fold Algorithm for HP Find a suitable point where to split the string into t

wo substrings. Example :0100101001110101000010

0100----101001

01000010101--1

Page 35: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -35

Ant Colony Optimization System

The ant colony optimization (ACO) algorithm was presented by Dorigo et al. in 1991.

Page 36: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -36

General Lattice Model

Square Lattice Model Triangular Lattice Model

Page 37: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -37

Experiments of Different Models

1b1u 1a6n 118l 102l 1b8k

Cubic 12.08891 13.35721 13.01421 13.98656 17.50644

FCC 10.18907 12.09836 12.39913 11.93452 15.06346

•Measured by RMSD(Å)

•Data source: PDB

•Folding by genetic algorithm 05

101520253035

0 100 200 300 400 500

Sequence Length(residues)

RM

SD

cubic

FCC

σ of cubic

σ of FCC

FCC: Face Center Cubic Model

Page 38: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -38

Structure Alignment by Curve Fitting

B-spline curves

Page 39: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -39

Curve Matching Curve matching - measure function

||||][ 1212 dqdqrdsdsg

T1=T2

A1=A2

q1

q2

s2

s1

s2- s1

C1

C2

B1

B2

Page 40: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -40

- Apply the curve alignment.

Our score function of the curve alignment:

)]1,1[],1,[],,1[max(],[

62

631

31

02

,

ij

ji

wjiAjiAjiAjiA

delse

ddif

ddif

dif

w

Page 41: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -41

Additional Constraints Improvement on the HP model

Prediction results are not successful enough Consideration of hydrophobicity is not enough.

Other features should also be considered Secondary structure elements (SSEs)

1. helix

2. sheet Electrostatic attractions Disulfide bonds

Page 42: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -42

Electrostatic Attractions and Disulfide Bonds

Electrostatic attractions:

Disulfide bond: formed between two C’s

Page 43: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -43

Probabilistic Disulfide Bonds Folding with the constraint of disulfide

bonds.

Page 44: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -44

Experiments for Disulfide Bonds Experiments of folding with disulfide

constraints

0

5

10

15

20

25

0 100 200 300 400 500

Sequence Length

RM

SD

Without SS

SS fold

σ of without SS

σ of SS

Page 45: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -45

Secondary Structures Conformations of helix

Distance between ith amino acid

and (i+4)th amino acid

Page 46: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -46

Secondary Structures Conformations of sheet

Page 47: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -47

Further Improvement--Sliced Lattice Model

The origin lattice models cannot work well. Slice the lattice into little lattices.

Page 48: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -48

Sliced Lattice Model

Page 49: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -49

Global Folding

Page 50: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -50

Experimental Materials

Database: PDB (http://www.rcsb.org/pdb/) April 17, 2005 20,380 proteins

Data of CASP6 (http://predictioncenter.llnl.gov/) 2004

Alignment: Blastp (http://www.ncbi.nlm.nih.gov/) Sequence identity < 90% Blosum-62

Page 51: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -51

Experiment Results Target protein: 1LIN (146)

Template

Protein

Sequence

Similarity

RMSD(‘03) RMSD(‘04) RMSD(‘05)

1CFD 100% 7.34 - -

1TNW 69% 18.72 13.37 10.56

1IQ5 55% 15.15 9.18 7.35

1DTL 52.9% 10.22 7.48 6.17

5PAL 36.4% 12.18 8.43 5.89

Measured by RMSD

Page 52: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -52

Experiment Results Target protein: 1QG1(104)

Template

Protein

Sequence

Similarity

RMSD(‘03) RMSD(‘04)

RMSD(‘05)

1JYQ 90.4% 4.15 - 4.24

1JYU 90.4% 13.89 - 10.89

1SHA 46.7% 4.82 4.82 3.65

1SHD 45.2% 8.89 6.77 5.55

5PDR 24.4% 10.55 8.0 6.76

Measured by RMSD

Page 53: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -53

Experimental Results of CASP6

# of proteins 77

# of positive improvement 59

# of negative improvement 12

Average improvement 21.44%

Average sequence length 208(53~435)

Average template identity 36%

Average template similarity 21%

Compared with Chen’03:

Page 54: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -54

Compared with Palu et al.

Palu et al.[Palu’04], without template FCC lattice model

Page 55: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -55

Comparing with Zheng et al.

Zheng et al. [Zheng02] Homology Lattice model

Page 56: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -56

An Example of Our Results PDB code: 7RSA, Length=124, RMSD =1.48Å

Our result Real structure

Page 57: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -57

Protein Structure Prediction Systemtarget protein 7RSAStep 1: Prepare

Page 58: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -58

Protein Structure Prediction Systemhttp://par.cse.nsysu.edu.tw/main.html

Step 2: Predict

Page 59: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -59

Protein Structure Prediction SystemStep 3: Display result

Page 60: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -60

Protein Structure Prediction SystemStep 3: Display result

Page 61: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -61

Protein Structure Prediction SystemStep 3: Display result

Page 62: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -62

Protein Structure Prediction SystemStep 4: Compare

Our result Real structureRMSD

Page 63: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -63Our result Real structure

Protein Structure Prediction System

Step 4: Compare

Page 64: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -64

Protein Side Chain Packing

Page 65: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -65

Amino Acids & Side-chain

Elements of protein Three groups

Lysine (LYS)

Side-chain

Page 66: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -66

Protein Structure Prediction Input: 1D sequence Output: 3D structure

3D backbone structure in general

Protein structure = Backbone structure +

side-chain structure

ACE GLY ASP VAL GLU LYS GLY LYS LYS ILE PHE VAL GLN

Page 67: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -67

Backbone and Side Chain

Protein SAV1595, Journal of Biomolecular NMR (2004) 29: 391–394

BackboneSide-chain

Page 68: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -68

Protein Side Chain Packing Problem

PSCPP Given the fixed backbone of the protein For each residue of backbone other than Gly

cine, there is a set of possible rotamers. Problem: Choose one suitable rotamer for e

ach residue, such that the total energy of the protein is minimized.

The PSCPP is NP-hard.

Page 69: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -69

Graph Model of PSCPP Problem

Let R = {r1, r2, . . . , rn} be the set of residues of the target protein.

Let an undirected graph G = (V, E) represent the side chain of a protein.

Vi = {vi,j |vi,j does not collide with each backbone atoms }.

Then we have V = ∪Vi and E = {(vi,j , vi+1,k)|vi,j does not collide with vi+1,k}. rotamer

Page 70: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -70

Dihedral Angles

Side-chain Atoms C, C

, O.

Dihedral Angles [Iupa70]

: Ci-1-Ni-C-Ci

: Ni-C-Ci-Ni+1

X1 : Ni-C-C

-Oi

O2i

O1i

Ci

Ci+1

Ni+1

Oi

Ci

H

Ci

Ci

HNi

Ci-1

H

Oi-1

C

H

H

H

X1

Asp

H

X2

Backbone

Side-chain

Page 71: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -71

The Rotamer Library

The accuracy of side chain prediction depends primarily on the quality of rotamer library.

Our rotamer library is a coordinate rotamer library, which reserves the bond lengths and bond angles that do not appear in the standard rotamer library.

The source of our rotamer library is based on 850 proteins, which are the same as the backbone-dependent rotamer library proposed by Dunbrack and Karplus. [Dunb93]

Page 72: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -72

Example of the Rotamer Library [A.A.] [φ] [ψ] [X1] [Prob.][3-D Coordinate]

Page 73: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -73

Formulas of ACO for PSCPP

Pheromone probability formula

Pheromone update formula

0 << 1, is the rate of the pheromone evaporation

1, ii VuVs

)]([)]([

)]([)]([),(

,

,

wt

utusP

ws

usk

skJw

)()()1()1( ,1

,, ttt kus

m

kusus

Page 74: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -74

ACO Prediction for PSCPP

Input: A backbone coordinate data.Output: The route with near minimum score.

Step1: Set parameters and initialize pheromone trails.Step 2: Each ant k chooses one rotamer u of residue i according

to the probability function pk(s, u) for all 1 ≤ i ≤ n, u Vi.Step3: Update the pheromone trails.Step 4: If current best solution has not exceeded some percent a

fter some predefined generations or the number of generations has reached the predefined value, return the route with minimum score; otherwise, go to Step 2.

Page 75: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -75

The Score Function Features in ACO score functions:

The disulfide bondsS1 =BonS #(disulfide bonds),

The hydrogen bondsS2 =BonH #(hydrogen bonds),

The charge-charge interactionsS3 =BonC (#(different charge pairs)− #(same charge pairs)),

The van der Waals interactionsS4 =BonV Ei,j

Energy score function: E = S1 + S2 + S3 + S4

Page 76: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -76

Experiments

Two test sets: 25 proteins from Xiang and Honig 2001 5 proteins from Canutescu et al. 2003

Cutoff value: 20 ° [Xie06, R3]

If X1 is within 20 ° of corresponding angle in the real structure, the prediction angle would be considered correct.

Comparing with SCWRL 3.0 [Canu03] and R3 [Xie06]

Page 77: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -77

Parameters in Experiments Weights of features in

score function

Parameters used in ACO Algorithm

Feature Value

BonS 0.5S4

BonH 5

BonC 2

BonV 1

Parameter Value

Population 50

Generation 300~600

1.0

1.0

Initial Pheromone

1.0

Page 78: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -78

Experimental Results (First Case)

NO.Target Protein Our Method SCWRL 3.0 R3 Method

Protein Length X1 X1 X1

1 1AAC 85 87.1 84.7/95 76.5/86

2 1AHO 54 85.2 68.5/67 64.8/65

3 1B9O 112 70.5 68.8/73 66.1/77

4 1C5E 71 81.7 81.7/86 73.2/82

5 1C9O 53 84.9 66.0/72 71.7/70

6 1CC7 66 80.3 68.2/83 63.6/79

7 1CEX 146 85.6 76.7/82 75.3/77

8 1CKU 60 81.7 76.7/82 68.3/80

Column 5-6: I UPAC-IUB rules / Xie and Sahinidis’s (R3) result

Page 79: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -79

NO.Target Protein Our Method SCWRL 3.0 R3 Method

Protein Length X1 X1 X1

9 1CTJ 61 77.0 68.9/79 70.5/80

10 1CZ9 111 70.3 64.0/73 64.0/76

11 1CZP 83 79.5 77.1/86 73.5/81

12 1D4T 89 77.5 76.4/86 67.4/82

13 1IGD 50 82.0 68.0/74 54.0/68

14 1MFM 118 75.4 68.6/80 70.3/81

15 1PLC 82 72.0 67.1/72 70.7/71

16 1QJ4 221 71.5 72.9/84 67.9/80

17 1QQ4 143 83.9 73.4/78 71.3/78

Experimental Results (First Case)

Page 80: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -80

NO.Target Protein Our Method SCWRL 3.0 R3 Method

Protein Length X1 X1 X1

18 1QTN 134 86.6 74.6/82 67.9/78

19 1QU9 99 79.8 71.7/81 73.7/78

20 1RCF 142 79.6 83.8/86 81.7/80

21 1VFY 63 79.4 69.8/76 71.4/75

22 2PTH 151 82.1 78.8/83 78.1/84

23 3LZT 105 73.3 78.1/86 69.5/82

24 5P2L 144 78.5 70.8/78 63.2/71

25 7RSA 109 75.2 65.1/75 61.5/67

Experimental Results (First Case)

Column 5-6: IUPAC-IUB rules / Xie and Sahinidis’s (R3) result

Page 81: 6B -1 The Prediction of Protein Structures. 6B -2 Amino Acids ( 胺基酸 ) 胺基酸:蛋白質的基本單位,共 20 種

6B -81

Experimental Results (Second Case)

NO.Target Protein Our Method SCWRL 3.0 R3 Method

Protein Length X1 X1 X1

1 1A8I 704 73.4 71.3 / 80 64.1 / 75

2 1B0P 978 70.8 62.3 / 69 - / 66

3 1BU7 399 74.9 70.4 / 78 64.4 / 72

4 1GAI 386 73.6 72.8 / 81 66.6 / 72

5 1XWL 496 71.5 66.7 / 73 61.5 / 72

Column 5-6: IUPAC-IUB rules / Xie and Sahinidis’s (R3) result