ligand-binding site prediction based on 3d protein modeling

16
Ligand-binding site prediction based on 3D protein modeling Mina Oh, Keehyoung Joo and Jooyoung Lee Korea Institute for Advanced Study 제 3 제 제제제제제제제제 Aug. 24. 2009

Upload: mark-gallagher

Post on 02-Jan-2016

39 views

Category:

Documents


6 download

DESCRIPTION

Ligand-binding site prediction based on 3D protein modeling. 제 3 회 단백질연구발표회 Aug. 24. 2009. Mina Oh, Keehyoung Joo and Jooyoung Lee Korea Institute for Advanced Study. Ligand-binding site prediction. The detection of ligand-binding sites is the starting point - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ligand-binding site prediction  based on 3D protein modeling

Ligand-binding site prediction

based on 3D protein modeling

Mina Oh, Keehyoung Joo and Jooyoung LeeKorea Institute for Advanced Study

제 3 회 단백질연구발표회Aug. 24. 2009

Page 2: Ligand-binding site prediction  based on 3D protein modeling

Ligand-binding site prediction

• Sequence-Structure-Function

paradigm

Binding Site Prediction

Binding-Ligand candidates

To understand protein

structure and function

relationship

T0457 (PDB code:3dev)

MG Binding

The detection of ligand-binding sites is the starting pointfor protein function identification and drug discovery

Page 3: Ligand-binding site prediction  based on 3D protein modeling

CASP Experiments: world-wide blind test

Page 4: Ligand-binding site prediction  based on 3D protein modeling

Motivation of Method• How can we use 3D protein models to predict

protein binding sites?• There are many known structures bound ligands in

the PDB• In CASP7: we achieved top results in high accuracy

structure prediction category

Putative binding sites

&Binding residues

identification

Predicted protein model

Superimposed model

: Ligand-bound Templates are superimposed Brylinski M. et al. PNAS 105, p129 (2008)

Page 5: Ligand-binding site prediction  based on 3D protein modeling

protein 3D models

Part I Template-based modeling

Part II 3D model based binding-site residue prediction

PDB

Superimposed model

Query sequence

Fold recognition

Protein modelingbased on global optimization

Structure superposition

between model and templates

Templates with

ligands(a)

Templates

Putative binding sites(b)

Clustering of ligands

Determination ofbinding residues(c)

Method details

(a) Calculating all of distances between the centers of masses of ligands by a distance cutoff (1-4.5Å)

(b) Identification of binding residues from the putative binding site by the distance cutoff

- ranking: as cluster size, i.e. the number of ligands in a cluster

(c) The contact residues are determined by detecting all atoms within distance cutoffs (3.0- 4.5Å)

Page 6: Ligand-binding site prediction  based on 3D protein modeling

Benchmark & Test set

• Benchmark: the set of CASP7 function prediction targets– 22 proteins

• Test set: CASP8 (2008)– 27 proteins– In a blind fashion

• Two measures for assessing binding residue predictions

Accuracy (%) = N/P ⅹ100

Coverage (%) = N/T ⅹ100

N: # of correctly predicted binding res. P: # of predicted binding res. T: # of annotated binding res.

Page 7: Ligand-binding site prediction  based on 3D protein modeling

CASP7 Benchmark (22 targets)

Target Class(Difficulty

)

PDB Ligand Ligand binding residues LEE NativeA C A C

T0284 TBM 3b8i OXL-MG2 48 49 50 88 159 212 235 57 57 57 57

T0289 TBM 2gu2 ZN 20 23 115 100 100 100 100T0292 HA-TBM 2cl1 5Z5 12 33 34 66 84-87 88 90 146 160 38 67 38 67T0293 TBM 2h00 SAH 34 36 41 69-71 75 76 91-93 97 119 121 122-124 143-145 147

186 18958 78 81 91

T0308 HA-TBM 2h57 GTP-MG 10-16 31 34 55-56 59 114 115 117 118 147 149 72 72 71 83T0312 TBM 2h6l ZN 89 91 104 100 67 75 100T0313 HA-TBM 2h58 ADP-MG 7 9 10 12 86-92 79 100 79 100T0315 TBM 2gzx NI-NI 6 8 92 128 153 204 100 100 100 100T0316 TBM 2hm

aSAM-MG 12-14 16 18-19 36-38 100 104 108 126-128 152 155 50 76 61 65

T0318 TBM 2hb6 ZN-ZN 252 257 275 334 336 71 100 71 100T0319 FM 2j6a ZN 11 16 112 115 DNP DNP DNP DNPT0320 TBM – FAD 59 61 66 106 107 144 148 161 163-165 181 182 185 188 190

30031 47 30 35

T0324 HA-TBM 2hdo PO4 9-11 104 105 137 38 50 38 50T0329 TBM 2hl0 NA 9 11 189 50 100 50 100T0330 TBM 2hcf MG 9 11 177 60 100 67 67T0332 HA-TBM 2ha8 SAH 87-89 110-112 115 129 130 132 137 138 139 141 144 70 93 75 100T0339 HA-TBM 2hdy PLR 71 72 75 117 119 166 205 207 208 228 230 231 267 268 67 86 75 86T0341 TBM 2h04 PO4-MG 13 15 46 47 179 204 100 100 100 83T0348 TBM/FM 2hf1 ZN 11 14 29 32 43 75 57 100T0369 TBM 2hkv NI 48 123 127 100 100 100 100T0371 TBM 2hx1 MG 19 21 232 100 100 75 100T0372 TBM 2hqy COA 175 190 246 270 272 275 24 67 19 67Avg. 67 83 68 83

DNP: did not predicted

22 different targets

• HA-TBM: easy (6 targets)• TBM: medium (14 targets)• FM: hard (1 target)• TBM/FM (1 target)

PDB-codeA total of 28 biologically relevant bound ligands

Page 8: Ligand-binding site prediction  based on 3D protein modeling

CASP8 Prediction (27 targets)Target Class Ligand PDB Ligand binding residues

LEE LEE-Server NativeA C A C A C

T0391 TBM FES 3d89 57 59 60-62 80 82 83 85 90 100 100 78 82 100T0394 TBM PO4 3dcy 15 16 22 28 66 94 203 204 53 100 71 62 50 100T0396 HA FAD

03963 4 7 8 10 11 15 44 48 49 52 75 77-79 81 82 84 85 87 90 95 98

78 91 83 65 NNS NNS

T0406 TBM NI 3di5 48 127 131 100 100 100 100 100 100T0407 TBM ZN ZN ZN 3e38 44 46 113 214 122 157 51 76 216 62 56 DNP DNP 100 78T0410 Canceled FE 3d3l 206 211 386 12 100 38 100 19 100T0422 TBM/HA ADP 3d8b 78 79 85 86 87 127-132 259 288 289 292 71 80 65 87 76 87T0425 TBM ZN 3czx 11 25 77 100 100 DNP DNP 75 100T0426 HA ZN 3da2 117 119 142 100 100 60 100 67 67

T0430 TBMAMP MG

3dlz49 50 51 54 57 68 70 116 164 165 166 167 168 170 212 213 215 245 246 211 275 277 306 310

59 54 68 71 76 79

T0431 TBM HEM3dax

84 112 116 268 269 272 273 276 343 419 420 425 427-429 432 433 436 471

62 95 DNP DNP 75 95

T0440 TBM FE ZN FE 3dcp 6 8 93 258 123 181 14 40 260 100 44 DNP DNP 100 33T0444 HA FE 2vux 135 198 232 235 100 100 80 100 80 100T0450

HAFAD

3dal24 27-29 47-49 54-57 60 61 63 65 191 193 228- 231 235 252 254 292 338 339 340 372-375

67 100 73 100 73 100

T0453 HA CA CA CA 3ded 76-78 83 0 0 0 0 100 75T0457 TBM MG 3dev 29 83 106 158 67 100 DNP DNP 80 100T0461 HA ZN 3dh1 75 111 114 100 100 100 100 75 100T0470 HA MG 3djb 29 58 59 122 100 100 100 100 100 100T0476 TBM/FM ZN 2k5c 4 7 47 50 - 0 DNP DNP - 0T0477 TBM ADP 3dkp 49 51 53 56 75-80 45 100 56 100 50 100T0478 TBM MG FE 3d19 30 121 248 252 117 154 158 0 0 0 0 0 0T0480 TBM ZN 2k4x 21 24 39 42 100 100 DNP DNP 80 100

T0483 TBMADP MG MG

3dls32 33 40 53 55 92 109-111 114 116 159 160 162 172 173

80 100 88 94 73 100

T0485 TBM SAM 3dlc8 16 28 50-53 72-74 77 99-101 117-119 122 123

71 89 71 89 79 100

T0487 TBM MG 3f73 478 546 548 660 67 50 DNP DNP NNS NNST0490

TBMFAD

3dme10 11 13-15 33-35 43 44 46-48 50 52 171-173 204-206 208 234 272 315 316 348 349 350-354

65 97 65 97 65 100

T0508 TBM SAM 3dou 22 46-52 67-69 82-85 111-113 151 70 100 70 100 70 100Avg. 70 80 68 81 74 82

DNP: did not predicted; NNS: No Native Structure is available

Page 9: Ligand-binding site prediction  based on 3D protein modeling

T0391 (TBM)

Accuracy = 9/10 = 90 %Coverage = 9/9 = 100 %

Magenta X-ray (3d89A)

Blue LEE model

Ligand: FES complex

PDB code: 3d89

HETERO ATOMS: FES

FES 57 59 60 61 62 80 82 83 85

Prediction: FES Binding

GDT-HA: 50.73

Page 10: Ligand-binding site prediction  based on 3D protein modeling

T0425 (TBM)

PDB code: 3czx

HETERO ATOMS: ZN

ZN 11, 25, 77 (H, E, H)

Prediction: ZN Binding

Accuracy = 3/3 = 100 %

Coverage = 3/3 = 100 %

GDT-HA : 50.14

LEE model

Page 11: Ligand-binding site prediction  based on 3D protein modeling

What factor is significant?

LEE ZhangA = 3/3 = 100 %

C = 3/3 = 100 %

A = 1/2 = 50 %

C= 1/3 = 33 %

An example (T0369) from metal (NI)-bound target

Page 12: Ligand-binding site prediction  based on 3D protein modeling

• Data set above 1σ GDT-TS among 100~120 submitted models

• Model quality measures:– Backbone : GDT-TS, GDT-HA, GDT-TL– Side-chain (global & local) : H-bond, χ1 , χ1+2 accuracy – local rmsd

• Investigation according to metal / non-metal targets• Pearson’s correlation coefficient (r) btw model quality

measures and Acc+Cov

22 )()(

))((

YYXX

YYXXr

N

nnxN

X1

1,

Model quality vs Acc+Cov

Page 13: Ligand-binding site prediction  based on 3D protein modeling

Metal-bound Targets

, Correlation coefficient (r) btw each model quality measure and Acc+Cov

Page 14: Ligand-binding site prediction  based on 3D protein modeling

Non-metal-bound TargetsCorrelation coefficient (r) btw each model quality measure and Acc+Cov

Page 15: Ligand-binding site prediction  based on 3D protein modeling

Conclusions• We developed a new method to predict protein binding sites /

ligands using 3D protein models (Acc=70, Cov=80 for LEE)

• Highly accurate 3D models increase the accuracy as well as the

coverage of binding site prediction

– Metal-bond protein: local side-chain accuracy is important factor.

– Non-metal-bond protein: backbone accuracy is important factor

• Future work

– Method development: finding the clustering & contact cutoffs

depending on metal/ non-metal ligands

– Challenging: applying to mutation study & drug discovery

Page 16: Ligand-binding site prediction  based on 3D protein modeling

Thank you for your attention !