about boostthreader

LEE, JUYONG2009. 08. 26

About BoostThreader

What is BoostThreader?

A Sequence-Structure threading program

Published by J. Xu’s group

Known to be good for hard cases

Does not work…… for me……

Let’s thread!

준비물 : sequence protein structure scoring function algorithm

AB

C D EF

G

Good

BAD

Deletion

Match

Three algorithms for Alignment!

Generative modelTraditional

Hidden Markov ChainNot that old

Conditional Random FieldUp to date

Dynamic programming

I’m your father

I’m Andrei Andreyevich Markov.

Dynamic programing

Finding the best scoring path on the alignment matrix

Initial FinalThe path

The alignment!

More about Dynamic Programming

SEQUENCE

ST

RU

CT

UR

E

deletion

insertion m

atch

A―

Aa

―a

g = Gap penalty = -1

h = Gap penalty = -1

F(i+1, j+1)

Follow the maximum scoring path!

g

hf

In Conventional seq.-str. alignment

Linear sum of similarities of propertiesFunctions for Match and Gap cases are only

needed! Fmatch= w1*predicted SS * real SS

+ w2*predicted SA * real SA + w3*predicted residue depth * real depth + …

Fgap= Opening penalty+# of gaps * Extension penalty

Only consider next step!

What’s different in BoostThreader?

Dependent on the current and next step both! Nine scoring functions are necessary!

Gap penalty is context-dependent

Trained from reference alignments! DALI, TMalign etc……

Regression Trees are used as scoring function Not Linear function!

Regression Tree 는 또 뭔가요 ?

쉬어가는페이지

Hey nature, Not all flies are not Drosophilia

Regression Tree!

100 대의 중고차

1500cc 가 넘는가 ?

20 만 km이상

뛰었는가 ?

5 년이 넘었는가 ?

예아니요

평균 8 백만원

아니요 예

평균 5 백만원

아니요 예

평균 15 백만원 평균 11 백만원

Training!

Example in ThreadingSequence – predicted

properties Structure – observed

properties

SS 가 같은가 ?

SA 정도가 같은가 ?

SA 정도가 같은가 ?

예아니요

확률 0.110 개 중에 1 개

아니요 예

확률 0.310 개 중에 3 개

아니요 예

확률 0.610 개 중에 6 개

확률 0.910 개중에 9 개

Estimate Prob. from examples

Advantage of Tree

Fast

Interaction between variables can be easily considered

What’s really happening in BoostThreader?

Initial Setting Set all F0 (uv,seq(i),str(j))= 0 P ~ exp(F)

30 개의 정답 Sequence-Structure alignment!

Calculate Prob. of all possible state transition! Probabilities of all examples! Forward-backward algorithm

“All Possible” Transitions?

AB–DEa b c d–mmimd

ABab

For MM

Generate examples!

ABbc

ABcd

BDab

BDbc

BDcd

DEab

DEbc

DEcd

Examples(2)

AB–DEa b c d–mmimd

B-ab

For MI

Generate examples!

B-bc

B-cd

A-ab

A-bc

A-cd

D-ab

D-bc

D-cd

E-ab

E-bc

E-cd

Inside BoostThreader

Examples and their probabilities Calculated with the current scoring functions

Modify Scoring Functions 정답이면 F 값 증가 ! : F1=F0 + (1 – P )

오답이면 F 값 감소 ! : F1=F0 - P

Add trees until prediction quality doesn’t increase F=F0+F1+F2+F3+F4+F5+……

Performance

Summary

BoostThreader considers “Current” and “Next” step

Scoring function consists of Regression Trees

Trees are trained based on Examples~

감사합니다 !

about boostthreader

Documents