about boostthreader
DESCRIPTION
About BoostThreader. Lee, Juyong 2009. 08. 26. What is BoostThreader?. A Sequence-Structure threading program Published by J. Xu’s group Known to be good for hard cases Does not work…… for me……. Let’s thread!. 준비물 : sequence protein structure scoring function algorithm. - PowerPoint PPT PresentationTRANSCRIPT
LEE, JUYONG2009. 08. 26
About BoostThreader
What is BoostThreader?
A Sequence-Structure threading program
Published by J. Xu’s group
Known to be good for hard cases
Does not work…… for me……
Let’s thread!
준비물 : sequence protein structure scoring function algorithm
AB
C D EF
G
Good
BAD
Deletion
Match
Three algorithms for Alignment!
Generative modelTraditional
Hidden Markov ChainNot that old
Conditional Random FieldUp to date
Dynamic programming
I’m your father
I’m Andrei Andreyevich Markov.
Dynamic programing
Finding the best scoring path on the alignment matrix
Initial FinalThe path
The alignment!
More about Dynamic Programming
SEQUENCE
ST
RU
CT
UR
E
deletion
insertion m
atch
A―
Aa
―a
g = Gap penalty = -1
h = Gap penalty = -1
F(i+1, j+1)
Follow the maximum scoring path!
g
hf
In Conventional seq.-str. alignment
Linear sum of similarities of propertiesFunctions for Match and Gap cases are only
needed! Fmatch= w1*predicted SS * real SS
+ w2*predicted SA * real SA + w3*predicted residue depth * real depth + …
Fgap= Opening penalty+# of gaps * Extension penalty
Only consider next step!
What’s different in BoostThreader?
Dependent on the current and next step both! Nine scoring functions are necessary!
Gap penalty is context-dependent
Trained from reference alignments! DALI, TMalign etc……
Regression Trees are used as scoring function Not Linear function!
Regression Tree 는 또 뭔가요 ?
쉬어가는페이지
Hey nature, Not all flies are not Drosophilia
Regression Tree!
100 대의 중고차
1500cc 가 넘는가 ?
20 만 km이상
뛰었는가 ?
5 년이 넘었는가 ?
예아니요
평균 8 백만원
아니요 예
평균 5 백만원
아니요 예
평균 15 백만원 평균 11 백만원
Training!
Example in ThreadingSequence – predicted
properties Structure – observed
properties
SS 가 같은가 ?
SA 정도가 같은가 ?
SA 정도가 같은가 ?
예아니요
확률 0.110 개 중에 1 개
아니요 예
확률 0.310 개 중에 3 개
아니요 예
확률 0.610 개 중에 6 개
확률 0.910 개중에 9 개
Estimate Prob. from examples
Advantage of Tree
Fast
Interaction between variables can be easily considered
What’s really happening in BoostThreader?
Initial Setting Set all F0 (uv,seq(i),str(j))= 0 P ~ exp(F)
30 개의 정답 Sequence-Structure alignment!
Calculate Prob. of all possible state transition! Probabilities of all examples! Forward-backward algorithm
“All Possible” Transitions?
AB–DEa b c d–mmimd
ABab
For MM
Generate examples!
ABbc
ABcd
BDab
BDbc
BDcd
DEab
DEbc
DEcd
Examples(2)
AB–DEa b c d–mmimd
B-ab
For MI
Generate examples!
B-bc
B-cd
A-ab
A-bc
A-cd
D-ab
D-bc
D-cd
E-ab
E-bc
E-cd
Inside BoostThreader
Examples and their probabilities Calculated with the current scoring functions
Modify Scoring Functions 정답이면 F 값 증가 ! : F1=F0 + (1 – P )
오답이면 F 값 감소 ! : F1=F0 - P
Add trees until prediction quality doesn’t increase F=F0+F1+F2+F3+F4+F5+……
Performance
Summary
BoostThreader considers “Current” and “Next” step
Scoring function consists of Regression Trees
Trees are trained based on Examples~
감사합니다 !