sumoylation site prediction

19
SUMOylation-site Prediction Denis C. Bauer Fabian A. Buske Mikael Bodén

Upload: denis-bauer

Post on 09-Jun-2015

5.130 views

Category:

Education


1 download

DESCRIPTION

This presentation is about predicting the sites within the primary sequence of a protein that are involved in the SUMOylation process.

TRANSCRIPT

Page 1: SUMOylation site prediction

SUMOylation-site Prediction

Denis C. Bauer

Fabian A. Buske

Mikael Bodén

Page 2: SUMOylation site prediction

Overview

• Background– SUMOylation - what is that ?

• Published predictors• Our approach• What makes SUMO hard to tackle

Page 3: SUMOylation site prediction

SUMO is not 相撲

• Small Ubiquitin-related Modifier is a small protein of 97 amino acids.

• 20% homology to ubiquitin• Post-translational modification• Covalently attached to Lysines• Involved in many

pathways/mechanisms– Transcriptional regulation– Compartmentisation

Page 4: SUMOylation site prediction

SUMOylation pathway

Page 5: SUMOylation site prediction

SUMOylation motif

• One consensus motif [ILV]K.E for about 60% of known sites

However• Not all [ILV]K.E -sites

are SUMOylated

• Not all SUMOylated sites have the consensus motif

TP

FP

FN

Page 6: SUMOylation site prediction

Baseline prediction

Method CC

Regular Expression scanner 0.68

Page 7: SUMOylation site prediction

Comparison with existing predictors

Method CC

Regular Expression scanner 0.68

SUMOpre+ 0.64

SUMOsp‡ 0.26

SUMOplot† 0.48

+ Xu J., BMC Bioinformatics 2008, 9:8‡ Xue Y., Nucleic Acid Res 2006, W254-W257† http://www.abgent.com/doc/sumoplot (commercial)

Page 8: SUMOylation site prediction

Case study : Core histones in yeast

• Identified SUMOylation sites+

– H2B : K6/7, K16/17– H2A : K2, K126– H4 : somewhere in the tail

• No SUMOylation consensus site

• Predictor to date are not able to predict even a single SUMOylation site in the histone sequence

+ Nathan D., Genes Dev 2006, 20(8):966-76

Page 9: SUMOylation site prediction

Our approach

• Identify – window size

– which ML method is best

• Voilá: better predictor !

SequencexxxxKxxxx

SUMOylation1/0ML

Page 10: SUMOylation site prediction

Training in more Detail

wU wD

Protein Sequence K K K

Imbalance in the dataset - more negatives than positives

ML

T010

P110

K

K

SUMOylated K

Not SUMOylated K

Page 11: SUMOylation site prediction

Prediction in more Detail

wU wD

Protein Sequence K K K

TrainedML

1

1

0

K

K

K

K

SUMOylated K

Not SUMOylated K

Page 12: SUMOylation site prediction

ML methods

• Bidirectional Recurrent Neural Network (BRNN)– Using information of flanking windows– Decaying with distance to center window– Prone to overfit

• Support Vector Machine (SVM)– regularized– requires suitable kernel and feature representation – Standard Kernels

• Linear, Polynomial, RBF

– String Kernel• P-kernel, local-alignment kernel

Page 13: SUMOylation site prediction

Data set

• Training/Testing data– 144 proteins with – 241 SUMOylation sites– 5,741 non-SUMOylated Lysines– 68% of the SUMOulated sites confom to the

consensus motif

• Hold-out – 13 proteins with– 27 SUMOylation sites– 48% consensus motif

Xu J., BMC Bioinformatics 2008, 9:8

Page 14: SUMOylation site prediction

Evaluation

• 5-fold cross-validation• Matthews correlation

coefficient (CC)• Sensitivity, Specificity,

Accuracy

• Area under the curve (AUC)

Page 15: SUMOylation site prediction

Performance overview

SUMOsvm

Page 16: SUMOylation site prediction

Comparison with existing methods

Page 17: SUMOylation site prediction

Quest to improve performance

• Protein structural features and evolutionary features

• Separating SUMOylation sites from different species or compartment

• Clustering for other motifs using kernel hierarchical clustering

Page 18: SUMOylation site prediction

Summary

• Regular Expression Scanner is still the best classifier.

• SUMO more versatile than expected !

• The road to better predictions– Are there other motifs?

– Which features can discriminate?

– Is the dataset biased?

htt

p://

spo

t.co

lora

do

.ed

u/~

cole

ma

b/T

hea

tre

_Re

sou

rce

s/S

um

oB

alle

rin

a.jp

g

Page 19: SUMOylation site prediction

Acknowledgment

Predictor/Analysis– Mikael Bodén– Fabian Buske

Dataset– Xu et al.

PhD Supervisors– Tim Bailey– Andrew Perkins– Mikael Bodén

Other Bioinformatic tools:STREAM – a practical workbench for modeling

transcriptional regulation.www.bioinformatics.org.au/stream/