coffee shop f91921025 黃仁暐 f92921029 戴志華 f92921041 施逸優 r93921142 吳於芳...

61
Coffee Shop F91921025 黃黃黃 F92921029 黃黃黃 F92921041 黃黃黃 R93921142 黃黃黃 R94921035 黃黃黃

Upload: joseph-hutchinson

Post on 29-Dec-2015

250 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

Coffee Shop

F91921025 黃仁暐F92921029 戴志華F92921041 施逸優R93921142 吳於芳R94921035 林與絜

Page 2: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 2

Menu

Coffee Shop OpeningWhy coffee shop?

Three FlavorsCOFFEE

T-Coffee

3DCoffee

Remarks

Recipes

Page 3: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 3

Multiple Sequence Alignment

Multiple sequence alignment is one of the most important tool for analyzing biological sequence.

structure prediction

phylogenetic analysis

function prediction

polymerase chain reaction (PCR) primer design.

Page 4: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 4

Multiple Sequence Alignment

However, the accuracy is not good enough.difficult to evaluate the quality of a multiple alignment

algorithmically very hard to produce the optimal alignment

In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.

Page 5: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 5

Before (drinking) COFFEEFor comparative genomics, and why?

Understanding the process of evolution at gross level and local level

Translate DNA sequence data into proteins of known function

Meaning of conservative regions

E. coli, C. elegans, Drosophila, Human…What’s their relationship?

Page 6: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 6

阿拉伯芥

大腸桿菌

酵母菌

集胞藻屬( 藍綠藻類 )

線蟲 果蠅

人類

Classification for genes of different function

Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3rd edition

Page 7: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 7

Comparative genomics vs. multiple sequence alignment

Alignment → conservative region

Conservative region → gene location

Evolution evidence

http://www.public.iastate.edu/~semrich/compgen/

Page 8: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 8

http://gchelpdesk.ualberta.ca/news/02jun05/cbhd_news_02jun05.php

A: human chromosome IB: human chromosome IIC: human chromosome III

Chromosome III region 125-128 Mb was magnified 120X

The alignment between the chromosomes

Page 9: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 9

Our FlavorsCOFFEE: A New Objective Function For Multiple Sequence Alignmnent.

C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998

T-Coffee: A novel method for multiple sequence alignments.

C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000

3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.

O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004

Page 10: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

COFFEE

Page 11: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 11

COFFEE

An objective function for multiple sequence alignments

Cédirc Notredame, Liisa Holm and Desmond G. Higgins

SAGA with COFFEE score

Page 12: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 12

Introduction

COFFEE - Consistency based Objective Function For alignmEnt EvaluationAn objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignmentsOptimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)

Page 13: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 13

Overview of their method

Given a set of sequences to be aligned

a library containing all pairwise alignments between them,

the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.

Page 14: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 14

COFFEE score

×

×

1

1 1,,

1

1 1,,

)(

)(

COFFEE N

i

N

ijjiji

N

i

N

ijjiji

ALENW

ASCOREW

score

librarytheandAbetweensharedarethat

residuesofpairsalignedofnumberASCORE

with

ji

ji

,

, )(

:

Page 15: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 15

COFFEE score

Page 16: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 16

Using COFFEE in SAGAIteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved SAGA follows the general principle of genetic algorithm.

The notion of survival of the fittest

SAGA iteratively does: Evaluate the score of the alignmentsThe fitter an alignment, the more likely it is to survive and produce an offspringAlignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)

Page 17: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 17

ResultsCOFFEE function

SAGA

Optimization of COFFEE function

Effect of optimization

Comparison: COFFEE and others

Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM

COFFEE score & alignment accuracy

等下會看到一堆表格很枯燥,所以請忍耐…

Page 18: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 18

Optimization COFFEE function was optimized by SAGA

Using ClustalW alignments

Using SAGA alignments

Page 19: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 19

Comparison

Multiple alignments of SAGA COFFEE and 5 other methods

PRRP, ClustalW, PILEUP, SAGA MSA, SAM

Performance of SAGA and ClustalW

Comparison of other 5 methods即使 SAGA-COFFEE 不是最好的結果 →跟最好的也相去不遠

Identity level lower → better SAGA-COFFEE results

Page 20: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 20

Page 21: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 21

Ratio of (E+H) residue correctly aligned

Better of worse alignment? SAGA-COFFEE & others

NO such thing as an ideal method

Correctly aligned ratio Better than PRRP

Worse than PRRP

Page 22: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 22

COFFEE score and alignment accuracy

r=0.65

Coffee sequence score

E+H accuracy (%)E+H accuracy (%)

Average identity (%)

由 coffee score 去預測 alignment 的準確度Average identity 並沒有辦

法預測 alignment 的準確度

>85% 的 sequence 都可預測 (error ~ ±10%)

Page 23: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 23

Correlation between score and accuracy

Higher score → higher accuracy

SAGA produces more high-score sequence than ClustalW

Page 24: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

Coffee Break ?

Page 25: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

T-Coffee

Page 26: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 26

T-Coffee

A novel method for multiple sequence alignments

C.Notredame, D. Higgins, J. Heringa

ClustalW with extended library

Page 27: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 27

ClustalW

ClustalW is the core alignment stradegy of T-Coffee, it follows the procedure below:

Pairwise Alignment: calculate distance matrix

Guide TreeUnrooted Neighbor-Joining Tree

Rooted Neighbor-Joining Tree: guide tree with sequence weights

Progressive Alignment: align following the guide tree

Page 28: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 28

Calculate distance matrix

Page 29: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 29

Guide tree

Use Neighbor-Joining Method to build guide tree from distance matrix.

First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor-Joining tree, the guide tree.

Page 30: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 30

Unrooted Neighbor-Joining Tree

Page 31: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 31

Rooted Neighbor-Joining Tree

Page 32: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 32

Progressive Alignment: align following the guide tree

Seq1 Seq2 Seq3 Seq4 Seq5

Alignment 1 Alignment 2

Alignment 3 Final alignment

Page 33: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 33

Progressive-alignment strategy

ProsFaster and saving spaces. (compared with computing all possible multiple alignments)

Cons May not find optimum solution.

Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in.

T-Coffee is an attempt to minimize that effect!“Once a gap, always a gap!”

Page 34: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 34

T-Coffee Algorithm

Generating a primary library of alignments

Derivetion of the primary library weights

Combination of the libraries

Extending the library

Progressive alignment strategy

Page 35: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 35

ClustalW Primary Library (Global)

Lalign Primary Library (Local)

Weighting

Primary Library

Page 36: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 36

Primary Library

Page 37: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 37

ClustalW Primary Library (Global)

Lalign Primary Library (Local)

Weighting

Primary Library

Extension

Extended Library

Page 38: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 38

Extended Library

A

Weight(A-C-B)

= min( Weigh(A-C), Weight(B-C) )

= min( 77, 100 ) = 77

Weight(A-D-B)

= min( Weight(A-D), Weight(B-D) )

= min( 100, 100 ) = 100

Page 39: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 39

Extended Library

SeqA: GARFIELD THE LAST FAT CAT

SeqB: GARFIELD THE FAST CAT

SeqA: GARFIELD THE LAST FAT CAT

SeqB: GARFIELD THE FAST CATA

Page 40: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 40

Extended Library

SeqA: GARFIELD THE LAST FAT CAT

SeqB: GARFIELD THE FAST CAT

ASeqA: GARFIELD THE LAST FAT CAT

SeqB: GARFIELD THE FAST CAT

Page 41: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 41

Progressive Alignment

ClustalW Primary Library (Global)

Lalign Primary Library (Local)

Weighting

Primary Library

Extension

Extended Library

Multiple Alignment Information

Page 42: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 42

Progressive Assignment

Page 43: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 43

Complexity Analysis

complexity of the whole procedure:O(N2L2) + O(N3L) + O(N3) + O(NL2)O(N2L2): computation of the pair-wise libraryO(N3L): computation of the extended pair-wise libraryO(N3): computation of the NJ treeO(NL2): computation of the progressive alignmentN sequences that can be aligned in a multiple alignment of length L

Page 44: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 44

Experiment

Implementation environment

Result 1: Effect of combining local and global alignments without extension; effect of the library extension

Result 2: compared with other multiple sequence alignment methods

Page 45: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 45

Implementation environment

Programming language: ANSI C

Hardware: LINUX platform with Pentium II processors (330 MHz).

Test case: BaliBase database of multiple sequence alignment

Page 46: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 46

Result 1

Table 1: The effect of combining local and global alignments

Name global/local/extend Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total(141) Significance

C ClustalW pw /.../... 70.6 26.7 43.0 56.0 60.0 58.9 7.8

CE ClustalW pw/…/ex 77.1 33.6 47.6 64.8 75.9 66.3 17.7

L .../Lalign pw/... 65.4 12.1 22.8 53.9 66.0 52.0 7.8

LE .../Lalign pw/ex 72.6 25.6 47.2 77.5 85.5 64.2 16.3

CL ClustalW pw/Lalign pw/.. 76.2 32.0 48.3 76.2 74.6 66.5 12.1g

CLE ClustalW pw/Lalign pw /ex 80.6 37.1 52.9 83.2 88.6 72.0

Page 47: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 47

Result 2

Table 2: T-coffee compared with other multiple sequence alignment methods

Method Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total1(141) Total2(141) Significance

Dialign 71.0 25.2 35.1 74.7 80.4 61.5 57.3 11.3ClustalW 78.5 32.2 42.5 65.7 74.3 66.4 58.6 26.2Prrp 78.6 32.5 50.2 51.1 82.7 66.4 59.0 36.9 T-Coffee 80.6 37.1 52.9 83.2 88.6 72.0 68.6

Page 48: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

3DCoffee

Page 49: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 49

3DCoffee

Combining protein sequences and structures within multiple sequence alignments

O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame

T-Coffee with structure information

Page 50: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 50

3DCoffee

Structural information can help to improve the quality of multiple sequence alignments

3DCoffeeCombines protein sequences and structuresIs based on T-Coffee version 2.00Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.

Page 51: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 51

3DCoffee

Use T-Coffee to compileA primary library: a list of weighted pairs of residues.

An extended library: usage the column consistency relationship between all sequences

According to the structure informationFugue, SAP, LSQman

Page 52: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 52

3DCoffee

Fugue – a threading method that aligns a protein sequence with a 3D-structure

SAP – uses DP to compute a pairwise alignment based on a non-rigid structure superposition

LSQman – a rigid body structure superposition package

Page 53: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 53

3DCoffee

Set the weight of new alignment as 100which is the most score of primary library

Add the weighted alignments into the library

Carry out progressive alignment the same as T-Coffee

Page 54: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 54

Remarks

COFFEE : An objective function for multiple sequence alignments

SAGA with COFFEE score

T-Coffee : A novel method for multiple sequence alignments

ClustalW with extended library3DCoffee : Combining protein sequences and structures within multiple sequence alignmentsT-Coffee with structure information

Page 55: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 55

RecipesCLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson*. 1994

COFFEE: A New Objective Function For Multiple Sequence Alignmnent.

C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998

T-Coffee: A novel method for multiple sequence alignments.C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000

3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.

O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004

Page 56: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 56

Q & A

Page 57: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 57

Thank You

Page 58: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 58

Residue scoreSequence score measurement

Global measurement

Residue was scored 9 >90% of the pairs involved in were also present in the reference library

Residue score evaluated → substitution defined

Class 5 substitution → residue score ≥ 5

Page 59: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 59

5566677788888888899999877- - - - -66666666788888888887

vsdvprdlevvaatptslliswdap gslevvaatptslliswdap

Page 60: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 60

• Correct substitution: SAGA > ClustalW

• Lower accuracy: more false positive in SAGA alignment

Page 61: Coffee Shop F91921025 黃仁暐 F92921029 戴志華 F92921041 施逸優 R93921142 吳於芳 R94921035 林與絜

2005/12/14 61

High-scoring residues with high accuracy Higher substitution

category → smaller number of prediction