the algorithm for constructing phylogenetic tree

28
The Algorithm for Constru cting Phylogenetic Tree ---by MYZ

Upload: myra-boone

Post on 03-Jan-2016

73 views

Category:

Documents


0 download

DESCRIPTION

The Algorithm for Constructing Phylogenetic Tree. ---by MYZ. what's the phylogenetic tree. common ancestor. the phylogenetic tree is used to express the evolutionary relationship among species. siamang 合趾猴. hylobatidae 长臂猿. orangutan 猩猩. human 人类. chimpanzee 黑猩猩. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Algorithm for Constructing Phylogenetic Tree

The Algorithm for Constructing Phylogenetic Tree

---by MYZ

Page 2: The Algorithm for Constructing Phylogenetic Tree

what's the phylogenetic tree

siamang合趾猴

hylobatidae长臂猿

orangutan猩猩

human人类

chimpanzee黑猩猩

The Evolutionary Tree for Some Primates

the phylogenetic tree is used to express the evolutionary relationship among species

common ancestor

generally speaking , the phylogenetic tree is a binary tree

Page 3: The Algorithm for Constructing Phylogenetic Tree

the value of the research

1> Infer evolutionary history

2> estimate the evolution time of the existing species

3> exploit the molecular information to offset the shortage of the fossil

Page 4: The Algorithm for Constructing Phylogenetic Tree

maximum parsimony( 最大简约法 )

maximum likelihood (最大似然法)

distance matrix (距离矩阵法)

The common methods

1

2

3

Page 5: The Algorithm for Constructing Phylogenetic Tree

introduction of the maximum parsimony

introduction of maximum likelihood

educe how to use heuristic algorithm copperate with max likelihood

The framework of this presentation

explain the distance matrix in detail

transform the problem to TSP

and then we can use heuristic algorithm and approximation algorithm to construct the phylogenetic tree

Page 6: The Algorithm for Constructing Phylogenetic Tree

The max parsimony

basic principle :

constructing a phylogenetic tree with minimum amino acid substitution

eg: a : G A A A T T G C b : G A A C T T G T c : G C A C T T G T d : G C C C T T G T e : G C C A T T G T

GAACTTGT

GCCATTGT

GCACTTGT GCCCTTGT

GAAATTGC GAACTTGT GCACTTGT GCCCTTGT

1 11 1

max parsimony

Page 7: The Algorithm for Constructing Phylogenetic Tree

The max likelihood

basic principle :

compute the probability of a particular set of sequences on a given tree and maximizing this probability over all trees.

Input: a set of sequences , a given pattern tree

Output: the likelihood value of the tree

Target: the tree structure wiht max likelihood value

8

2 3 4 5

7

6

1

0

t6 t8

t7

t1 t2

t3t4 t5

max likelihood

Page 8: The Algorithm for Constructing Phylogenetic Tree

how to compute the likelihood value

the probability of a given set of data arising on a given tree can be computed site by site

S

S[i])|logL(TS)|L(T

8

2 3 4 5

7

6

1

0

t6 t8

t7

t1 t2

t3t4 t5

1: ATCGGGTGTGTGCAGTGCTG2: ATGCCTTGTGTGCAGTGCTG3: ATGCCTTACTGTGCAGTGCT4: GTCAAATCGTGATCGATAGCT5: ATGCTAGTTGCTAGCATAGAT

L(T | S1) L(T | S2) L(T | Sn)…

max likelihood

Page 9: The Algorithm for Constructing Phylogenetic Tree

The L(T | S[i]))()()()()()()()(])[|( 548321760 5848803627177660tPtPtPtPtPtPtPtPiSTL xxxxxxxxxxxxxxxx

)(tPij

8

2 3 4 5

7

6

1

0

t6 t8

t7

t1 t2

t3t4 t5

where i , j corresponding to the four bases A T G C

is the probability that a lineage which is initially in state i will be in state j after t units of time have elapsed

max likelihood

0 is the prior probability

Page 10: The Algorithm for Constructing Phylogenetic Tree

The L(T | S[i]))()()()()()()()(])[|( 548321760 5848803627177660tPtPtPtPtPtPtPtPiSTL xxxxxxxxxxxxxxxx

But in the formula , x0 x6 x7 x8 are unknown variables

0

58488036

6 7 8

27177660)()()()()()()()(])[|( 548321760

xxxxxxxxx

x x xxxxxxxxx tPtPtPtPtPtPtPtPiSTL

This expression have 256 terms , in general a tree with n leaves will have n-1 internal nodes and then will have 4^(n-1) terms

max likelihood

Page 11: The Algorithm for Constructing Phylogenetic Tree

8

584880)()()( 548

xxxxxxx tPtPtP

0 6

36

7

271776600)()()()()(])[|( 32176

x xxx

xxxxxxxxxx tPtPtPtPtPiSTL

8

2 3 4 5

7

6

1

0

t6 t8

t7

t1 t2

t3t4 t5

The L(T | S[i])

S

S[i])|logL(TS)|L(T

notice that the pattern of parenthese describes an exact relationship of the topology

max likelihood

Page 12: The Algorithm for Constructing Phylogenetic Tree

L( T | S ) as the fitness function

The structure of the tree is the solution

our target is get a tree's structure with max likelihood value

The number of the trees' structure is

heuristic algorithm

n

i

inT3

)32()(

output

initialize

Neighbour soulutions

requirement

record max value

compute L(T|S)

no

yes

max likelihood

Page 13: The Algorithm for Constructing Phylogenetic Tree

distance matrix -------Neighbour joining

Neighbour joining seeks to build a tree which minimizes the sum of all branch lengths

X

1

2

3

87

6

5

4

1

2

87

6

54

3

Y X

distance matrix

Page 14: The Algorithm for Constructing Phylogenetic Tree

step1 : obtain a distance table of each pair sequences

1 2 3 4 5

1 0 0.015 0.045 0.143 0.198

2 0 0.03 0.126 0.179

3 0 0.092 0.179

4 0 0.179

5 0sequeces two theoflength

site same in the base same theofnumber q

)14

3ln(

4

3

qd

]21)221ln[(2

1221 pppd

distance matrix of five sequences

Jukes-Cautor single parameter model

Kimuradouble parametes model

sequence theoflength

ation transformI ofnumber 1 p

sequence oflength

ation transformII ofnumber 2 p

Page 15: The Algorithm for Constructing Phylogenetic Tree

step 2: select the min distance and merge nodes

1 2 3 4 5

1 0 0.015 0.045 0.143 0.198

2 0 0.03 0.126 0.179

3 0 0.092 0.179

4 0 0.179

5 0

distance matrix of five sequences

so the select the node 1 and 2 as branch add the 6 to the structure and compute the distance of 6 to each nodes

1

2

3

4

5

6

distance matrix

in the meantime , creat a new nodes 6 as the parent of the 1&2

Page 16: The Algorithm for Constructing Phylogenetic Tree

step 3: the disatance of new node to remaining nodes

2jzizij

ix

DDDL

if we select two nodes i and j with min distance , and then creat a new node xas the parent node , we compute the distance of k to other nodes as follow formula

ji,kn 1,2k 2

jkikxk

DDD

we should also modify the distance of i,j to the x as the length of the branch

2izjzij

jx

DDDL

1

2

3

4

5

6

z is all nodes except i and j

Page 17: The Algorithm for Constructing Phylogenetic Tree

use the rate-corrected distance 1

N

nmmnijjkik D

NDDD

N 1ij 2

1

2

1)(

)2(2

1S

ij

ji

N

kk

ij DN

AAAS

2

1

)2(21

ji 1

N

jiji DA

2

N

AADM jiijij

1 2 3 4 5 A

1 0 0.015 0.045 0.143 0.198 0.401

2 0.015 0 0.03 0.126 0.179 0.35

3 0.045 0.03 0 0.092 0.179 0.346

4 0.143 0.126 0.092 0 0.179 0.540

5 0.198 0.179 0.179 0.179 0 0.735

distance matrix

Page 18: The Algorithm for Constructing Phylogenetic Tree

1 2 3 4 5

1 0 0.015 0.045 0.143 0.198

2 0 0.03 0.126 0.179

3 0 0.092 0.179

4 0 0.179

5 0

2

N

AADM jiijij

use the rate-corrected distance 2

1 2 3 4 5

1 0 -0.21 -0.179 -0.139 -0.143

2 0 -0.179 -0.141 -0.147

3 0 -0.174 -0.145

4 0 -0.214

5 0

table of diastance matrix table of rate-corrected distance

distance matrix

Page 19: The Algorithm for Constructing Phylogenetic Tree

summarize the process

i

j

k

i

j

distance matrix

Page 20: The Algorithm for Constructing Phylogenetic Tree

1 compute Ai according to 2 while N>2 do 3 for i=0 to m-1 do 4 for j=i+1 to m do 5 compute Mij according to

6 select the min Mij , cluste i j to a new node x7 compute the Dxk according to

8 modify the branch length of i and j to x according to

9 delete the i and j from the table , add the x to the table 10 N=N-111 end of while

The pseudo code of the NJ algorithm

ji 1

N

jiji DA

2

N

AADM jiijij

ji,kn 1,2k 2

jkikxk

DDD

2jzizij

ix

DDDL

2izjzij

jx

DDDL

Page 21: The Algorithm for Constructing Phylogenetic Tree

maximum parsimony

maximum likelihood

make full use of the information of the nucleotide while there have few species , MP will find the global optimum tree while there have plenty of species , the performance under restrictions

merit and demerit

make full use of the information of the nucleotide highly dependent on the nucleotide substitution model the performance is the worst

Neighbour joining

is the most fast algorithm of all but sometimes get the wrong topology

Page 22: The Algorithm for Constructing Phylogenetic Tree

Transform the problem to TSP

A B

C

D

y

x

z2 2

2

2

1

1

A B

C

D

y

x

z2 2

2

2

1

1

zxy

A

y

B

y xC

x

z

D

1

1

2

21 2

2

2

2

22

1

Page 23: The Algorithm for Constructing Phylogenetic Tree

Transform the problem to TSP

zxy

A

y

B

y xC

x

z

D

1

1

2

21 2

2

2

2

22

1

A

B

C

D

3

5

6

6

add the edges of the unshadow node

Page 24: The Algorithm for Constructing Phylogenetic Tree

Transform the problem to TSP

A

B

C

D

3

5

6

6

A B C D

A 0 3 4 6

B 3 0 5 7

C 4 5 0 6

D 6 7 6 0

A B

DC

6

3

5

6

4 7

the circle is one of the hamiltonian circuit of the complete graph

Page 25: The Algorithm for Constructing Phylogenetic Tree

Transform the problem to TSP

now assume that if we get a hamiltonian circuit , can we construct the phylogenetic tree

A

BC

D

3

5

6

6

y

BA

D

y

C

66

5

x

y

A B

C

X

D

z

x

y

A B

C

D

Page 26: The Algorithm for Constructing Phylogenetic Tree

Transform the problem to TSP

so the question transform to seek the min hamiltonian circuit of a given complete graph

A B

DC

6

3

5

6

4 7

A

BC

D

3

5

6

6z

x

y

A B

C

D

step 1 distance matrix step 2 TSP

step 3 construct tree1 ant colony optimization,ACO2 particle Swarm Optimization, PSO3 genetic Algorithm, GA4 simulated Annealing , SA5 artificial bee colony algorithm, ABC6 approximation algorithm, NN,ShortestLink,Insertheuristic

Page 27: The Algorithm for Constructing Phylogenetic Tree

Transform the problem to TSPz

x

y

A B

C

D

A B C D

A 0 3 4 6

B 3 0 5 7

C 4 5 0 6

D 6 7 6 0

6

7

5

6

4

3

zDxzCx

zDxzyxBy

xCyxBy

zDxzyxAy

xCyxAy

yBAy

2

2

2

1

2

1

zD

xz

xC

xy

By

Ay

A B

C

D

y

x

z2 2

2

2

1

1

Page 28: The Algorithm for Constructing Phylogenetic Tree

Thank you !

maoyaozong