based on presentations by dan geiger, shlomo moran, and ido wexler. modified by benny chor

21
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor. References: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1

Upload: natara

Post on 30-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony. Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor. References: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

.

Comput. Genomics, Lecture 5b

Character Based Methods for Reconstructing Phylogenetic Trees:

Maximum Parsimony

Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor.

References: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1

הכנסתילהרצאה גם את המימוש היעילשל בדיקת למינריות. אם בניית עצים אולטרהמטריים תועבר לשיעור קודם, ניתן להכניס גם אלגוריתם פיטש.
Page 2: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

2

Phylogenetic Trees - Reminder

Leaves represent objects (genes, species) being compared

• Internal nodes are hypothetical ancestral objects

• In a rooted tree, path from root to a node corresponds to a path in evolutionary time

• An unrooted tree specifies relationships among objects, but not evolutionary time

Page 3: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

3

Parsimony Based Approch

Input: Character data (aligned sequences)

Goal/Output: A labeled tree (labeled internal nodes) that “explains” the data with a minimal number of changes across edges

Page 4: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

4

Parsimony: An Example

Various trees that could explain the phylogeny of the following four sequences: AAG, AAA, GGA, AGA. For example,

AAA

AAA AAA

AGA AAAAAG GGA

AAA

AAA AGA

AGAAAAAAG GGA

Parsimony prefers the second tree to the first, because it requires less substitution events (three vs. four changes).

Page 5: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

5

Big and Small Parsimony

Usually the approaches to finding a maximum parsimony tree have two separate components:

A search through the space of trees (BIG parsimony)

Given a specific tree topology, find an assignment of “ancestral labels” to internal nodes as to the minimize the total number of changes across tree edges (small parsimony)

Page 6: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

6

Formally: Big Parsimony

Input: Character data (aligned sequences)

Goal/Output: A labeled tree (labeled internal nodes) that minimizes number of changes across edges (over all trees and internal labelings).

Page 7: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

7

Formally: Small Parsimony

Input: Character data (aligned sequences) and a tree with sequences at leaves.

Goal/Output: A labeling of internal nodes that minimizes number of changes across edges (over all internal labelings).

Page 8: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

8

Big, Small, and Weighted Parsimony

Small parsimony has a linear time solution (Fitch’ algorithm).

BIG parsimony is NP hard (easy reduction from vertex cover, VC).

Weighted small parsimony also has a linear time solution (Sankoff’s algorithm, dynamic programming).

Page 9: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

9

Small Parsimony: Fitch’s Algorithm

Traverse tree “up”, from leaves to root, finding sets of possible ancestral states (labels) for each internal node.

Traverse tree “down”, from root to leaves, determining ancestral states (labels) for internal nodes.

Key observation: Different sites are independent. Can solve one site at a time.

Page 10: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

10

Fitch’s Algorithm – Step 1

Do a post-order (from leaves to root) traversal of tree

Find out possible states Ri of internal node i with children j and k

otherwiseRR

RRifRRR

kj

kjkj

i

Page 11: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

11

Fitch’s Algorithm – Step 1

# of changes = # union operations

T

T

CT

T

C T AG T

AGT

GT

Page 12: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

12

Fitch’s Algorithm – Step 2

Do a pre-order (from root to leaves) traversal of tree

Select state rj of internal node j with parent i

otherwiseRstatearbitrary

Rrifrr

j

jii

j

Page 13: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

13

Fitch’s Algorithm – Step 2

T

T

CT

T

C T AG T

AGT

GT

T

T

CT

T

C T AG T

AGT

GT

T

T

CT

T

C T AG T

AGT

GT

T

T

CT

T

C T AG T

AGT

GT

T

T

CT

T

C T AG T

AGT

GT

T

T

CT

T

C T AG T

AGT

GT

Page 14: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

14

Weighted Version

Instead of assuming all state changes are unit cost ( equally likely), use different costs S(a,b) for different changes

1st step of algorithm is to propagate costs up through tree

ba

Page 15: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

15

Weighted Version of Fitch’s Algorithm

Want to determine min. cost Ri(a) of assigning character a to node i

for leaves:

otherwise

leafatcharacteraisaifaRi

0)(

Page 16: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

16

Weighted Version of Fitch’s Algorithm

want to determine min. cost Ri(a)of assigning character a to node i

for internal nodes:

)),()((min)),()((min)( baSbRbaSbRaR kb

jb

i

a

b

i

j kba

Page 17: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

17

Weighted Version of Fitch’s Algorithm – Step 2

do a pre-order (from root to leaves) traversal of tree

select minimal cost character for root

For each internal node j, select character that produced minimal cost at parent i

Page 18: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

18

Big Parsimony: Exploring the Space of Trees

(2 3)!!n

We’ve considered small parsimony: How to find the

minimum number of changes for a given tree topology

To solve big parsimony, need some search procedure for exploring the space of tree topologies

There are unrooted trees on n leaves

(2 3)!! 3 5 (2 3)n n

Page 19: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

19

Exploring the Space of Trees

taxa (n) # trees4 155 1056 9458 135,13510 30,405,375

Page 20: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

20

Does This Implies Big MP is Hard?

taxa (n) # trees4 155 1056 9458 135,13510 30,405,375

Not necessarily: There could be some smarter way to zoom directly to best topology.

But: We will show hardness of Big MP by a (simple) reduction from vertex cover (VC).

Page 21: Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor

21

Big MP is NP Hard !

First, define VC and VC for triangle free graphs.Then…

1. You will show a poly time reduction from VC to VC for

triangle free graphs as part of home assignment (easy).

2. In class, I will show a poly time reduction from

VC for triangle free graphs to Big MP

(old style, white board proof).

• This establishes NP hardness of Big MP.