how to raise the dead: the nuts & bolts of ancestral sequence reconstruction jeffrey boucher...
TRANSCRIPT
How to Raise the Dead:The Nuts & Bolts of Ancestral
Sequence Reconstruction
Jeffrey BoucherTheobald Laboratory
GFP Chromophore - Structure & Synthesis
Ser65
Tyr66 Gly67
Wachter 2006 Excitation - UV Blue Emission - Green
• Auto-catalyzation begins upon folding
1. Cyclization
3. Oxidation
2. Dehydation
3. Dehydation
2. Oxidation
dendRFPclawGFP
mc5G5 2
scubGFP1
mc2R2
mc3mc4
mc1R1 2
G1 2
Kaede
Ugalde, Chang & Matz 2004
GFP Superfamily
GFP-likeProteins
from Coral
Colors of the Rainbow
Wachter 2006
2 Chemical Reactions(Oxidation, Dehydration)
3 Chemical Reactions(Oxidation, Dehydration &2nd Oxidation extends π-system)
Was complexity gained or lost?
Convergent vs. Divergent Evolution
Terrestrial Vertebrates
Tree of Life (tolweb.org) - Slide courtesy of Kristine Mackin
0.1 substitutions/site
dendRFPclawGFP
mc5G5 2
scubGFP1
mc2R2
mc3mc4
mc1R1 2
G1 2
Kaede
Ugalde, Chang & Matz 2004
ALL
RED/GREEN
Pre-REDRED
dendRFPclawGFP
0.1 substitutions/site
dendRFPclawGFP
mc5G5 2
scubGFP1
mc2R2
mc3mc4
mc1R1 2
G1 2
Kaede
495 nm
505 nm
518 nm
578 nm
ALL
RED/GREEN
Pre-REDRED
Ugalde, Chang & Matz 2004
How to Resurrect a Protein
1) Acquire/Align Sequences
2) Construct Phylogeny(from Chang et al. 2002)
3) Infer Ancestral Nodes
Acquiring Sequences• BLAST (Basic Local Alignment Search Tool)
• How does alignment compare to alignment of random sequences?– E-value of 1E-3 is a 1:1000 chance of alignment of
random sequences
Homology vs. Identity• Significant BLAST hits inform us about
evolutionary relationships
• Homologous - share a common ancestor– Homology is a hypothesis, identity is calculated
– This is binary, not a percentile
– Homology does not ensure common function
Aligning Sequences
• Gap Penalty of -8 (heuristically determined):
4 -8 5 4 0 6 2 4 6 5 4 0 3 4 -8 7 1 1 = 40
OrangutanChimpanzee
• Multiple sequences aligned by similar method• ClustalW
• MUSCLE (MUltiple Sequence Comparison by Log-Expectation)
• Faster than ClustalW, but methods similar
How to Resurrect a Protein
1) Acquire/Align Sequences
2) Construct Phylogeny(from Chang et al. 2002)
3) Infer Ancestral Nodes
Visual Depiction of Alignment Scores
• Suppose alignment of 3 sequences…
OrangutanChimpanzeeMouse
OCM
M C O
19 40 -18 - 40- 18 19
M O C
Neighbor-Joining
Phylogenetic Programs• PHYLIP (PHYLogeny Inference Package)
• PAUP (Phylogeny Analysis Using Parsimony)– Now incorporates Maximum Likelihood
• PhyML (Phylogenetic Maximum Likelihood)
• MrBayes
• BAli-Phy (Bayesian Alignment and Phylogeny estimation)
Maximum Likelihood (ML)
• Likelihood:
– How surprised we should be by the data– Maximizing the likelihood, minimize your surprise
• Example:– Roll 20-sided die 9 times:
Likelihood = Probability(Data|Model)
Maximum Likelihood (ML)
• Fair Die Model:– 5% chance of rolling a 20
• Trick Die Model:– 100% chance of rolling a 20
Likelihood = Probablity(Data|Model)
Likelihood = (0.05)9 = 2E-11
Likelihood = (1)9 = 1
Assuming trick model maximizes the likelihood
From Dice to Trees
– Data - Sequences/Alignment– Model - Tree topology, Branch lengths & Model of
evolution
Starting Tree
• Choose model that maximizes the likelihood
Likelihood = Probablity(Data|Model)
New Tree
Likelihood Likelihood
How to Resurrect a Protein
1) Acquire/Align Sequences
2) Construct Phylogeny(from Chang et al. 2002)
3) Infer Ancestral Nodes
Parsimony• Parsimony Principle– Best-supported evolutionary inference requires fewest
changes– Assumes conservation as model
• Advantage over consensus:– Takes phylogenetic relationships into account
Parsimony
V VVILL
Example adapted from David Hillis
IL
{V}{L}
{V, I}
{V, I, L}
{V, I, L}
{V, I, L}
{V, I, L}
Changes = 4
V
L
I
I
I
VL
Parsimony - Alternate Reconstructions
• Resolve ambiguous reconstructions
• Is conservation the best model?
ML Improvements Over Parsimony
• PAML (Phylogeny Analysis by Maximum Likelihood)
• Includes evolutionary process & branch lengths– Reduction in ambiguous sites
• Fit of model included in calculation– Removes a priori choices– Use more complex models (when applicable)
• Confidence in reconstruction– Posterior probabilities
Ugalde, Chang & Matz 2004
• Sites with PP<0.8 considered ambiguous
• Alt residues considered if PP>0.2
• Alternative ancestors did not affect the conclusions
Position
Residue PP Residue PP Residue PP
168 P 0.999
169 K 0.407
R 0.236
S 0.210
170 V 0.730
I 0.270
171 I 0.742
V 0.158
R 0.034