phylogenomics talk in 2000 at university of maryland by j. eisen
TRANSCRIPT
TIGRTIGR
Phylogenomics:
Combining Evolutionary Reconstructions and Genome
Analysis into a Single Composite Approach
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changesArchaeaBacteriaEukarya
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85
Bacteria Archaea Bacteria Archaea A. rRNA tree of Bacterial and Archaeal Major Groups B. Groups with Completed Genomes Highlighted
A
B
CD
E
F
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
EF
A’
B’
C’
D’
E’F’
A
B
C
D
EF
A’
B’
C’
D’
E’F’
A
C
D
F
A’
B’
E’
E. coliE. coli
B
C
D
F
A’
B’
D’
E’
V. cholerae
A
B
C
D
EF
A’
B’
C’
D’
E’F’
B1
A1
B2
A2
B3
A3
A2
A1 A2
A3
B2
B1
B3
B2
2423
2221
2019
1817161514
1312
11109
67258
2627
2829
301 2 3
45
3132
B1
3132
6789
1011
1213
1415161718
1920
2122
23242526
2728
2930
1 2 34
5
3132
B3 2423
2221
2019
1817161514
1312
11109
67258
2627
2829
33231 30
45
2 1
A1
3132
6789
1011
1213
1415161718
1920
2122
23242526
2728
2930
1 2 34
5
3132
A2
3132
6789
1011
1213
1918171615
1420
2122
2324252627
2829
301 2 3
45
3132
A3
2
6789
1011
1213
1918171615
1420
2122
23242526
275
43 31 30
2928
1 32
B2
Inversion Around Terminus (*)
Inversion Around Terminus (*)
Inversion AroundOrigin (*)
Inversion AroundOrigin (*)
* *
* *
* *
* *
Figure 4
Common Ancestor of
A and B
3132
6789
1011
1213
1415161718
1920
2122
23242526
2728
2930
1 2 34
5
3132
Three V. choleraePhotolyases
Phr.S thyp
PHR E. coli
ORFA00965*********
phr.neucr
Phr.Tricho
Phr.Yeast
Phr.B firm
phr.strpy
phr.haloba
PHR STRGR
pCRY1.huma
phr.mouse
phr2.human
phr2.mouse
phr.drosop
phr3.Synsp
ORF02295.Vibch********
phr.neigo
ORF01792.Vibch*******
Phr.Adiant
Phr2.Adian
Phr3.Adian
phr.tomato
CRY1 ARATH
phr.phycom
CRY2 ARATH
PHH1.arath
PHR1 SINAL
phr.chlamy
PHR ANANI
phr.Synsp
PHR SYNY3
phr.Theth
Rh.caps
MTHF type Class I CPD Photolyases
6-4 Photolyases
Blue Light
Receptors
8-HDF type CPD
Photolyases
Three Photolyase Homologs in V. cholerae
UvrA2UvrA2 S. coelicolorDrrC S. peuceteusUvrA2 D. radioduransDuplicationin UvrAfamilyUvrA1UvrA H. influenzaeUvrA E. coliUvrA N. gonorrhoaeaUvrA R. prowazekiiUvrA S. mutansUvrA S. pyogenesUvrA S. pneumoniaeUvrA B. subtilisUvrA M. luteusUvrA M. tuberculosisUvrA M. hermoautotrophicumUvrA H. pyloriUvrA C. jejuniUvrA P. gingivalisUvrA C. tepidumuvra1 D. radioduransUvrA T. thermophilusUvrA T. pallidumUvrA B. burgdorefiUvrA T. maritimaUvrA A. aeolicusUvrA Synechocystis sp. UvrA1UvrA2OppDFUUPNodILivFXylGNrtDCPstBMDRHlyBTAP1CFTR, SURA. ABC TransportersB. UvrA Subfamily
01020304050600510152005010015005101520Number of Species With High Hits050100150200250Frequency05101520Papa BearMama BearBaby Bear010020030040050005101520E. coli
TIGRTIGR
Topics of Discussion• Introduction to phylogenomics• Phylogenomics Examples
– Functional prediction– Not making functional predictions– Gene duplication– Genetic exchange within genomes– Gene loss– Specialization – Horizontal gene transfer
TIGRTIGRTIGRTIGR
“Nothing in biology makes senseexcept in the light of evolution.”
T. H. Dobzhansky (1973)
TIGRTIGR
TIGRTIGR
Uses of Evolutionary Analysis in Molecular Biology
• Identification of mutation patterns (e.g., ts/tv ratio)• Amino-acid/nucleotide substitution patterns useful in
structural studies (e.g., rRNA)• Sequence searching matrices (e.g., PAM, Blosum)• Motif analysis (e.g., Blocks)• Functional predictions• Classifying multigene families• Evolutionary history puts other information into
perspective (e.g., duplications, gene loss)
TIGRTIGR
TIGRTIGR
Evolutionary Studies Improve Most Aspects of Genome Analysis• Phylogeny of species places comparative data in perspective• Evolution of genes and gene families
– Functional predictions– Identification of orthologs and paralogs– Species specific mutation patterns
• Evolution of pathways– Convergence– Prediction of function
• Evolution of gene order/genome rearrangements• Phylogenetic distribution patterns• Identification of novel features
TIGRTIGR
Genome Information and Analysis Improves Studies of Evolution
• Complete genome information particularly useful • Unbiased sampling• More sequences of genes• Presence/absence information needed to infer certain
events (e.g., gene loss, duplication)• Genome wide mutation and substitution patterns (e.g.,
strand bias)• Diversification and duplication
TIGRTIGR
Phylogenomic Analysis• There are feedback loop between evolutionary and genome
analysis such that for many studies, genome and evolutionary analyses are interdependent.
• Therefore, I have proposed that they actually be combined into a single composite approach I refer to as phylogenomics
• Phylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences.
TIGRTIGR
Outline of PhylogenomicsGene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene treesCongruenceEvol. DistributionF(x) PredictionsPathway Evolution
TIGRTIGR
TIGRTIGR
TIGRTIGR
Uses of Phylogenomics I:
Functional Predictions
TIGRTIGR
Predicting Function
• Identification of motifs• Homology/similarity based methods
– Highest hit– Top hits– Clusters of orthologous groups– HMM models– Structural threading and modeling– Evolutionary reconstructions
TIGRTIGR
TIGRTIGR
Types of Molecular Homology
• Homologs: genes that are descended from a common ancestor (e.g., all globins)
• Orthologs: homologs that have diverged after speciation events (e.g., human and chimp β-globins)
• Paralogs: homologs that have diverged after gene duplication events (e.g., α and β globin).
• Xenologs: homologs that have diverged after lateral transfer events
• Positional homology: common ancestry of specific amino acid or nucleotide positions in different genes
TIGRTIGR
Phylogenomic Analysis of the MutS Family of Proteins
• Published analysis– Eisen JA et al. 1997. Nature Medicine
3(10):1076-1078. – Eisen JA. 1998. Nucleic Acids Research 26(18):
4291-4300
TIGRTIGR
TIGRTIGR
Blast Search of H. pylori “MutS” Score E Sequences producing significant alignments: (bits) Value sp|P73625|MUTS_SYNY3 DNA MISMATCH REPAIR PROTEIN 117 3e-25 sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN 69 1e-10 sp|P44834|MUTS_HAEIN DNA MISMATCH REPAIR PROTEIN 64 3e-09 sp|P10339|MUTS_SALTY DNA MISMATCH REPAIR PROTEIN 62 2e-08 sp|O66652|MUTS_AQUAE DNA MISMATCH REPAIR PROTEIN 57 4e-07 sp|P23909|MUTS_ECOLI DNA MISMATCH REPAIR PROTEIN 57 4e-07
• Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs
TIGRTIGR
H. pylori and MutS• Prior to this genome, all species that
encoded a MutS homolog also encoded a MutL homolog
• Experimental studies have shown MutS and MutL always work together in mismatch repair
• Problem: what do we conclude about H. pylori mismatch repair
TIGRTIGR
Phylogenetic Tree of MutS FamilyAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHuman
TIGRTIGR
MutS SubfamiliesAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2
TIGRTIGR
MutS Subfamilies
• MutS1 Bacterial MMR• MSH1 Euk - mitochondrial MMR• MSH2 Euk - all MMR in nucleus• MSH3 Euk - loop MMR in nucleus• MSH6 Euk - base:base MMR in nucleus
• MutS2 Bacterial - function unknown• MSH4 Euk - meiotic crossing-over• MSH5 Euk - meiotic crossing-over
TIGRTIGR
Overlaying Functions onto TreeAquaeTrepaRatFlyXenlaMouseHumanYeastNeucrArathBorbuSynspNeigoThemaStrpyBacsuEcoliTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathStrpyBacsuHumanCelegYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2
TIGRTIGR
Functional Prediction Using TreeAquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMSH1
Repairin Mictochondria
MSH3Repair of Loops
in Nucleus
MSH6Repair of Mismatches
in Nucleus
MutS1Repair of Loops and Mismatches
StrpyBacsuCelegHumanYeastMetthBorbuAquaeSynspDeiraHelpymSacoYeastCelegHumanMSH4Meiotic Crossing-Over
MSH5Meiotic Crossing-OverMutS2 Unknown FunctionsMSH2Repair of Loops and Mismatches
in Nucleus
TIGRTIGR
Table 3. Presence of MutS Homologs in Complete Genomes Sequences
Species # of MutSHomologs
WhichSubfamilies?
MutLHomologs
BacteriaEscherichia coli K12 1 MutS1 1Haemophilus influenzae Rd KW20 1 MutS1 1Neisseria gonorrhoeae 1 MutS1 1Helicobacter pylori 26695 1 MutS2 -Mycoplasma genitalium G-37 - - -Mycoplasma pneumoniae M129 - - -Bacillus subtilis 169 2 MutS1,MutS2 1Streptococcus pyogenes 2 MutS1,MutS2 1Mycobacterium tuberculosis - - -Synechocystis sp. PCC6803 2 MutS1,MutS2 1Treponema pallidum Nichols 1 MutS1 1Borrelia burgdorferi B31 2 MutS1,MutS2 1Aquifex aeolicus 2 MutS1,MutS2 1Deinococcus radiodurans R1 2 MutS1,MutS2 1
ArchaeaArchaeoglobus fulgidus VC-16, DSM4304 - - -Methanococcus janasscii DSM 2661 - - -Methanobacterium thermoautotrophicum ∆Η 1 ΜυτΣ2 −
ΕυκαρψοτεσΣαχχηαροµψχεσ χερεϖισιαε 6 ΜΣΗ1−6 3+Ηοµο σαπιενσ 5 ΜΣΗ2−6 3+
TIGRTIGR
Why was the MutS2 Family Missed?Blast Search of Syn. sp. MutS#2
Sequences producing significant alignments: (bits) Value
sp|Q56239|MUTS_THETH DNA MISMATCH REPAIR PROTEIN MUT 91 3e-17sp|P26359|SWI4_SCHPO MATING-TYPE SWITCHING PROTEIN 87 4e-16sp|P27345|MUTS_AZOVI DNA MISMATCH REPAIR PROTEIN MUTS 83 1e-14sp|P74926|MUTS_THEMA DNA MISMATCH REPAIR PROTEIN MUTS 81 3e-14sp|Q56215|MUTS_THEAQ DNA MISMATCH REPAIR PROTEIN MUTS 81 4e-14sp|P10564|HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA 80 5e-14
• Blast search pulls up standard MutS genes but with only a moderate p value (10-17)
TIGRTIGR
Problems with Similarity Based Functional Prediction
• Prone to database error propagation.• Cannot identify orthologous groups reliably.• Perform poorly in cases of evolutionary rate
variation and non-hierarchical trees (similarity will not reflect evolutionary relationships in these cases)
• May be misled by modular proteins or large insertion/deletion events.
• Are not set up to deal with expanding data sets.
TIGRTIGR
TIGRTIGR
Evolutionary Rate Variation
231456
TIGRTIGR
Rate Variation and DuplicationSpecies 3Species 1Species 21A2A3A1B2B3BDuplication
TIGRTIGR
EvolutionaryMethod
PHYLOGENENETIC PREDICTION OF GENE FUNCTIONIDENTIFY HOMOLOGSOVERLAY KNOWNFUNCTIONS ONTO TREE
INFER LIKELY FUNCTIONOF GENE(S) OF INTEREST
1234563531A2A3A1B2B3B2A1B1A3A1B2B3BALIGN SEQUENCESCALCULATE GENE TREE1246CHOOSE GENE(S) OF INTEREST2A2A53Species 3Species 1Species 211222311A3A1A2A3A1A2A3A464564562B3B1B2B3B1B2B3B ACTUAL EVOLUTION(ASSUMED TO BE UNKNOWN)
Duplication?EXAMPLE AEXAMPLE BDuplication?Duplication?Duplication5 METHODAmbiguous
TIGRTIGR
MutS.Aquaeorf.TrepaSPE1.DromeMSH2.XenlaMSH2.RatMSH2.MouseMSH2.HumanMSH2.YeastMSH2.NeucratMSH2.ArathMutS.Borbuorf.StrpyMutS.BacsuMutSSynspMutSEcoliorfNeigoMutSThemaMutSTheaq
orf.Deiraorf.ChltrMSH1.SpombeMSH1.YeastMSH3.YeastSwi4.SpombeRep3.MousehMSH3.Humanorf.ArathMSH6.YeastGTBP.HumanGTBP.MouseMSH6.ArathorfStrpyyshDBacsuMSH5CaeelhMHS5humanMSH5YeastMutS.Metthorf
BorbuMutS2AquaeMutSSynsporfDeiraMutS.HelpysgMutS.SauglMSH4.YeastMSH4.CaeelhMSH4.HumanA.AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMutS2.MetthMutS2.SauglStrpyBacsuCaeelHumanYeastBorbuAquaeSynspDeiraHelpyYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2B.AquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathMouseMouseFlyRatMouseHumanYeastStrpyBacsuEcoliTheaqYeastYeastHumanYeastHumanArathStrpyBacsuHumanMutS2-MetthBorbuAquaeSynspDeiraHelpyMutS2-SauglCaeelYeastYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2C.MutS2StrpyBacsuMutS2.MetthBorbuAquaeSynspDeiraHelpyMutS2.SauglCaeelYeastYeastCaeelHumanHumanMSH4Segregation &
Crossover
MSH5Segregation &
Crossover
FlyMouseHumanYeastAquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathArathMutS1All MMR
(Bacteria)
RatStrpyBacsuEcoliTheaqYeastYeastMouseHumanYeastHumanMouseMSH1MMR in
Mitochondria
MSH3MMR of
Large Loops in Nucleus
MSH6MMR of
Mismatches and Small Loopsin Nucleus
MSH2All MMR
in Nucleus
D.
TIGRTIGR
ETL1_M.m YA19_S.c CHD1_M.m SYGP4_S.c MOT1_S.c ERCC6_H.s RAD26_S.c NUCP_H.s NUCP_M.m YB53_S.c RAD54_S.c DNRPPX_S.p RAD5_S.c RAD8_S.p HIP116A_H.s RAD16_S.c LODE._D.mNPHCG_42HEPA._E.c YB95_S.c F37A4_C.e ISWI_D.m SNF2L_H.s BRM_D.m BRM_H.s BRG1_H.s BRG1_M.m STH1_S.c SNF2_S.c SNF2SNF2LCHD1ETL1CSBRAD54RAD16LODEEvolution of the SNF2 Family of Proteins
TIGRTIGR
4 F17L22 170 Arabidopsis thali4455279 Arabidopsis thaliana1049068 Lycopersicon esculentuHomo sapiens5514652 Drosophila melanogasteDrosophila melanogaster2123725 Caenorhabditis elegans6606113 Capronia mansoniiRpoII.Yeast.YOR151C107346 Schizosaccharomyces pom151348 Euplotes octocarinatus265427 Euplotes octocarinatus3845258 Plasmodium falciparumRpoIII.DromeRpoIII.Drome.7303535EGAD 114464 Caenorhabditis eleRpoIII.Yeast.172383EGAD 145012 SchizosaccharomyceRpoIII.Neucr.7800864ARATH5 K18C1 1Aeropyrum pernixEGAD 8025 Sulfolobus acidocald5458046 Pyrococcus abyssiPH1546 Pyrococcus horikoshiiThermococcus celerEGAD 14667 Methanococcus vanniMJ1040 Methanococcus jannaschiAF1886 Archaeoglobus fulgidusHalobacterium halobiumThermoplasma acidophilumRPB2 Methanobacterium thermoauatmystery.BAB02021ARATH3 MRC8.7ARATH3 MYM9.126723961 Schizosaccharomyces poRpoI.Yeast.YPR010CRpoI.Neucr.3668171RPA2 Rattus norvegicusMus musculusRpoI.Drome.7296211Caenorhabditis elegans92131 Euplotes octocarinatusARATH1 T1P2.15ARATH1 F1N18.21492072Molluscum contagiosum v439046 Variola major virus1143635 Variola virus2772787 Vaccinia virus323395 Cowpox virus6578643 Rabbit fibroma virus6523969 Myxoma virus6682809 Yaba monkey tumor viru7271687 Fowlpox virus4049822 Melanoplus sanguinipes2887 Kluyveromyces lactisEGAD 151364 Sacch kluyveri1369760 Borrelia burgdorferiBB0389 Borrelia burgdorferiTP0241 Treponema pallidum6652714 Rickettsia massiliae6652723 Rickettsia sp. Bar296652720 Rickettsia conoriiRP140 Rickettsia prowazekii6960339 Salmonella typhimuriumEGAD 1084 Salmonella choleraesEC3987 Escherichia coliEGAD 23892 Buchnera aphidicolaHI0515 Haemophilus influenzaeEGAD 6020 Pseudomonas putidaRPOB Coxiella burnetii3549149 Legionella pneumophilaRPOB Neisseria meningitidisHP1198 Helicobacter pylori6967949 Campylobacter jejuniAA1339 Aquifex aeolicusBS0107 Bacillus subtilis4512396 Bacillus halodurans6002201 Listeria monocytogenesEGAD 32012 Staphylococcus aureEGAD 32011 Spiroplasma citriMG341 Mycoplasma genitaliumMP326 Mycoplasma pneumoniae6899151 Ureaplasma urealyticumRv0667 Mycobacterium tuberculoMycobacterium leprae7144498 Mycobacterium smegmatiEGAD 39063 Mycobacterium smegmGP 7331268 Amycolatopsis medit7248348 Streptomyces coelicolo7573273 Thermus aquaticusDR0912 Deinococcus radioduransTM0458 Thermotoga maritimaEGAD 74970 80693 Heterosigma cEGAD Odontella sinensisEGAD 60306 Spinacia oleraceaEGAD Nicotiana tabacum6723742 Oenothera elata5457427 Sinapis alba5881686 Arabidopsis thaliana4958867 Triticum aestivumEGAD 76270 Zea maysRPOB Oryza sativaEGAD Pinus thunbergiiEGAD Marchantia polymorpha7259525 Mesostigma viride5880717 Nephroselmis olivaceaRPOB Guillardia thetasll1787 Synechocystis PCC6803EGAD 75526 Porphyra purpurea6466433 Cyanidium caldariumEGAD 76712 Cyanophora paradoxaRPOB Chlorella vulgarisEGAD 76424 Euglena gracilis5231258 Toxoplasma gondii6492294 Neospora caninumEGAD 83446 Plasmodium falcipar
100
78
100
85
93
83
100
79
100
100
100100 100
100
94100
100
7499100
99100
100
99
9480
100
100
100
100
59
100
100
99
56100
100
100
10058 95100
9763
95100
100
10081
100
100
100
59
6099
100
10094
100100
69100
7710097
100
71
100
9958
83
100100
100
99100
98100
100
61
99
75100
73100
100
59
100
100
72
72
98
529859
100
100
a
Novel RNA Polymerase in A. thaliana
ArchaealIV
II
III
I
Viral
Bacterial - RpoB
Plastid- RpoBs
TIGRTIGR
Novel Large Subunit Rubisco in Chlorobium tepidumAgathis.gi3982533
Agathis.gi3982549
Araucaria.gi3982517
Agathis.gi3982535
Agathis.gi3982541
Venturiella.gi4009420
Leucobryum.gi6230571
Mougeotia.gi1145415
Anabaena.gi68158
Thife.gi2411435
Thiin.gi4105518
Metja.gi2129276
Pyrho.gi|3257353
Pyrab.gi|5458634
Pyr karaensis.gi3769302
Arcfu.gi2648911
Arcfu.gi2648975
Bacsu.gi2633730
Chlte.ORF02314
100
100
96
54
99
58
66
59
100
100
82
67100
100
100
93
Type X
Type I
Rubisco Large Subunit Phylogeny
TIGRTIGR
Uses of Phylogenomics II:
Knowing when to Not Predict Functions
TIGRTIGR
Deinococcus radiodurans
TIGRTIGR
DNA Repair Genes in D. radiodurans Complete Genome
Process Genes in D. radiodurans
Nucleotide Excision Repair UvrABCD, UvrA2Base Excision Repair AlkA, Ung, Ung2, GT, MutM, MutY-Nths,
MPGAP Endonuclease XthMismatch Excision Repair MutS, MutLRecombination Initiation Recombinase Migration and resolution
RecFJNRQ, SbcCD, RecDRecARuvABC, RecG
Replication PolA, PolC, PolX, phage PolLigation DnlJdNTP pools, cleanup MutTs, RRaseOther LexA, RadA, HepA, UVDE, MutS2
TIGRTIGR
Recombination Genes in GenomesPathway |------------------------------Bacteria---------------------------| |---Archaea---| Euks
Protein Name(s)
Initiation
RecBCD pathwayRecB + + - - - - - - + + - + - - - - - - - -RecC + + - - - - - - + ±+ - ± - - - - - - - -RecD + + - - ± - - - + ±+ - ++ - ± ±+ - - - - -
RecF pathwayRecF + + + - + - - + + - + ± - - + - - ± ± ±RecJ + + + + + - - + - + + + + + + - - - - -RecO + + - - + - - + + - - - - - ± - - - - -RecR + + + ±+ + - - + + - + + - + + - - - - -RecN + + + + + - - + + - + - ± + + - - ± ± -RecQ + + - - + - - + - - + - - - + - - - - + ++
RecE pathwayRecE/ExoVIII + - - - - - - - - - - - - - - - - - - -RecT + - - - + - - - - - - - - - - - - - - -
SbcBCD pathwaySbcB/ExoI + + - - - - - - - - - - - - - - - - - -SbcC + - - - + - - + - + + - + + + ± ± ± ± ± ±SbcD + - - - + - - + - + + - + + + ± ± ± ± ± ±
AddAB PathwayAddA/RexA - - + - + - - - - - + + - ± - - - - - -AddB/RexB - - - - + - - - - - - - - - - - - - - -
Rad52 pathwayRad52, Rad59 - - - - - - - - - - - - - - - - - - - ++ +Mre11/Rad32 ± - - - ± - - ± - ± ± - ± ± ± + + + + + +Rad50 ± - - - ± - - ± - ± ± - ± ± ± + + + ± + +
RecombinaseRecA, Rad51 + + + + + + + + + + + + + + + + + + + ++ ++
Branch migrationRuvA + + + + + + + + + + + + + - + - - - - -RuvB + + + + + + + + + + + + + - + - - - - -
RecG + + + + + - - + + + + - + + + - - - - -
ResolvasesRuvC + + + + - - - + + - + + + - + - - - - -RecG + + + + + - - + + + + - + + + - - - - -Rus + - - - - - - - ±+ - - - - ±+ - - - - - -CCE1 - - - - - - - - - - - - - - - - - - - +
Other recombination proteinsRad54 - - - - - - - - - - - - - - - - - - - + +Rad55 - - - - - - - - - - - - - - - - - - - + +Rad57 - - - - - - - - - - - - - - - - - - - + +Xrs2 - - - - - - - - - - - - - - - - - - - +
TIGRTIGR
Unusual Features of D. radiodurans DNA Repair Genes
Process Genes
Nucleotide excision repair Two UvrAs
Base excision repair Four MutY-Nths
Recombination RecD but not RecBC
Replication Four Pol genes
dNTP pools Many MutTs, two RRases
Other UVDE
TIGRTIGR
Problem:
List of DNA repair gene homologs in D. radiodurans genome is not significantly different from other
bacterial genomes of the similar size
TIGRTIGR
-Ogt-RecFRQN-RuvC-Dut-SMS
-PhrI-AlkA-Nfo-Vsr-SbcCD-LexA-UmuC
-PhrI-PhrII-AlkA-Fpg-Nfo-MutLS-RecFORQ-SbcCD-LexA-UmuC-TagI
-PhrI-Ogt-AlkA-Xth-MutLS-RecFJORQN-Mfd-SbcCD-RecG-Dut-PriA-LexA-SMS-MutT
-PhrI-PhrII?-AlkA-Fpg-Nfo-RecO-LexA-UmuC
-PhrI-Ung?-MutLS-RecQ?-Dut-UmuC
-PhrII-Ogg
-Ogt-AlkA-TagI-Nfo-Rec-SbcCD-LexA
-Ogt-AlkA-Nfo-RecQ-SbcD?-Lon-LexA
-AlkA-Xth-Rad25?
-AlkA-Rad25
-Nfo
-Ogt-Ung-Nfo-Dut-Lon
-Ung
-PhrII
-PhrI
Ecoli
Haein
Neigo
Helpy
Bacsu
Strpy
Mycge
Mycpn
Borbu
Trepa
Synsp
Metjn
Arcfu
Metth
Human
Yeast
BACTERIA ARCHAEA EUKARYOTES
from mitochondria
+Ada+MutH+SbcB
dPhr
+TagI?+Fpg
+UvrABCD+Mfd
+RecFJNOR+RuvABC
+RecG+LigI
+LexA+SSB
+PriA+Dut?
+Rus+UmuD
+Nei?+RecE
tRecT?
+Vsr+RecBCD?
+RFAs+TFIIH
+Rad4,10,14,16,23,26+CSA
+Rad52,53,54+DNA-PK, Ku
dSNF2dMutSdMutLdRecA
+Rad1+Rad2
+Rad25?+Ogg+LigII
+Ung?+SSB,
+Dut?
+PhrI, PhrII+Ogt
+Ung, AlkA, MutY-Nth+AlkA
+Xth, Nfo?+MutLS?
+SbcCD+RecA
+UmuC+MutT
+LondMutSI/MutSII
dRecA/SMSdPhrI/PhrII
+Sprt3MG
+Rad7+CCE1
+P53dRecQ
dRad23+MAG?
-PhrII-RuvC
tRad25
+TagI?
+RecT
tUvrABCD
tTagI ?
Gain and Loss of Repair Genes
TIGRTIGR
TIGRTIGR
Repair Studies in Different Species(determined by Medline searches as of 1998)
Humans 7028E. coli 3926S. cerevisiae 988Drosophila 387B. subtilits 284S. pombe 116Xenopus 56C. elegans 25A. thaliana 20Methanogens 16Haloferax 5Giardia 0
TIGRTIGR
Uses of Phylogenomics III:
Gene Duplication
TIGRTIGR
Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Aids functional predictions
• Recent duplications may be indicative of species’ specific adaptations
• Helps identify mechanisms of duplication
• Can be used to study mutation processes in different parts of genome
TIGRTIGR
Recent Duplications
TIGRTIGR
MutY-NthDEIRA ORF00829DEIRA ORF02784DEIRA AQUAEMETJA METTHTHEMACHLTRHAEIN MCYTU THEMAMETTHPYRHOAQUAE METJAARCFU CELEGVIBCHECOLIHAEINTREPARICPR AQUAEBACSUCAMJEHELPYMCYTU SYNSPCHLPNCHLTRBBUR
TIGRTIGR
Expansion of MCP Family in V. choleraeE.coli gi1787690B.subtilis gi2633766Synechocystis sp. gi1001299Synechocystis sp. gi1001300Synechocystis sp. gi1652276Synechocystis sp. gi1652103H.pylori gi2313716H.pylori99 gi4155097C.jejuni Cj1190cC.jejuni Cj1110cA.fulgidus gi2649560A.fulgidus gi2649548B.subtilis gi2634254B.subtilis gi2632630B.subtilis gi2635607B.subtilis gi2635608B.subtilis gi2635609B.subtilis gi2635610B.subtilis gi2635882E.coli gi1788195E.coli gi2367378E.coli gi1788194E.coli gi1789453C.jejuni Cj0144C.jejuni Cj0262cH.pylori gi2313186H.pylori99 gi4154603C.jejuni Cj1564C.jejuni Cj1506cH.pylori gi2313163H.pylori99 gi4154575H.pylori gi2313179H.pylori99 gi4154599C.jejuni Cj0019cC.jejuni Cj0951cC.jejuni Cj0246cB.subtilis gi2633374T.maritima TM0014T.pallidum gi3322777T.pallidum gi3322939T.pallidum gi3322938B.burgdorferi gi2688522T.pallidum gi3322296B.burgdorferi gi2688521T.maritima TM0429T.maritima TM0918T.maritima TM0023T.maritima TM1428T.maritima TM1143T.maritima TM1146P.abyssi PAB1308P.horikoshii gi3256846P.abyssi PAB1336P.horikoshii gi3256896P.abyssi PAB2066P.horikoshii gi3258290P.abyssi PAB1026P.horikoshii gi3256884D.radiodurans DRA00354D.radiodurans DRA0353D.radiodurans DRA0352P.abyssi PAB1189P.horikoshii gi3258414B.burgdorferi gi2688621M.tuberculosis gi1666149V.cholerae VC0512V.cholerae VCA1034V.cholerae VCA0974V.cholerae VCA0068V.cholerae VC0825V.cholerae VC0282V.cholerae VCA0906V.cholerae VCA0979V.cholerae VCA1056V.cholerae VC1643V.cholerae VC2161V.cholerae VCA0923V.cholerae VC0514V.cholerae VC1868V.cholerae VCA0773V.cholerae VC1313V.cholerae VC1859V.cholerae VC1413V.cholerae VCA0268V.cholerae VCA0658V.cholerae VC1405V.cholerae VC1298V.cholerae VC1248V.cholerae VCA0864V.cholerae VCA0176V.cholerae VCA0220V.cholerae VC1289V.cholerae VCA1069V.cholerae VC2439V.cholerae VC1967V.cholerae VCA0031V.cholerae VC1898V.cholerae VCA0663V.cholerae VCA0988V.cholerae VC0216V.cholerae VC0449V.cholerae VCA0008V.cholerae VC1406V.cholerae VC1535V.cholerae VC0840V.cholerae VC0098V.cholerae VCA1092V.cholerae VC1403V.cholerae VCA1088V.cholerae VC1394V.cholerae VC0622NJ*******************************************************************************
TIGRTIGR
Phosphate TransportersARCFUSYNSPTHEMAAQUAEMETJAMCYTUMCYTUVIBCHECOLIDEIRA_ORF00198DEIRA_ORFA00139DEIRA_ORF00510
TIGRTIGR
Levels of Paralogy Within A Genome• All
– All members of a gene family are linked together
• Top matches– Only top matching pairs are linked together.
Therefore, if in a large gene family, only the pair from the most recent duplication event is included
• Recent– Operational definition based on comparison to other
species. Only pairs which are more similar to each other than to selected other species are included.
TIGRTIGR
C. pneumoniae Paralogs - All
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
C. pneumoniae Paralogs - Top
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
C. pneumoniae Paralogs – Recent
0
250000
500000
750000
1000000
1250000
Subject Orf Position
0 250000 500000 750000 1000000 1250000
Query Orf Position
TIGRTIGR
Uses of Phylogenomics IV:
Genetic Exchange within Genomes
TIGRTIGR
Circular Maps
TIGRTIGR
TIGRTIGR
Uses of Phylogenomics V:
Gene Loss
TIGRTIGR
Why Gene Loss is Useful to Identify
• Indicates that gene is not absolutely required for survival
• Helps distinguish likelihood of gene transfers
• Correlated loss of same gene in different species may indicate selective advantage of loss of that gene
• Correlated loss of genes in a pathway indicates a conserved association among those genes
TIGRTIGR
EuksArchBacteriaLossEvolutionary Origin of GeneMTMJSCHSAADRTABSMGMPBBTPHPHIECSSMTPresence ( ) or Absence of GeneSpecies AbbreviationKingdom
Example of Tracing Gene Loss
TIGRTIGR
TIGRTIGR
51234E. coliH. influenzaeN. gonorrhoeaeH. pyloriSyn. spB. subtilisS. pyogenesM. pneumoniaeM. genitaliumA. aeolicusD. radioduransT. pallidumB.burgdorferiA. aeolicusS pyogenesB. subtilisSyn. spD. radioduransB. burgdorferiSyn. spB. subtilisS. pyogenesA. aeolicusD. radioduransB. burgdorferiMutS2MutS1A.B.GeneDuplication
GeneDuplication
Ancient Duplication in MutS Family
TIGRTIGR
Loss of MMR
• Lost in many pathogen species• Mechanism of loss
– gene deletion (e.g., M. tuberculosis, H. pylori)– frameshifts (e.g., N. meningitidis, S.
pneumoniae)– some species have evolved systems to turn
MMR on and off depending on conditions (e.g., E. coli)
TIGRTIGR
Need for Phylogenomics Example:Gene Duplication and Loss
• Genome analysis required to determine number of homologs in different species
• Evolutionary analysis required to divide into orthology groups and identify gene duplications
• Genome analysis is then required to determine presence and absence of orthologs
• Then loss of orthologs can be traced onto evolutionary tree of species
TIGRTIGR
Uses of Phylogenomics VI:
Specialization
TIGRTIGR
Circular Maps
TIGRTIGR
Species Distribution of Homologs of D. radiodurans Genes
01020304050600510152005010015005101520Number of Species With High Hits050100150200250Frequency05101520Papa BearMama BearBaby Bear010020030040050005101520E. coli
TIGRTIGR
Specialized Genetic Elements (Chromosome II and Megaplasmid)
• Many two component systems• Nitrogen metabolism• LexA• Ribonucleotide reductase• UvrA2• Many transcription factors (e.g., HepA)• Iron metabolism
TIGRTIGR
Uses of Phylogenomics VII:
Genome Rearrangements
TIGRTIGR
V. cholerae vs. E. coli All Hits
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
TIGRTIGR
V. cholerae vs. E. coli Top Hits
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
TIGRTIGR
V. cholerae vs. E. coliOnly if EC-Orf is Closest in All Genomes
0
1000000
2000000
3000000
4000000
5000000
E. coli
Coordinates
0 1000000 2000000 3000000
V. cholerae Coordinates
TIGRTIGR
V. cholerae vs. E. coli Proteins Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
TIGRTIGR
S. pneumoniae vs. S. pyogenes DNA F+R0500000100000015000002000000BSP vs Spyo
TIGRTIGR
M. tuberculosis vs. M. leprae DNA
0
1000000
2000000
3000000
4000000
M1
TIGRTIGR
Duplication and Gene Loss Model
A
B
CD
E
F
A
B
CD
E
F
A
B
CD
E
F
A
B
C
D
EF
A’
B’
C’
D’
E’F’
A
B
C
D
EF
A’
B’
C’
D’
E’F’
A
C
D
F
A’
B’
E’
E. coliE. coli
B
C
D
F
A’
B’
D’
E’
V. cholerae
A
B
C
D
EF
A’
B’
C’
D’
E’F’
TIGRTIGR
V. cholerae vs. E. coli Proteins Top
0
1000000
2000000
3000000
4000000
V. cholerae ORF Coordinates
TIGRTIGR C. trachomatis MoPn
C. p
neum
onia
e A
R39
Origin
Termination
C. trachomatis vs C. pneumoniae Dot Plot
TIGRTIGR
B1
A1
B2
A2
B3
A3
A2
A1 A2
A3
B2
B1
B3
B2
2423
2221
2019
1817161514
1312
11109
67258
2627
2829
301 2 3
45
3132
B1
3132
6789
1011
1213
1415161718
1920
2122
2324252627
2829
301 2 3
45
3132
B3 2423
2221
2019
1817161514
1312
11109
67258
2627
2829
33231 30
45
2 1
A1
3132
6789
1011
1213
1415161718
1920
2122
2324252627
2829
301 2 3
45
3132
A2
3132
6789
1011
1213
1918171615
1420
2122
2324252627
2829
301 2 3
45
3132
A3
2
6789
1011
1213
1918171615
1420
2122
2324252627
54
3 31 3029
28
1 32
B2
Inversion Around Terminus (*)
Inversion Around Terminus (*)
Inversion AroundOrigin (*)
Inversion AroundOrigin (*)
* *
* *
* *
* *
Figure 4
Common Ancestor of
A and B
3132
6789
1011
1213
1415161718
1920
2122
2324252627
2829
301 2 3
45
3132
TIGRTIGR
Uses of Phylogenomics VIII:
Horizontal Gene Transfer and Species Evolution
TIGRTIGR
Vertical Inheritance
TIGRTIGR
Examples of Horizontal Transfers
• Antibiotic resistance genes on plasmids• Insertion sequences• Pathogenicity islands• Toxin resistance genes on plasmids• Agrobacterium Ti plasmid• Viruses and viroids• Organelle to nucleus transfers
TIGRTIGR
Why Gene Transfers Are Useful to Identify
• Laterally transferred genes frequently involved in environmental adaptations and/or pathogenicity
• Helps identify transposons, integrons, and other vectors of gene transfer
• Helps identify species associations in the environment
TIGRTIGR
Steps in Lateral Gene Transfer
1
2
3-5
6
A B C D
TIGRTIGR
How to Infer Gene Transfers
• Unusual distribution patterns
• Unusual nucleotide composition
• High sequence similarity to supposedly distantly related species
• Unusual gene trees
• Observe transfer events
TIGRTIGR
E. coli and S. typhimurium TransferE. coliS. typhimuriumOld ModelE. coliS. typhimuriumNew Model
TIGRTIGR
Archaeal genes in bacterial genomesArchaeal genes in bacterial genomes**
Bacterial speciesBacterial species Best hits to ArchaealBest hits to Archaeal
Thermotoga maritimaThermotoga maritima 451 (24%)451 (24%)
Aquifex aeolicusAquifex aeolicus 246 (16%)246 (16%)
SynechocystisSynechocystis sp. sp. 126 (4%)126 (4%)
Borrelia burgdorferiBorrelia burgdorferi 45 (3.6%)45 (3.6%)
Escherichia coliEscherichia coli 99 (2.3%)99 (2.3%)
** 1010-5-5 over 60% of sequence over 60% of sequence
TIGRTIGR
Evidence for lateral gene transfer in Evidence for lateral gene transfer in ThermotogaThermotoga
1. 81 archaeal-like genes are clustered in 15 regions which range in size from ~ 4 to 20 kb; many share conserved gene order with their archaeal counterparts.
2. Many of the archaeal-like genes correspond to regions with a significantly different base composition than the rest of the chromosome.
3. Some of these regions are associated with a 30 bp repeat structure found only in thermophiles.
4. Initial phylogenetic analyses of some of these genes lends support to lateral gene transfer.
TIGRTIGR
0987 09900989ThermotogaThermotoga ORF ORF
Archaea homologArchaea homolog
Bacterial homologBacterial homolog
Eukaryote homologEukaryote homolog
ThermotogaThermotoga ORF ORF
Archaea homologArchaea homolog
Bacterial homologBacterial homolog
Eukaryote homologEukaryote homolog
0988 0991 0992 0993 0994
0995 0996 0997 0998 0999 1000 10021001 1003
Region TM00987 - TM1003 ( 21kb Archaea-like stretch)Region TM00987 - TM1003 ( 21kb Archaea-like stretch)
79% 69% 69% 72%
72% 69% 65%61% 78%
72%
TransposonTransposon
54%
48%
68% 51%
73%
73%
Regulatory proteinRegulatory protein
TIGRTIGR
0
100
200
300
400
500
600
700
500 1000 1500 2000 2500 3000 3500 4000 4500
Orfs in Target Genome
Best Matches
Best Matches to Prokaryotes
CAUCR BACSU
ECOLI
MYCTU
SYNSP
TIGRTIGR
A. thaliana T1E2.8 is aChloroplast Derived HSP60ARATH -T1E2.8**********ECOLHAEINVIBCHVIBCHRICPR YEASTCHLPNCHLTRAQUAECAMJEHELPYBBURTREPATHEMA BACSUDEIRAMCYTU MCYTU SYNSPSYNSPODONT CPSTMYCGEMYCPNCHLPNCHLTRCHLPN CHLTR ARCFUARCFUMETJAPYRHOMETTH METTHYEAST YEASTYEASTYEAST CELEGYEASTYEASTYEASTCELEG YEAST YEAST CELEGYEASTCELEG CELEGEukaryaArchaeaBacteriaCyano/Cpst
TIGRTIGR
Organellar HSP60sDROMECG12101DROMECG7235DROMECG2830DROMECG16954ARATH At2g33210ARATH F14O13.19ARATH MCP4.7YEAST SWCAUCR ORF03639RICPR gi|3861167ECOLI gi|1790586NEIMEb gi|7227233.AQUAE gi|2984379CHLPN gi|4376399|DEIRA ORF02245BACSU gi|2632916SYNSP gi|1652489SYNSP gi|1001103ARATH At2g28000ARATH MRP15.11MCYTU gi|2909515MCYTU gi|1449370THEMA TM0506BBUR gi|2688576TREPA gi|3322286PORGI ORF00933CHLTE ORF00173HELPY gi|2313084MitochondrialFormsα−ΠροτεοΧψανοβαχτεριαΠλαστιδ Φορµσ
TIGRTIGR
ParA PhylogenypOMB25.BorBBl32.BorbBorbu3Borbu.2BBM32.BorbCP32-6.BorBBA20.BorbCp18.BorbupOMB10.BorpLp7E.BorbBBE19.BorbBBB12.BorbBBN32.BorbBBF13.BorbBBH28.BorbBBK21.BorbBBU05.BorbBBJ17.BorbBBQ08.BorbBBF24.BorbOrfC.BorbuBBG08.BorbPyrabPyrhoYZ24 METJAIncC1.EntaIncC2.EntaINC1 ECOLIINC2 ECOLIOrf.pRK2IncC.pRK2pM3.ParAORF3.PseaeORFB.Psepu2603.Vibch*****ParA.StrcoStrco2Strco3Myctu4Mycle3Deira.ChroSoj.TrepaSOJ BACSURicprYGI1 PSEPUParA.CaucrpAG1.CorglMycleMycle2Rv1708.MycStrcoRv3213.MycHelpy99Helpy26695
A00900.Vib*****ParB.pR27.
ParA.pMT1.parA.pMT1parA.phageParA phageORFA00900
SOPA ECOLIF-PlasmidPhageN13pCD1.YerpepCD1#2.YerpYVe227.YepNL1.SpharpQPH1.Coxbp42d.Rhilep42d.RhietREPA AGRRApRiA4b.AgrpTiB6S3.AgpTi-SAKURApRL8JI.RhiY4CK PlasmParA.RaleupL6.5.PsefChr2.DeiraMP1#2.DeirMP1.DeiraPX02.BacanORF298.CloSojC.HalspBorbu4sojD.Halspplasmid.StSojB.HalspParA.RhoerSOJ MYCPNSOJ MYCGEMinD2.PyraPyrho2pK214.LaclPatA.synspDeira.ParApCHL1.Chlt2GP5D CHLTRpCHL1.ChltChltrChlpsChlps2ChlpnChltr2Chlpn2
Chromosomal
Plasmid and Phage
BBQ08.Borb
Chlamydial
Inc
Borrelia Plasmids
Archaea
Misc
Evolution of Chromosome Partitioning Proteins (ParA)
TIGRTIGR
Horizontal Gene Transfer II
TIGRTIGR
Reconciling a Tree of Life in the Context of Lateral Gene Transfer
TIGRTIGR
rRNA Tree of Complete GenomesMycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changesArchaeaBacteriaEukarya
TIGRTIGR
Whole Genome Phylogeny
TIGRTIGR
rRNA vs. Whole Genome Trees
Mycobacterium tuberculosisBacillus subtilisSynechocystis sp.Caenorhabditis elegansDrosophila melanogasterSaccharomyces cerevisiaeMethanobacterium thermoautotrophicumArchaeoglobus fulgidusPyrococcus horikoshiiMethanococcus jannaschiiAeropyrum pernixAquifex aeolicusThermotoga maritimaDeinococcus radioduransTreponema pallidumBorrelia burgdorferiHelicobacter pyloriCampylobacter jejuniNeisseria meningitidisEscherichia coliVibrio choleraeHaemophilus influenzaeRickettsia prowazekiiMycoplasma pneumoniaeMycoplasma genitaliumChlamydia trachomatisChlamydia pneumoniae0.05 changesArchaeaBacteriaEukarya
TIGRTIGR
Outline of PhylogenomicsGene Evolution EventsPhenotype PredictionsDatabaseSpecies treePresence/AbsenceGene treesCongruenceEvol. DistributionF(x) PredictionsPathway Evolution
TIGRTIGR
TIGRTIGR
Evolutionary Genome Scanning• Distribution patterns/phylogenetic profiles
• Patterns of evolution (ds/dn, correlations, constraints)
• Lateral gene transfers (organellar genes, Pathogenicity islands)• Subdividing gene families• Functional predictions (gene trees, PG profiles)• Gene duplications• Gene loss
• Specialization
• Comparing close relatives
• Species evolution
TIGRTIGR
Evolutionary Diversity Still Poorly Represented in Complete Genomes
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85
Bacteria Archaea Bacteria Archaea A. rRNA tree of Bacterial and Archaeal Major Groups B. Groups with Completed Genomes Highlighted
TIGRTIGR
True Phylogenetic Methods Work Best
MutS2.SynsMutS2.BacsMutS2.HelpMutS2.DeirMutsl.MettMSH4.CelegMSH4.YeastMSH4.humanmMutS.SacoMSH3.yeastC23C11.SpoMSH1.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastMSH5.HumanMSH5.CelegMSH5.YeastMSH2.HumanMSH2.MouseMSH2.YeastMutS.EcoliMutS.SynspMutS.DeiraMutS.Bacsu
MutS.EcoliMutS.SynspMutS.BacsuMutS.DeiraMSH2.HumanMSH2.MouseMSH2.YeastMSH3.HumanREP1.MouseGTBP.MouseGTBP.HumanMSH6.YeastC23C11.SpoMSH1.YeastMSH3.yeastMSH4.CelegMSH4.humanMSH5.CelegMSH5.YeastmMutS.SacoMSH5.HumanMSH4.YeastMutS2.SynsMutS2.BacsMutS2.DeirMutS2.HelpMutsl.Mett
UPGMANeighbor-Joining
TIGRTIGR
Acknowledgements
• Genome duplications: S. Salzberg, J. Heidelberg, O. White, A. Stoltzfus, J. Peterson
• Genome sequences and analysis: J. Heidelberg, T. Read, H. Tettelin, K. Nelson, J. Peterson, R. Fleischmann, D. Bryant
• Horizontal transfers: K. Nelson, W. F. Doolittle
• TIGR: C. Fraser, J. Venter, M-I. Benito, S. Kaul, Seqcore
• $$$: DOE, NSF, NIH, ONR
TIGRTIGR
Evolutionary Diversity Still Poorly Represented in Complete Genomes
Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85Tmf-pendenR-rubrum3Azs-brasi2Rm-vannielRhb-legum8Bdr-japoniSpg-capsulRic-prowazSte-maltopSpr-volutaRub-gelat2Rcy-purpurNis-gonor1Hrh-halch2Alm-vinosmPs-aerugi3E-coliMyx-xanthuBde-stolpiDsv-desulfDsb-postgaC-leptumC-butyric4C-pasteuriEub-barkerC-quercicoHel-chlor2Acp-laidlaM-capricolC-ramosumB-stearothEco-faecalLis-monoc3B-cereus4B-subtilisStc-therm3L-delbruckL-caseiFus-nucleaGlb-violacOlst-lut_CZea mays CNost-muscrSyn-6301Tnm-lapsumFlx-litoraCy-lyticaEmb-brevi2Bac-fragilPrv-rumcolPrb-diffluCy-hutchinFlx-canadaSap-grandiChl-limicoWln-succi2Hlb-pylor6Cam-jejun5Stm-ambofaArb-globifCor-xerosiBif-bifiduCfx-aurantTmc-roseumAqu-pyrophenv-SBAR12env-SBAR16Msr-barkerTpl-acidopMsp-hungatHf-volcaniMb-formiciMt-fervid1Tc-celerArg-fulgidMpy-kandl1Mc-vannielMc-jannascenv-pJP27Sul-acaldaThp-tenaxenv-pJP89Tt-maritimFer-islandMei-ruber4D-radiodurChd-psittaAcbt-capslenv-MC18Pir-staleyLpn-illiniLps-interKSpi-stenosTrp-pallidBor-burgdoSpi-halophBrs-hyodysFib-sucS85
Bacteria Archaea Bacteria Archaea A. rRNA tree of Bacterial and Archaeal Major Groups B. Groups with Completed Genomes Highlighted
TIGRTIGR
TIGRTIGR
TIGTIGRR
Other Other peoplepeople
Mom and DadMom and Dad
S. KarlinS. Karlin
M. FeldmanM. Feldman
A. M. CampbellA. M. Campbell
R. FernaldR. FernaldR. ShaferR. Shafer
D. AckerlyD. AckerlyD. GoldsteinD. Goldstein
M. EisenM. Eisen
J. CourcelleJ. Courcelle
R. MyersR. Myers
C. M. CavanaughC. M. Cavanaugh
P. HanawaltP. Hanawalt
NSFNSF
J. HeidelbergJ. Heidelberg
T.ReadT.Read
S. KaulS. Kaul
M-I BenitoM-I Benito
J. C. VenterJ. C. VenterC. FraserC. Fraser
S. SalzbergS. Salzberg
O. WhiteO. White
K. NelsonK. Nelson
$$$$$$
ONRONRDOEDOE
NIHNIHH. TettelinH. Tettelin
TIGRTIGR
Uses of Phylogenomics IX:
Evolution Within Species
TIGRTIGR
M. tuberculosis strain phylogeny (Indels)
TIGRTIGR
Musser-Type Evolution (Indel Phylogeny)
98a
107a
43a
73a
105a
133a
114a
169a
218a
290a
160a
159a
13a
18a
26a
30a
32a
53a
58a
70a
96a
97a
100a
124a
204a
208a
236a
239a
249a
286a
99a
279a
205a
304a
54a
155a
165a
CD
C15
51a
223a
110a
122a
245a
313a
36a
40a
71a
79a
168a
254a
283a
312a
4a 12a
41a
42a
52a
77a
187a
214a
81a
129a
274a
220a
64a
48a
55a
60a
72a
80a
83a
85a
89a
91a
95a
111a
170a
171a
182a
212a
219a
225a
244a
278a
301a
195a
2a 123a
207a
306a
69a
94a
101a
102a
112a
113a
121a
132a
211a
222a
235a
250a
284a
285a
N1a
87a
117a
120a
136a
191a
237a
261a
37a
131a
269a
240a
63a
197a
206a
75a
108a
263a
128a
172a
162a
86a
38a
109a
119a
248a
6a 65a
68a
189a
66a
106a
227a
31a
78a
202a
213a
62a
163a
224a
256a
276a
287a
173a
291a
252a
281a
295a
310a
251a
151a
188a
292a
140a
141a
103a
174a
229a
259a
H37
Rv
88a
44a
74a
76a
126a
282a
166a
210a
84a
TIGRTIGR
Consistency Indices (Indel Phylogeny)
Calculated over stored trees
CI
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
maximum
average
minimum
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201
Character
TIGRTIGR
TIGRTIGR
Phylogenomics I:Presence/Absence of Homologs
• Important to have complete genomes
• Similarity searches with high “homology threshold” (to prevent false positives)
• Iterative searches (to prevent false negatives)
• Multiple sequence alignments to confirm assignment of homology and to divide up multi-domain proteins
TIGRTIGR
Phylogenomics II:Phylogenetic Analysis of Homologs
• Multiple sequence alignment
• Mask alignment (exclude certain regions)– ambiguous regions of alignment– hypervariable regions and regions with large gaps
• Phylogenetic tree with method of choice
• Robustness checks– bootstrapping– compare trees with different alignments– compare trees with different tree-building methods
TIGRTIGR
Phylogenomics III:Inferring Evolutionary Events
• Infer evolutionary distribution patterns (overlay presence/absence onto species tree)
• Compare gene tree vs. species tree
• Compare gene tree vs. evolutionary distribution
• Infer gene duplication and transfer events
• Combine gene transfer and duplication information with evolutionary distribution analysis to infer gene loss, gene origin, and timing of gene duplications and transfers
TIGRTIGR
Phylogenomics IV:Functional Predictions and Evolution• Overlay experimentally determined functions
onto gene tree
• Infer changes in function– many changes suggests caution should be used in
making new predictions
• Predict functions based on position in tree relative to genes with known functions and based on orthology groups
TIGRTIGR
Phylogenomics V:Pathway Analysis
• Correlated presence/absence of all genes in pathway in different species?– If not, maybe non-orthologous gene displacement– Alternatively, pathway may be different between species
• Correlated evolutionary events for genes in pathway– loss of all genes at once– correlated duplications?
• Compare evolution of function between pathways – The number of times an activity has evolved helps in making
predictions of function/phenotype
TIGRTIGR
Steps in Phylogenomic Analysis
• Create database of genes of interest
• Presence/absence of homologs in complete genomes
• Phylogenetic trees of each gene family
• Infer evolutionary events (gene origin, duplication, loss and transfer)
• Refine presence/absence (orthologs, paralogs, subfamilies)
• Functional predictions and functional evolution
• Analysis of pathways
TIGRTIGR
Evolution as a Screening Method
• Gene duplications
• Gene loss
• Lateral gene transfers
• Organellar genes
• Structurally constrained genes
• Correlated evolutionary changes
TIGRTIGR
Evolutionary Genome Scanning• Distribution patterns/phylogenetic profiles• Patterns of evolution
– (ds/dn)
– Structurally constrained genes– Correlated evolutionary changes
• Lateral gene transfers– Organellar genes– Pathogenicity islands
• Subdividing gene families– Orthologs vs paralogs– Functional predictions– Subfamilies– Motif identification
• Gene duplications
• Gene loss
TIGRTIGR
Genome Sequences Allow “Hypothesisless Research”
• DNA microarrays• Proteomics• GC skew and other nucleotide composition
analyses• Parallel genome wide genetic experiments• Evolutionary genome scanning• Phylogenetic profiles