genômica comparativa - iq usp · genômica comparativa joão carlos setubal iq-usp . outubro 2012...

44
Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 1 11/5/2012 J. C. Setubal

Upload: others

Post on 03-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Genômica comparativa

João Carlos Setubal IQ-USP

outubro 2012

1 11/5/2012 J. C. Setubal

Page 2: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Comparative genomics

• There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available

• Many are of closely related species • Why compare? • How to do it?

5 November 2012 2 JC Setubal

Page 3: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Why comparative genomics?

• To understand the genomic basis of the present – Differences in lifestyle

• pathogen vs. nonpathogen • Obligate vs. free-living

– Host specificity • animals vs. plants, plant X vs. plant Y, etc

– In the case of pathogens: this understanding should help us in fighting disease

• To understand the past – How organisms evolved to be what they are

5 November 2012 3 JC Setubal

Page 4: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Citrus canker Xanthomonas

axonopodis pathovar citri

5 November 2012 4 JC Setubal

Page 5: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Black rot: Xanthomonas campestris pathovar campestris

5 November 2012 5 JC Setubal

Page 6: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

What is comparative genomics • Assuming input is the sequence and its annotation • There are many ways that genomes can be compared

– Different resolutions • Whole genome

– Genome alignments – Synteny (gene order conservation) – Anomalous regions

• Gene-centric – Gene families and unique genes – Gene clustering by function

• Gene sequence variations – Codon usage, SNPs, inDels, pseudogenes

5 November 2012 6 JC Setubal

Page 7: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Resolution

• Low resolution – Scope: entire genomes – Example event: rearrangement

• High resolution – Scope: nucleotide sequences – Example event: single mutation

5 November 2012 JC Setubal 7

Page 8: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Genome-wide evolutionary events

• Replicon rearrangements • Gene/region duplication • Gene/region loss • Chromosome plasmid DNA exchange • Lateral transfer

5 November 2012 8 JC Setubal

Page 9: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Whole replicon alignments: the pairwise case

If the sequences were identical we would see

B

A 5 November 2012 9 JC Setubal

Page 10: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

an inversion

A B C D

A

C B

D

5 November 2012 10 JC Setubal

Page 11: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

A B C D

A

C

D

B

Such inversions seem to happen around the origin or terminus of replication 5 November 2012 11 JC Setubal

Page 12: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)
Page 13: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

5 November 2012 13 JC Setubal

Page 14: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

5 November 2012 14 JC Setubal

Page 15: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

15

Page 16: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Xanthomonas axonopodis pv citri

E. coli K12 Promer alignment

Both are γ proteobacteria! Red: direct; green: reverse

5 November 2012 16 JC Setubal

Page 17: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)
Page 18: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Eisen JA, Heidelberg JF, White O, Salzberg SL. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000;1(6):RESEARCH0011

5 November 2012 18 JC Setubal

Page 19: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Replicon sequence comparisons

• Basic tool: MUMmer – Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast

algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002 Jun 1;30(11):2478-83.

– Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12

• http://mummer.sourceforge.net

5 November 2012 19 JC Setubal

Page 20: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Basics of MUMmer

• It finds Maximal Unique Matches • These are exact matches above a user-specified threshold

that are unique • Exact matches found are clustered and extended (using

dynamic programming) – Result is approximate matches

• Data structure for exact match finding: suffix tree – Difficult to build but very fast

• Nucmer and promer – Both very fast – O(n + #MUMs), n = genome lengths

5 November 2012 20 JC Setubal

Page 21: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Whole replicon multiple alignment

• The program MAUVE • Darling AC, Mau B, Blattner FR, Perna NT.

Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004 Jul;14(7):1394-403.

5 November 2012 21 JC Setubal

Page 22: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

22

Main chromosome alignment MAUVE

5 November 2012 JC Setubal

Page 23: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

23

Chromosome 2 alignment MAUVE

5 November 2012 JC Setubal

Page 24: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

24

RSA 493

RSA 331

Dugway

Chromosome alignment MAUVE

5 November 2012 JC Setubal

Page 25: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

25

Genome Alignments MAUVE

5 November 2012 JC Setubal

Page 26: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

How MAUVE works

• Seed-and-extend hashing • Seeds/anchors: Maximal Multiple Unique

Matches of minimum length k • Result: Local collinear blocks (LCBs) • O(G2n + Gn log Gn), G = # genomes, n =

average genome length

5 November 2012 26 JC Setubal

Page 27: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Alignment algorithm

1. Find Multi-MUMs 2. Use the multi-MUMs to calculate a phylogenetic

guide tree 3. Find LCBs (subset of multi-MUMs; filter out

spurious matches; requires minimum weight) 4. Recursive anchoring to identify additional anchors

(extension of LCBs) 5. Progressive alignment (CLUSTALW) using guide tree

5 November 2012 JC Setubal 27

Page 28: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Gene-centric comparisons

• Homologs: genes that have the same ancestor; in general retain the same function

• Orthologs: homologs from different species (arise from speciation)

• Paralogs: homologs from the same species (arise from duplication) – Duplication before speciation (ancient duplication)

• Out-paralogs; may not have the same function

– Duplication after speciation (recent duplication) • In-paralogs; likely to have the same function

5 November 2012 28 JC Setubal

Page 29: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Gene Set Computations

• Given a set of genomes, represented by their ‘proteomes’ or sets of protein sequences

• Given homlogous relationships (as given for example by orthoMCL) – Which genes are shared by genomes X and Y? – Which genes are unique to genome Z? – Venn or extended Venn diagrams

5 November 2012 29 JC Setubal

Page 30: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

3-way genome comparison

5 November 2012 JC Setubal 30

A B

C

Page 31: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)
Page 32: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)
Page 33: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)
Page 34: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Copyright ©2004 by the National Academy of Sciences

Boussau, Bastien et al. (2004) Proc. Natl. Acad. Sci. USA 101, 9722-9727

Fig. 4. Net gene loss or gain throughout the evolution of the {alpha}-proteobacterial species

5 November 2012 34 JC Setubal

Page 35: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Proteome alignment done with LCS (top: Xcc; bottom: Xac )

Blue: BBHs that are in the LCS; dark blue: BBHs not in the LCS; red: Xac specifics; yellow: Xcc specifics

5 November 2012 35 JC Setubal

Page 36: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

5 November 2012 36 JC Setubal

Page 37: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

5 November 2012 37 JC Setubal

Page 38: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

5 November 2012 38 JC Setubal

Page 39: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

What do the tables show

• conserved blocks (aka “microsyntenic regions”), and how these blocks appear in different replicons across the genomes compared

• some of these blocks are not operons (would need to show strand)

• possible block losses

5 November 2012 39 JC Setubal

Page 40: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)
Page 41: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)
Page 42: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Polymorphism detection

• inDels, SNPs • pseudogenes

5 November 2012 42 JC Setubal

Page 43: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

I

II

Figure 4.

Page 44: Genômica comparativa - IQ USP · Genômica comparativa João Carlos Setubal IQ-USP . outubro 2012 . 11/5/2012 J. C. Setubal 1 . Comparative genomics • There are currently (out/2012)

Gluconate isomerase – A Brucella gene in the process a decay

44