sukanya manna cheng-yuan liou national taiwan university department of computer science and ...

32
IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong Kong Pseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes Sukanya Manna Cheng-Yuan Liou National Taiwan University Department of Computer Science and Information Engineering

Upload: reya

Post on 25-Feb-2016

58 views

Category:

Documents


3 download

DESCRIPTION

IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong Kong Pseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes. Sukanya Manna Cheng-Yuan Liou National Taiwan University Department of Computer Science and - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong KongPseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes

Sukanya MannaCheng-Yuan Liou

National Taiwan University Department of Computer Science and Information Engineering

Page 2: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

2

• NTU land size ~ 360 平方公里• huge botanic garden in high mountains>3000meters 台大扁泥蟲• eleven colleges, • 54 departments, • 96 graduate institutes (which offer 96 Master's

programs and 83 doctoral programs), research centers: the Division of Population and

Gender Studies, the Center for Condensed Matter Sciences, the Center for Biotechnology, Japanese Research Center, and the Biodiversity Center.

• The number of students reached 29,877 in 2004, including the students from the division of Continuing Education & Professional development

Page 3: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

3

Concepts used

• Under neutral evolution– Rate of synonymous substitution = Rate of

Nonsynonymous substitutions– Estimation of rate of synonymous and

nonsynonymous substitutions has become an important subject in molecular evolution

Page 4: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

4

Why?

• ‘Draft’ theory: initial and intuitive evolution model

• Part of evol based on a set of core systems.• They are relatively invariant (hard and strong)

over evolution.• Qualitative changes occur as distinct systems are

integrated.• Separate systems conjoin to produce distinctively

patterns of evol change.• This model provides evol flexiblity.

Page 5: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

5

Assumptions

• For comparative genomics– nondistantly related species like human and

mouse share the vast majority of their genes• amino acid sequences obtained for each

enzymes share a great similarity like homologous genes

Page 6: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

6

Our Approach

Amino acid sequences for each enzyme proteins.

Least Mismatch between two aa sequences, and selection of trio

Generating the nucleotide (nt) sequences for the aa

sequences from the trio.

Perform dn/ds ratio test among the pair of species

with randomly generated nt sequences.

Overview of the steps

undertaken

Page 7: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

7

Our Approach (contd.)

AATGATTGTCAAGAGCAT AAG TTT TAT

NDCQEHKFY

Nt to AA

AA to nt

AATGATTGTCAAGAGCAT AAG TTT TAT

AACGATTGCCAAGAACAT AAG TTT TAT

AATGACTGTCAGGAGCAC AAG TTC TAT

R

E

V

E

R

S

E

All possible combinations,

Infeasible,High space and time complexity

Page 8: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

8

Basic Concepts

• Nucleotides– A,G,T,C (DNA)– A,G,U,C(mRNA)

• Amino acid– 20 naturally occurring– Coded by a triplet of

nucleotide bases (referred as a codon)

• Synonymous/Nonsynonymous substitution– A substitution of a base

within the codon that does not / does change the type amino acid it represents.

43=64 codons code for 20 amino acids

3 of the 64 codons are stop codons that marks the end of a gene section (ie. end of exon)

Page 9: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

9

Model Used

• Jukes and Cantor (one parameter method)– Assumes rate of substitution between all pairs of

A,T,C,G is the same.–

where p is either ps or pn (result is ds and dn respectively)• ps = Sd/S• pn = Nd/N• Sd / Nd – total # of synonymous / nonsynonymous difference

for all codons compared• S / N – numbers of synonmous / nonsynonmous sites

)341ln(

43 pd

Page 10: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

10

Our Approach (contd.)

• Normally, we have seen that the amino acids sequences are obtained from nucleotide sequences by using the universal genetic mapping table.

• Generating the nucleotide sequences from the amino acid sequences is a concept of reverse process.

• For a particular amino acid sequences, there can be numerous nucleotide sequences for all the possible combination of codons.

• But generation of all sequences is infeasible because of very large time and space complexity.

• We use here this reverse mechanism, to match the closely related nucleotide sequences of the respective amino acids.

• The next slide will show, what method we have used to proceed with this situation.

Page 11: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

11

Comparison of Frequency of Codons

0

10000000

20000000

30000000

40000000

GCUGCG

GAUGAG

GGUGGG

AUUUAA

AAAUUG

CUAAAU

CCCCAA

CGCAGA

UCCAGU

ACCGUU

GUGUAC

Codons

Freq

uenc

y of

Cod

ons

Human Mouse Rat

Calculated the total frequency of codons from each

genome

Calculated cumulative probability of the codons from these

frequencies

Our Approach (contd.)

Page 12: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

12

Our Approach (contd.)

• Generated the random sequences using the cumulative probability:– Best matched pairs

• Generate sequences for trio– All pairs with least mismatch

• Generate sequences only with the all pairs

Page 13: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

13

A = [a1, a2,…an] aa sequences for HUMAN

B = [b1, b2,…bm] aa sequences for MOUSE

C = [c1, c2,…ck] aa sequences for RAT

a1b1, a1b3, a2b2, a1r2, a2r5, a1r1, b1r1, b1r2, b2r6

aa sequences with least mismatch

Selecting the best matched pair

Choose randomly such that three pairs will be:

a1b1, b1r2 and a1r2a1b1r2 is the trio

Our Approach (contd.)Calculate all

possible mismatch between

AB, BC and CA

Page 14: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

14

Our Approach (contd.)

• Least mismatch means maximum similarity in their sequences.

• Let A, B, C be the amino acid sequences for human, mouse and rat respectively.

• We compare the two sequences with one amino acids at a time.

• Calculated the possible mismatches between all sequences .

• Separated out the ones with least mismatch.• Here the example is shown for the amino acid

sequences for one particular enzyme.

Page 15: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

15

Our Approach (contd.)

• Generalized algorithm– Pathway analysis by model of Nei and Gojobori– No transition matrix used here– No phylogenetic tree for codon comparison– Sliding buffer of 3 characters used for codon

comparison.– Used Jukes and Cantor’s model for multiple

nucleotide substitution correction.

Page 16: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

16

AATGATTGTCAAGAGCAT AAG TTT TAT

AATGACTGTCAGGAGCAC AAG TTC TAT

Sliding buffer compares codons for each sequences each

time

Use Nei and Goobori’s model to

calculate the pathwaysand Jukes and

Cantor’s model to get dn/ds.

Our Approach (contd.)

Page 17: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

17

Experimental Results (Best matched pairs)

dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all.

Numbers in brackets is the length of sequence compared.

Comparison of dn/ds Ratio for the Enzymes found in all three Species

012345

Glu

tam

ate

dehy

drog

enas

e

(558

)

Aci

dph

osph

atas

e(1

57)

Cat

alas

e

(526

)

Tran

sald

olas

e

(337

)

Car

boxy

lest

era

se

(563

)

Glu

cose

-6-

phos

phat

ase

(357

)

Pyrid

oxal

phos

phat

ase

(241

)

Pyru

vate

oxid

ase

(3

92)

Enzymes

dn/d

s Rat

io

HM MR HR

Page 18: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

18

Experimental Results (contd.)(Best matched pairs)

dn/ds Ratio of Human-Mouse and Mouse-Rat Comparison for the Enzymes not Common in them.

dn/ds Ratio for the Enzymes Not Common for Human-Mouse and Human-Rat Respectively

05

101520

Pero

xida

se(2

23)

Tryp

sin

(246

)

Am

inop

eptid

ase

(9

65)

Mal

ate

dehy

drog

ena

se

(333

)

Olig

opep

tida

se A

(6

86)

Enzymes

dn/d

s Rat

io

HM HR

Page 19: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

19

Experimental Results (contd.)(Best matched pairs)

Valid dn/ds Ratio of the Mouse-Rat Comparison for the Enzymes found only in these two species but not Human

dn/ds Ratio for Mouse-Rat Comparison

02468

101214

Lact

ate

dehy

drog

enas

e (3

32)

Lipa

se

(137

)

Hex

okin

ase

(2

98)

Lyso

phos

phol

ipa

se

(230

)

Pyru

vate

carb

oxila

se

(622

)

Ald

ehyd

eox

idas

e

(133

4)

Glu

cose

dehy

drog

enas

e (4

93)

Enzymes

dn/d

s R

atio

Page 20: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

20

Experimental Results (contd.)(All pairs with least mismatch)

00.050.1

0.150.2

0.250.3

0.35G

luta

mat

ede

hydr

ogen

ase

(558

)

Aci

dph

osph

atas

e

(157

)

Cat

alas

e

(526

)

Hex

okin

ase

(2

98)

Glu

cose

-6-

phos

phat

ase

(3

57)

Pyrid

oxal

phos

phat

ase

(2

41)

Pyru

vate

oxid

ase

(3

92)

Enzymes

dn/d

s Rat

io

HM HR MR

dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all.

This graph shows the enzymes with only one least mismatch sequence pair for each species pair.

Page 21: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

21

Experimental Results (contd.)(All pairs with least mismatch)

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4

No. of genes for each case

dn/d

s rat

io

HM HR MR

0

1

2

3

4

5

1 2 3

No. of genes for each case

dn/d

s Rat

io

HM HR MR

Transaldolase Carboxylesterase

For all three species comparison, enzymes with more than one least mismatch. dn/ds ratio of human-mouse, mouse-rat and human-rat comparison for the enzymes common in all. The graphs show the enzymes with multiple least mismatch sequence pair for each species pair.The label in x-axis indicates the sequence pair number and is insignificant.

Page 22: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

22

Experimental Results (contd.)(All pairs with least mismatch)

0.95

0.96

0.97

0.98

0.99

1

1 2 3

Different gene pair

dn/d

s Rat

io

Trypsin

0.144

0.146

0.148

0.15

0.152

0.154

0.156

1 2

Different gene pairs

dn/d

s Rat

io

Alkaline phosphatase

Enzymes found only for Human-mouse comparison

Page 23: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

23

Experimental Results (contd.)(All pairs with least mismatch)

0

0.05

0.1

0.15

0.2

0.25

Lactatedehydrogenase

Lysophospholipase Tyrosine Pyruvate carboxylase

Enxymes

dn/d

s Rat

io

3.17

3.175

3.18

3.185

3.19

3.195

1 2

Different gene pairs

dn/d

s Rat

ioAldehyde oxidase

Enzymes found only for Mouse-rat comparison

Page 24: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

24

Experimental Results (contd.)(All pairs with least mismatch)

0

5

10

15

2025

30

35

40

Ribonuclease Oligopeptidase-A Tyrosine

Enzymes

dn/d

s Rat

io

Enzymes found only for human-rat comparison

Page 25: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

25

Experimental Results (contd.)(All pairs with least mismatch)

Estimated time for aa substitution per for the enzymes

Page 26: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

26

0100200300400500

Glu

tam

ate

dehy

drog

enas

e

Aci

dph

osph

atas

e

Cat

alas

e

Tran

sald

olas

e

Car

boxy

lest

eras

e

Glu

cose

-6-

phos

phat

ase

Pyrid

oxal

phos

phat

ase

Pyru

vate

oxid

ase

Enzymes

Tim

e in

Myr

HM HR MR

Experimental Results (contd.)(All pairs with least mismatch)

Estimated time for aa substitution per for the enzymes common in all three species

Page 27: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

27

Summary• Rate of synonymous substitution varies

considerably from gene to gene• Many enzymes, inspite of being proteins in

nature, do not provide the valid results• Accuracy rate is about 50% to 55%.• Nonsynonymous sites were too high for

some cases, so no valid result.

Page 28: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

28

Summary (contd.)

• In cases of enzymes, the variation is high in comparison to the ordinary proteins as mentioned in the case study with ordinary proteins by Prof Li.

• Enzymes possess restoration capability after chemical reactions, that means it can resist many mutations.

Page 29: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

29

Summary (contd.)

• Here, in this work, estimated time for mutation is around 5 times more (~400 Myr).

• We can say that they are 5 times stronger than ordinary proteins.

Page 30: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

30

Summary (contd.)Enzymes Li’s Approach Our Approach

Codons compared (H-M/R)

dn/ds ratio

Codons compared(H-M)

dn/ds ratio

Codons compared(H-R)

dn/ds ratio

Aldolase A 363 0.03 363 0.10 363 NVR

Creatine kinase M 380 0.06 381 0.10 381 0.10

Lactate dehydrogenase A 331 0.02 332 0.50 332 0.53

Glyceraldehyde-3-phosphate dehydrogenase

332 0.09 332 NVR 332 NVR

Glutamine synthetase 371 0.08 372 0.10 372 0.11

Adenine phosphoribosyltransferase

179 0.19 179 NVR 179 NVR

Carbonic anhydrase I 260 0.26 259 NVR 259 0.26

Comparison between already Established Result and Our Approach

(NVR – No Valid Results, H-Human, M-Mouse, R-Rat)

Page 31: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

31

Summary (contd.)

• None of the values can be considered to be accurate.• All may vary with the parameters or the assumption

taken into account.• We can just observe the nature of selection – whether

neutral or purifying or diversifying.• In this table, the variations have occurred , but we don’t

know which pair of genes have been taken by Prof Li.• For our case, the random sequence generated might have

varied a lot from what the nucleotide sequence for that gene should have been originally.

• NVR means- not valid result.• In these cases the ratio could not be calculated as the

value of ds obtained was not a valid number that could be computed.

Page 32: Sukanya Manna Cheng-Yuan Liou National Taiwan University        Department of Computer Science and         Information Engineering

Thank YouSuppl. Materials in website.Evol model is Hairy model.