sukanya manna cheng-yuan liou national taiwan university department of computer science and ...
DESCRIPTION
IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong Kong Pseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes. Sukanya Manna Cheng-Yuan Liou National Taiwan University Department of Computer Science and - PowerPoint PPT PresentationTRANSCRIPT
IAENG_IMECS_ICB II, Room E 10:45~13:00, March 21, 2007, Hong KongPseudo-Reverse Approach in Genetic Evolution: An Empirical Study with Enzymes
Sukanya MannaCheng-Yuan Liou
National Taiwan University Department of Computer Science and Information Engineering
2
• NTU land size ~ 360 平方公里• huge botanic garden in high mountains>3000meters 台大扁泥蟲• eleven colleges, • 54 departments, • 96 graduate institutes (which offer 96 Master's
programs and 83 doctoral programs), research centers: the Division of Population and
Gender Studies, the Center for Condensed Matter Sciences, the Center for Biotechnology, Japanese Research Center, and the Biodiversity Center.
• The number of students reached 29,877 in 2004, including the students from the division of Continuing Education & Professional development
3
Concepts used
• Under neutral evolution– Rate of synonymous substitution = Rate of
Nonsynonymous substitutions– Estimation of rate of synonymous and
nonsynonymous substitutions has become an important subject in molecular evolution
4
Why?
• ‘Draft’ theory: initial and intuitive evolution model
• Part of evol based on a set of core systems.• They are relatively invariant (hard and strong)
over evolution.• Qualitative changes occur as distinct systems are
integrated.• Separate systems conjoin to produce distinctively
patterns of evol change.• This model provides evol flexiblity.
5
Assumptions
• For comparative genomics– nondistantly related species like human and
mouse share the vast majority of their genes• amino acid sequences obtained for each
enzymes share a great similarity like homologous genes
6
Our Approach
Amino acid sequences for each enzyme proteins.
Least Mismatch between two aa sequences, and selection of trio
Generating the nucleotide (nt) sequences for the aa
sequences from the trio.
Perform dn/ds ratio test among the pair of species
with randomly generated nt sequences.
Overview of the steps
undertaken
7
Our Approach (contd.)
AATGATTGTCAAGAGCAT AAG TTT TAT
NDCQEHKFY
Nt to AA
AA to nt
AATGATTGTCAAGAGCAT AAG TTT TAT
AACGATTGCCAAGAACAT AAG TTT TAT
AATGACTGTCAGGAGCAC AAG TTC TAT
…
…
…
R
E
V
E
R
S
E
All possible combinations,
Infeasible,High space and time complexity
8
Basic Concepts
• Nucleotides– A,G,T,C (DNA)– A,G,U,C(mRNA)
• Amino acid– 20 naturally occurring– Coded by a triplet of
nucleotide bases (referred as a codon)
• Synonymous/Nonsynonymous substitution– A substitution of a base
within the codon that does not / does change the type amino acid it represents.
43=64 codons code for 20 amino acids
3 of the 64 codons are stop codons that marks the end of a gene section (ie. end of exon)
9
Model Used
• Jukes and Cantor (one parameter method)– Assumes rate of substitution between all pairs of
A,T,C,G is the same.–
where p is either ps or pn (result is ds and dn respectively)• ps = Sd/S• pn = Nd/N• Sd / Nd – total # of synonymous / nonsynonymous difference
for all codons compared• S / N – numbers of synonmous / nonsynonmous sites
)341ln(
43 pd
10
Our Approach (contd.)
• Normally, we have seen that the amino acids sequences are obtained from nucleotide sequences by using the universal genetic mapping table.
• Generating the nucleotide sequences from the amino acid sequences is a concept of reverse process.
• For a particular amino acid sequences, there can be numerous nucleotide sequences for all the possible combination of codons.
• But generation of all sequences is infeasible because of very large time and space complexity.
• We use here this reverse mechanism, to match the closely related nucleotide sequences of the respective amino acids.
• The next slide will show, what method we have used to proceed with this situation.
11
Comparison of Frequency of Codons
0
10000000
20000000
30000000
40000000
GCUGCG
GAUGAG
GGUGGG
AUUUAA
AAAUUG
CUAAAU
CCCCAA
CGCAGA
UCCAGU
ACCGUU
GUGUAC
Codons
Freq
uenc
y of
Cod
ons
Human Mouse Rat
Calculated the total frequency of codons from each
genome
Calculated cumulative probability of the codons from these
frequencies
Our Approach (contd.)
12
Our Approach (contd.)
• Generated the random sequences using the cumulative probability:– Best matched pairs
• Generate sequences for trio– All pairs with least mismatch
• Generate sequences only with the all pairs
13
A = [a1, a2,…an] aa sequences for HUMAN
B = [b1, b2,…bm] aa sequences for MOUSE
C = [c1, c2,…ck] aa sequences for RAT
a1b1, a1b3, a2b2, a1r2, a2r5, a1r1, b1r1, b1r2, b2r6
aa sequences with least mismatch
Selecting the best matched pair
Choose randomly such that three pairs will be:
a1b1, b1r2 and a1r2a1b1r2 is the trio
Our Approach (contd.)Calculate all
possible mismatch between
AB, BC and CA
14
Our Approach (contd.)
• Least mismatch means maximum similarity in their sequences.
• Let A, B, C be the amino acid sequences for human, mouse and rat respectively.
• We compare the two sequences with one amino acids at a time.
• Calculated the possible mismatches between all sequences .
• Separated out the ones with least mismatch.• Here the example is shown for the amino acid
sequences for one particular enzyme.
15
Our Approach (contd.)
• Generalized algorithm– Pathway analysis by model of Nei and Gojobori– No transition matrix used here– No phylogenetic tree for codon comparison– Sliding buffer of 3 characters used for codon
comparison.– Used Jukes and Cantor’s model for multiple
nucleotide substitution correction.
16
AATGATTGTCAAGAGCAT AAG TTT TAT
AATGACTGTCAGGAGCAC AAG TTC TAT
Sliding buffer compares codons for each sequences each
time
Use Nei and Goobori’s model to
calculate the pathwaysand Jukes and
Cantor’s model to get dn/ds.
Our Approach (contd.)
17
Experimental Results (Best matched pairs)
dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all.
Numbers in brackets is the length of sequence compared.
Comparison of dn/ds Ratio for the Enzymes found in all three Species
012345
Glu
tam
ate
dehy
drog
enas
e
(558
)
Aci
dph
osph
atas
e(1
57)
Cat
alas
e
(526
)
Tran
sald
olas
e
(337
)
Car
boxy
lest
era
se
(563
)
Glu
cose
-6-
phos
phat
ase
(357
)
Pyrid
oxal
phos
phat
ase
(241
)
Pyru
vate
oxid
ase
(3
92)
Enzymes
dn/d
s Rat
io
HM MR HR
18
Experimental Results (contd.)(Best matched pairs)
dn/ds Ratio of Human-Mouse and Mouse-Rat Comparison for the Enzymes not Common in them.
dn/ds Ratio for the Enzymes Not Common for Human-Mouse and Human-Rat Respectively
05
101520
Pero
xida
se(2
23)
Tryp
sin
(246
)
Am
inop
eptid
ase
(9
65)
Mal
ate
dehy
drog
ena
se
(333
)
Olig
opep
tida
se A
(6
86)
Enzymes
dn/d
s Rat
io
HM HR
19
Experimental Results (contd.)(Best matched pairs)
Valid dn/ds Ratio of the Mouse-Rat Comparison for the Enzymes found only in these two species but not Human
dn/ds Ratio for Mouse-Rat Comparison
02468
101214
Lact
ate
dehy
drog
enas
e (3
32)
Lipa
se
(137
)
Hex
okin
ase
(2
98)
Lyso
phos
phol
ipa
se
(230
)
Pyru
vate
carb
oxila
se
(622
)
Ald
ehyd
eox
idas
e
(133
4)
Glu
cose
dehy
drog
enas
e (4
93)
Enzymes
dn/d
s R
atio
20
Experimental Results (contd.)(All pairs with least mismatch)
00.050.1
0.150.2
0.250.3
0.35G
luta
mat
ede
hydr
ogen
ase
(558
)
Aci
dph
osph
atas
e
(157
)
Cat
alas
e
(526
)
Hex
okin
ase
(2
98)
Glu
cose
-6-
phos
phat
ase
(3
57)
Pyrid
oxal
phos
phat
ase
(2
41)
Pyru
vate
oxid
ase
(3
92)
Enzymes
dn/d
s Rat
io
HM HR MR
dn/ds Ratio of the Human-Mouse, Mouse-Rat and Human-Rat Comparison for the Enzymes Common in all.
This graph shows the enzymes with only one least mismatch sequence pair for each species pair.
21
Experimental Results (contd.)(All pairs with least mismatch)
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4
No. of genes for each case
dn/d
s rat
io
HM HR MR
0
1
2
3
4
5
1 2 3
No. of genes for each case
dn/d
s Rat
io
HM HR MR
Transaldolase Carboxylesterase
For all three species comparison, enzymes with more than one least mismatch. dn/ds ratio of human-mouse, mouse-rat and human-rat comparison for the enzymes common in all. The graphs show the enzymes with multiple least mismatch sequence pair for each species pair.The label in x-axis indicates the sequence pair number and is insignificant.
22
Experimental Results (contd.)(All pairs with least mismatch)
0.95
0.96
0.97
0.98
0.99
1
1 2 3
Different gene pair
dn/d
s Rat
io
Trypsin
0.144
0.146
0.148
0.15
0.152
0.154
0.156
1 2
Different gene pairs
dn/d
s Rat
io
Alkaline phosphatase
Enzymes found only for Human-mouse comparison
23
Experimental Results (contd.)(All pairs with least mismatch)
0
0.05
0.1
0.15
0.2
0.25
Lactatedehydrogenase
Lysophospholipase Tyrosine Pyruvate carboxylase
Enxymes
dn/d
s Rat
io
3.17
3.175
3.18
3.185
3.19
3.195
1 2
Different gene pairs
dn/d
s Rat
ioAldehyde oxidase
Enzymes found only for Mouse-rat comparison
24
Experimental Results (contd.)(All pairs with least mismatch)
0
5
10
15
2025
30
35
40
Ribonuclease Oligopeptidase-A Tyrosine
Enzymes
dn/d
s Rat
io
Enzymes found only for human-rat comparison
25
Experimental Results (contd.)(All pairs with least mismatch)
Estimated time for aa substitution per for the enzymes
26
0100200300400500
Glu
tam
ate
dehy
drog
enas
e
Aci
dph
osph
atas
e
Cat
alas
e
Tran
sald
olas
e
Car
boxy
lest
eras
e
Glu
cose
-6-
phos
phat
ase
Pyrid
oxal
phos
phat
ase
Pyru
vate
oxid
ase
Enzymes
Tim
e in
Myr
HM HR MR
Experimental Results (contd.)(All pairs with least mismatch)
Estimated time for aa substitution per for the enzymes common in all three species
27
Summary• Rate of synonymous substitution varies
considerably from gene to gene• Many enzymes, inspite of being proteins in
nature, do not provide the valid results• Accuracy rate is about 50% to 55%.• Nonsynonymous sites were too high for
some cases, so no valid result.
28
Summary (contd.)
• In cases of enzymes, the variation is high in comparison to the ordinary proteins as mentioned in the case study with ordinary proteins by Prof Li.
• Enzymes possess restoration capability after chemical reactions, that means it can resist many mutations.
29
Summary (contd.)
• Here, in this work, estimated time for mutation is around 5 times more (~400 Myr).
• We can say that they are 5 times stronger than ordinary proteins.
30
Summary (contd.)Enzymes Li’s Approach Our Approach
Codons compared (H-M/R)
dn/ds ratio
Codons compared(H-M)
dn/ds ratio
Codons compared(H-R)
dn/ds ratio
Aldolase A 363 0.03 363 0.10 363 NVR
Creatine kinase M 380 0.06 381 0.10 381 0.10
Lactate dehydrogenase A 331 0.02 332 0.50 332 0.53
Glyceraldehyde-3-phosphate dehydrogenase
332 0.09 332 NVR 332 NVR
Glutamine synthetase 371 0.08 372 0.10 372 0.11
Adenine phosphoribosyltransferase
179 0.19 179 NVR 179 NVR
Carbonic anhydrase I 260 0.26 259 NVR 259 0.26
Comparison between already Established Result and Our Approach
(NVR – No Valid Results, H-Human, M-Mouse, R-Rat)
31
Summary (contd.)
• None of the values can be considered to be accurate.• All may vary with the parameters or the assumption
taken into account.• We can just observe the nature of selection – whether
neutral or purifying or diversifying.• In this table, the variations have occurred , but we don’t
know which pair of genes have been taken by Prof Li.• For our case, the random sequence generated might have
varied a lot from what the nucleotide sequence for that gene should have been originally.
• NVR means- not valid result.• In these cases the ratio could not be calculated as the
value of ds obtained was not a valid number that could be computed.
Thank YouSuppl. Materials in website.Evol model is Hairy model.