search for genes positively selected during primate evolution by 5′-end-sequence screening of...

6
GENOMICS Vol. 79, Number 5, May 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved. 0888-7543/02 $35.00 657 Article doi:10.1006/geno.2002.6753, available online at http://www.idealibrary.com on IDEAL INTRODUCTION The recent increase in the number of gene and genome sequence databases provides an opportunity to elucidate human evolution at the molecular level by making large- scale comparisons between the genes or genomes of humans and other primates. Although higher primates pos- sess highly conserved genomes, the diversity and speci- ficity of appearance and behavior in humans especially have attracted the attention of many researchers [1]. Positive selection is one key to understanding the functions and roles of proteins from an evolutionary standpoint, because the evidence for positive selection of genes indi- cates episodic changes in protein function. The genes pos- itively selected during the evolution of primates are respon- sible for the process of human evolution, which is referred to as hominization. Identification of the genes that have played important roles in human evolution is essential to understanding human beings. In the widely accepted neutral theory of evolution by Kimura [2], the bulk of DNA divergence between species is Search for Genes Positively Selected during Primate Evolution by 5-End-Sequence Screening of Cynomolgus Monkey cDNAs Naoki Osada, 1,2, * Jun Kusuda, 1 Makoto Hirata, 1 Reiko Tanuma, 1 Munetomo Hida, 3 Sumio Sugano, 3 Momoki Hirai, 4 and Katsuyuki Hashimoto 1 1 Division of Genetic Resources, National Institute of Infectious Diseases, Tokyo, Japan 2 Department of Biological Science, Graduate School of Science, The University of Tokyo, Tokyo, Japan 3 Department of Genome Structure Analysis, Institute of Medical Science, The University of Tokyo, Tokyo, Japan 4 Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan *To whom correspondence and reprint requests should be addressed. Fax: (+81) 3-5285-1181. E-mail: [email protected]. It is possible to assess positive selection by using the ratio of K a (nonsynonymous substi- tutions per plausible nonsynonymous sites) to K s (synonymous substitutions per plausible synonymous sites). We have searched candidate genes positively selected during primate evolution by using 5-end sequences of 21,302 clones derived from cynomolgus monkey (Macaca fascicularis) brain cDNA libraries. Among these candidates, 10 genes that had not been shown by previous studies to undergo positive selection exhibited a K a /K s ratio > 1. Of the 10 candidate genes we found, 5 were included in the mitochondrial respiratory enzyme complexes, suggesting that these nuclear-encoded genes coevolved with mito- chondrial-encoded genes, which have high mutation rates. The products of other candidate genes consisted of a cell-surface protein, a member of the lipocalin family, a nuclear tran- scription factor, and hypothetical proteins. Key Words: primate evolution, positive selection, cDNA library, oligo-capping, cynomolgus monkey driven by mutation and drift, rather than by positive selec- tion. Although biologists generally accept positive selection with respect to morphological traits, evidence of adaptive evolution at the molecular level has been elusive. However, by using recently growing nucleotide sequence data and novel methods, several investigations have shown that many genes, such as those encoding lysozyme [3], sperm lysins [4], major histocompatibility complex (MHC [5]), and protamine [6,7], are suggestively under positive selection at the molecular level. Positive selection can be assessed by using the rate of nonsynonymous nucleotide substitutions per nonsynonymous site (K a ) and the rate of synonymous nucleotide substitutions per synonymous site (K s ). When positive selection is strong enough, K a may exceed K s (K a /K s > 1). Endo et al. [8] searched for positively selected genes through 3,595 groups of homologous genes and found that only 17 gene groups showing K a /K s values > 1 for more than half the pairs in each gene group can be the candidate genes for positive selection. Liberles et al. [9] recently con- structed a database that serves as a phylogenic molecular catalog widely covering GenBank DNA sequences, and they

Upload: naoki-osada

Post on 11-Oct-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search for Genes Positively Selected during Primate Evolution by 5′-End-Sequence Screening of Cynomolgus Monkey cDNAs

Articledoi:10.1006/geno.2002.6753, available online at http://www.idealibrary.com on IDEAL

Search for Genes Positively Selected during Primate Evolution by 5�-End-Sequence Screening of

Cynomolgus Monkey cDNAsNaoki Osada,1,2,* Jun Kusuda,1 Makoto Hirata,1 Reiko Tanuma,1 Munetomo Hida,3

Sumio Sugano,3 Momoki Hirai,4 and Katsuyuki Hashimoto1

1Division of Genetic Resources, National Institute of Infectious Diseases, Tokyo, Japan2Department of Biological Science, Graduate School of Science, The University of Tokyo, Tokyo, Japan

3Department of Genome Structure Analysis, Institute of Medical Science, The University of Tokyo, Tokyo, Japan4Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan

*To whom correspondence and reprint requests should be addressed. Fax: (+81) 3-5285-1181. E-mail: [email protected].

It is possible to assess positive selection by using the ratio of Ka (nonsynonymous substi-tutions per plausible nonsynonymous sites) to Ks (synonymous substitutions per plausiblesynonymous sites). We have searched candidate genes positively selected during primateevolution by using 5�-end sequences of 21,302 clones derived from cynomolgus monkey(Macaca fascicularis) brain cDNA libraries. Among these candidates, 10 genes that had notbeen shown by previous studies to undergo positive selection exhibited a Ka/Ks ratio > 1.Of the 10 candidate genes we found, 5 were included in the mitochondrial respiratoryenzyme complexes, suggesting that these nuclear-encoded genes coevolved with mito-chondrial-encoded genes, which have high mutation rates. The products of other candidategenes consisted of a cell-surface protein, a member of the lipocalin family, a nuclear tran-scription factor, and hypothetical proteins.

Key Words: primate evolution, positive selection, cDNA library, oligo-capping,cynomolgus monkey

INTRODUCTION

The recent increase in the number of gene and genomesequence databases provides an opportunity to elucidatehuman evolution at the molecular level by making large-scale comparisons between the genes or genomes ofhumans and other primates. Although higher primates pos-sess highly conserved genomes, the diversity and speci-ficity of appearance and behavior in humans especiallyhave attracted the attention of many researchers [1].Positive selection is one key to understanding the functionsand roles of proteins from an evolutionary standpoint,because the evidence for positive selection of genes indi-cates episodic changes in protein function. The genes pos-itively selected during the evolution of primates are respon-sible for the process of human evolution, which is referredto as hominization. Identification of the genes that haveplayed important roles in human evolution is essential tounderstanding human beings.

In the widely accepted neutral theory of evolution byKimura [2], the bulk of DNA divergence between species is

GENOMICS Vol. 79, Number 5, May 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.0888-7543/02 $35.00

driven by mutation and drift, rather than by positive selec-tion. Although biologists generally accept positive selectionwith respect to morphological traits, evidence of adaptiveevolution at the molecular level has been elusive. However,by using recently growing nucleotide sequence data andnovel methods, several investigations have shown thatmany genes, such as those encoding lysozyme [3], spermlysins [4], major histocompatibility complex (MHC [5]), andprotamine [6,7], are suggestively under positive selection atthe molecular level. Positive selection can be assessed byusing the rate of nonsynonymous nucleotide substitutionsper nonsynonymous site (Ka) and the rate of synonymousnucleotide substitutions per synonymous site (Ks). Whenpositive selection is strong enough, Ka may exceed Ks (Ka/Ks> 1).

Endo et al. [8] searched for positively selected genesthrough 3,595 groups of homologous genes and found thatonly 17 gene groups showing Ka/Ks values > 1 for morethan half the pairs in each gene group can be the candidategenes for positive selection. Liberles et al. [9] recently con-structed a database that serves as a phylogenic molecularcatalog widely covering GenBank DNA sequences, and they

657

Page 2: Search for Genes Positively Selected during Primate Evolution by 5′-End-Sequence Screening of Cynomolgus Monkey cDNAs

Article doi:10.1006/geno.2002.6753, available online at http://www.idealibrary.com on IDEAL

TABLE 1: Genes showing Ka/Ks values > 1

Monkey Acc. RefSeq Acc. Symbol Description Ka (�102) Ks (�102) Ka/Ks Amino acid length

AB072015 NM_004074 COX8 Cytochrome oxidase 11.93±2.23 6.37±2.53 1.87* 69subunit VIII

AB072016 NM_004549 NDUFC2 NADH-ubiquinone oxido 8.75±1.54 7.37±2.57 1.19 119reductase subunit b14.5b

AB072017 NM_000611 CD59 CD59 antigen, allele A 8.27±1.41 4.90±1.26 1.69* 128

AB072018 CD59 antigen, allele B 8.24±1.43 7.51±1.99 1.10 128

AB072019 NM_024300 MGC2217 Hypothetical protein 7.89±1.76 2.79±1.47 2.82* 85MGC2217

AB072020 NM_001867 COX7C Cytochrome oxidase 7.47±1.97 0 ∞†*** 63subunit VIIc

AB072021 NM_001647 APOD Apolipoprotein D 6.94±1.10 5.20±1.41 1.34 189

AB072022 NM_002432 MNDA Myeloid cell nuclear 6.44±0.75 6.32±1.17 1.02 407differentiation antigen

AB072023 NM_017866 FLJ20533 Hypothetical protein 5.05±1.06 4.36±1.19 1.16 161FLJ20533

AB072025 NM_001685 ATP5J ATP synthase, subunit F6 4.44±1.26 4.12±2.11 1.08 108

AB072026 NM_006004 UQCRH Ubiquinol-cytochrome c 4.09±1.35 3.60±2.30 1.14 91reductase hinge protein

The standard errors are shown. The difference between Ka and Ks is significant at 5% level (*) and at 0.1% level (***).Last three genes below the dashed line showed Ka lower than the averaged Ks.†There are no synonymous substitutions, but eight non-synonymous substitutions.

calculated the Ka/Ks value of each phylogenic branch abovethe chordata and embryophyta. However, the number ofnonhuman primate DNA sequences deposited in the pub-lic databases is considerably smaller than those of humans.Actually, the entries of nonhuman primate nucleotidesequences in the GenBank (GenBank release 122) amount to< 0.3% of the human entries. We previously constructedoligo-capped cDNA libraries of the cynomolgus monkeybrain and accumulated ~ 20,000 5�-end sequences [10,11]. Inthis study, we calculated the Ka/Ks value for each codingregion of the 5�-end sequence and its human ortholog, andsystematically identified the 10 candidate genes under pos-itive selection during the evolution of hominids (repre-sented by humans) and Old World monkeys (representedby the cynomolgus monkey).

RESULTS

First, we searched 21,302 5�-end sequences of cDNAs fromcynomolgus monkey brain, cerebellar cortex (QccE), andparietal lobe (QnpA) against the NCBI RefSeq database [12]by BLAST program (cut-off value: 1e-60). In total, 12,815(60.2% of total 5�-end sequences) sequences matched to 3478RefSeq sequences (average redundancy is 3.7). Repeatsequences at the 5� end were masked with Repbase Update[14] before the database search. Each RefSeq sequence ofthe protein-coding region and the corresponding sequence

658

of the 5� end of monkey cDNA was then aligned usingSmith-Waterman’s algorithm, and the Ka/Ks value of eachalignment was subsequently estimated. The average frag-ment length compared was 424 bp. As expected, manyfalse-positive alignments yielded Ka/Ks values > 1 becauseof ambiguous nucleotide sequences or BLAST mismatchesto closely related paralogous genes. We checked all possi-ble alignments to reduce incorrect alignments. If necessary,we selected one clone and determined the entire codingsequence of monkey cDNA clones only when we couldobtain the full-length cDNA from our libraries. During thisprocess, MHC sequences were removed from the subjectsfor the following analysis, because they are already shownto evolve at a rapid rate, and many monkey homologs havebeen registered in the public databases. One hypotheticalprotein-coding gene, GW128, which showed a Ka/Ks valueof > 1, contained an Alu-type repeat in the coding sequence(CDS). We considered that the RefSeq annotation of GW128was inadequate and removed it from further analysis. Wefound no other repetitive sequences and CpG dinucleotidebiases, which might elevate the mutation rate of the genes,among the rest of the candidate genes. Ultimately, weobtained 10 novel pairs of genes with Ka/Ks values > 1, orwith nonsynonymous substitutions but no synonymoussubstitutions. The genes with Ka/Ks values > 1 are listed inTable 1 and arranged according to their Ka values. All out-puts such as the CDS alignment of paired genes by the pro-gram are accessible at our web site (http://www.

GENOMICS Vol. 79, Number 5, May 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Page 3: Search for Genes Positively Selected during Primate Evolution by 5′-End-Sequence Screening of Cynomolgus Monkey cDNAs

Articledoi:10.1006/geno.2002.6753, available online at http://www.idealibrary.com on IDEAL

FIG. 1. Phylogenic trees of COX7C(A) and COX8 (B). Ka and Ks (Ka/Ks)values multiplied by 100 are shownabove each branch. The branchesyielding Ka values higher than Ks

values are shown as bold lines.

A

B

nih.go.jp/yoken/genebank/Supplementary_data/KaKs/).Interestingly, 5 of the 10 genes (ATP5J, COX7C, COX8,NDUFC2, and UQCRH) are related to mitochondrial energymetabolism, and all of them are components of complexesconsisting of nuclear-encoded and mitochondrial-encodedsubunits. APOD (a member of the lipocalin family), CD59(a cell-surface protein), and MNDA (a nuclear transcriptionfactor) also yielded Ka/Ks values > 1. The CD59 sequence ofcynomolgus monkey clearly represented two types of alle-les in which three synonymous substitutions and one non-synonymous substitution were observed, and both allelesyielded Ka/Ks > 1.

In addition, we found two hypothetical proteins havingKa/Ks values > 1. Those hypothetical genes were discoveredthrough recent large-scale cDNA sequencing projects, andmost of their proteins have been conceptually translated.Although these coding sequences were confirmed by cDNAor expressed-sequence tag (EST) sequences alone, the openreading frame (ORF) conservation between humans andcynomolgus monkeys may help predict the protein-codingsequences in cDNA [13].

Among the 10 genes, we conducted further phylogenicanalysis for cytochrome c oxidase subunit VIIc (COX7C) and

GENOMICS Vol. 79, Number 5, May 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

VIII (COX8), to evaluate our Ka/Ks screening method.Although most COX subunits, COX1 [15,16], COX2 [17,18],COX4 [19,20], COX6A [21], and COX7A1 [22], andcytochrome c itself [23], were shown to have coevolved atvariable rates in their lineages during primate evolution, noreports have ever indicated the accelerated rate of nonsyn-onymous substitutions in COX7C and COX8. COX7C is a sin-gle-copy gene, and while COX8 has two isoforms in bovines,only the liver-type isoform has ever been isolated in primates[24,25]. We amplified the genomic regions of COX7C andCOX8 from an additional five primate species (bonobo, chim-panzee, orangutan, gorilla, and baboon) by PCR, andsequenced the coding regions. We then constructed theancestral sequence of each lineage through the parsimoniousmethod of the PAUP* program by using a total evidence pri-mate phylogeny [26,27] and other mammalian sequencesavailable in the public databases: the bovine and mousesequences for COX7C, and the bovine, rat, and mousesequences for COX8 (Figs. 1A and 1B). The results showedthat the acceleration rate of both COX7C and COX8 mainlyappeared after the divergence of hominids and Old Worldmonkeys, indicating that strong positive selection hadoccurred. Because COX7C and COX8 are rather small pro-teins, many lineages showed no synonymous substitutions.

659

Page 4: Search for Genes Positively Selected during Primate Evolution by 5′-End-Sequence Screening of Cynomolgus Monkey cDNAs

Article doi:10.1006/geno.2002.6753, available online at http://www.idealibrary.com on IDEAL

DISCUSSION

In this study, we screened the cynomolgus monkey braincDNA libraries by using 5�-end sequences. Although thelibraries have already been searched for unidentified humangenes [11,12], this was the first attempt to analyze them froman evolutionary standpoint. We found 10 candidate genesundergoing positive selection during the evolution ofhominids and Old World monkeys.

Because the Ks of small genes varies widely, Ka/Ks values> 1 may be observed as statistical noise. In this analysis, sta-tistical significance is observed in only 4 of 10 candidate genesbecause of the relatively high standard errors. To reduce sta-tistical noise, we could apply averaged Ks values. Because weobtained > 300 full-length sequences of cynomolgus monkeycDNAs that have human orthologs in the RefSeq database inprevious studies [11,12], we could estimate the average syn-onymous substitutions per site between humans andcynomolgus monkeys. By using this data set, we estimatedthat the averaged Ks value is roughly 0.64 (unpublished data).Although this does not account for region-specific mutationrates, the variation is probably smaller than the statisticalnoise associated with small genes. If the individual Ks valuesare replaced with the averaged Ks value, the last three geneslisted in the Table 1 (FLJ20533, ATP5J, and UQCRH) do notshow higher Ka values. However, it is noteworthy that thesegenes have a certain potential to undergo positive selection.We should also note that we use the cDNA libraries derivedfrom two monkeys. Polymorphic sites in cynomolgus mon-keys may falsely raise the nonsynonymous substitution ratebetween humans.

Positive selection affects the part of the protein sequencethat alters protein function, rather than the entire codingsequence uniformly. Because the Ka/Ks values were calcu-lated over the entire protein sequence in our analysis, theregional mutations influencing Ka/Ks values in larger pro-tein-encoding sequences may have been missed. It may bepossible to overcome this problem by window analysis, whichsets a narrow window to calculate the local Ka/Ks values [8].Window analysis can be effective in more comprehensiveanalyses of long sequences such as full-length cDNA com-parisons.

Of the 10 novel pairs of genes we identified, 5 (ATP5J,COX7C, COX8, NDUFC2, and UQCRH) were included in themitochondrial respiratory enzyme complexes. NDUFC2,UQCRH, and ATP5J are subunits of complex I, III, and V,respectively, whereas COX7C and COX8 are the componentsof complex IV. These four complexes are the mixture of pro-teins encoded in both mitochondrial DNA and nuclear DNA,and an accelerated substitution rate of MTCYB in complex III[28], and some COX subunits in complex IV have beenreported. Many studies have inferred the occurrence of coevo-lution between mitochondrial-encoded genes and nuclear-encoded genes [16,29,30]. To maintain their functional cohe-sion, proteins that interact with each other invoke

660

coevolution. Schmidt et al. [30] calculated the correlationbetween the physical distance from each residue of COX sub-units to the residues of other subunits and the nonsynony-mous substitution, and found that the nuclear-encodedresidues in close physical proximity to mitochondrial-encoded residues evolve more slowly than the other nuclear-encoded residues, whereas the mitochondrial-encodedresidues in close physical proximity to nuclear-encodedresidues evolve more rapidly than the other mitochondrial-encoded residues. The mitochondrial genome mutates rapidlybecause it replicates frequently, lacks DNA proofreading andrepair mechanisms, and is vulnerable to cumulative damageby free radicals generated by the electron transport chain. The5- to 10-fold faster mutation rate of the mitochondrial genomemay accelerate the evolutionary rate of nuclear DNA. Thishypothesis applies not only to genetically mixed-origin com-plexes but also to genes that directly interact with mitochon-drial-encoded genes, such as the gene for nuclear-encodedcytochrome c. Moreover, brain oxidative metabolism isimportant for brain function and activity, and mounting evi-dence suggests that defects in energy metabolism contributeto the pathogenesis of brain diseases such as Alzheimer’s dis-ease and Parkinson’s disease [31,32]. In our own list of genes,the protein level of APO-D has also been reported to increasein the cerebrospinal fluid and hippocampus of Alzheimer’spatients [33]. It is interesting that the proteins responsible forbrain function have been positively selected during primateevolution.

APO-D is a member of the lipocalin family and is prima-rily associated with the high-density lipoproteins in humanplasma [34]. Although APO-D can bind many ligands, suchas cholesterol, progesterone, pregnenolone, bilirubin, andarachidonic acid, it is unclear if any, or all of these, representits physiological ligands. Among apolipoprotein genes,APOA also exhibits a peculiar evolutional episode. It isthought to have been generated by gene duplication of plas-minogen after primate divergence and also to have evolvedin the hedgehog lineage with convergence [35]. Moreover,Huby et al. [36] showed that chimpanzee APOA promoterexhibited a fivefold elevation in transcriptional activity ascompared with its human counterpart. Another apolipopro-tein, APO-E, has also been shown to be the major risk factorfor Alzheimer’s disease [37] and the APOE2 and APOE3 alle-les [38] also seem to have arisen after the divergence ofhumans and chimpanzees (chimpanzees express only theancestral APOE4 alleles). It would be interesting to examinewhether the structure and function of other lipoproteins isconserved or altered during primate evolution.

cDNA sequences for hypothetical proteins from speciesclosely related to humans may become effective referencesfor human cDNA annotation [12]. Because some cDNA clonesare truncated at the 5� end and part of the 3�-untranslatedregion (UTR), or carry unspliced intron, they may have noapparent ORFs, making it difficult to identify the true codingregion. If the ORF of unidentified human cDNA is conservedin the monkey cDNA sequence, the ORF may code protein.

GENOMICS Vol. 79, Number 5, May 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Page 5: Search for Genes Positively Selected during Primate Evolution by 5′-End-Sequence Screening of Cynomolgus Monkey cDNAs

Articledoi:10.1006/geno.2002.6753, available online at http://www.idealibrary.com on IDEAL

Although this type of analysis is often accomplished usingDNA sequences of other species, such as the mouse, primatecDNAs are more useful resources because of their geneticallyclose relation to humans. Even in the absence of informationabout the function of a hypothetical protein, evidence of pos-itive selection may provide clues to elucidate the role andfunction of the protein.

In this study, we applied COX subunit genes to phylo-genic analysis for the following reasons. First, evidence hasbeen found for positive selection of the other members ofCOX subunits and cytochrome c, and COX is one of the mostcharacterized complexes and the crystal structures of all sub-units [39]. In the literature, COX1, COX2, COX4, COX6A, andCOX7A1 are reported to exhibit higher amino acid replace-ment in primate evolution. COX7C and COX8 have alsoexhibited rapid evolution in simian primate evolution andhave shown one of the highest rates of nonsynonymous sub-stitution among nuclear-encoded COX subunits.

Although our candidate genes require additional analysisto confirm the evolutionary processes in detail, they may helpus understand primate evolution. Recently, some studies pro-vide large-scale sequence data on primates [40,41]. However,the genomic or cDNA sequences of primates have yet beenvery scarce, although we have made great technologicalstrides in the analysis of the human genome sequence. Theadvances in primate genetic information will enlightenepisodes of evolution among the extant primate species,including human beings.

MATERIALS AND METHODS

Construction of oligo-capped cDNA libraries. RNA extraction from cynomol-gus monkey tissues and construction of oligo-capped cDNA libraries have beendescribed [10,42].

DNA seqencing. The entire sequence of clones was determined on an ABI 3700and 310 automated sequencers (Perkin-Elmer) by the primer walking method.Cycle sequencing was done with an ABI PRISM BigDye Terminator Sequencingkit (Perkin-Elmer).

PCR. Genomic DNA samples from chimpanzees were obtained from theperipheral blood of adult individuals. The DNA samples from the bonobo,gorilla, and orangutan were obtained from the Epstein–Barr virus (EBV)–trans-formed lymphoblastoid cells, established by Takafumi Ishida (The Universityof Tokyo). Genomic DNA from baboon was purchased from Research GeneticsInc. (RH05CONBAB). Several primers were designed to amplify the exons ofCOX7C and COX8 genes from genomic DNA. Two pairs of primers (COX7C,forward, 5�-CGCAGAGCTTCCAGCAGC-3�, reverse, 5�-TTGATCCATTTCCAGAGGCTG-3�; COX8, forward, 5�-GGCTGACAGCTTTTTGTGGTGT-3�,reverse, 5�-TTTATTGTTACAAGGGGGATCC-3�), based on the clone sequenceof the cynomolgus monkey, were used except for some species described later.For COX7C of the orangutan, each exon was amplified individually using twopairs of primers (COX7C, forward 2, 5�-TGTCCTTTTCAACAGACTAGAG-TATACG-3�, and reverse 2, 5�-TCTGGCCTCCTGTAGGTCTC-3�, forward 3, 5�-CCATTCATTGCCTAGTGCTC-3�, and reverse 3, 5�-TGCTAGAGAACAT-GAGATATCGATGTAA-3�). COX7C, forward 2 and COX7C reverse 2, COX7C,forward 4 (5�-CCATTTCATCCATCCTCGGT-3�), and COX7C reverse 3 wereapplied to amplify the baboon COX7C genomic region. Annealing temperatureswere varied from 50�C to 58�C. Sequences of COX7C and COX8 in chimpanzee,bonobo, gorilla, orangutan, and baboon were deposited in the DDBJ database(accession numbers: AB072312–AB072331).

GENOMICS Vol. 79, Number 5, May 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Computational analysis. The number of ambiguous nucleotides in the 5�-endone-pass sequences was limited to two, and the sequence after the third Nnucleotide was cut off. A search was made for each 5�-end sequence of mon-key brain cDNA against the RefSeq in the NCBI database, and then thesequences were aligned by Smith-Waterman’s algorithm in the FASTA pro-gram package [43]. When the alignment program inserted gaps in humanRefSeq sequences, both the human gap and the monkey insertion were skippedin the calculations. We calculated the Ka/Ks values by using the algorithm ofLi [44], which counts the number of degenerate sites and estimates Ka/Ks val-ues, with slight modification. The method of Li uses empirically derived rela-tive likelihood of codon change to weight different pathways. However, weweighted the possible paths equally, excluding those that contained stop codonas an intermediate step. When a synonymous substitution rate was too high,or a monkey cDNA sequence and a human RefSeq sequence were mapped todifferent human chromosome loci by BLAT mapping at the UCSC(http://genome.ucsc.edu/), we assumed that the two sequences are paralo-gous. The threshold of synonymous substitution rate was determined by usingthe RNASE2 sequence of humans and cynomolgus monkeys, which show thehighest variations (Ks = 0.171) between experimentally confirmed homologs ofhumans and cynomolgus monkeys in the GenBank database [45].

In the phylogenic analysis for COX subunits, we used the phylogenic treesthat were generally accepted [26,27] instead of estimating the phylogenic treesfrom the COX sequences because some nucleotide sequences of COX subunitswere identical. The PAUP* 4.0 program was used to infer the ancestralsequences for each node with the parsimonious algorithm. By using the samemethod described earlier, we calculated the Ka and Ks values between eachnode.

ACKNOWLEDGMENTSThis study was supported in part by the health science research grant for the humangenome program from the Ministry of Health and Welfare of Japan.

REFERENCES

1. Gibbons, A. (1998). Which of our genes make us human? Science 281: 1432–1434.2. Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press,

Cambridge, UK.3. Messier, W., and Stewart, C. B. (1997). Episodic adaptive evolution of primate lysozymes.

Nature 385: 151–154.4. Yang, A., Swanson, J., and Vacquier, V. (2000). Maximum-likelihood analysis of molec-

ular adaptation in abalone sperm lysin reveals variable selective pressures among line-ages and sites. Mol. Biol. Evol. 17: 1446–1455.

5. Hughes, A. L., and Nei, M. (1988). Pattern of nucleotide substitution at major histo-compatibility complex class I loci reveals overdominant selection. Nature 335: 167–170.

6. Rooney, A. P., and Zhang, J. (1999). Rapid evolution of a primate sperm protein: relax-ation of functional constraint or positive Darwinian selection. Mol. Biol. Evol. 16: 706–710.

7. Wyckoff, G. J., Wang, W., and Wu, C. I. (2000). Rapid evolution of male reproductivegenes in the descent of man. Nature 403: 304–309.

8. Endo, T., Ikeo, K., and Gojobori, T. (1996). Large-scale search for genes on which posi-tive selection may operate. Mol. Biol. Evol. 13: 685–690.

9. Liberles, D., Schreiber, D. R., Govindarajan, S., Chamberlin, S. G., and Benner, S. A.(2001). The adaptive evolution database (TAED). Genome Biol. 2: research0028.1–0028.6.

10. Hida, M., et al. (2000). Construction and preliminary characterization of full lengthenriched cDNA libraries for nonhuman primates. Primate Res. 16: 95–110.

11. Osada, N., et al. (2001). Assignment of 118 novel cDNAs of cynomolgus monkey brainto human chromosomes. Gene 275: 31–37.

12. Pruitt, K. D., and Maglott, D. R. (2001). RefSeq and LocusLink: NCBI gene-centeredresources. Nucleic Acids Res. 29: 137–140.

13. Osada, N., et al. (2001). Prediction of unidentified human genes on the basis of sequencesimilarity to novel cDNAs from cynomolgus monkey brain. Genome Biol. 3:research0028.1-6.

14. Jurka, J. (2000). Repbase Update: a database and an electronic journal of repetitive ele-ments. Trends Genet. 9: 418–420.

15. Andrews, T. D., and Easteal, S. (2000). Evolutionary rate acceleration of cytochrome oxi-dase subunit I in simian primates. J. Mol. Evol. 50: 562–568.

16. Wu, W., Schmidt, T. R., Goodman, M., and Grossman, L. I. (2000). Molecular evolutionof cytochrome c oxidase subunit I in primates: is there coevolution between mito-chondiral and nuclear genomes? Mol. Phyogenet. Evol. 17: 294–304,doi:10.1006/mpev.2000.0833.

17. Ramharack, R., and Deeley, R. G. (1992). Structure and evolution of primate cytochromec oxidase subunit II gene. J. Biol. Chem. 262: 14014–14021.

661

Page 6: Search for Genes Positively Selected during Primate Evolution by 5′-End-Sequence Screening of Cynomolgus Monkey cDNAs

Article doi:10.1006/geno.2002.6753, available online at http://www.idealibrary.com on IDEAL

18. Adkins, R. M., Honeycutt, R. L., and Disotell, T. R. (1996). Evolution of eutheriancytochrome c oxidase subunit II: heterogeneous rates of protein evolution and alteredinteraction with cytochrome c. Mol. Biol Evol. 13: 1393–1404.

19. Lomax, M. I., Hewett-Emmett, D., Yang, T. L., and Grossman, L. I. (1992). Rapid evolu-tion of the human gene for cytochrome c oxidase subunit IV. Proc. Natl. Acad. Sci. USA89: 5266–5270.

20. Wu, W., Goodman, M., Lomax, M. I., and Grossman, L. I. (1997). Molecular evolution ofcytochrome c oxidase subunit IV: evidence for positive selection in simian primates. J. Mol.Evol. 44: 477–491.

21. Schmidt, T. R., Jaradat, S. A., Goodman, M., Lomax, M. I., and Grossman, L. I. (1997).Molecular evolution of cytochrome c oxidase: rate variation among subunit VIa isoforms.Mol. Biol. Evol. 14: 595–601.

22. Schmidt, T. R., Goodman, M., and Grossman, L. I. (1999). Molecular evolution of theCOX7A gene family in primates. Mol. Biol. Evol. 16: 619–626.

23. Baba, M. L., Darga, L. L., Goodman, M., and Czeluzniak, J. (1981). Evolution of cytochromec investigated by the maximum parsimony method. J. Mol. Biol. 17: 197–213.

24. Rizzuto, R., et al. (1989). A gene specifying subunit VIII of human cytochrome c oxidase islocalized to chromosome 11 and is expressed in both muscle and non-muscle tissue. J. Biol.Chem. 264: 10595–10600.

25. Sleelan, R. S., and Grossman, L. I. (1997). Structural organization and promoter analysisof the bovine cytochrome oxidase subunit VIIc gene. J. Biol. Chem. 272: 10175–10181.

26. Goodman, M., et al. (1998). Toward a phylogenetic classification of Primates based onDNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9: 585–598,doi:10.1006/mpev.1998.0495.

27. Stewart, C. B., and Disotell, T. R. (1998). Primate evolution—in and out of Africa. Curr.Biol. 8: R582–588.

28. Andrews, T. D., Jermiin, L. S., and Easteal, S. (1998). Accelerated evolution of cytochromeb in simian primates: adaptive evolution in concert with other mitochondrial proteins? J.Mol. Evol. 47: 249–257.

29. Cann, R. L., Brown, W. M., and Wilson, A. C. (1984). Polymorphic sites and the mecha-nism of evolution in human mitochondrial DNA. Gentetics 106: 479–499.

30. Schmidt, T. R., Wu, W., Goodman, M., and Grossman, L. I. (2001). Evolution of nuclear-and mitochondrial-encoded subunit interaction in cytochrome c oxidase. Mol. Biol. Evol.

662

Sequence data from this article have been deposited with the DDBJ/EMBL/GenBank Data

18: 563–569.31. Shoffner, J. M., et al. (1993). Mitochondrial DNA variants observed in Alzheimer disease

and Parkinson’s disease patients. Genomics 17: 171–184.32. Kim, S. H., Vlkolinsky, R., Caims, N., Fountoulakis, M., and Lubec, G. (2001). The reduc-

tion of NADH ubiquinone oxidoreductase 24- and 75-kDa subunits in brains of patientswith Down syndrome and Alzheimer’s disease. Life Sci. 68: 2741–2750.

33. Terrisse, L., et al. (1998). Increased levels of apolipoprotein D in cerebrospinal fluid andhippocampus of Alzheimer’s patients. J. Neurochem. 71: 1643–1650.

34. Rassart, E., et al. (2000). Apolipoprotein D. Biochim. Biophys. Acta 1482: 185–198.35. Lawn, R. M., Schwartz, K., and Patthy, L. (1997). Convergent evolution of apolipoprotein(a)

in primates and hedgehog. Proc. Natl. Acad. Sci. USA 94: 11992–11997.36. Huby, T., et al. (2001). Functional analysis of the chimpanzee and human apo(a) promoter

sequences. J. Biol. Chem. 276: 22209–22214.37. Srittmatter, W. J., et al. (1993). Apolipoprotein E: high-avidity binding to _-amyloid and

increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc. Natl.Acad. Sci. USA 90: 1977–1981.

38. Hanlon, C. S., and Rubinsztein, D. C. (1995). Arginine residues at codons 112 and 158 inthe apolipoprotein E gene correspond to the ancestral state in humans. Atherosclerosis 112:85–90.

39. Tsukihara, T., et al. (1996). The whole structure of the 13-subunit oxidized cytochrome coxidase at 2.8Å. Science 272: 1136–1144.

40. Johnson. M., et al. (2001). Positive selection of a gene family during the emergence ofhumans and African apes. Nature 413: 514–519.

41. Fujiyama, A., et al. (2002). Construction and analysis of a human–chimpanzee compara-tive clone map. Science 295: 131–134.

42. Suzuki, Y., et al. (2000). Statistical analysis of the 5� untranslated region of human mRNAusing “Oligo-Capped” cDNA libraries. Genomics 64: 286–297, doi:10.1006/geno.2000.6076.

43. Pearson, W. R. (2000). Flexible sequence similarity searching with the FASTA3 programpackage. Methods Mol. Biol. 132: 85–219.

44. Li, W. H. (1993). Unbiased estimation of the rates of synonymous and nonsynonymoussubstitution. J. Mol. Evol. 36: 96–99.

45. Rosenberg, H. F., Dyer, K. D., Tiffany, H. L., and Gonzalez, M. (1995). Rapid evolu-tion of a unique family of primate ribonuclease genes. Nat. Genet. 10: 219–223.

GENOMICS Vol. 79, Number 5, May 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Libraries under accession numbers AB072015–AB072026 and AB072312–AB072331.