mapping of a novel human carbonyl reductase, cbr3, and ribosomal pseudogenes to human chromosome...

6
SHORT COMMUNICATION Mapping of a Novel Human Carbonyl Reductase, CBR3, and Ribosomal Pseudogenes to Human Chromosome 21q22.2 Koji Watanabe, Chiyo Sugawara, Ayako Ono, Yasuhito Fukuzumi, Shoko Itakura, Masaaki Yamazaki, Hiroyuki Tashiro, Kazutoyo Osoegawa,* Eiichi Soeda,* ,1 and Touru Nomura Bioscience Research Laboratory, FUJIYA Co., Ltd., 228 Soya, Hadano 257, Japan; and *RIKEN, 3-1-1 Koyadai, Tsukuba 305, Japan Received November 19, 1997; accepted May 12, 1998 To find the genes contributing to Down syndrome, we constructed a 4-Mb sequence-ready map spanning chromosome 21q22.2 with megabase-sized cosmid/P1- derived artificial chromosome (PAC) contigs. The re- striction map with rare cutting enzymes, followed by sequencing from the clustering sites, has defined CpG islands and revealed the genes associated with CpG islands (Accession No. D85771). Of these, two human carbonyl reductases (CBR; EC1.1.1.184) were found in a PAC 25P16 clone. CBR catalyzes the reduction of a large number of biologically and pharmacologically active carbonyl compounds to their corresponding al- cohols and has been mapped in 21q22.1. To confirm these results, we sequenced the PAC clone in shotgun strategies and identified a novel carbonyl reductase, designated CBR3, 62 kb downstream from the original CBR. In addition, three ribosomal pseudogenes, L23a, S9, and L3, and some cDNAs with ESTs were mapped in the sequence. In conclusion, the sequence analysis for CpG islands predicted from the megabase-sized contigs will reveal and identify the genes involved in Down syndrome. © 1998 Academic Press The DNA sequencing of a 170-kb P1-derived arti- ficial chromosome (PAC) was almost completed in shotgun strategies (6, 13), using ABI 373S automated DNA sequencers (ABI). Five micrograms of PAC DNA prepared by the modified alkaline–SDS method (8) was sheared by sonication or by passing 30 times through a 29-gauge needle. Fragments larger than 1 kb were collected with Chroma Spin-1000 (Clontech) and cloned into the SmaI site of the pUC19 vector. Approximately 1200 recombinant clones were ran- domly selected and submitted to a cycle sequencing reaction, using Thermo Sequenase, DYEnamic En- ergy Transfer Dye Primer (Amersham), and the Dye Terminator Cycle Sequencing FS kit (ABI) according to the manufacturer’s instructions. More than 550 bases could be read from a clone, and the resulting sequences with a redundancy of 6.0 per base were assembled to form 10 large contigs with the aid of S/SQ software (SDC, Tokyo). The gaps between con- tigs were sealed by the primer extension method. The total sequence of PAC 25P16 consisted of 170,529 bp (Accession No. AB003151), which were lo- cated near two STS markers, CBR and D21S333 (Fig. 1) (9). Several software packages were used for the sequence analysis: First, the repetitive elements were eliminated by RepeatMasker. Second, the se- quences of exons were predicted with the aid of GRAIL version 1.3 (10) and then searched in GenBank1EMBL1DDBJ sequence databases and expressed sequence tag database (dbEST) (2) by BLASTn (1). Of the ESTs present in the predicted exons, those containing consensus sequences in the exon–intron boundaries were extended further by the primer-walking method, using the cDNAs con- taining with the ESTs as templates (7, 12). They are IMAGE cDNAs 70506, 128097, 300856, 546810, 139776, and 529844 (Research Genetics, Inc.) (Acces- sion Nos. AB004848 –AB004853). All three exons of CBR were found in the present sequence (nucleotide positions 56,080 –56,461, 57,007–57,114, and 58,503–59,221) (3). The homolo- gous sequences with CBR were observed in the pre- dicted exons F37, F38, and F39 (Table 1, Fig. 2). The sequence of 529844 cDNA was identical with the sequences of F38 and F39 (nucleotides 12,935–123,990 and 132,136 –132,634, respectively). Furthermore, another EST (Accession No. AA320697) corresponded to two exon–intron junctions between F37 and F38 and between F38 and F39 (Table 1, Fig. 2). The sequence comparison with CBR (3.3kb) revealed that Sequence data from this article have been deposited with the DDBJ, EMBL, and GenBank Data Libraries under Accession Nos. AB003151 and AB004848 –AB004854. 1 To whom correspondence and reprint requests should be ad- dressed. Telephone: 81-298-36-9122. Fax: 81-298-36-9140. E-mail: [email protected]. GENOMICS 52, 95–100 (1998) ARTICLE NO. GE985380 95 0888-7543/98 $25.00 Copyright © 1998 by Academic Press All rights of reproduction in any form reserved.

Upload: koji-watanabe

Post on 16-Oct-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

SHORT COMMUNICATION

Mapping of a Novel Human Carbonyl Reductase, CBR3,and Ribosomal Pseudogenes to Human

Chromosome 21q22.2

Koji Watanabe, Chiyo Sugawara, Ayako Ono, Yasuhito Fukuzumi, Shoko Itakura,Masaaki Yamazaki, Hiroyuki Tashiro, Kazutoyo Osoegawa,*

Eiichi Soeda,*,1 and Touru Nomura

Bioscience Research Laboratory, FUJIYA Co., Ltd., 228 Soya, Hadano 257, Japan; and *RIKEN, 3-1-1 Koyadai, Tsukuba 305, Japan

Received November 19, 1997; accepted May 12, 1998

To find the genes contributing to Down syndrome,we constructed a 4-Mb sequence-ready map spanningchromosome 21q22.2 with megabase-sized cosmid/P1-derived artificial chromosome (PAC) contigs. The re-striction map with rare cutting enzymes, followed bysequencing from the clustering sites, has defined CpGislands and revealed the genes associated with CpGislands (Accession No. D85771). Of these, two humancarbonyl reductases (CBR; EC1.1.1.184) were found ina PAC 25P16 clone. CBR catalyzes the reduction of alarge number of biologically and pharmacologicallyactive carbonyl compounds to their corresponding al-cohols and has been mapped in 21q22.1. To confirmthese results, we sequenced the PAC clone in shotgunstrategies and identified a novel carbonyl reductase,designated CBR3, 62 kb downstream from the originalCBR. In addition, three ribosomal pseudogenes, L23a,S9, and L3, and some cDNAs with ESTs were mappedin the sequence. In conclusion, the sequence analysisfor CpG islands predicted from the megabase-sizedcontigs will reveal and identify the genes involved inDown syndrome. © 1998 Academic Press

The DNA sequencing of a 170-kb P1-derived arti-ficial chromosome (PAC) was almost completed inshotgun strategies (6, 13), using ABI 373S automatedDNA sequencers (ABI). Five micrograms of PACDNA prepared by the modified alkaline–SDS method(8) was sheared by sonication or by passing 30 timesthrough a 29-gauge needle. Fragments larger than 1kb were collected with Chroma Spin-1000 (Clontech)and cloned into the SmaI site of the pUC19 vector.Approximately 1200 recombinant clones were ran-

domly selected and submitted to a cycle sequencingreaction, using Thermo Sequenase, DYEnamic En-ergy Transfer Dye Primer (Amersham), and the DyeTerminator Cycle Sequencing FS kit (ABI) accordingto the manufacturer’s instructions. More than 550bases could be read from a clone, and the resultingsequences with a redundancy of 6.0 per base wereassembled to form 10 large contigs with the aid ofS/SQ software (SDC, Tokyo). The gaps between con-tigs were sealed by the primer extension method.

The total sequence of PAC 25P16 consisted of170,529 bp (Accession No. AB003151), which were lo-cated near two STS markers, CBR and D21S333(Fig. 1) (9). Several software packages were used forthe sequence analysis: First, the repetitive elementswere eliminated by RepeatMasker. Second, the se-quences of exons were predicted with the aid ofGRAIL version 1.3 (10) and then searched inGenBank1EMBL1DDBJ sequence databases andexpressed sequence tag database (dbEST) (2) byBLASTn (1). Of the ESTs present in the predictedexons, those containing consensus sequences in theexon–intron boundaries were extended further bythe primer-walking method, using the cDNAs con-taining with the ESTs as templates (7, 12). Theyare IMAGE cDNAs 70506, 128097, 300856, 546810,139776, and 529844 (Research Genetics, Inc.) (Acces-sion Nos. AB004848 –AB004853).

All three exons of CBR were found in the presentsequence (nucleotide positions 56,080 –56,461,57,007–57,114, and 58,503–59,221) (3). The homolo-gous sequences with CBR were observed in the pre-dicted exons F37, F38, and F39 (Table 1, Fig. 2). Thesequence of 529844 cDNA was identical with thesequences of F38 and F39 (nucleotides 12,935–123,990and 132,136 –132,634, respectively). Furthermore,another EST (Accession No. AA320697) correspondedto two exon–intron junctions between F37 and F38and between F38 and F39 (Table 1, Fig. 2). Thesequence comparison with CBR (3.3kb) revealed that

Sequence data from this article have been deposited with theDDBJ, EMBL, and GenBank Data Libraries under Accession Nos.AB003151 and AB004848–AB004854.

1 To whom correspondence and reprint requests should be ad-dressed. Telephone: 81-298-36-9122. Fax: 81-298-36-9140. E-mail:[email protected].

GENOMICS 52, 95–100 (1998)ARTICLE NO. GE985380

95

0888-7543/98 $25.00Copyright © 1998 by Academic Press

All rights of reproduction in any form reserved.

FIG

.1.

Gen

omic

orga

niza

tion

dedu

ced

from

the

com

plet

ese

quen

ceof

aP

AC

25P

16cl

one

anch

ored

at21

q22.

2.T

heP

AC

was

loca

ted

prec

isel

yne

artw

oS

TS

mar

kers

,CB

Ran

dD

21S

333

(9).

(A)T

hera

recu

ttin

gsi

tes

wit

hN

otI,

Eag

I,B

ssH

II,a

ndS

acII

repr

esen

ting

CpG

isla

nds.

(B)L

ocal

G1

Cco

nten

t(%

)in

anav

erag

esp

anof

500

bpw

ith

50-b

pw

indo

ws,

usin

gG

EN

ET

YX

-Mac

soft

war

epa

ckag

epr

ogra

m(S

DC

).(C

,D)

The

dist

ribu

tion

ofC

pGis

land

sth

atw

ere

defin

edac

cord

ing

toG

ardi

ner-

Gar

den

and

Fro

mm

er(4

)and

the

dist

ribu

tion

ofA

lu-a

ndno

n-A

lu-i

nter

sper

sed

repe

ats,

L1,

ME

R,M

IR,a

ndL

TR

elem

ents

,w

hich

wer

ese

arch

edou

tby

Rep

eatM

aske

r(h

ttp:

//ftp

.gen

ome.

was

hing

ton.

edu/

cgi-

bin/

Rep

eatM

aske

r).

(E)

Loc

atio

nof

pred

icte

dex

ons

byG

RA

ILve

rsio

n1.

3(1

0).(

F)

DN

Aho

mol

ogou

sre

gion

sid

enti

fied

byB

LA

ST

(1)

and

corr

espo

ndin

gac

cess

ion

num

bers

,exc

ept

data

ofW

ashU

-Mer

kE

ST

proj

ect

(5).

(G)

The

inte

grat

ion

ofIM

AG

EC

onso

rtiu

mcD

NA

clon

es(7

)w

ith

ES

Tda

taof

the

Was

hU-M

erk

proj

ect

into

the

PA

Cse

quen

ce.

96 SHORT COMMUNICATION

a 11.2-kb segment located 62 kb telomeric of CBRcan encode a new member of CBR, designated CBR3,because the nucleotide and predicted amino acid se-quences of CBR3 are highly homologous with those ofCBR (77.0 and 84.0%), taking into account synony-mous substitutions, particularly, in the NADP-bind-ing domain and short-chain dehydrogenase/reduc-tase family signature (Accession No. AB004854).Compared with the genomic structure of CBR3 andCBR, the introns and surrounding regions were dif-ferent from one another but the exons were highlyconserved. This suggests that both exons shared acommon origin. As CBR3 was cloned in cDNA, CBRfunction will be enhanced and may be implicated inthe pathogenesis of Down syndrome and Alzheimerdisease (11).

The sequences of several IMAGE cDNAs were inte-grated in the exons that might be assigned to thisregion as novel genes with unknown function. Thesequence of 139776 cDNA corresponded to the threeexons at nucleotides 142,360–142,277, 132,415–132,316,and 128,005–127,436, although they were not pre-dicted by GRAIL. The sequences from the 39-ends ofthe 70506, 546810, and 300856 cDNAs were shared incommon and corresponded to the five exons. However,the sequences at the 59-ends of these clones were dif-

ferent from each other and localized to separate re-gions upstream, suggesting that these cDNAs weregenerated by an alternative splicing occurring near theN-terminus of the gene. Therefore, the amino acid se-quences (283 residues) predicted from the 70506,546810, and 300856 clones were the same except for anextra sequence of 24 amino acid residues at the N-teminus of the 300856 clone.

The amino acid sequences predicted from nucleo-tide positions 2624 –2083, 118,478 –119,170, and156,427–154,976 were very similar to those of ribo-somal proteins L23a (93.5%), S9 (85.8%), and L3(90.1%), which were located on human chromosomes17, 19, and 22, respectively. However, they may beprocessed pseudogenes because they are interruptedby some stop codons and no exon–intron structurewas found.

We completed the sequence of a 170-kb PAC clonethat was anchored with CBR and D21S333 in a shot-gun manner. The sequence comparison revealed anovel gene, designated CBR3, 62 kb downstreamfrom CBR in addition to three ribosomal proteinpseudogenes and some genes with unknown func-tion. These results show that the integration of avariety of sequences in the databases with thegenomic sequence is more precise and efficient for

FIG. 1—Continued

97SHORT COMMUNICATION

gene identification and mapping than any othermethod under rapid sequencing of PAC (or BAC)clones.

REFERENCES

1. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman,D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215:403–410.

2. Bougski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993).dbEST—Database for “expressed sequence tags.” Nat. Genet. 4:332–333.

3. Forrest, G. L., Akman, S., Krutzig, S., Paxton, R. J., Sparkes,R. S., Doroshow, J., Felsted, R. L., Glover, C. J., Mohandas, T.,and Bachur, N. R. (1990). Introduction of a human carbonylreductase gene located on chromosome 21. Biochim. Biophys.Acta 1048: 149–155.

4. Gardiner-Garden, M., and Frommer, M. (1987). CpG islands invertebrate genomes. J. Mol. Biol. 196: 261–282.

5. Hiller, L., Lennon, G., Becker, M., Bonaldo, M. F., Chiapelli, B.,Chissoe, S., Dietrich, N., Dubuque, T., Favello, A., Gish, W.,Hawkins, M., Hultman, M., Kucaba, T., Lacy, M., Le, M., Le, N.,Mardis, E., Moore, B., Morris, M., Parsons, J., Prange, C.,Rifkin, L., Rohlfing, T., Schellenberg, K., Soares, M. B., Tan, F.,Thierrymeg, J., Trevaskis, E., Underwood, K., Wohldman, P.,Waterston, R., Wilson, R., and Marra, M. (1996). Generationand analysis of 280,000 human expressed sequence tags. Ge-nome Res. 6: 807–828.

6. Ioannou, A. P., Amemiya, C. T., Garnes, J., Kroisel, P. M.,Shizuya, H., Chen, C., Batzer, M. A., and de Jong, P. J. (1994).A new bacteriophage P1-derived vector for propagation of largehuman DNA fragments. Nat. Genet. 6: 84–89.

7. Lennon, G., Auffray, C., Polymeropoulos, M., and Soares, M. B.(1996). The I.M.A.G.E. Consortium: An integrated molecularanalysis of genomes and their expression. Genomics 33: 151–152.

8. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1989). “MolecularCloning: A Laboratory Manual”, 2nd ed., Cold Spring HarborLaboratory Press, Cold Spring Harbor, NY.

TABLE 1

Homologous Sequences from DNA Data Bank Found to GRAIL-Predicted Exons

Exon ID Quality Position Program Accession No. Definition P (N)

F06 Good 29,808–29,914 BLASTn (EST) AA083235 Homo sapiens cDNA clone 546810 39 2.6 3 10237

BLASTn(EST) T48498 Homo sapiens cDNA clone 70506 39 2.9 3 10237

BLASTn (EST) W07786 Homo sapiens cDNA clone 300856 59 4.1 3 10237

BLASTn (EST) R09552 Homo sapiens cDNA clone 128097 39 1.1 3 10236

F08 Good 31,735–31,902 BLASTn (EST) W07786 Homo sapiens cDNA clone 300856 59 1.5 3 10245

BLASTn (EST) R09552 Homo sapiens cDNA clone 128097 39 1.3 3 10212

BLASTn (EST) AA083235 Homo sapiens cDNA clone 546810 39 0.61F14 Excellent 56,149–56,461 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA Identical

BLASTx (nr-aa) P16152 Human carbonyl reductase IdenticalF15 Good 57,007–57,114 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA Identical

BLASTx (nr-aa) P16152 Human carbonyl reductase IdenticalF17 Excellent 58,503–58,939 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA Identical

BLASTx (nr-aa) P16152 Human carbonyl reductase IdenticalF33 Good 116,534–117,402 BLASTn (EST) N36700 Homo sapiens cDNA clone 269230 59 4.1 3 102141

BLASTn (EST) AA134689 Homo sapiens cDNA clone 531926 59 3.1 3 102112

BLASTx (nr-aa) U64587 Caenorhabditis elegans cDNA CEESS08F 2.0 3 10284

BLASTx (nr-aa) P47085 Yeast hypothetical 38.5-kDa protein 2.5 3 10259

F35 Excellent 118,512–118,632 BLASTn (nr-nt) U14971 Human ribosomal protein S9 mRNA 1.5 3 10232

BLASTx (nr-aa) P46781 Human 40S ribosomal protein S9 1.5 3 10215

F36 Good 118,929–119,055 BLASTn (nr-nt) U14971 Human ribosomal protein S9 mRNA 1.8 3 10229

BLASTx (nr-aa) P46781 Human 40S ribosomal protein S9 2.6 3 10213

F37 Excellent 121,231–121,543 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA 4.2 3 10299

BLASTn (EST) AA320697 EST23140 adipose tissue, white II 1.1 3 10216

BLASTx (nr-aa) P16152 Human carbonyl reductase 1.0 3 10252

F38 Excellent 123,883–123,990 BLASTn (EST) AA320697 EST23140 adipose tissue, white II 1.7 3 10230

BLASTx (nr-aa) P16152 Human carbonyl reductase 7.4 3 10211

F39 Excellent 132,136–132,542 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA 3.4 3 10272

BLASTn (EST) AA070863 Homo sapiens cDNA clone 529844 39 9.3 3 102123

BLASTn (EST) AA071157 Homo sapiens cDNA clone 529844 59 6.3 3 10242

BLASTn (EST) AA320697 EST23140 adipose tissue, white II 6.3 3 10241

BLASTn (EST) R62251 Homo sapiens cDNA clone 139776 59 4.4 3 10234

BLASTx (nr-aa) P16152 Human carbonyl reductase 5.7 3 10261

R17 Excellent 2,493–2,137 BLASTn (nr-nt) U43701 Human ribosomal protein L23a mRNA 6.3 3 102127

BLASTx (nr-aa) P39024 Human 60S ribosomal protein L23a 8.8 3 10270

R12 Excellent 39,677–39,640 BLASTn (EST) N78701 Homo sapiens cDNA clone 300856 39 3.5 3 1028

R10 Excellent 43,261–43,166 BLASTn (EST) R82136 2E9 Homo sapiens genomic 59 and 39 1.9 3 10233

BLASTn (EST) N78701 Homo sapiens cDNA clone 300856 39 1.1 3 10228

BLASTn (EST) AA83234 Homo sapiens cDNA clone 546810 59 7.9 3 10214

R03 Excellent 155,597–155,031 BLASTn (nr-nt) X73460 Human ribosomal protein L3 mRNA 3.6 3 102195

BLASTx (nr-aa) P39023 Human 60S ribosomal protein L3 1.5 3 10295

R02 Excellent 155,786–155,655 BLASTn (nr-nt) X73460 Human ribosomal protein L3 mRNA 2.7 3 10236

BLASTx (nr-aa) P39023 Human 60S ribosomal protein L3 7.8 3 10218

98 SHORT COMMUNICATION

FIG. 2. Sequence comparison between CBR and CBR3. The CBR3 gene was constructed from three exons predicted from the PACsequence with the aid of the computer program GRAIL and compared with CBR. These exons were expressed in EST (nucleotides 246–540:GenBank Accession No. AA320697) and IMAGE Consortium cDNA (nucleotides 366–902: Accession No. 529844). The initiation site fortranscription was tentatively localized to the upmost region of CBR (11). The NADP-binding domain and short-chain dehydrogenase/reductase family signature are shown in a box. Identical nucleotides and amino acid residues are indicated with hyphens. The stop codonsare denoted by an asterisk, and solid triangles indicate exon–intron boundaries.

99SHORT COMMUNICATION

9. Osoegawa, K., Susukida, R., Okano, S., Kudoh, J., Minoshima,S., Shimizu, N., de Jong, P. J., Groet, J., Ives, J., Lehrach, H.,Nizetic, D., and Soeda, E. (1996). An integrated map with cos-mid/PAC contigs of a 4-Mb Down syndrome critical region.Genomics 32: 375–387.

10. Uberbacher, E. C., Xu, Y., and Mural, R. J. (1996). Discoveringand understanding genes in human DNA sequence usingGRAIL. Methods Enzymol. 266: 554–571.

11. Wermuth, B., Bohren, K. M., Heinemann, G., von Wartburg,J. P., and Gabbay, K. H. (1988). Human carbonyl reductase.

Nucleotide sequence analysis of a cDNA and amino acid se-quence of the encoded protein. J. Biol. Chem. 263: 16185–16188.

12. Wolfsberg, T. G., and Landsman, D. (1997). A comparison ofexpressed sequence tags (ESTs) to human genomic sequences.Nucleic Acids Res. 25: 1626–1632.

13. Yamazaki, M., Ayako, O., Watanabe, K., Sasaki, K., Tashiro,H., and Nomura, T. (1995). Nucleotide sequence surroundingthe locus marker D21S246 on human chromosome 21. DNARes. 2: 187–189.

100 SHORT COMMUNICATION