mapping of a novel human carbonyl reductase, cbr3, and ribosomal pseudogenes to human chromosome...
TRANSCRIPT
SHORT COMMUNICATION
Mapping of a Novel Human Carbonyl Reductase, CBR3,and Ribosomal Pseudogenes to Human
Chromosome 21q22.2
Koji Watanabe, Chiyo Sugawara, Ayako Ono, Yasuhito Fukuzumi, Shoko Itakura,Masaaki Yamazaki, Hiroyuki Tashiro, Kazutoyo Osoegawa,*
Eiichi Soeda,*,1 and Touru Nomura
Bioscience Research Laboratory, FUJIYA Co., Ltd., 228 Soya, Hadano 257, Japan; and *RIKEN, 3-1-1 Koyadai, Tsukuba 305, Japan
Received November 19, 1997; accepted May 12, 1998
To find the genes contributing to Down syndrome,we constructed a 4-Mb sequence-ready map spanningchromosome 21q22.2 with megabase-sized cosmid/P1-derived artificial chromosome (PAC) contigs. The re-striction map with rare cutting enzymes, followed bysequencing from the clustering sites, has defined CpGislands and revealed the genes associated with CpGislands (Accession No. D85771). Of these, two humancarbonyl reductases (CBR; EC1.1.1.184) were found ina PAC 25P16 clone. CBR catalyzes the reduction of alarge number of biologically and pharmacologicallyactive carbonyl compounds to their corresponding al-cohols and has been mapped in 21q22.1. To confirmthese results, we sequenced the PAC clone in shotgunstrategies and identified a novel carbonyl reductase,designated CBR3, 62 kb downstream from the originalCBR. In addition, three ribosomal pseudogenes, L23a,S9, and L3, and some cDNAs with ESTs were mappedin the sequence. In conclusion, the sequence analysisfor CpG islands predicted from the megabase-sizedcontigs will reveal and identify the genes involved inDown syndrome. © 1998 Academic Press
The DNA sequencing of a 170-kb P1-derived arti-ficial chromosome (PAC) was almost completed inshotgun strategies (6, 13), using ABI 373S automatedDNA sequencers (ABI). Five micrograms of PACDNA prepared by the modified alkaline–SDS method(8) was sheared by sonication or by passing 30 timesthrough a 29-gauge needle. Fragments larger than 1kb were collected with Chroma Spin-1000 (Clontech)and cloned into the SmaI site of the pUC19 vector.Approximately 1200 recombinant clones were ran-
domly selected and submitted to a cycle sequencingreaction, using Thermo Sequenase, DYEnamic En-ergy Transfer Dye Primer (Amersham), and the DyeTerminator Cycle Sequencing FS kit (ABI) accordingto the manufacturer’s instructions. More than 550bases could be read from a clone, and the resultingsequences with a redundancy of 6.0 per base wereassembled to form 10 large contigs with the aid ofS/SQ software (SDC, Tokyo). The gaps between con-tigs were sealed by the primer extension method.
The total sequence of PAC 25P16 consisted of170,529 bp (Accession No. AB003151), which were lo-cated near two STS markers, CBR and D21S333(Fig. 1) (9). Several software packages were used forthe sequence analysis: First, the repetitive elementswere eliminated by RepeatMasker. Second, the se-quences of exons were predicted with the aid ofGRAIL version 1.3 (10) and then searched inGenBank1EMBL1DDBJ sequence databases andexpressed sequence tag database (dbEST) (2) byBLASTn (1). Of the ESTs present in the predictedexons, those containing consensus sequences in theexon–intron boundaries were extended further bythe primer-walking method, using the cDNAs con-taining with the ESTs as templates (7, 12). Theyare IMAGE cDNAs 70506, 128097, 300856, 546810,139776, and 529844 (Research Genetics, Inc.) (Acces-sion Nos. AB004848 –AB004853).
All three exons of CBR were found in the presentsequence (nucleotide positions 56,080 –56,461,57,007–57,114, and 58,503–59,221) (3). The homolo-gous sequences with CBR were observed in the pre-dicted exons F37, F38, and F39 (Table 1, Fig. 2). Thesequence of 529844 cDNA was identical with thesequences of F38 and F39 (nucleotides 12,935–123,990and 132,136 –132,634, respectively). Furthermore,another EST (Accession No. AA320697) correspondedto two exon–intron junctions between F37 and F38and between F38 and F39 (Table 1, Fig. 2). Thesequence comparison with CBR (3.3kb) revealed that
Sequence data from this article have been deposited with theDDBJ, EMBL, and GenBank Data Libraries under Accession Nos.AB003151 and AB004848–AB004854.
1 To whom correspondence and reprint requests should be ad-dressed. Telephone: 81-298-36-9122. Fax: 81-298-36-9140. E-mail:[email protected].
GENOMICS 52, 95–100 (1998)ARTICLE NO. GE985380
95
0888-7543/98 $25.00Copyright © 1998 by Academic Press
All rights of reproduction in any form reserved.
FIG
.1.
Gen
omic
orga
niza
tion
dedu
ced
from
the
com
plet
ese
quen
ceof
aP
AC
25P
16cl
one
anch
ored
at21
q22.
2.T
heP
AC
was
loca
ted
prec
isel
yne
artw
oS
TS
mar
kers
,CB
Ran
dD
21S
333
(9).
(A)T
hera
recu
ttin
gsi
tes
wit
hN
otI,
Eag
I,B
ssH
II,a
ndS
acII
repr
esen
ting
CpG
isla
nds.
(B)L
ocal
G1
Cco
nten
t(%
)in
anav
erag
esp
anof
500
bpw
ith
50-b
pw
indo
ws,
usin
gG
EN
ET
YX
-Mac
soft
war
epa
ckag
epr
ogra
m(S
DC
).(C
,D)
The
dist
ribu
tion
ofC
pGis
land
sth
atw
ere
defin
edac
cord
ing
toG
ardi
ner-
Gar
den
and
Fro
mm
er(4
)and
the
dist
ribu
tion
ofA
lu-a
ndno
n-A
lu-i
nter
sper
sed
repe
ats,
L1,
ME
R,M
IR,a
ndL
TR
elem
ents
,w
hich
wer
ese
arch
edou
tby
Rep
eatM
aske
r(h
ttp:
//ftp
.gen
ome.
was
hing
ton.
edu/
cgi-
bin/
Rep
eatM
aske
r).
(E)
Loc
atio
nof
pred
icte
dex
ons
byG
RA
ILve
rsio
n1.
3(1
0).(
F)
DN
Aho
mol
ogou
sre
gion
sid
enti
fied
byB
LA
ST
(1)
and
corr
espo
ndin
gac
cess
ion
num
bers
,exc
ept
data
ofW
ashU
-Mer
kE
ST
proj
ect
(5).
(G)
The
inte
grat
ion
ofIM
AG
EC
onso
rtiu
mcD
NA
clon
es(7
)w
ith
ES
Tda
taof
the
Was
hU-M
erk
proj
ect
into
the
PA
Cse
quen
ce.
96 SHORT COMMUNICATION
a 11.2-kb segment located 62 kb telomeric of CBRcan encode a new member of CBR, designated CBR3,because the nucleotide and predicted amino acid se-quences of CBR3 are highly homologous with those ofCBR (77.0 and 84.0%), taking into account synony-mous substitutions, particularly, in the NADP-bind-ing domain and short-chain dehydrogenase/reduc-tase family signature (Accession No. AB004854).Compared with the genomic structure of CBR3 andCBR, the introns and surrounding regions were dif-ferent from one another but the exons were highlyconserved. This suggests that both exons shared acommon origin. As CBR3 was cloned in cDNA, CBRfunction will be enhanced and may be implicated inthe pathogenesis of Down syndrome and Alzheimerdisease (11).
The sequences of several IMAGE cDNAs were inte-grated in the exons that might be assigned to thisregion as novel genes with unknown function. Thesequence of 139776 cDNA corresponded to the threeexons at nucleotides 142,360–142,277, 132,415–132,316,and 128,005–127,436, although they were not pre-dicted by GRAIL. The sequences from the 39-ends ofthe 70506, 546810, and 300856 cDNAs were shared incommon and corresponded to the five exons. However,the sequences at the 59-ends of these clones were dif-
ferent from each other and localized to separate re-gions upstream, suggesting that these cDNAs weregenerated by an alternative splicing occurring near theN-terminus of the gene. Therefore, the amino acid se-quences (283 residues) predicted from the 70506,546810, and 300856 clones were the same except for anextra sequence of 24 amino acid residues at the N-teminus of the 300856 clone.
The amino acid sequences predicted from nucleo-tide positions 2624 –2083, 118,478 –119,170, and156,427–154,976 were very similar to those of ribo-somal proteins L23a (93.5%), S9 (85.8%), and L3(90.1%), which were located on human chromosomes17, 19, and 22, respectively. However, they may beprocessed pseudogenes because they are interruptedby some stop codons and no exon–intron structurewas found.
We completed the sequence of a 170-kb PAC clonethat was anchored with CBR and D21S333 in a shot-gun manner. The sequence comparison revealed anovel gene, designated CBR3, 62 kb downstreamfrom CBR in addition to three ribosomal proteinpseudogenes and some genes with unknown func-tion. These results show that the integration of avariety of sequences in the databases with thegenomic sequence is more precise and efficient for
FIG. 1—Continued
97SHORT COMMUNICATION
gene identification and mapping than any othermethod under rapid sequencing of PAC (or BAC)clones.
REFERENCES
1. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman,D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215:403–410.
2. Bougski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993).dbEST—Database for “expressed sequence tags.” Nat. Genet. 4:332–333.
3. Forrest, G. L., Akman, S., Krutzig, S., Paxton, R. J., Sparkes,R. S., Doroshow, J., Felsted, R. L., Glover, C. J., Mohandas, T.,and Bachur, N. R. (1990). Introduction of a human carbonylreductase gene located on chromosome 21. Biochim. Biophys.Acta 1048: 149–155.
4. Gardiner-Garden, M., and Frommer, M. (1987). CpG islands invertebrate genomes. J. Mol. Biol. 196: 261–282.
5. Hiller, L., Lennon, G., Becker, M., Bonaldo, M. F., Chiapelli, B.,Chissoe, S., Dietrich, N., Dubuque, T., Favello, A., Gish, W.,Hawkins, M., Hultman, M., Kucaba, T., Lacy, M., Le, M., Le, N.,Mardis, E., Moore, B., Morris, M., Parsons, J., Prange, C.,Rifkin, L., Rohlfing, T., Schellenberg, K., Soares, M. B., Tan, F.,Thierrymeg, J., Trevaskis, E., Underwood, K., Wohldman, P.,Waterston, R., Wilson, R., and Marra, M. (1996). Generationand analysis of 280,000 human expressed sequence tags. Ge-nome Res. 6: 807–828.
6. Ioannou, A. P., Amemiya, C. T., Garnes, J., Kroisel, P. M.,Shizuya, H., Chen, C., Batzer, M. A., and de Jong, P. J. (1994).A new bacteriophage P1-derived vector for propagation of largehuman DNA fragments. Nat. Genet. 6: 84–89.
7. Lennon, G., Auffray, C., Polymeropoulos, M., and Soares, M. B.(1996). The I.M.A.G.E. Consortium: An integrated molecularanalysis of genomes and their expression. Genomics 33: 151–152.
8. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1989). “MolecularCloning: A Laboratory Manual”, 2nd ed., Cold Spring HarborLaboratory Press, Cold Spring Harbor, NY.
TABLE 1
Homologous Sequences from DNA Data Bank Found to GRAIL-Predicted Exons
Exon ID Quality Position Program Accession No. Definition P (N)
F06 Good 29,808–29,914 BLASTn (EST) AA083235 Homo sapiens cDNA clone 546810 39 2.6 3 10237
BLASTn(EST) T48498 Homo sapiens cDNA clone 70506 39 2.9 3 10237
BLASTn (EST) W07786 Homo sapiens cDNA clone 300856 59 4.1 3 10237
BLASTn (EST) R09552 Homo sapiens cDNA clone 128097 39 1.1 3 10236
F08 Good 31,735–31,902 BLASTn (EST) W07786 Homo sapiens cDNA clone 300856 59 1.5 3 10245
BLASTn (EST) R09552 Homo sapiens cDNA clone 128097 39 1.3 3 10212
BLASTn (EST) AA083235 Homo sapiens cDNA clone 546810 39 0.61F14 Excellent 56,149–56,461 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA Identical
BLASTx (nr-aa) P16152 Human carbonyl reductase IdenticalF15 Good 57,007–57,114 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA Identical
BLASTx (nr-aa) P16152 Human carbonyl reductase IdenticalF17 Excellent 58,503–58,939 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA Identical
BLASTx (nr-aa) P16152 Human carbonyl reductase IdenticalF33 Good 116,534–117,402 BLASTn (EST) N36700 Homo sapiens cDNA clone 269230 59 4.1 3 102141
BLASTn (EST) AA134689 Homo sapiens cDNA clone 531926 59 3.1 3 102112
BLASTx (nr-aa) U64587 Caenorhabditis elegans cDNA CEESS08F 2.0 3 10284
BLASTx (nr-aa) P47085 Yeast hypothetical 38.5-kDa protein 2.5 3 10259
F35 Excellent 118,512–118,632 BLASTn (nr-nt) U14971 Human ribosomal protein S9 mRNA 1.5 3 10232
BLASTx (nr-aa) P46781 Human 40S ribosomal protein S9 1.5 3 10215
F36 Good 118,929–119,055 BLASTn (nr-nt) U14971 Human ribosomal protein S9 mRNA 1.8 3 10229
BLASTx (nr-aa) P46781 Human 40S ribosomal protein S9 2.6 3 10213
F37 Excellent 121,231–121,543 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA 4.2 3 10299
BLASTn (EST) AA320697 EST23140 adipose tissue, white II 1.1 3 10216
BLASTx (nr-aa) P16152 Human carbonyl reductase 1.0 3 10252
F38 Excellent 123,883–123,990 BLASTn (EST) AA320697 EST23140 adipose tissue, white II 1.7 3 10230
BLASTx (nr-aa) P16152 Human carbonyl reductase 7.4 3 10211
F39 Excellent 132,136–132,542 BLASTn (nr-nt) J4056 Human carbonyl reductase mRNA 3.4 3 10272
BLASTn (EST) AA070863 Homo sapiens cDNA clone 529844 39 9.3 3 102123
BLASTn (EST) AA071157 Homo sapiens cDNA clone 529844 59 6.3 3 10242
BLASTn (EST) AA320697 EST23140 adipose tissue, white II 6.3 3 10241
BLASTn (EST) R62251 Homo sapiens cDNA clone 139776 59 4.4 3 10234
BLASTx (nr-aa) P16152 Human carbonyl reductase 5.7 3 10261
R17 Excellent 2,493–2,137 BLASTn (nr-nt) U43701 Human ribosomal protein L23a mRNA 6.3 3 102127
BLASTx (nr-aa) P39024 Human 60S ribosomal protein L23a 8.8 3 10270
R12 Excellent 39,677–39,640 BLASTn (EST) N78701 Homo sapiens cDNA clone 300856 39 3.5 3 1028
R10 Excellent 43,261–43,166 BLASTn (EST) R82136 2E9 Homo sapiens genomic 59 and 39 1.9 3 10233
BLASTn (EST) N78701 Homo sapiens cDNA clone 300856 39 1.1 3 10228
BLASTn (EST) AA83234 Homo sapiens cDNA clone 546810 59 7.9 3 10214
R03 Excellent 155,597–155,031 BLASTn (nr-nt) X73460 Human ribosomal protein L3 mRNA 3.6 3 102195
BLASTx (nr-aa) P39023 Human 60S ribosomal protein L3 1.5 3 10295
R02 Excellent 155,786–155,655 BLASTn (nr-nt) X73460 Human ribosomal protein L3 mRNA 2.7 3 10236
BLASTx (nr-aa) P39023 Human 60S ribosomal protein L3 7.8 3 10218
98 SHORT COMMUNICATION
FIG. 2. Sequence comparison between CBR and CBR3. The CBR3 gene was constructed from three exons predicted from the PACsequence with the aid of the computer program GRAIL and compared with CBR. These exons were expressed in EST (nucleotides 246–540:GenBank Accession No. AA320697) and IMAGE Consortium cDNA (nucleotides 366–902: Accession No. 529844). The initiation site fortranscription was tentatively localized to the upmost region of CBR (11). The NADP-binding domain and short-chain dehydrogenase/reductase family signature are shown in a box. Identical nucleotides and amino acid residues are indicated with hyphens. The stop codonsare denoted by an asterisk, and solid triangles indicate exon–intron boundaries.
99SHORT COMMUNICATION
9. Osoegawa, K., Susukida, R., Okano, S., Kudoh, J., Minoshima,S., Shimizu, N., de Jong, P. J., Groet, J., Ives, J., Lehrach, H.,Nizetic, D., and Soeda, E. (1996). An integrated map with cos-mid/PAC contigs of a 4-Mb Down syndrome critical region.Genomics 32: 375–387.
10. Uberbacher, E. C., Xu, Y., and Mural, R. J. (1996). Discoveringand understanding genes in human DNA sequence usingGRAIL. Methods Enzymol. 266: 554–571.
11. Wermuth, B., Bohren, K. M., Heinemann, G., von Wartburg,J. P., and Gabbay, K. H. (1988). Human carbonyl reductase.
Nucleotide sequence analysis of a cDNA and amino acid se-quence of the encoded protein. J. Biol. Chem. 263: 16185–16188.
12. Wolfsberg, T. G., and Landsman, D. (1997). A comparison ofexpressed sequence tags (ESTs) to human genomic sequences.Nucleic Acids Res. 25: 1626–1632.
13. Yamazaki, M., Ayako, O., Watanabe, K., Sasaki, K., Tashiro,H., and Nomura, T. (1995). Nucleotide sequence surroundingthe locus marker D21S246 on human chromosome 21. DNARes. 2: 187–189.
100 SHORT COMMUNICATION