structural analysis of arabidopsis thaliana - dna research

14
DNA RESEARCH 6, 183-195 (1999) Short Communication Structural Analysis of Arabidopsis thaliana chromosome 5. IX. Sequence Features of the Regions of 1,011,550 bp Covered by Seventeen PI and TAC Clones Takakazu KANEKO, Tomohiko KATOH, Shusei SATO, Yasukazu NAKAMURA, Erika ASAMIZU, Hirokazu KOTANI, Nobuyuki MlYAJlMA, and Satoshi TABATA* Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan (Received 21 May 1999) Abstract In this series of projects sequencing the entire genome of Arabidopsis thaliana chromosome 5, non- redundant PI and TAC clones have been sequenced according to the fine physical map, and as of May 7, 1999, the sequences of 16.2 Mb representing approximately 60% of chromosome 5 have been accumulated and released at our web site. In parallel, structural features of the sequenced regions have been analyzed by applying a variety of computer programs, and to date we have predicted a total of 2380 potential protein- coding genes in the 10,154,580 bp regions, which are covered by 142 PI and TAC clones. In this paper, we newly analyzed the structural features of the 1,011,550 bp regions covered by additional 17 PI and TAC clones, and predicted 298 protein-coding genes. The average density of the genes identified was 1 gene per 3394 bp. Introns were observed in 67% of the genes, and the average number per gene and the average length of the introns were 3.2 and 159 bp, respectively. The gene density became higher than the value estimated in the previously analyzed regions (1 gene per 4,267 bp), as the data in this paper were compiled based on a new standard of gene assignment including the computer-predicted hypothetical genes. The regions also contained 8 tRNA genes when searched by similarity to reported tRNA genes and the tRNA scan-SE program. The sequence data and information on the potential genes are available on the database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/. Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; TAC genomic library; gene prediction We have been operating a sequencing project of the genome of a dicot model plant Arabidopsis thaliana, which is estimated to be approximately 120 Mb long. We focused our target on chromosomes 5 and 3 among the five chromosomes, and constructed accurate con- tig maps of both chromosomes with clones from YAC, PI, TAC, and BAC libraries. 1 ' 2 For DNA sequencing, we first isolated the chromosome-specific clones from PI and TAC libraries, and performed the sequence analysis of PI and TAC clones physically assigned on the chro- mosomes. As of May 7, 1999, the regions of 16.2 Mb representing approximately 60% of chromosome 5 have been sequenced and the data were released at our web site KAOS (Kazusa Arabidopsis data Opening Site, http://www.kazusa.or.jp/arabi/). In parallel, potential genes in the sequenced regions have been analyzed by us- ing a variety of computer programs for similarity search Communicated by Mituru Takanami * To whom correspondence should be addressed. Tel. +81-438- 52-3933, Fax. +81-438-52-3934. E-mail: [email protected] and gene modeling, and we so far predicted the potential genes in the total 10,154,580 bp regions which are covered by 142 PI and TAC clones. 3 ~ 10 In this paper, we newly analyzed the structural features of the 1,011,550 bp re- gions covered by an additional 17 PI and TAC clones. 1. Isolation and Sequencing of PI and TAC Clones DNA sources and the method of clone isolation were essentially the same as described in the previ- ous paper. 3 The PI and TAC clones containing the DNA regions which cover a total of 17 DNA mark- ers on chromosome 5 were isolated by screening the Mitsui PI 11 and TAC 12 libraries by means of poly- merase chain reaction (PCR) with the primers designed from the sequence information of DNA markers. The DNA markers and selected clones are ends of MUG13 and K18I23 (K2A11), CIC10D2L (K18J17), CIC4D4 (K19M13), CIC5E10 (MJG14), CIC5E10 (K12B20), Downloaded from https://academic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Upload: others

Post on 11-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural Analysis of Arabidopsis thaliana - DNA Research

DNA RESEARCH 6, 183-195 (1999) Short Communication

Structural Analysis of Arabidopsis thaliana chromosome 5. IX.Sequence Features of the Regions of 1,011,550 bp Covered bySeventeen PI and TAC Clones

Takakazu KANEKO, Tomohiko KATOH, Shusei SATO, Yasukazu NAKAMURA, Erika ASAMIZU,Hirokazu KOTANI, Nobuyuki MlYAJlMA, and Satoshi TABATA*

Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan

(Received 21 May 1999)

Abstract

In this series of projects sequencing the entire genome of Arabidopsis thaliana chromosome 5, non-redundant PI and TAC clones have been sequenced according to the fine physical map, and as of May 7,1999, the sequences of 16.2 Mb representing approximately 60% of chromosome 5 have been accumulatedand released at our web site. In parallel, structural features of the sequenced regions have been analyzed byapplying a variety of computer programs, and to date we have predicted a total of 2380 potential protein-coding genes in the 10,154,580 bp regions, which are covered by 142 PI and TAC clones. In this paper, wenewly analyzed the structural features of the 1,011,550 bp regions covered by additional 17 PI and TACclones, and predicted 298 protein-coding genes. The average density of the genes identified was 1 gene per3394 bp. Introns were observed in 67% of the genes, and the average number per gene and the averagelength of the introns were 3.2 and 159 bp, respectively. The gene density became higher than the valueestimated in the previously analyzed regions (1 gene per 4,267 bp), as the data in this paper were compiledbased on a new standard of gene assignment including the computer-predicted hypothetical genes. Theregions also contained 8 tRNA genes when searched by similarity to reported tRNA genes and the tRNAscan-SE program. The sequence data and information on the potential genes are available on the databaseKAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/.Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; TAC genomiclibrary; gene prediction

We have been operating a sequencing project of thegenome of a dicot model plant Arabidopsis thaliana,which is estimated to be approximately 120 Mb long.We focused our target on chromosomes 5 and 3 amongthe five chromosomes, and constructed accurate con-tig maps of both chromosomes with clones from YAC,PI, TAC, and BAC libraries.1'2 For DNA sequencing,we first isolated the chromosome-specific clones from PIand TAC libraries, and performed the sequence analysisof PI and TAC clones physically assigned on the chro-mosomes. As of May 7, 1999, the regions of 16.2 Mbrepresenting approximately 60% of chromosome 5 havebeen sequenced and the data were released at our website KAOS (Kazusa Arabidopsis data Opening Site,http://www.kazusa.or.jp/arabi/). In parallel, potentialgenes in the sequenced regions have been analyzed by us-ing a variety of computer programs for similarity search

Communicated by Mituru Takanami* To whom correspondence should be addressed. Tel. +81-438-

52-3933, Fax. +81-438-52-3934. E-mail: [email protected]

and gene modeling, and we so far predicted the potentialgenes in the total 10,154,580 bp regions which are coveredby 142 PI and TAC clones.3~10 In this paper, we newlyanalyzed the structural features of the 1,011,550 bp re-gions covered by an additional 17 PI and TAC clones.

1. Isolation and Sequencing of PI and TACClones

DNA sources and the method of clone isolationwere essentially the same as described in the previ-ous paper.3 The PI and TAC clones containing theDNA regions which cover a total of 17 DNA mark-ers on chromosome 5 were isolated by screening theMitsui PI11 and TAC12 libraries by means of poly-merase chain reaction (PCR) with the primers designedfrom the sequence information of DNA markers. TheDNA markers and selected clones are ends of MUG13and K18I23 (K2A11), CIC10D2L (K18J17), CIC4D4(K19M13), CIC5E10 (MJG14), CIC5E10 (K12B20),

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 2: Structural Analysis of Arabidopsis thaliana - DNA Research

184

length (Mbp)

Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 6,

dicated together with the clone names in Fig. 2.

1 0 -

20-

mil 21 -

mi97 -

mil 74-

mi322-mi438-

mi138-mi4337mi90 'mi219-

mi12S-

mi137-

mi323-

mi194-mi83 -mi61 -

g4028 —

CIC12D10L--

CIC11G8R—

mi69 -*mi70 -mil 84-

mi335-

K2A11K18J17MTH16

K19M13

MJG14K12B20K21I16MJC20MNL12MFC16MFC19MDN11K19E20K3K7MGN6K13P22MRI1

30—I

Figure 1. Relative locations of the sequenced PI and TAC clonesand the associated markers on the physical map of chromo-some 5. The positions of DNA markers used for PI and TACisolation and of other major DNA markers were localized on themap on the basis of the YAC tiling path and map informationin refs. 1 and 2. The vertical open bar represents the entirelength of chromosome 5. The names of PI and TAC clones aregiven at the right side, and those of markers at the left side.The distance (Mbp) from the telomeric site of the top arm isgiven in the vertical scale.

ends of MOP12 and MNF13 (K21I16), g4028 (MJC20),MWF20Jeft end (MNL12), K9L2_right end (MFC16),CIC11H2 (MFC19), CIC11G8R (MDN11), K24G6_rightend (K19E20), CIC11F10 (K3K7), MNC6_right end(MGN6), ends of MBG8 and MC015 (K13P22), andMUA2_right end (MRI1). MTH16 was directly isolatedas a clone showing restriction fragment length polymor-phism (RFLP) when used as a probe for genomic South-ern hybridization.13 The relative positions of the markersand the sequenced clones on chromosome 5 are shown inFig. 1. The relative orientation of each clone and con-tig on the chromosome has been confirmed by anchoringboth ends of the clone to those at the corresponding po-sitions of the contig map.

The nucleotide sequence of each PI or TAC insert wasdetermined according to the bridging shotgun methoddescribed previously.3 The length of the nucleotide se-quence of each PI or TAC insert finally confirmed is in-

2. Assignment of Potential Coding Regions

For assignment of the protein-coding regions and genemodeling, similarity search and computer prediction wereperformed as described in the previous paper3 with mod-ification. Briefly, a similarity search against the non-redundant protein sequence database nr (compiled byNCBI) was carried out using the BLASTX14 program.In parallel, the positions of potential protein-coding re-gions were predicted with the Grail,15 GENSCAN16 andNetGene2 computer programs.17 The transcribed regionswere assigned by comparison of the nucleotide sequenceswith Arabidopsis ESTs18'19 in the public databases us-ing the BLASTN program.11 All the results obtainedwere compiled with the aid of our new web-based tool,named Arabidopsis Genome Displayer (manuscript inpreparation), then assignment of the potential proteincoding genes was carried out by taking both similarity toknown genes and computer prediction into consideration.Therefore, the regions predicted only by the computerprograms with no apparent similarity to known geneswere also assigned as genes, while the computer-predictedhypothetical genes had not been included in the previousanalyses.3~10 To sum up, 298 potential protein-codinggenes as well as 13 partial genes located at the terminalregions of the clones and 6 pseudo genes were assignedin the 1,011,550 bp regions The average gene densitybecomes 1 gene per 3394 bp. We previously analyzed thestructural features of the total 10,154,580 bp regions andpredicted 2380 potential protein coding genes, resultingin 1 gene per 4,267 bp. This difference has arisen becausethe data in this paper were compiled based on the newstandard of gene assignment described above.

The RNA coding regions were assigned on the basis ofsequence similarity to the reported structural RNAs. FortRNA genes, prediction by the tRNAscan-SE program20

was also taken into account. As indicated in Fig. 2,8 tRNA genes corresponding to 6 amino acid species anda gene for 7SL RNA were identified in the 1,011,550 bpregions. Both potential protein- and RNA-coding geneswere denoted by numbers with the clone names followedby sequential numbers from one end to another of theinsert, which are listed in the table below the figure, andare also schematically represented in Fig. 2.

3. Structural Features of Potential Protein Cod-ing Genes

In this paper, the complete structures of 298 potentialprotein coding genes were predicted. Structural featuresof these genes as well as those of 1901 genes includingthose previously identified are listed in Table 1. Theyaccount for approximately 9.5% of the total gene con-stituents (2 x 104 genes) assumed for A. thaliana. Ap-

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 3: Structural Analysis of Arabidopsis thaliana - DNA Research

No. 3] T. Kaneko et al. 185

Table 1. Structural features of potential protein-coding genes in A. thaliana chromosome 5.

Features 298 genes" 1901 genes'"Gene length (bp) including intronsProduct length (amino acids)Genes with intronsNumber of intron/geneExon length (bp)Intron length (bp)GC content of exonsGC content of introns

62-10,618 (1615)19-1962 (368)

2000-34 (3.2)

6-3558 (263)25-1738 (159)

43%31%

62-11,377 (1963)19-2756 (430)

14370-42 (3.8)

2-4287 (267)8-5405 (175)

43%32%

Structural features of the potential protein coding genes assigned so far are listed.The 298 genes are assigned based on the new standard in this study*' and the1901 genesb' include 1603 potential protein genes previously assigned. Averagevalues are shown in parentheses.

proximately 76% of the protein-coding genes containedintrons, and the average number per gene and their av-erage length were 3.8 and 175 bp, respectively.

4. Expression Level of Potential Protein CodingGenes and Gene Segments

The nucleotide sequence of each of the potential pro-tein coding genes was compared with those in the Ara-bidopsis EST database, and the number of matched Ara-bidopsis ESTs was counted to monitor the transcriptionallevel of each gene. Of 298 complete and 13 partial genesthat we have identified on chromosome 5 in this study,104 carried matched ESTs. The putative products ofthe genes hit by 10 or more EST files, suggesting to bea class of highly expressed genes, include those showingsequence similarity to glutamate-ammonia ligase in A.thaliana (K12B20.6), microbody NAD-dependent malatedehydrogenase in A. thaliana (MTH16.8), CONSTANS-like protein 2 in Malus domestica (MRI1.1), ClpC prod-uct in A. thaliana (K3K7.7), luminal binding protein inA. thaliana (MJC20.12), and ALY transcriptional coac-tivator in Mus musculus (K12B20.19).

The sequence data as well as the gene informationshown in this paper are available through the World WideWeb at http://www.kazusa.or.jp/arabi/.

Acknowledgments: We thank S. Sasamoto and K.Idesawa for excellent technical assistance and the mem-bers of DNA Sequencing Laboratory: T. Kimura, T.Hosouchi, K. Kawashima, M. Matsumoto, A. Matsuno,A. Muraki, N. Nakazaki, S. Shinpo, C. Takeuchi, T.Wada, A. Watanabe, M. Yamada, and M. Yasuda fortheir excellent team work. We are grateful to A. Tanakafor technical advice, and Mitsui Plant Biotechnology-Research Institute and Arabidopsis Biological ResourceCenter at the Ohio State University for providing theDNA markers and the DNA libraries. This work wassupported by the Kazusa DNA Research Institute Foun-

dation. We thank M. Takanami and M. Oishi for theirsupport and encouragement to perform this project.

References

1. Kotani, H., Sato, S., Liu, Y.-G. et al. 1997, A fine physicalmap of Arabidopsis thaliana chromosome 5: Constructionof a sequence-ready contig map, DNA Res., 4, 371-378.

2. Sato, S., Kotani, H., Hayashi, R. et ai 1998, A physicalmap of Arabidopsis thaliana chromosome 3 representedby two contigs of CIC YAC, PI, TAC and BAC clones,DNA Res., 5, 163-168.

3. Sato, S., Kotani, H., Nakamura, Y. et al. 1997, Struc-tural analysis of Arabidopsis thaliana chromosome 5. I.Sequence features of the 1.6 Mb regions covered by twentyphysically assigned PI clones, DNA Res., 4, 215-230.

4. Kotani, H., Nakamura, Y., Sato, S. et al. 1997, Struc-tural analysis of Arabidopsis thaliana chromosome 5. II.Sequence features of the regions of 1,044,062 bp coveredby thirteen physically assigned PI clones, DNA Res., 4,291-300.

5. Nakamura, Y., Sato, S., Kaneko, T. et al. 1997, Struc-tural analysis of Arabidopsis thaliana chromosome 5. III.Sequence features of the regions of 1,191,918 bp coveredby seventeen physically assigned PI clones, DNA Res., 4,401-414.

6. Sato, S., Kaneko, T., Kotani, H. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. IV. Se-quence features of the regions of 1,456,315 bp covered bynineteen physically assigned PI and TAC clones, DNARes., 5, 41-54.

7. Kaneko, T., Kotani, H., Nakamura, Y. et al. 1998, Struc-tural analysis of Arabidopsis thaliana chromosome 5. V.Sequence features of the regions of 1,381,565 bp coveredby twenty one physically assigned PI and TAC clones,DNA Res., 5, 131-145.

8. Kotani, H., Nakamura, Y., Sato, S. et al. 1998, Struc-tural analysis of Arabidopsis thaliana chromosome 5. VI.Sequence features of the regions of 1,367,185 bp coveredby 19 physically assigned PI and TAC clones, DNA Res.,5, 203-216.

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 4: Structural Analysis of Arabidopsis thaliana - DNA Research

186 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 6.

9. Nakamura, Y., Sato, S., Asamizu, E. et al. 1998, Struc-tural analysis of Arabidopsis thaliana chromosome 5. VII.Sequence features of the regions of 1,013,767 bp coveredby sixteen physically assigned PI and TAC clones, DNARes., 5, 297-308.

10. Asamizu, E., Sato, S., Kaneko, T. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. VIII. Se-quence features of the regions of 1,081,958 bp covered byseventeen physically assigned PI and TAC clones, DNARes., 5, 379-391.

11. Liu, Y.-G., Mitsukawa, N., Vazquez-Tello, A., andWhittier, R. F. 1995, Generation of a high-quality PIlibrary of Arabidopsis suitable for chromosome walking,Plant J., 7, 351-358.

12. Liu, Y.-G., Shirano, Y., Fukaki, H. et al. Complemen-tation of plant mutants with large genomic DNA frag-ments by a transformation-competent artificial chromo-some vector accelerates positional cloning, Proc. Natl.Acad. Sci. USA, in press.

13. Shibata, D., Seki, M., Mitsukawa, N. et al. Establish-ment of framework PI clones for map-based cloningand genome sequencing: direct RFLP mapping of largeclones, Gene, 225, 31-38.

14. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., andLipman, D. J. 1990, Basic local alignment search tool, J.

Mol. Bioi, 215, 403-410.15. Uberbacher, E. C. and Mural, R. ,]. 1991, Locating

protein-coding regions in human DNA sequences by amultiple sensor-neural network approach, Proc. Natl.Acad. Sci. USA, 88, 11261-11265.

16. Burge, C. and Karlin, S 1997, Prediction of completegene structures in human genomic DNA, J. Mol. Bi.ol.,268, 78-94.

17. Hebsgaard, S. M., Korning, P. G., Tolstrup, N.,Engelbrecht, J., Rouze, P., and Brunak, S. 1996, Splicesite prediction in Arabidopsis thaliana DNA by combin-ing local and global sequence information, Nucl. AcidsRes., 24, 3439-3452.

18. Newman, T., Bruijn, F. J., and Green, P. 1994, Genesgalore: A summary of methods for accessing results fromlarge-scale partial sequencing of anonymous ArabidopsiscDNA clones, Plant Physioi, 106, 1241-1255.

19. Cooke, R., Raynal, M., Laudie, M. et al. 1996, Furtherprogress towards a catalogue of all Arabidopsis genes:analysis of a set of 5000 non-redundant ESTs, Plant J.,9, 101-124.

20. Lowe, T. M. and Eddy, S R. 1997, tRNAscan-SE: a pro-gram for improved detection of transfer RNA genes ingenomic sequence, Nucl. Acids Res., 25, 955-964.

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 5: Structural Analysis of Arabidopsis thaliana - DNA Research

No. 3] T. Kaneko et al. 187

MJC20 (83689

II III1

III23 4

• in11

1 1

bp)

a mil m i1

11 i56 7

BBIB

i

1 1

8

•9

l i l

mi

i

nm

i

•in

> n

1

II

1 1

Bi BIa on

mi i•i• I I nun

i II i

II

•TIIBBI1718 19

I III 111III I

I III Hill

M

Ii20

I I I

• I Bi l l2122 2324

II I

II1* 1 II

II

B i

•fli

1

25

II Illl 1 IIIIIII

1

112627 28 29

• • • •

1 1 1 1

30

I1

1311

1

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

deduced genes

ideal ifier Di rectioaPosiitii0D

5 •1

No.Exo

of No. ofEST

LenRt h InfoSeqi

rmatijence

on OD

IL)the mosI simil;ir sequ

Ov.lencap

eIden tity b efinitioD

MJC20 1

MJC202MJC20.3MJC20.1MJC20.5

MJC20 6MJC207MJf 20*MJC20.9MJC2010MJC20.11

MJC20.12MJC20.11MJC20.11MJC2015MJC20.16MJC20.17MJC20.1SMJC2019MJC20.20MJC20.21MJC20 22MJC20 21

MJC20 21

MJC20 25

MJC20 26

M.JC2027

MJC202«

MJC2029MJC20 30MJf20.11

127252578058

19933512

15998173212168821513

29933337283607138298399111088011515128971906 3516965129255309

59783

68180

"1019

"2119

"1935"673182756

312915657681

11590

130981519117119201102291126987

327133532036629391621010-11155123521676950379531015115957381

63873

70139

71765

73937

76251777 198321

186

105

506565

316371

151709

159201

155610139125

112

1073

315

118

319

317171163

gM10185'Sp|P09379Rii3911191 sp|P56558l

Bi|3871563 emb CAB02797|6ij2191187Ri|315O812ei12959370'emb CA Al7921!j>i!1039155

,ei 11695719 :dbj; BAA 13918 iRi!3702327RiH316766 B P | P 1 8 5 5 6 |

ei:2129608ipir!|S59558RI 1293835ei 119789 pin S31196

i 38753OOiemb!CAA92291l

i 2829910

i 3879119!emb!CAA91368!

i 728868 sp!P10603i

i 728867 sp|P10602i

i 1102910 lembjC A A6696OIi 3068809i 1567307ieb AAD23718.1

126

379

192186

32021 1

138288

200158150

609288325

259

1063

217

61

263

316

32.3

56.6

31.726.2

15.838.156.837.118.1

92.136.812.879.5

98.037.167.2

31.5

61.8

15.9

61.5

11.3

100.083.031.1

HYPOTHETICAL 20.8 KD PROTEIN T09A5.6 IN CHROMO-SOME III(U62798) SCARECROW A. ihuhtuu,

PROBABLE CVP7 PROTEINUDP-N-ACETYLCLUCOSAMINE-PEPTIDE N-ACETYLCLUCOSAMINYLTRANSFERASE110 KD SUBUNIT(O-CLCNAC TRANSFERASE P110 SUBl'NIT)

e On

(Z81012) similar to '(AF007271) contains(AF080136) mitoeeo(AL022117) hypoihe(AF101258) putativ

(D89312) lumiaal bindine protein A. ihitlinmt(ACOO5397) unknown protein A. ihnlinuu26S PROTEASOME REGULATORY SUBUNIT SI 1 (P31)B2 PROTEIN

Tiilirity to a DNAJ-likedotivated protein kinase kin;

opper-inducible 35.6 kDa

GTP-bindiae protein. 68K - A. thaliana

hvpot betical protein - pot ato

(AF069298) coatains similarity to a protein kinase domain (Pfapkinase.hmm. score: 166.20) aad to leeume lectias beta domi(Pfam: lectiaJeeB.hmm. score: 139.32) A tiiulinun(Z68160) Similarity to Yeast putative mitochondrial carrier pitein PET8 (SW:PET8_YEAST](At_ 0022911 L nKnowti proteln. contains reeutator of cbromosoi

(Z70310) similar to Clutathione S-t ransferases. C**itvrhnUliiii. •i-

ANTER-BPECIFIf. PROLINtRIfH PROTEIN APC (PRO-TEIN CEX)ANTER-BPECIFIf PROLINE-RIf H PROTEIX APG PRECUR-SOR(X98316) peroxidaseA tlutum*(AF059295I Skpl homoioe A. |1,«1;«,,«(ACOO5956) puiativezinc finder proiein A th«U«u«

Figure 2. Gene organization in the 17 PI and TAC clones. Positions of the identified or predicted genes in each insert of the PI andTAG clones are schematically represented by color-coded boxes above (rightward) and below (leftward) the wide line in the middlewhich represents the entire insert sequence. The length of the sequenced region in each insert is given in parenthesis together with theclone name at the top. The names of the adjacent overlapping clones of which sequences had been reported are shown on the middlebars. Arrowheads indicate the direction of the DNA strands (5' to 3'). Dark and faint blue bars with numbers represent the positionsof the assigned potential protein coding genes, and pseudo and partial genes, respectively, and red bars the positions of RNA codinggenes. Gray bars indicate the positions of the regions which matched Arabidopsis ESTs. The regions which showed similarity to thesequences in the protein database are shown by yellow, orange and red bars, each of which corresponds to BLASTX scores of 70-100,100-250, and 250 or more, respectively. The green bars indicate the positions of the potential exons predicted by the Grail program.Each of three different colors with increasing depth corresponds to the region with the Grail scores of less than 70, 70-90, and 90 ormore, respectively. The potential protein and RNA coding genes assigned as described in the text were listed below each of the figures.In this table, number of amino acid residues and nucleotide length (in italic) of putative gene products of the respective potentialprotein and RNA coding genes are indicated. The accession numbers are as follows: AB018107 (K12B20), AB017059 (K13P22),AB017060 (K18J17), AB017061 (K19E20), AB018110 (K19M13), AB017062 (K21I16), AB018111 (K2A11), AB017063 (K3K7),AB017064 (MDN11), AB017065 (MFC16), AB018113 (MFC19), AB017066 (MGN6), AB017063 (MJC20), AB017068 (MJG14),AB017070 (MNL12), AB018118 (MRI1), and AB020752 (MTH16).

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 6: Structural Analysis of Arabidopsis thaliana - DNA Research

1SS

K3K7 (68726 bp)

Sequencing of Artibttlopst* thai/ana chromosome o [Vol. 0.

ii 1 1 1 mai iiiHiiiiia II II I Ii i II

i iin

K16E14

I I I I I I I8 9 *> « QO MC 1* 22 2324 27

• • I — • • — I • I I • MWD22

I I III I• ! • •

STTO20 21IiI I I l l l II I i | |

i i • I I • • I I I II I I • m a i i i m i

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hK

Grail exon

rTH [niormauoVT

K3K7 .'.

K3K7 3

<"JK7 10

\ 3 K 7 11

\ 3 K 7 1 fi

\ 3 K 7 1 fi

192 e i I 3 6 2 1 0 3 pi i

nb CAB 10063.1

i 3 * 7 5 6 5 ? » r n b C A A 9 1 160

t 1 K ) « « 0 1 f-mb C A B 3 * 2 O i

i J31J3J pdb 6 R \ N

i ICTI ' IV* Hbj BAA 1* 166

1«8 ai 3 8 6 1 1 13 f-mb t ' A A I

2«' i 2« 9 I A L O 1 9 5 2 T ! p s n a We p r o t f i n A ihahaua

511 36 1 ( A C 0 0 3 6 « 0 ) u n k n o w n p r o i e i t i A tlmliHi*

III 17 3 i .AI_01<M«n p u n i v D N A - ' i i r < " T c d U N A | » ' I V I : I . T 3

566 95 5 ( A F Q 2 0 3 0 3 1 f u m a r a ^ - A ilml^im

3 0 3 57 9 ( A L 0 ' S 1 T « 3 ) p u t ; f i v e n u H » « i i d e b i n d i n g p r '> tHn

172 21 3 1 / 6 6 5 6 1 ) C ' o i n a i i s b e t a - i r a n s d ncit i f a m i i v p r ' i ^ i t ^

cfi-isf ( . - p r o t e i n s

19« 38 .3 ( A I . 0 3 5 6 0 1 ) p u t a i ve p r o t e i n A thub.Hiia

i R N A - T v n C F A l Pc,^\ \,\r i rn ron - 11166- H 1 5 1

316 1 5 . * l A ( ' Q G 5 7 2 1 ) u n k n o w n p r o t e i n \ ilmlmtin

36 51 1 R i i b r e d o x i n18_> 25 6 ( D < J 0 9 1 1 ) h v p o t h ^ i i c a l p r o i f i n S> ttr< lux > ~ n - -(>

s2 100 0 sRNA-^en At; A] A (/i«J/«r^

001 'M.3 tAI'05535S) r^spi-atorv burss oxtdase prou-in (' A

371 g-J.fi [panial] FRDt F IOTIIIN P K K t T K ^ O K

MDNll (83373 bp)

I I I I

12 3K16F13 ( I |

• IImill

• i i urn 11in IIi1 i '14 e « 7

U - 7 ' .

M i l

I I I I l l l

MIF21

IlllI I I

20 21I

I I I I I l l l • I 51 I I I I I I I I I I I l l l

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced gene.

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 7: Structural Analysis of Arabidopsis thaliana - DNA Research

I . K a n o k o ot al . ISO

K12B20 (78874 bp)

• 11 i am i II i i i n inII

7 9 12 14 18 18

I • • • I I •

II III

I I22 23

MPA22

3 4 S 6• I

IIH I I H I

n n

111

O B 17_ II

19 2021I I I I I

ii III i in ni i i

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

ai JOT'S r>u emb C\\7'Vltv>

-tin i

Ml 1

' { ! 0 1

K13P22 (19971 bp)

II I I If III

I1 5

MBG8 I • MC015

2 34 6 7I I I

I I

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrai! exon

deduced genes

11* 1 1 l"i ai !JOO1JJr>

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 8: Structural Analysis of Arabidopsis thaliana - DNA Research

Sequencing of Arubulopsi^ thtilmnit chromosome •> [Vol . ().

K18J17 (54252 bp)

•II • III

MJJ3

I I I12 3 4

I I I I III

ITH213141II II I

II III I I III I

I I16 W 2021222324 25

M n i l l ii i i

7 89 17 «I I

I H I ! i ii i mi

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

K19E20 (61712 bp)

I B M I1

1 II I1 3

K24G6 • • I

I I • • n i l i iniii i i n nI I i

8 9 HO 13 1416 17

• • • • • • I • •I I I

K20J1

I II I I I IMS) i i ii mi II

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

EST db hit

Protein db hit

Grail exon

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 9: Structural Analysis of Arabidopsis thaliana - DNA Research

No. ;il T . K a n c k o ot al. I!) I

K19M13 (42563 bp)

• i 1 1 1 ii iiim• i

i

T32G21

i i • i • • • i •2 3 45 « 7I (I 1 Jl I 11 II

I IIII ii nil mini i in i

SB 1112I I • ! QM1

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

K21I16 (30578 bp)

i I

I

MP012

IIIIIIII

I

',I

I I

MNF13

I I

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

K2A11 (29292 bp)

• I I I

MUG13

I I1 3 5 6

"a " " II Hill 111

I Mii i i iaai INI

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 10: Structural Analysis of Arabidopsis thaliana - DNA Research

192 Sequencing of Arub/dopsis fhahaiia chromosome •> [Vol. ().

MFC16 (61290 bp)

111111 M im

K9L2

ii • m i i ian i iii in n

III120 6181718

II • • ! •21 2S

• • K15C23

' • *~ I I I • 1 I1 3 5 6 78 9 O 11 14I I I II I I II I

• • II I19 20 222324

III I • I II I I B I

I S H I M • I I • ! II I I • ! • ! Illl

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

MFC19 (85020 bp)

ill i nun III iin I I I

K9E15

I IIS 6 7

' I I M i l

II II I II*

8 9II I

11 13I I

M S 17I III

I I II I II III I I I I I l l l I I I I • I I I I • M l •

I IIII

III

• K2N11

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

. I i V

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 11: Structural Analysis of Arabidopsis thaliana - DNA Research

MGN6 (61380 bp)

I I I I I I I I I I I l l l l l l l l l I

1 2 4 5 7 9 D H 12 13

MNC6 ii ii I ! • • M

T. Kanrko ct al.

I II IIII I

17 W 21 22• • I I K6O8

M« « « 20

I I III I

Grail exon

Protein db hit

EST db hit

Gene

Gene

ESTdbhK

Protein db hit

Grail exon

MNL12 (45911 bp)

I I I I H I I I

K24F534 5• IB

I

I II

IIv MWF20

I I • II •1 2 6 78 9

I II

• IV. IB Illl II

II 12 13

I III II

Grail exonProtein db hit

EST db hit

Gene

Gene

EST db hit

Protein db hitGrail exon

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 12: Structural Analysis of Arabidopsis thaliana - DNA Research

of j \ i'al)/(lops/s thai /a mi chromosome1 .)

MJG14 (86121 bp)

Illl I III!

I I I i miniII i

11 67 9 OH OB 14 B VOT 20 21

KI5O15 | | | | • • • • M B I ^ H | | • •

• • ! ! I IIIIHIIII

22 23 24 25

t RM 1 1 nh II t I I H

iI H I M I III I I I

Grail exonProtein db hit

ESTcfohit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

h , ti !

i s i 22 1 I1) 1 2 !

k. \ s 1 tl

1 ] t o

] I

1

r

1 I '1 Ii 1

M J s . I 1 2 1

Mf I H I l U M

MRIl (50700 bp)

i

MUA2

1

deduced gene

.)..,,.if,..r -T

MRIi 1

MKll .'M i !

MR

M K

MR

MK

*U

M R

M K

i '!I 1

i "1

1 h

1 *

1 10

1 1 I

1 1 I

1II

•i

1

1

-

-

II 1

•Ai

i n

v:7!)

1 "Su

'1=1

27

7'i

71

1 1

I1

1

OR

! f i

)7

1 i 1i J i

>9 2

i« 1

)'! '}

Vi 1

n 1

I I B

m i l

No

Jh T

'J 'J 2

1 0 ' {

=>1 '

hOK

7 fi 0

«,:«,

I I7

1

II II

of No of

1

!s

1

J1

6

a

l

i

H i!

ill

Ti 1

116

1 1«

'S71

4'jr.

'10( •

lh«

2 10

•? 0 6

II

I1tB --I I MTI20 ^ * "

13 " ^ ^

I II

I ,f rn i iu i n i h - most sisi

- i i> r - ID

h.

e

e

e

• N ] s -

i 1 "i h 1

i" 1 ' 7

' " ! J ' . 1 • rub (* A B"!(lr]97 1• -s "i i i

Hi \f **

' 1" i - ' , . » ' , CAB'JtsTIT

S ' '7^ i

1 - " i l l

lar ^(I IKMH-P

Ovf rUp 1 i D

• m2 9()

1 1 b

119

107

•$09

7 9-i

IK 1

> 9 T 1

S S I f

t 1 ]

* 1 I1 ir v t i

r \ i i r

1-i 1 [ 1 l l

I 1 [ 1 ! 11

1 H I ! i r

i - \ •,

l i i i

i in U

1 * , „

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

I 1 I r k * \ / ,«„«

i l i t K

1 r I X

\ k l i ki i \

i l l . 1,1 ] H ™<<rr

, t

1 ^

t \

t F i \

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 13: Structural Analysis of Arabidopsis thaliana - DNA Research

No. 3]

MTH16 (C8098 bp)

T. Kaneko et al. 195

I B Illl I I I I I I

I l l l3 4 1 I ! D U E 1314616 17 19 20 21 22 2425 28 27I I Kill III

I I I II

11 • • • ! • ! • lima I I i m nilmi i i

• in

Grail exon

Protein db hit

EST db hit

Gene

Gene

EST db hit

Protein db hit

Grail exon

d e d

iden

ucedg

tifier

enes

DirectionPosi i t ion

5 3No.Exo

ofn

No. ofEST

Leo h InlormatSequence

ion oni ID

the inost simil ar sisquOverl

enraap Ideo tity b efiniuon

MTH16

MTH16MTH16

2

31

MTH16 5MTH1S

MTH16MTH16

MTH16MTH16

MTH16MTH16MTH16MTH16MTH16MTH16MTH16MTH16MTH16MTH16

.6

.

.8

q

.10

1112131 11 5Ifi171819?o

MTH16.21

MTH16.22MTH16MTH16

>3

MTH16 25MTH16MTH16

.26'7

2019

162368829236

12197

111901 1511

1697520228

219832158628016287585012331 1073311335792361851300717267

516805323156115

577516007063989

3569

62158039

1182613980

1126316111

1861221166

210322599128 2102960730888318513526236029390891557117938

517525522157381

593176183365215

0

30

n

517

531386113

Ri|1351595|sp:QO9829

gi|1262226|eb AAD11519gi|1311366|gb AAD15577Ri!3169171

300 Ri 585322isp|P3798O

00

10

,

0000011100

000

000

74351

516275

10111165

119110113333Xf*773115221

73577110

303378109

X51513(ti|3929651|emb|CAA10321

ei 126389 |splP2319Oei|3881161|emblCAA21721

Ri|1106759|Rb|AAD2007O|Bi | 1100133

Ri|H06759!nb|AAD20O7O|ei | l 106759 lftb| A AD 20070Rill 106759 iftbi A AD 20070Rilll06759lRb!AAD2OO70|X55111Ri'2791278|emb,CAA93218ei,355U17|dbjiBAA32822:«i '31933i;

ABOO5786Rill552379|emb|CAA69318Ri|1512132|dbj|BAA75299.

Ri|3668O75Ri!3021151l8p|Q92791Rii 1510376 :eb A AD 21161.1

213

1013771212 30

74353

312126

289165

25.0

57.812.358.119.8

100.0100.0

28.813.3

15.966.3

[partial] (ALO21811) fHYPOTHETICAL 88.2 KD PROTEIN C1GS.03C IN CHROMO-SOME I(ACOO6200) putative protein kinaee A. ihnlinin,(ACOO631O) hypothetical protein A thulium(AC001101) putative eerine carboxypeptidase I A. llmlmINORGANIC PYROPHOSPHATASEPHOSPHO-HYDROLASE) (PPASE)tRNA-Val(AAC) A. tlialmw(AJ131206) microbody NAD-depeodent

LORICRIN(ALO32651) similar to Heme-bindinR doaod oxidoreductases Qtt?[turhttl-MJt(is rfcgHit(AC006836) hypoihecical protein A. </i«)i*(AFO00378) beta-glucosidase Glyciiie uiux

57.151.1

13.2100.0

710398

73379

93

29158

376

37.791.0

100.060.851.3

51.917.5'7 .3

(PYROPHOSPHATE

(AC0O6836

(Z69257) b.(ABO127O3

hypothetical prc

hypothetical protein A.

la-xylosidase Hypvtrea iv181 Datxus, carola

ins similarity to transcriptional activator RaA. thah*!,!,tRNA-Clu(CUC) A ,k«l,«,.«

(AB017508) rplQ homologue (identity ol 817i to B. nubiili. | B « J -liu /laloJuratiK(ACOO1667I hypothetical protein A. llwliuiaMONOCYTIC LEUKEMIA ZINC FINGER PROTEIN(AC007017) unknown protein A. llialiuu

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022

Page 14: Structural Analysis of Arabidopsis thaliana - DNA Research

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022