structural analysis of arabidopsis thaliana - dna research
TRANSCRIPT
DNA RESEARCH 6, 183-195 (1999) Short Communication
Structural Analysis of Arabidopsis thaliana chromosome 5. IX.Sequence Features of the Regions of 1,011,550 bp Covered bySeventeen PI and TAC Clones
Takakazu KANEKO, Tomohiko KATOH, Shusei SATO, Yasukazu NAKAMURA, Erika ASAMIZU,Hirokazu KOTANI, Nobuyuki MlYAJlMA, and Satoshi TABATA*
Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan
(Received 21 May 1999)
Abstract
In this series of projects sequencing the entire genome of Arabidopsis thaliana chromosome 5, non-redundant PI and TAC clones have been sequenced according to the fine physical map, and as of May 7,1999, the sequences of 16.2 Mb representing approximately 60% of chromosome 5 have been accumulatedand released at our web site. In parallel, structural features of the sequenced regions have been analyzed byapplying a variety of computer programs, and to date we have predicted a total of 2380 potential protein-coding genes in the 10,154,580 bp regions, which are covered by 142 PI and TAC clones. In this paper, wenewly analyzed the structural features of the 1,011,550 bp regions covered by additional 17 PI and TACclones, and predicted 298 protein-coding genes. The average density of the genes identified was 1 gene per3394 bp. Introns were observed in 67% of the genes, and the average number per gene and the averagelength of the introns were 3.2 and 159 bp, respectively. The gene density became higher than the valueestimated in the previously analyzed regions (1 gene per 4,267 bp), as the data in this paper were compiledbased on a new standard of gene assignment including the computer-predicted hypothetical genes. Theregions also contained 8 tRNA genes when searched by similarity to reported tRNA genes and the tRNAscan-SE program. The sequence data and information on the potential genes are available on the databaseKAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/.Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; TAC genomiclibrary; gene prediction
We have been operating a sequencing project of thegenome of a dicot model plant Arabidopsis thaliana,which is estimated to be approximately 120 Mb long.We focused our target on chromosomes 5 and 3 amongthe five chromosomes, and constructed accurate con-tig maps of both chromosomes with clones from YAC,PI, TAC, and BAC libraries.1'2 For DNA sequencing,we first isolated the chromosome-specific clones from PIand TAC libraries, and performed the sequence analysisof PI and TAC clones physically assigned on the chro-mosomes. As of May 7, 1999, the regions of 16.2 Mbrepresenting approximately 60% of chromosome 5 havebeen sequenced and the data were released at our website KAOS (Kazusa Arabidopsis data Opening Site,http://www.kazusa.or.jp/arabi/). In parallel, potentialgenes in the sequenced regions have been analyzed by us-ing a variety of computer programs for similarity search
Communicated by Mituru Takanami* To whom correspondence should be addressed. Tel. +81-438-
52-3933, Fax. +81-438-52-3934. E-mail: [email protected]
and gene modeling, and we so far predicted the potentialgenes in the total 10,154,580 bp regions which are coveredby 142 PI and TAC clones.3~10 In this paper, we newlyanalyzed the structural features of the 1,011,550 bp re-gions covered by an additional 17 PI and TAC clones.
1. Isolation and Sequencing of PI and TACClones
DNA sources and the method of clone isolationwere essentially the same as described in the previ-ous paper.3 The PI and TAC clones containing theDNA regions which cover a total of 17 DNA mark-ers on chromosome 5 were isolated by screening theMitsui PI11 and TAC12 libraries by means of poly-merase chain reaction (PCR) with the primers designedfrom the sequence information of DNA markers. TheDNA markers and selected clones are ends of MUG13and K18I23 (K2A11), CIC10D2L (K18J17), CIC4D4(K19M13), CIC5E10 (MJG14), CIC5E10 (K12B20),
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
184
length (Mbp)
Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 6,
dicated together with the clone names in Fig. 2.
1 0 -
20-
mil 21 -
mi97 -
mil 74-
mi322-mi438-
mi138-mi4337mi90 'mi219-
mi12S-
mi137-
mi323-
mi194-mi83 -mi61 -
g4028 —
CIC12D10L--
CIC11G8R—
mi69 -*mi70 -mil 84-
mi335-
K2A11K18J17MTH16
K19M13
MJG14K12B20K21I16MJC20MNL12MFC16MFC19MDN11K19E20K3K7MGN6K13P22MRI1
30—I
Figure 1. Relative locations of the sequenced PI and TAC clonesand the associated markers on the physical map of chromo-some 5. The positions of DNA markers used for PI and TACisolation and of other major DNA markers were localized on themap on the basis of the YAC tiling path and map informationin refs. 1 and 2. The vertical open bar represents the entirelength of chromosome 5. The names of PI and TAC clones aregiven at the right side, and those of markers at the left side.The distance (Mbp) from the telomeric site of the top arm isgiven in the vertical scale.
ends of MOP12 and MNF13 (K21I16), g4028 (MJC20),MWF20Jeft end (MNL12), K9L2_right end (MFC16),CIC11H2 (MFC19), CIC11G8R (MDN11), K24G6_rightend (K19E20), CIC11F10 (K3K7), MNC6_right end(MGN6), ends of MBG8 and MC015 (K13P22), andMUA2_right end (MRI1). MTH16 was directly isolatedas a clone showing restriction fragment length polymor-phism (RFLP) when used as a probe for genomic South-ern hybridization.13 The relative positions of the markersand the sequenced clones on chromosome 5 are shown inFig. 1. The relative orientation of each clone and con-tig on the chromosome has been confirmed by anchoringboth ends of the clone to those at the corresponding po-sitions of the contig map.
The nucleotide sequence of each PI or TAC insert wasdetermined according to the bridging shotgun methoddescribed previously.3 The length of the nucleotide se-quence of each PI or TAC insert finally confirmed is in-
2. Assignment of Potential Coding Regions
For assignment of the protein-coding regions and genemodeling, similarity search and computer prediction wereperformed as described in the previous paper3 with mod-ification. Briefly, a similarity search against the non-redundant protein sequence database nr (compiled byNCBI) was carried out using the BLASTX14 program.In parallel, the positions of potential protein-coding re-gions were predicted with the Grail,15 GENSCAN16 andNetGene2 computer programs.17 The transcribed regionswere assigned by comparison of the nucleotide sequenceswith Arabidopsis ESTs18'19 in the public databases us-ing the BLASTN program.11 All the results obtainedwere compiled with the aid of our new web-based tool,named Arabidopsis Genome Displayer (manuscript inpreparation), then assignment of the potential proteincoding genes was carried out by taking both similarity toknown genes and computer prediction into consideration.Therefore, the regions predicted only by the computerprograms with no apparent similarity to known geneswere also assigned as genes, while the computer-predictedhypothetical genes had not been included in the previousanalyses.3~10 To sum up, 298 potential protein-codinggenes as well as 13 partial genes located at the terminalregions of the clones and 6 pseudo genes were assignedin the 1,011,550 bp regions The average gene densitybecomes 1 gene per 3394 bp. We previously analyzed thestructural features of the total 10,154,580 bp regions andpredicted 2380 potential protein coding genes, resultingin 1 gene per 4,267 bp. This difference has arisen becausethe data in this paper were compiled based on the newstandard of gene assignment described above.
The RNA coding regions were assigned on the basis ofsequence similarity to the reported structural RNAs. FortRNA genes, prediction by the tRNAscan-SE program20
was also taken into account. As indicated in Fig. 2,8 tRNA genes corresponding to 6 amino acid species anda gene for 7SL RNA were identified in the 1,011,550 bpregions. Both potential protein- and RNA-coding geneswere denoted by numbers with the clone names followedby sequential numbers from one end to another of theinsert, which are listed in the table below the figure, andare also schematically represented in Fig. 2.
3. Structural Features of Potential Protein Cod-ing Genes
In this paper, the complete structures of 298 potentialprotein coding genes were predicted. Structural featuresof these genes as well as those of 1901 genes includingthose previously identified are listed in Table 1. Theyaccount for approximately 9.5% of the total gene con-stituents (2 x 104 genes) assumed for A. thaliana. Ap-
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
No. 3] T. Kaneko et al. 185
Table 1. Structural features of potential protein-coding genes in A. thaliana chromosome 5.
Features 298 genes" 1901 genes'"Gene length (bp) including intronsProduct length (amino acids)Genes with intronsNumber of intron/geneExon length (bp)Intron length (bp)GC content of exonsGC content of introns
62-10,618 (1615)19-1962 (368)
2000-34 (3.2)
6-3558 (263)25-1738 (159)
43%31%
62-11,377 (1963)19-2756 (430)
14370-42 (3.8)
2-4287 (267)8-5405 (175)
43%32%
Structural features of the potential protein coding genes assigned so far are listed.The 298 genes are assigned based on the new standard in this study*' and the1901 genesb' include 1603 potential protein genes previously assigned. Averagevalues are shown in parentheses.
proximately 76% of the protein-coding genes containedintrons, and the average number per gene and their av-erage length were 3.8 and 175 bp, respectively.
4. Expression Level of Potential Protein CodingGenes and Gene Segments
The nucleotide sequence of each of the potential pro-tein coding genes was compared with those in the Ara-bidopsis EST database, and the number of matched Ara-bidopsis ESTs was counted to monitor the transcriptionallevel of each gene. Of 298 complete and 13 partial genesthat we have identified on chromosome 5 in this study,104 carried matched ESTs. The putative products ofthe genes hit by 10 or more EST files, suggesting to bea class of highly expressed genes, include those showingsequence similarity to glutamate-ammonia ligase in A.thaliana (K12B20.6), microbody NAD-dependent malatedehydrogenase in A. thaliana (MTH16.8), CONSTANS-like protein 2 in Malus domestica (MRI1.1), ClpC prod-uct in A. thaliana (K3K7.7), luminal binding protein inA. thaliana (MJC20.12), and ALY transcriptional coac-tivator in Mus musculus (K12B20.19).
The sequence data as well as the gene informationshown in this paper are available through the World WideWeb at http://www.kazusa.or.jp/arabi/.
Acknowledgments: We thank S. Sasamoto and K.Idesawa for excellent technical assistance and the mem-bers of DNA Sequencing Laboratory: T. Kimura, T.Hosouchi, K. Kawashima, M. Matsumoto, A. Matsuno,A. Muraki, N. Nakazaki, S. Shinpo, C. Takeuchi, T.Wada, A. Watanabe, M. Yamada, and M. Yasuda fortheir excellent team work. We are grateful to A. Tanakafor technical advice, and Mitsui Plant Biotechnology-Research Institute and Arabidopsis Biological ResourceCenter at the Ohio State University for providing theDNA markers and the DNA libraries. This work wassupported by the Kazusa DNA Research Institute Foun-
dation. We thank M. Takanami and M. Oishi for theirsupport and encouragement to perform this project.
References
1. Kotani, H., Sato, S., Liu, Y.-G. et al. 1997, A fine physicalmap of Arabidopsis thaliana chromosome 5: Constructionof a sequence-ready contig map, DNA Res., 4, 371-378.
2. Sato, S., Kotani, H., Hayashi, R. et ai 1998, A physicalmap of Arabidopsis thaliana chromosome 3 representedby two contigs of CIC YAC, PI, TAC and BAC clones,DNA Res., 5, 163-168.
3. Sato, S., Kotani, H., Nakamura, Y. et al. 1997, Struc-tural analysis of Arabidopsis thaliana chromosome 5. I.Sequence features of the 1.6 Mb regions covered by twentyphysically assigned PI clones, DNA Res., 4, 215-230.
4. Kotani, H., Nakamura, Y., Sato, S. et al. 1997, Struc-tural analysis of Arabidopsis thaliana chromosome 5. II.Sequence features of the regions of 1,044,062 bp coveredby thirteen physically assigned PI clones, DNA Res., 4,291-300.
5. Nakamura, Y., Sato, S., Kaneko, T. et al. 1997, Struc-tural analysis of Arabidopsis thaliana chromosome 5. III.Sequence features of the regions of 1,191,918 bp coveredby seventeen physically assigned PI clones, DNA Res., 4,401-414.
6. Sato, S., Kaneko, T., Kotani, H. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. IV. Se-quence features of the regions of 1,456,315 bp covered bynineteen physically assigned PI and TAC clones, DNARes., 5, 41-54.
7. Kaneko, T., Kotani, H., Nakamura, Y. et al. 1998, Struc-tural analysis of Arabidopsis thaliana chromosome 5. V.Sequence features of the regions of 1,381,565 bp coveredby twenty one physically assigned PI and TAC clones,DNA Res., 5, 131-145.
8. Kotani, H., Nakamura, Y., Sato, S. et al. 1998, Struc-tural analysis of Arabidopsis thaliana chromosome 5. VI.Sequence features of the regions of 1,367,185 bp coveredby 19 physically assigned PI and TAC clones, DNA Res.,5, 203-216.
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
186 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 6.
9. Nakamura, Y., Sato, S., Asamizu, E. et al. 1998, Struc-tural analysis of Arabidopsis thaliana chromosome 5. VII.Sequence features of the regions of 1,013,767 bp coveredby sixteen physically assigned PI and TAC clones, DNARes., 5, 297-308.
10. Asamizu, E., Sato, S., Kaneko, T. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. VIII. Se-quence features of the regions of 1,081,958 bp covered byseventeen physically assigned PI and TAC clones, DNARes., 5, 379-391.
11. Liu, Y.-G., Mitsukawa, N., Vazquez-Tello, A., andWhittier, R. F. 1995, Generation of a high-quality PIlibrary of Arabidopsis suitable for chromosome walking,Plant J., 7, 351-358.
12. Liu, Y.-G., Shirano, Y., Fukaki, H. et al. Complemen-tation of plant mutants with large genomic DNA frag-ments by a transformation-competent artificial chromo-some vector accelerates positional cloning, Proc. Natl.Acad. Sci. USA, in press.
13. Shibata, D., Seki, M., Mitsukawa, N. et al. Establish-ment of framework PI clones for map-based cloningand genome sequencing: direct RFLP mapping of largeclones, Gene, 225, 31-38.
14. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., andLipman, D. J. 1990, Basic local alignment search tool, J.
Mol. Bioi, 215, 403-410.15. Uberbacher, E. C. and Mural, R. ,]. 1991, Locating
protein-coding regions in human DNA sequences by amultiple sensor-neural network approach, Proc. Natl.Acad. Sci. USA, 88, 11261-11265.
16. Burge, C. and Karlin, S 1997, Prediction of completegene structures in human genomic DNA, J. Mol. Bi.ol.,268, 78-94.
17. Hebsgaard, S. M., Korning, P. G., Tolstrup, N.,Engelbrecht, J., Rouze, P., and Brunak, S. 1996, Splicesite prediction in Arabidopsis thaliana DNA by combin-ing local and global sequence information, Nucl. AcidsRes., 24, 3439-3452.
18. Newman, T., Bruijn, F. J., and Green, P. 1994, Genesgalore: A summary of methods for accessing results fromlarge-scale partial sequencing of anonymous ArabidopsiscDNA clones, Plant Physioi, 106, 1241-1255.
19. Cooke, R., Raynal, M., Laudie, M. et al. 1996, Furtherprogress towards a catalogue of all Arabidopsis genes:analysis of a set of 5000 non-redundant ESTs, Plant J.,9, 101-124.
20. Lowe, T. M. and Eddy, S R. 1997, tRNAscan-SE: a pro-gram for improved detection of transfer RNA genes ingenomic sequence, Nucl. Acids Res., 25, 955-964.
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
No. 3] T. Kaneko et al. 187
MJC20 (83689
II III1
III23 4
• in11
1 1
bp)
a mil m i1
11 i56 7
BBIB
i
1 1
•
8
•9
l i l
mi
i
nm
i
•in
> n
1
II
1 1
Bi BIa on
mi i•i• I I nun
i II i
II
•TIIBBI1718 19
I III 111III I
I III Hill
M
Ii20
•
I I I
• I Bi l l2122 2324
II I
II1* 1 II
II
B i
•fli
1
25
II Illl 1 IIIIIII
1
112627 28 29
• • • •
1 1 1 1
30
I1
1311
1
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
deduced genes
ideal ifier Di rectioaPosiitii0D
5 •1
No.Exo
of No. ofEST
LenRt h InfoSeqi
rmatijence
on OD
IL)the mosI simil;ir sequ
Ov.lencap
eIden tity b efinitioD
MJC20 1
MJC202MJC20.3MJC20.1MJC20.5
MJC20 6MJC207MJf 20*MJC20.9MJC2010MJC20.11
MJC20.12MJC20.11MJC20.11MJC2015MJC20.16MJC20.17MJC20.1SMJC2019MJC20.20MJC20.21MJC20 22MJC20 21
MJC20 21
MJC20 25
MJC20 26
M.JC2027
MJC202«
MJC2029MJC20 30MJf20.11
127252578058
19933512
15998173212168821513
29933337283607138298399111088011515128971906 3516965129255309
59783
68180
"1019
"2119
"1935"673182756
312915657681
11590
130981519117119201102291126987
327133532036629391621010-11155123521676950379531015115957381
63873
70139
71765
73937
76251777 198321
186
105
506565
316371
151709
159201
155610139125
112
1073
315
118
319
317171163
gM10185'Sp|P09379Rii3911191 sp|P56558l
Bi|3871563 emb CAB02797|6ij2191187Ri|315O812ei12959370'emb CA Al7921!j>i!1039155
,ei 11695719 :dbj; BAA 13918 iRi!3702327RiH316766 B P | P 1 8 5 5 6 |
ei:2129608ipir!|S59558RI 1293835ei 119789 pin S31196
i 38753OOiemb!CAA92291l
i 2829910
i 3879119!emb!CAA91368!
i 728868 sp!P10603i
i 728867 sp|P10602i
i 1102910 lembjC A A6696OIi 3068809i 1567307ieb AAD23718.1
126
379
192186
32021 1
138288
200158150
609288325
259
1063
217
61
263
316
32.3
56.6
31.726.2
15.838.156.837.118.1
92.136.812.879.5
98.037.167.2
31.5
61.8
15.9
61.5
11.3
100.083.031.1
HYPOTHETICAL 20.8 KD PROTEIN T09A5.6 IN CHROMO-SOME III(U62798) SCARECROW A. ihuhtuu,
PROBABLE CVP7 PROTEINUDP-N-ACETYLCLUCOSAMINE-PEPTIDE N-ACETYLCLUCOSAMINYLTRANSFERASE110 KD SUBUNIT(O-CLCNAC TRANSFERASE P110 SUBl'NIT)
e On
(Z81012) similar to '(AF007271) contains(AF080136) mitoeeo(AL022117) hypoihe(AF101258) putativ
(D89312) lumiaal bindine protein A. ihitlinmt(ACOO5397) unknown protein A. ihnlinuu26S PROTEASOME REGULATORY SUBUNIT SI 1 (P31)B2 PROTEIN
Tiilirity to a DNAJ-likedotivated protein kinase kin;
opper-inducible 35.6 kDa
GTP-bindiae protein. 68K - A. thaliana
hvpot betical protein - pot ato
(AF069298) coatains similarity to a protein kinase domain (Pfapkinase.hmm. score: 166.20) aad to leeume lectias beta domi(Pfam: lectiaJeeB.hmm. score: 139.32) A tiiulinun(Z68160) Similarity to Yeast putative mitochondrial carrier pitein PET8 (SW:PET8_YEAST](At_ 0022911 L nKnowti proteln. contains reeutator of cbromosoi
(Z70310) similar to Clutathione S-t ransferases. C**itvrhnUliiii. •i-
ANTER-BPECIFIf. PROLINtRIfH PROTEIN APC (PRO-TEIN CEX)ANTER-BPECIFIf PROLINE-RIf H PROTEIX APG PRECUR-SOR(X98316) peroxidaseA tlutum*(AF059295I Skpl homoioe A. |1,«1;«,,«(ACOO5956) puiativezinc finder proiein A th«U«u«
Figure 2. Gene organization in the 17 PI and TAC clones. Positions of the identified or predicted genes in each insert of the PI andTAG clones are schematically represented by color-coded boxes above (rightward) and below (leftward) the wide line in the middlewhich represents the entire insert sequence. The length of the sequenced region in each insert is given in parenthesis together with theclone name at the top. The names of the adjacent overlapping clones of which sequences had been reported are shown on the middlebars. Arrowheads indicate the direction of the DNA strands (5' to 3'). Dark and faint blue bars with numbers represent the positionsof the assigned potential protein coding genes, and pseudo and partial genes, respectively, and red bars the positions of RNA codinggenes. Gray bars indicate the positions of the regions which matched Arabidopsis ESTs. The regions which showed similarity to thesequences in the protein database are shown by yellow, orange and red bars, each of which corresponds to BLASTX scores of 70-100,100-250, and 250 or more, respectively. The green bars indicate the positions of the potential exons predicted by the Grail program.Each of three different colors with increasing depth corresponds to the region with the Grail scores of less than 70, 70-90, and 90 ormore, respectively. The potential protein and RNA coding genes assigned as described in the text were listed below each of the figures.In this table, number of amino acid residues and nucleotide length (in italic) of putative gene products of the respective potentialprotein and RNA coding genes are indicated. The accession numbers are as follows: AB018107 (K12B20), AB017059 (K13P22),AB017060 (K18J17), AB017061 (K19E20), AB018110 (K19M13), AB017062 (K21I16), AB018111 (K2A11), AB017063 (K3K7),AB017064 (MDN11), AB017065 (MFC16), AB018113 (MFC19), AB017066 (MGN6), AB017063 (MJC20), AB017068 (MJG14),AB017070 (MNL12), AB018118 (MRI1), and AB020752 (MTH16).
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
1SS
K3K7 (68726 bp)
Sequencing of Artibttlopst* thai/ana chromosome o [Vol. 0.
ii 1 1 1 mai iiiHiiiiia II II I Ii i II
i iin
K16E14
I I I I I I I8 9 *> « QO MC 1* 22 2324 27
• • I — • • — I • I I • MWD22
I I III I• ! • •
STTO20 21IiI I I l l l II I i | |
i i • I I • • I I I II I I • m a i i i m i
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hK
Grail exon
rTH [niormauoVT
K3K7 .'.
K3K7 3
<"JK7 10
\ 3 K 7 11
\ 3 K 7 1 fi
\ 3 K 7 1 fi
192 e i I 3 6 2 1 0 3 pi i
nb CAB 10063.1
i 3 * 7 5 6 5 ? » r n b C A A 9 1 160
t 1 K ) « « 0 1 f-mb C A B 3 * 2 O i
i J31J3J pdb 6 R \ N
i ICTI ' IV* Hbj BAA 1* 166
1«8 ai 3 8 6 1 1 13 f-mb t ' A A I
2«' i 2« 9 I A L O 1 9 5 2 T ! p s n a We p r o t f i n A ihahaua
511 36 1 ( A C 0 0 3 6 « 0 ) u n k n o w n p r o i e i t i A tlmliHi*
III 17 3 i .AI_01<M«n p u n i v D N A - ' i i r < " T c d U N A | » ' I V I : I . T 3
566 95 5 ( A F Q 2 0 3 0 3 1 f u m a r a ^ - A ilml^im
3 0 3 57 9 ( A L 0 ' S 1 T « 3 ) p u t ; f i v e n u H » « i i d e b i n d i n g p r '> tHn
172 21 3 1 / 6 6 5 6 1 ) C ' o i n a i i s b e t a - i r a n s d ncit i f a m i i v p r ' i ^ i t ^
cfi-isf ( . - p r o t e i n s
19« 38 .3 ( A I . 0 3 5 6 0 1 ) p u t a i ve p r o t e i n A thub.Hiia
i R N A - T v n C F A l Pc,^\ \,\r i rn ron - 11166- H 1 5 1
316 1 5 . * l A ( ' Q G 5 7 2 1 ) u n k n o w n p r o t e i n \ ilmlmtin
36 51 1 R i i b r e d o x i n18_> 25 6 ( D < J 0 9 1 1 ) h v p o t h ^ i i c a l p r o i f i n S> ttr< lux > ~ n - -(>
s2 100 0 sRNA-^en At; A] A (/i«J/«r^
001 'M.3 tAI'05535S) r^spi-atorv burss oxtdase prou-in (' A
371 g-J.fi [panial] FRDt F IOTIIIN P K K t T K ^ O K
MDNll (83373 bp)
I I I I
12 3K16F13 ( I |
• IImill
• i i urn 11in IIi1 i '14 e « 7
U - 7 ' .
M i l
I I I I l l l
MIF21
IlllI I I
20 21I
I I I I I l l l • I 51 I I I I I I I I I I I l l l
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
deduced gene.
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
I . K a n o k o ot al . ISO
K12B20 (78874 bp)
• 11 i am i II i i i n inII
7 9 12 14 18 18
I • • • I I •
II III
I I22 23
MPA22
3 4 S 6• I
IIH I I H I
n n
111
O B 17_ II
19 2021I I I I I
ii III i in ni i i
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
ai JOT'S r>u emb C\\7'Vltv>
-tin i
Ml 1
' { ! 0 1
K13P22 (19971 bp)
II I I If III
I1 5
MBG8 I • MC015
2 34 6 7I I I
I I
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrai! exon
deduced genes
11* 1 1 l"i ai !JOO1JJr>
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
Sequencing of Arubulopsi^ thtilmnit chromosome •> [Vol . ().
K18J17 (54252 bp)
•II • III
MJJ3
I I I12 3 4
I I I I III
ITH213141II II I
II III I I III I
I I16 W 2021222324 25
M n i l l ii i i
7 89 17 «I I
I H I ! i ii i mi
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
K19E20 (61712 bp)
I B M I1
1 II I1 3
K24G6 • • I
I I • • n i l i iniii i i n nI I i
8 9 HO 13 1416 17
• • • • • • I • •I I I
K20J1
I II I I I IMS) i i ii mi II
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
EST db hit
Protein db hit
Grail exon
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
No. ;il T . K a n c k o ot al. I!) I
K19M13 (42563 bp)
• i 1 1 1 ii iiim• i
i
T32G21
i i • i • • • i •2 3 45 « 7I (I 1 Jl I 11 II
I IIII ii nil mini i in i
SB 1112I I • ! QM1
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
K21I16 (30578 bp)
i I
I
MP012
IIIIIIII
I
',I
I I
MNF13
I I
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
K2A11 (29292 bp)
• I I I
MUG13
I I1 3 5 6
"a " " II Hill 111
I Mii i i iaai INI
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
192 Sequencing of Arub/dopsis fhahaiia chromosome •> [Vol. ().
MFC16 (61290 bp)
111111 M im
K9L2
ii • m i i ian i iii in n
III120 6181718
II • • ! •21 2S
• • K15C23
' • *~ I I I • 1 I1 3 5 6 78 9 O 11 14I I I II I I II I
• • II I19 20 222324
III I • I II I I B I
I S H I M • I I • ! II I I • ! • ! Illl
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
MFC19 (85020 bp)
ill i nun III iin I I I
K9E15
I IIS 6 7
' I I M i l
II II I II*
8 9II I
11 13I I
M S 17I III
I I II I II III I I I I I l l l I I I I • I I I I • M l •
I IIII
III
• K2N11
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hit
Grail exon
. I i V
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
MGN6 (61380 bp)
I I I I I I I I I I I l l l l l l l l l I
1 2 4 5 7 9 D H 12 13
MNC6 ii ii I ! • • M
T. Kanrko ct al.
I II IIII I
17 W 21 22• • I I K6O8
M« « « 20
I I III I
Grail exon
Protein db hit
EST db hit
Gene
Gene
ESTdbhK
Protein db hit
Grail exon
MNL12 (45911 bp)
I I I I H I I I
K24F534 5• IB
I
I II
IIv MWF20
I I • II •1 2 6 78 9
I II
• IV. IB Illl II
II 12 13
I III II
Grail exonProtein db hit
EST db hit
Gene
Gene
EST db hit
Protein db hitGrail exon
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
of j \ i'al)/(lops/s thai /a mi chromosome1 .)
MJG14 (86121 bp)
Illl I III!
I I I i miniII i
11 67 9 OH OB 14 B VOT 20 21
KI5O15 | | | | • • • • M B I ^ H | | • •
• • ! ! I IIIIHIIII
22 23 24 25
t RM 1 1 nh II t I I H
iI H I M I III I I I
Grail exonProtein db hit
ESTcfohit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
h , ti !
i s i 22 1 I1) 1 2 !
k. \ s 1 tl
1 ] t o
] I
1
r
1 I '1 Ii 1
M J s . I 1 2 1
Mf I H I l U M
MRIl (50700 bp)
i
MUA2
1
deduced gene
.)..,,.if,..r -T
MRIi 1
MKll .'M i !
MR
M K
MR
MK
*U
M R
M K
i '!I 1
i "1
1 h
1 *
1 10
1 1 I
1 1 I
1II
•i
1
1
-
-
II 1
•Ai
i n
v:7!)
1 "Su
'1=1
27
7'i
71
1 1
I1
1
OR
! f i
)7
1 i 1i J i
>9 2
i« 1
)'! '}
Vi 1
n 1
I I B
m i l
No
Jh T
'J 'J 2
1 0 ' {
=>1 '
hOK
7 fi 0
«,:«,
I I7
1
II II
of No of
1
!s
1
J1
6
a
l
i
H i!
ill
Ti 1
116
1 1«
'S71
4'jr.
'10( •
lh«
2 10
•? 0 6
II
I1tB --I I MTI20 ^ * "
13 " ^ ^
I II
I ,f rn i iu i n i h - most sisi
- i i> r - ID
h.
e
e
e
• N ] s -
i 1 "i h 1
i" 1 ' 7
' " ! J ' . 1 • rub (* A B"!(lr]97 1• -s "i i i
Hi \f **
' 1" i - ' , . » ' , CAB'JtsTIT
S ' '7^ i
1 - " i l l
lar ^(I IKMH-P
Ovf rUp 1 i D
• m2 9()
1 1 b
119
107
•$09
7 9-i
IK 1
> 9 T 1
S S I f
t 1 ]
* 1 I1 ir v t i
r \ i i r
1-i 1 [ 1 l l
I 1 [ 1 ! 11
1 H I ! i r
i - \ •,
l i i i
i in U
1 * , „
Grail exonProtein db hit
ESTdbhit
Gene
Gene
ESTdbhit
Protein db hitGrail exon
I 1 I r k * \ / ,«„«
i l i t K
1 r I X
\ k l i ki i \
i l l . 1,1 ] H ™<<rr
, t
1 ^
t \
t F i \
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
No. 3]
MTH16 (C8098 bp)
T. Kaneko et al. 195
I B Illl I I I I I I
I l l l3 4 1 I ! D U E 1314616 17 19 20 21 22 2425 28 27I I Kill III
I I I II
11 • • • ! • ! • lima I I i m nilmi i i
• in
Grail exon
Protein db hit
EST db hit
Gene
Gene
EST db hit
Protein db hit
Grail exon
d e d
iden
ucedg
tifier
enes
DirectionPosi i t ion
5 3No.Exo
ofn
No. ofEST
Leo h InlormatSequence
ion oni ID
the inost simil ar sisquOverl
enraap Ideo tity b efiniuon
MTH16
MTH16MTH16
2
31
MTH16 5MTH1S
MTH16MTH16
MTH16MTH16
MTH16MTH16MTH16MTH16MTH16MTH16MTH16MTH16MTH16MTH16
.6
.
.8
q
.10
1112131 11 5Ifi171819?o
MTH16.21
MTH16.22MTH16MTH16
>3
MTH16 25MTH16MTH16
.26'7
2019
162368829236
12197
111901 1511
1697520228
219832158628016287585012331 1073311335792361851300717267
516805323156115
577516007063989
3569
62158039
1182613980
1126316111
1861221166
210322599128 2102960730888318513526236029390891557117938
517525522157381
593176183365215
0
30
n
517
531386113
Ri|1351595|sp:QO9829
gi|1262226|eb AAD11519gi|1311366|gb AAD15577Ri!3169171
300 Ri 585322isp|P3798O
00
10
,
0000011100
000
000
74351
516275
10111165
119110113333Xf*773115221
73577110
303378109
X51513(ti|3929651|emb|CAA10321
ei 126389 |splP2319Oei|3881161|emblCAA21721
Ri|1106759|Rb|AAD2007O|Bi | 1100133
Ri|H06759!nb|AAD20O7O|ei | l 106759 lftb| A AD 20070Rill 106759 iftbi A AD 20070Rilll06759lRb!AAD2OO70|X55111Ri'2791278|emb,CAA93218ei,355U17|dbjiBAA32822:«i '31933i;
ABOO5786Rill552379|emb|CAA69318Ri|1512132|dbj|BAA75299.
Ri|3668O75Ri!3021151l8p|Q92791Rii 1510376 :eb A AD 21161.1
213
1013771212 30
74353
312126
289165
25.0
57.812.358.119.8
100.0100.0
28.813.3
15.966.3
[partial] (ALO21811) fHYPOTHETICAL 88.2 KD PROTEIN C1GS.03C IN CHROMO-SOME I(ACOO6200) putative protein kinaee A. ihnlinin,(ACOO631O) hypothetical protein A thulium(AC001101) putative eerine carboxypeptidase I A. llmlmINORGANIC PYROPHOSPHATASEPHOSPHO-HYDROLASE) (PPASE)tRNA-Val(AAC) A. tlialmw(AJ131206) microbody NAD-depeodent
LORICRIN(ALO32651) similar to Heme-bindinR doaod oxidoreductases Qtt?[turhttl-MJt(is rfcgHit(AC006836) hypoihecical protein A. </i«)i*(AFO00378) beta-glucosidase Glyciiie uiux
57.151.1
13.2100.0
710398
73379
93
29158
376
37.791.0
100.060.851.3
51.917.5'7 .3
(PYROPHOSPHATE
(AC0O6836
(Z69257) b.(ABO127O3
hypothetical prc
hypothetical protein A.
la-xylosidase Hypvtrea iv181 Datxus, carola
ins similarity to transcriptional activator RaA. thah*!,!,tRNA-Clu(CUC) A ,k«l,«,.«
(AB017508) rplQ homologue (identity ol 817i to B. nubiili. | B « J -liu /laloJuratiK(ACOO1667I hypothetical protein A. llwliuiaMONOCYTIC LEUKEMIA ZINC FINGER PROTEIN(AC007017) unknown protein A. llialiuu
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022
Dow
nloaded from https://academ
ic.oup.com/dnaresearch/article/6/3/183/421513 by guest on 08 January 2022