genomic structure of the humancaldesmon · muscles, whereas i-cadis widely distributed in nonmuscle...

5
Proc. Natd. Acad. Sci. USA Vol. 89, pp. 12122-12126, December 1992 Biochemistry Genomic structure of the human caldesmon gene (dfetatn/smooh m e/acmyos/tr yosl/c d ) KEN'ICHIRO HAYASHI*, HAJIME YANO*, TAKASHI HASHIDAt, RIE TAKEUCHIt, OSAMU TAKEDAt, Kiyozo ASADAt, EI-ICHI TAKAHASHI*, IKUNOSHIN KATOt, AND KENJI SOBUE*§ *Department of Neurochemistry and Neuropharmacology, Biomedical Research Center, Osaka University Medical School, 2-2 Yamadaoka, Suita, Osaka 565, Japan; tBiotechnology Research Laboratories, Takara Shuzo Company, Ltd., 341 Seta, Otsu-shi, Shiga 520-21, Japan; and *Division of Genetics, National Institute of Radiological Sciences, 4-9-1 Anagawa, Inage-ku, Chiba 263, Japan Communicated by Christian Anfinsen, September 17, 1992 ABSTRACT The high molecular weight cldemon (h- CaD) is predominantly expressed in smooth muscles, whereas the low molecular weight caldesmon (I-CaD) is widely distrib- uted in nonmusce tissues and cells. The changes in CaD isoform expression are closely correlated with the phenotypic modulation of smooth muce cells. During a search for isdorm diversity of human CaDs, I-CaD cDNAs were cloned from HeLa S3 cells. HeLa i-CaD I is composed of 558 amino acids, whereas 26 amino acids (residues 202-227 for HeLa i-CaD I) are deleted in HeLa i-CaD H. The short amino i sequence of HeLa i-CaDs is different from that of fibroblast (WI-38) I-Cal) H and human aorta h-CaD. We have also identiied WI-38 I-CaD I, which contains a 26-amino acid insertion relative to WI-38 I-CaD H. To reveal the mo r events of the expressional regulation of the CaD iorms, the genomic sructure of the human CaD gene was detemiu. The human CaD gene is composed of 14 exons and was m ed to a single locus, 7q33-q34. The 26-amino acid in is encoded in exon 4 and Is cay spliced in the mRNAs for both h-CaD and i-CaDs I. Exon 3 is the exon that encodes the central repeating domain specific to h-CaD (residues 208-436) together with the common domain in all CaDs (residues 73-207 for h-CaD and WI-38 i-CaDs, and residues 68-201 for HeLa i-CaDs). The regulation of h- and i-CaD exp is thought to depend on selection of the two 5' splice sites within exon 3. Thus, the change In essio between I-CaD and h-CaD might be caused by this splicing pathway. Caldesmon (CaD), a calmodulin- and actin-binding protein, plays a vital role in the regulation of smooth muscle and nonmuscle contraction (1, 2). Two CaD isoforms have been identified; h-CaD (high Mr, 120,000-150,000) and i-CaD (low Mr, 70,000-80,000) as judged by NaDodSO4/polyacrylamide gel electrophoresis (3-6). Sequencing studies on chicken CaD cDNAs have demonstrated that the deduced molecular weights of h- and i-CaD are in the range of 87,000-89,000 and 59,000-60,000, respectively, and that the major parts of both CaDs have identical amino acid sequences except for the insertion of the central repeating domain of the h-CaD molecule (7-10). Structural and functional analyses have revealed that the calmodulin-, actin-, and tropomyosin- binding sites contained in a region involved in the regulation of actin-myosin interaction reside within the common car- boxyl-terminal domain of both CaD isoforms (9, 11). The tissue and cell distributions of the two isoforms are distinc- tively different, however. h-CaD is primarily found in smooth muscles, whereas i-CaD is widely distributed in nonmuscle tissues and cells. Notably, the changes in expression of the two CaD isoforms are closely correlated with phenotypic modulation of smooth muscle cells, in which h-CaD is predominantly expressed in differentiated smooth muscle cells and is replaced by I-CaD during dedifferentiation (12- 14). To investigate the regulation of CaD isoform expression, we have searched for isoform diversity of human CaDs and have determined the genomic structure¶ and the chromoso- mal location of the CaD gene. Our studies have revealed two splice sites within exon 3 of the CaD gene. We discuss this feature in relation to the regulation of CaD isoform expres- sion. MATERIALS AND METHODS Cloning and Sequencing of cDNA. An oligo(dT)-primed cDNA library from HeLa S3 mRNA was screened with 32P-labeled restriction fragments originating from embryonic chicken brain I-CaD cDNA. Four positive clones carrying I-CaD cDNAs were obtained and their sequences were de- termined. Southern Blot Analysis. Genomic DNA (5 pg) from HeLa S3 cells or human peripheral lymphocytes was digested with restriction enzymes and the digests were electrophoresed in 0.7% agarose gels. The separated DNA fragments were blotted to nylon membranes by the method of Southern (15). The hybridization conditions with 32P-labeled HeLa i-CaD I or II cDNA fragments have been described (9). Reverse Trauscriptio-PR. The first-strand cDNA from each cell type was synthesized by using (dT)1218 and/or the antisense primer specific to the 3' noncoding sequence of human h- and I-CaD cDNAs. Primers used in this experiment were as follows: sense primer Pn, d(ATGCTGGGTGGATC- CGGATC), specific to the short amino-terminal sequence of HeLa I-CaDs; antisense primer Pm, d(GTTTAAGTT- TGTGGGTCATGAATTCTCC), complementary to the com- mon sequence in all CaD isoforms, nucleotide positions 832-859 in WI-38 i-CaD II cDNA; sense primer Pn2, d(CAC- CATGGATGATTTTGAGCG), nucleotide positions 108- 128 in WI-38 i-CaD II cDNA (16); and antisense primer Pi, d(GAAGGTAGGCTTGTCTTCTTGGAGCTTTTC), com- plementary to the insertion sequence of the HeLa i-CaD I sense strand (Fig. 1). DNA fragments amplified by PCR (17) were separated in 1.5% agarose gels. Cha t a of Human CaD Gene. A human placental genomic library in EMBL3 was screened by hybridization with 32P-labeled probes from the HeLa I-CaD I cDNA. Restriction mapping revealed four overlapping clones (EMBL 11, SA, 111, and C4) and a nonoverlapping clone (EMBL 2) (see Fig. 3A). Restriction fragments from each Abbreviations: CaD, caldesmon; h-CaD, high molecular weight CaD; I-CaD, low molecular weight CaD. ITo whom reprint requests should be addressed. IThe nucleotide sequences reported in this paper have been depos- ited in the GenBank/EMBL/DDJB data base (accession nos. D90452 and D90453). 12122 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on October 14, 2020

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomic structure of the humancaldesmon · muscles, whereas i-CaDis widely distributed in nonmuscle tissues andcells. Notably, the changesin expression ofthe two CaD isoforms are

Proc. Natd. Acad. Sci. USAVol. 89, pp. 12122-12126, December 1992Biochemistry

Genomic structure of the human caldesmon gene(dfetatn/smooh m e/acmyos/tr yosl/c d )

KEN'ICHIRO HAYASHI*, HAJIME YANO*, TAKASHI HASHIDAt, RIE TAKEUCHIt, OSAMU TAKEDAt,Kiyozo ASADAt, EI-ICHI TAKAHASHI*, IKUNOSHIN KATOt, AND KENJI SOBUE*§*Department of Neurochemistry and Neuropharmacology, Biomedical Research Center, Osaka University Medical School, 2-2 Yamadaoka, Suita, Osaka 565,Japan; tBiotechnology Research Laboratories, Takara Shuzo Company, Ltd., 341 Seta, Otsu-shi, Shiga 520-21, Japan; and *Division of Genetics, NationalInstitute of Radiological Sciences, 4-9-1 Anagawa, Inage-ku, Chiba 263, Japan

Communicated by Christian Anfinsen, September 17, 1992

ABSTRACT The high molecular weight cldemon (h-CaD) is predominantly expressed in smooth muscles, whereasthe low molecular weight caldesmon (I-CaD) is widely distrib-uted in nonmusce tissues and cells. The changes in CaDisoform expression are closely correlated with the phenotypicmodulation ofsmoothmuce cells. During a search for isdormdiversity of human CaDs, I-CaD cDNAs were cloned fromHeLa S3 cells. HeLa i-CaD I is composed of 558 amino acids,whereas 26 amino acids (residues 202-227 for HeLa i-CaD I)are deleted in HeLa i-CaD H. The short amino isequence of HeLa i-CaDs is different from that of fibroblast(WI-38) I-Cal) H and human aorta h-CaD. We have alsoidentiied WI-38 I-CaD I, which contains a 26-amino acidinsertion relative to WI-38 I-CaD H. To reveal themo revents of the expressional regulation of the CaD iorms, thegenomic sructure ofthe humanCaD gene was detemiu. Thehuman CaD gene is composed of 14 exons and wasm ed toa single locus, 7q33-q34. The 26-amino acid in isencoded in exon 4 and Is cay spliced in the mRNAs forboth h-CaD and i-CaDs I. Exon 3 is the exon that encodes thecentral repeating domain specific to h-CaD (residues 208-436)together with the common domain in all CaDs (residues 73-207for h-CaD and WI-38 i-CaDs, and residues 68-201 for HeLai-CaDs). The regulation of h- and i-CaDexp is thoughtto depend on selection of the two 5' splice sites within exon 3.Thus, the change In essio between I-CaD and h-CaDmight be caused by this splicing pathway.

Caldesmon (CaD), a calmodulin- and actin-binding protein,plays a vital role in the regulation of smooth muscle andnonmuscle contraction (1, 2). Two CaD isoforms have beenidentified; h-CaD (high Mr, 120,000-150,000) and i-CaD (lowMr, 70,000-80,000) asjudged by NaDodSO4/polyacrylamidegel electrophoresis (3-6). Sequencing studies on chickenCaD cDNAs have demonstrated that the deduced molecularweights of h- and i-CaD are in the range of87,000-89,000 and59,000-60,000, respectively, and that the major parts ofbothCaDs have identical amino acid sequences except for theinsertion of the central repeating domain of the h-CaDmolecule (7-10). Structural and functional analyses haverevealed that the calmodulin-, actin-, and tropomyosin-binding sites contained in a region involved in the regulationof actin-myosin interaction reside within the common car-boxyl-terminal domain of both CaD isoforms (9, 11). Thetissue and cell distributions of the two isoforms are distinc-tively different, however. h-CaD is primarily found in smoothmuscles, whereas i-CaD is widely distributed in nonmuscletissues and cells. Notably, the changes in expression of thetwo CaD isoforms are closely correlated with phenotypicmodulation of smooth muscle cells, in which h-CaD is

predominantly expressed in differentiated smooth musclecells and is replaced by I-CaD during dedifferentiation (12-14).To investigate the regulation of CaD isoform expression,

we have searched for isoform diversity of human CaDs andhave determined the genomic structure¶ and the chromoso-mal location of the CaD gene. Our studies have revealed twosplice sites within exon 3 of the CaD gene. We discuss thisfeature in relation to the regulation of CaD isoform expres-sion.

MATERIALS AND METHODSCloning and Sequencing of cDNA. An oligo(dT)-primed

cDNA library from HeLa S3 mRNA was screened with32P-labeled restriction fragments originating from embryonicchicken brain I-CaD cDNA. Four positive clones carryingI-CaD cDNAs were obtained and their sequences were de-termined.

Southern Blot Analysis. Genomic DNA (5 pg) from HeLaS3 cells or human peripheral lymphocytes was digested withrestriction enzymes and the digests were electrophoresed in0.7% agarose gels. The separated DNA fragments wereblotted to nylon membranes by the method of Southern (15).The hybridization conditions with 32P-labeled HeLa i-CaD Ior II cDNA fragments have been described (9).

Reverse Trauscriptio-PR. The first-strand cDNA fromeach cell type was synthesized by using (dT)1218 and/or theantisense primer specific to the 3' noncoding sequence ofhuman h- and I-CaD cDNAs. Primers used in this experimentwere as follows: sense primer Pn, d(ATGCTGGGTGGATC-CGGATC), specific to the short amino-terminal sequence ofHeLa I-CaDs; antisense primer Pm, d(GTTTAAGTT-TGTGGGTCATGAATTCTCC), complementary to the com-mon sequence in all CaD isoforms, nucleotide positions832-859 in WI-38 i-CaD II cDNA; sense primer Pn2, d(CAC-CATGGATGATTTTGAGCG), nucleotide positions 108-128 in WI-38 i-CaD II cDNA (16); and antisense primer Pi,d(GAAGGTAGGCTTGTCTTCTTGGAGCTTTTC), com-plementary to the insertion sequence of the HeLa i-CaD Isense strand (Fig. 1). DNA fragments amplified by PCR (17)were separated in 1.5% agarose gels.Cha t a of Human CaD Gene. A human placental

genomic library in EMBL3 was screened by hybridizationwith 32P-labeled probes from the HeLa I-CaD I cDNA.Restriction mapping revealed four overlapping clones(EMBL 11, SA, 111, and C4) and a nonoverlapping clone(EMBL 2) (see Fig. 3A). Restriction fragments from each

Abbreviations: CaD, caldesmon; h-CaD, high molecular weightCaD; I-CaD, low molecular weight CaD.ITo whom reprint requests should be addressed.IThe nucleotide sequences reported in this paper have been depos-ited in the GenBank/EMBL/DDJB data base (accession nos.D90452 and D90453).

12122

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 2: Genomic structure of the humancaldesmon · muscles, whereas i-CaDis widely distributed in nonmuscle tissues andcells. Notably, the changesin expression ofthe two CaD isoforms are

Proc. Natd. Acad. Sci. USA 89 (1992) 12123

clone were subcloned and sequenced. Clones carrying theexon that encodes, the amino-terminal domain of WI-38i-CaDs or aorta h-CaD were obtained by using the cassetteprimer method. This method is based on the modification andimprovement of the specific-primer PCR method (18, 19).The specific DNA fragment containing the target exon thusobtained was used for isolation of the genomic clone (EMBLF5).Chromosomal Mapping of Human CaD Gene. The human

CaD gene was localized by using the genomic clones EMBL2, 11, 111, and C4 as probes in a mapping system combiningfluorescence in situ hybridization with R-banding (20, 21).

RESULTSIsoform Diversity ofHuman CaDs. Sequence analysis in the

present study revealed the two different molecules of i-CaD(i-CaDs I and II) that originated from HeLa S3 cells. Theprimary structures of HeLa I-CaDs I and II in comparisonwith those ofW138 i-CaD II (16) and human aorta h-CaD (22)are shown schematically in Fig. 1. HeLa i-CaD I and II arecomposed of558 amino acids (Mr 64,252) and 532 amino acids(Mr of 61,210), respectively; 26 amino acids (residues 202-227) of HeLa i-CaD I have been deleted in HeLa i-CaD II.The short amino-terminal sequences (residues 1-18) ofHeLai-CaDs are different from those of WI-38 i-CaD II and aortah-CaD (residues 1-24). The 26-amino acid insertion in HeLai-CaD I is found in aorta h-CaD, but not in WI38 i-CaD II. Thecentral repeating domain specific to h-CaD (residues 208-436) is deleted in all i-CaDs. To search for isoform diversityof human CaD, the reverse transcription-PCR method wasintroduced (Fig. 2). The primers used in this experiment areindicated in Fig. 1. The two kinds ofDNA fragments [731 and809 base pairs (bp)] were amplified from HeLa S3 mRNA by

HeLa I-CaD558 amino acids

MLGGSGSHGRRSLAALSQ1 18

C5~

sense Pri Pn2antisense Prn Pm

731 752__ ) _~af

0Y

Pil Pn2Pi Pi

691-670

FIG. 2. Characterization of I-CaD isoforms expressed in HeLa S3and WI-38 cells. The reverse transcription-PCR method was donewith the indicated primers sets, using first-strand cDNA from HeLaS3 and WI-38 as templates. Sizes ofthe amplified fragments are givenin base pairs.

using a HeLa-type sense primer (Pn) and the commonantisense primer (Pm). The similarly amplified DNA frag-ments (752 and 830 bp) were obtained from WI-38 mRNA byusing a WI-38-type sense primer (Pn2). The result suggeststhat the two i-CaD isoforms are expressed in HeLa S3 andWI-38 cells. In both cases, large and small DNA fragmentswould be derived from the mRNAs for the respective i-CaDwith the insertion of 26 amino acids (i-CaD I) and without it(i-CaD II). Large DNA fragments were not well amplified,however. Immunoblotting of HeLa S3 and WI-38 cells re-vealed that the expression of I-CaD I was very low comparedwith that of I-CaD II (data not shown). Therefore, suchamplifications would be reflected in the amount of eachmRNA for I-CaD I or II. The PCR with an antisense primerspecific to the HeLa I-CaD I insertion sequence (Pi) couldclearly amplify a single fragment- 670- and 691-bp fragments

- Pm

GEEKGTKVQAKREKLQEDKPTFKKEE202 - Pi 227

MLGGSGSHGRRSLAALSQ1 18

MDDFERRRELRRQKREEMRLEAER1 24

Pn2 -

WI-38 I-CaD564 amino acids

MDDFERRRELRRQKREEMRLEAER GEEKGTKVQAKREKLQEDKPTFKKEE1 24 208 233

aorta h-CaD793 amino acids

MDDFERRRELRRQKREEMRLEAER1 24

GEEKGTKVQAKREKLQEDKPTFKKEE437 462

FIG. 1. Isoform diversity of human CaDs. The identical sequences in all CaD isoforms and the central repeating domain specific to aortah-CaD are indicated by solid bars and an open bar, respectively. The short amino-terminal sequences of each isoform and the insertion sequencesspecific to I-CaDs I and h-CaD are shown by one-letter amino acid symbols below the bars. Primers used in PCR analysis are indicated by arrows.Numbers indicate the positions of amino acids in each CaD molecule.

HeLa I-CaD 11532 amino acids

WI-38 I-CaD 11538 amino acids .M

_ _

Biochemistry: Hayashi et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 3: Genomic structure of the humancaldesmon · muscles, whereas i-CaDis widely distributed in nonmuscle tissues andcells. Notably, the changesin expression ofthe two CaD isoforms are

12124 Biochemistry: Hayashi et al.

from HeLa S3 and WI-38 mRNAs, respectively. From theseresults, we have identified WI-38 i-CaD I with the insertionsequence in WI-38 cells (Fig. 1). The primary structures ofthe human CaD isoforms identified are summarized in Fig. 1.

Structure of Human CaD Gene. Five positive clones wereisolated from a human placental genomic library. Each clonewas subjected to further characterization, and the exon-containing fragments derived from the respective clones weresubcloned and sequenced. Fig. 3A shows the genomic con-struction ofhuman CaD. Four overlapping clones (EMBL 11,5A, 111, and C4) carried most of the exons. EMBL 2 andEMBL F5 were independent clones carrying an exon encod-ing the short amino-terminal sequence specific to HeLai-CaDs (residues 6-17) or WI-38 I-CaDs and aorta h-CaD(residues 1-24). Since EMBL 2 and EMBL F5 did not overlapwith EMBL 11, we could not clarify the spatial relationshipbetween exon 1 and 1'. All intron/exon junctions (Table 1)are compatible with the splice consensus sequence except forexon 3 (23).Exon 3 is constituted as follows (Fig. 4). The common

domain of the CaD isoforms (residues 68-201 for HeLai-CaDs and 73-207 for WI-38 I-CaDs and aorta h-CaD) isencoded in exon 3a, whereas the central repeating domainspecific to h-CaD (residues 208-436 for aorta h-CaD) residesin exon 3b. The consensus sequence for the 5' splice site isfound in the border between exon 3a and 3b (underlined inFig. 4). Exon 4 encodes the insertion sequence specific to thetwo i-CaDs I and aorta h-CaD. Based on the present findings,alternative splicing pathways are summarized in Fig. 3B.Exons 2 and 5-13 are spliced in all of the mRNAs for h- andi-CaDs, and exon 4 is spliced in the mRNA for I-CaDs I and

A

h-CaD. Exon 3a is spliced in the mRNAs for all i-CaDs,whereas exon 3ab is specifically spliced in the mRNA forh-CaD. Exons 1 and 1' encode the short amino terminusspecific to HeLa i-CaDs and to WI-38 I-CaDs or aorta h-CaD,respectively.Chromosomal Locus of Human CaD Gene. Southern blot

analysis of genomic DNA from HeLa S3 cells and humanperipheral lymphocytes with HeLa I-CaD I cDNA fragmentsas probes revealed identical hybridizing patterns (Fig. 5). Thesame result was obtained with HeLa i-CaD II cDNA fiag-ments as probes (data not shown). These results suggest thatthe CaD isoforms are encoded by a single gene. To confirmthis suggestion, the chromosomal locus of the CaD gene wasdetermined. We examined 100 (pro)metaphase plates show-ing a typical R-band for all clones. The efficiency of hybrid-ization was similar among the four kinds of probe (EMBL 2,11, 111, and C4, indicated in Fig. 3A), and the locations ofthesignals were the same. For example, 51% of such R-bandedchromosomes exhibited complete double-spot staining withEMBL 11 as a probe. The signals were localized in bandq33-q34 of the long arm of chromosome 7. No signals weredetected in the other chromosomes. Thus, the CaD genecould be assigned to band 7q33-q34 (Fig. 6).

DISCUSSIONIn our previous study (9), we investigated the structural andfunctional relationships between chicken h- and I-CaDs, inwhich the major parts of the amino and carboxyl termini ofboth isoforms are completely identical sequences. The car-boxyl terminus of both chicken CaDs conserve two se-

FAMB[L. EMB. F;.-. L-MBL

II,Akr..a.jj :.1-I'.

j 1ii1

ii 1i

L..:~ ~ H-, L...

.-.U,1r- .1

;J D.f i _

k-iirTaar±ortasrriootrlr k1 f t ci L-

II..IVI.. 1.1... ..i.

.........-llo..1

I.

i1........1 , 1.

FIG. 3. The intron/exon organization of the human CaD gene (A) and its alternative splicing pathways (B). (A) Four overlapping genomicclones and two independent clones are shown at the top. Boxes and lines indicate the exons and introns, respectively. Sizes of introns (below)are given in kilobase pairs (kbp). The introns and exon that we have not confirmed by cloning of genomic DNA are indicated by dashed linesand box, respectively. (B) The five alternative mRNA splicing pathways used to generate HeLa I-CaDs I and H, WI-38 I-CaDs I and II, andaortic smooth muscle h-CaD are shown schematically. Filled boxes represent the common exons in all CaD isoforms, and the exons encodingthe short amino-terminal sequences of HeLa I-CaDs and of WI-38 i-CaDs or aorta h-CaD are indicated by shaded and hatched boxes,respectively. Open boxes represent the exons specific to h-CaD and/or I-CaDs I.

t.1 ..1.i

Proc. Nad. Acad. Sci. USA 89 (1992)

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 4: Genomic structure of the humancaldesmon · muscles, whereas i-CaDis widely distributed in nonmuscle tissues andcells. Notably, the changesin expression ofthe two CaD isoforms are

Biochemistry: Hayashi et al.

Table 1. Exon organization of human CaD gene

Size,3' splice site Exon bp 5' splice site

Not determined 1 >37 CGCTCTCCCAgtgagtA L r

tttcagGTCCAGACAT 1' 112 AAGCAGAAAGgtaaggnoncoding E A E23

ttgoagAATCGCCTAC 2 147 CCCAGAACAGgtactgR I A Y A Q N72Q

gtacagTGTGCCTGAC 3 1090 AAAGAAACAGgtacagS V P D L K K Q436

aaaaagGGAGAAGAGA 4 78 AAAAGAAGAGgtaaatG E E K K E E42

ggttagATCAAAGAEG 5 146 ATACTTTCAGgtaagaI K D N T F510

ccacagCCGCCCTGGA 6 264 CAGAGAGGAGgtaaggS R D G L R E E598

tcttagGAAGAGAAGA 7 141 ATCTCTCAAGgtatttE E K S S L K65

tcotagATAGAAGAGC 8 44 TGCAGAAAAGgtaaatI E E V Q K659

ttttagCAGTGGTGTC 9 82 TGCAATTGAGgtgagaS S G V S A I E687

tttcagGGAACAAAAA 10 138 ACCAAATAAGgtgagcG T K T P N K733

ttgtagGAAACTGCTG 11 96 CAAACCTTCTgtaagtE T A P K P S765

tatcagGACTTGAGAC 12 81 CCCCACTAAGgtaatcD L R S P T K792

tggoagGTTTGAGACG 13 >533 Not determinedV

Exon sequences are shown in uppercase letters, and intron se-quences in lowercase letters. Amino acids encoded by each exon areindicated by one-letter symbols below the nucleotides, and theirposition numbers at each 5' splice site are from the human aortah-CaD sequence (22). The sequences and position number withunderline are from HeLa I-CaD sequence.

3ab

3a

AG CGC CTG GCT CGG CGT GAG GAA AGA CGC CAA AAA CGC CTTE R L A R R E E R R Q K R L

Proc. Natl. Acad. Sci. USA 89 (1992) 12125

quences showing high homology with the two tropomyosin-binding regions (T1 and T2) in the troponin T molecule. Wehave further identified the minimum regulatory domains,which are involved in the Ca2+-dependent regulation of theactin-myosin interaction, in the carboxyl terminus ofchickenh- and i-CaDs. We compared the primary structures ofhumanCaDs with those of chicken CaDs. The overall sequenceidentity between the two species of CaD is 65-68%. How-ever, all CaD isoforms in different species strongly conservethe minimum regulatory domain (89% identity). In addition tothis, the amino termini of all CaD isoforms (for example,residues 21-47 and 85-121 for HeLa I-CaD I and the corre-sponding residues of other chicken and human CaDs) alsoretain completely identical sequences. Therefore, these con-served domains might be important for the structure andfunction of CaDs.To begin to elucidate the molecular events ofthe regulation

of CaD isoform expression during phenotypic modulation ofsmooth muscle cells, we have investigated the genomicstructure ofthe human CaD gene. The CaD gene is composedof at least 14 exons (Fig. 3) and was mapped to a single locuson 7q33-q34 by using the four kinds of probes that can coverthe overall CaD gene (Fig. 6). The isoform diversity of thisprotein (Fig. 1) can be explained by selection of exon 1 or 1',exon 3a or 3ab, and/or exon 4. Exon 1 or 1' encodes the shortamino-terminal sequences of all human CaD isoforms. Exon4 is spliced in i-CaDs I and h-CaD. Exon 3a is also spliced inall i-CaDs, whereas exon 3ab is specifically spliced in h-CaD.Among these splicing pathways, the most interesting is theregulatory mechanism to select the two consensus sequencesfor the 5' splice sites in exon 3. The same exon structure asfor the human CaD gene has been also found in the genomeof chicken CaD (unpublished work). Such use of competingsplice sites has been reported in viral transcription units ofadenovirus EJA (24), simian virus 40 tumor antigen genes(25), Drosophila Ultrabithorax (26) and transformer genes(27), and the human kininogene gene (28). Among them, thetwo former examples have been the object of the most study

AAC ACTI CAA GIG UGAA WUG AIG AT IG ATGAG CGCA TTC CGN T Q V E G D D E A A F L

CAG GAG GCT CTG GAG CGG CAG AAG GAG TTC GAC CCA ACA0 E A L E R 0 K E F D P T

TA ACA GAT GCA AGT CTG TCG CTC CCA AGC AGA AGA ATG CAA AAT GAC ACA GCA GAA AAT GAA ACT ACC GAG AAG GAA GAAT D A S L S L P S R R M 0 N D T A E N E T T E K E E

AA AGT GAA AGT CGC CAA GAA AGA TAC GAG ATA GAG GAA ACA GAA ACA GTC ACC AAG TCC TAC CAG AAG AAT GAT TGG AGK S E S R Q E R Y E E E T E T V T K S Y Q K N D W RAT GCT GAA GAA AAC AAG AAA GAA GAC AAG GAA AAG GAG GAG GAG GAA GAG GAG AAG CCA AAG CGA GGG AGC ATT GGA GAhD A E E N K K E D K E K E E E E E E K P K R G S G EAT CAI!51A GA" GTG ATG GTG GAA GAG AAA ACA ACT GAA AGC CAG GAG GAA ACA GTG GTA ATG TCA TTA AAA AAT GGG CAGN 0 1 V E V m V E E K T T E S Q E E T V V M S L K N G Q

ATC AGT TCA GAA GAG CCT AAA CAA GAG GAG GAG AGG GAA CAA GGT TCA GAT GAG ATT TCC CAT CAT GAA AAG ATG GAA GAGS S E E P K Q E E E R E Q G S D E S H H E K M E E

GAA GAC AAG GAA AGA GCT GAG GCA GAG AGG GCA AGG TTG GAA GCA GAA GAA AGA GAA AGA ATT AAA GCC GAG CAA GAC AAAE D K E R A E A E R A R L E A E E R E R K A E 0 D K

AAG ATA GCA GAT GAA CGA GCA AGA ATT GAA G00 GAA GAA AAA GCA GCT GCC CAA GAA AGA GAA AGG AGA GAG GCA GAA GAGK A D E R A R E A E E K A A A Q E R E R R E A E E

AGG GAA AGG ATG AGG GAG GAA GAG AAA AGG GCA SCA GAG GAG AGG CAG AGG ATA AAG GAG GAA GAG AAA AGG GCA GCA GAGR E R M R E E E K R A A E E R 0 R K E E E K R A A E

GAG AGG CAG AGG ATA AAG GAG GAA GAG AAA AGG GCA GCA GAG GAG AGG CAG AGG ATA AAA GAG GAA GAG AAA AGG GCA GCAE R 0 R K E E E K R A A E E R Q R K E E E K R A A

GAG GAG AGG CAA AGG GCC AGG GCA GAG GAG GAA GAG AAG GCT AAG GTA GAA GAG CAG AAA CGT AAC AAG CAG CTA GAA GAOE E R 0 R A R A E E E E K A K V E E 0 K R N K 0 L E E

AAA AAA CGT GCC ATG CAA GAG ACA AAG ATA AAA 000 GAA AAG GTA GAA CAG AAA ATA GAA GGG AAA TGG GTA AAT GAA AAGK K R A M 0 E T K K G E K V E 0 K E G K W V N E K

AAA GCA CAA GAA GAT AAA CTT CAG ACA GCT GTC CTA AAG AAA CAaYntacLataK A 0 E D K L 0 T A V L K K Q I

284 (520)91 (97)

365 (601)118 (124)

446 (682)145 (151)

527 (763)1T2 (178)608 (844)1 99 (205)

(925)(232)

(1006)(259)

(1087)(286)

(1 168)(313)

(1249)(340)

(1330)(367)

(1411)(394)

(1492)(421)

(1537)(436)

FIG. 4. Nucleotide sequence ofexon 3 (uppercase) with flanking intron sequences (lowercase). Exons 3aand 3ab are boxed. The consensus sequenceof5' splice sites for exons 3a and 3ab are underlined, and the intron/exonjunctions are indicated by arrowheads. The nucleotides and the deduced aminoacid sequences from HeLa I-CaD cDNAs are numbered at right, and the numbers in parentheses are from human aorta h-CaD cDNA (22).

ttatacao T GTG CCT GAC GAG GAG GCC AAG ACA ACC ACC ACAV P D E E A K T T T TI r-----

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0

Page 5: Genomic structure of the humancaldesmon · muscles, whereas i-CaDis widely distributed in nonmuscle tissues andcells. Notably, the changesin expression ofthe two CaD isoforms are

12126 Biochemistry: Hayashi et a!.

BarnH EcoR Hind illn b a b ib

A,23.1

946 6 ~-44~ Itd2 32.0

01

FIG. 5. Southern blot analysis. Genomic DNA from HeLa S3(lanes a) and human peripheral lymphocytes (lanes b) was digestedwith the indicated enzymes. Probes were made from the full-lengthHeLa I-CaD I cDNA. Size markers are indicated in kilobases.

according to which trans-acting factors might be involved inthe modification of the recognition process of alternative 5'splice sites by U1 small nuclear ribonucleoprotein (29, 30). Asplicing factor derived from HeLa nuclear extract (SF2; ref.31) plays a critical role in selection of the 5' splice site; itpromotes use of the 5' splice site that is located near the 3'splice site in an artificial mRNA precursor containing two 5'splice sites (32). An anti-SF2 factor to suppress the activationof the 5' splice site by SF2 has been also reported (33). Herewe propose the selective usage of competing splice sites inrelation to cell differentiation. Regulation of h- and 1-CaDexpression may depend on unknown trans-acting factorswhich are linked to phenotypic modulation of smooth musclecells. Further studies are required for identification of suchfactors. Additionally, it is necessary to determine whetherHeLa- and WI-38-type mRNAs are transcribed from thesame promoter or an independent promoter. The regulation

FIG. 6. Chromosome mapping of human CaD gene. A wholeR-banded (pro)metaphase plate was hybridized with the biotinylatedCaD gene. Arrows indicate the signals on 7q33-q34.

of CaD isoform expression must be studied at both thetranscriptional and the mRNA processing level.

We thank Dr. T. Hon (National Institute ofRadiological Sciences)for his suggestions. This study was partly supported by grants fromthe Scientific Research Fund of the Ministry of Education, Science,and Culture of Japan and from the Nissan Foundation.

1. Sobue, K., Muramoto, Y., Fujita, M. & Kakiuchi, S. (1981)Proc. Natl. Acad. Sci. USA 78, 5652-5655.

2. Sobue, K" & Sellers, J. R. (1991)J. Biol. Chem. 266, 12115-12118.3. Owada, M., Hakura, A., lida, K., Yahara, I., Sobue, K. &

Kakiuchi, S. (1984) Proc. Natl. Acad. Sci. USA 81, 3133-3137.

4. Sobue, K., Tanaka, T., Kanda, K., Ashino, N. & Kakiuchi, S.(1985) Proc. Natl. Acad. Sci. USA 82, 5025-5029.

5. Bretcher, A. & Lynch, W. (1985) J. Cell Riol. 100, 1748-1757.6. Dingus, J., How, S. & Bryan, J. (1986) J. Cell Biol. 102,

1748-1757.7. Hayashi, K., Kanda, K., Kimizuka, F., Kato, I. & Sobue, K.

(1989) Biochem. Biophys. Res. Commun. 164, 503-511.8. Bryan, J., Imai, M., Lee, R., Moore, P., Cook, R. G. & Lin,

W.-G. (1989) J. Biol. Chem. 264, 13873-13879.9. Hayashi, K., Fujio, Y., Kato, I. & Sobue, K. (1991) J. Biol.

Chem. 266, 355-361.10. Bryan, J. & Bryan, L. (1991) J. Muscle Cell Motil. 12, 372-375.11. Wang, C.-L. A., Wang, L.-W. C., Xu, S., Lu, R. C., Saavedra-

Alanis, V. & Bryan, J. (1991) J. Biol. Chem. 266, 9166-9172.12. Ueki, N., Sobue, K., Kanda, K., Hada, T. & Higashino, K.

(1987) Proc. Natl. Acad. Sci. USA 84, 9049-9053.13. Sobue, K., Kanda, K., Tanaka, T. & Ueki, N. (1988) J. Cell.

Biochem. 37, 317-325.14. Glukhova, M. A., Kabakov, A. E., Frid, M. G., Ornatsky,

0. I., Belkin, A. M., Mukhin, D. N., Orekhov, A. N., Kotel-iansky, V. E. & Smirnov, V. N. (1988) Proc. Natl. Acad. Sci.USA 85, 9542-9546.

15. Southern, E. (1975) J. Mol. Biol. 98, 503-517.16. Novey, R. E., Lin, J. L.-C. & Lin, J. J.-C. (1991) J. Biol.

Chem. 266, 16917-16924.17. Saiki, R. K., Scharf, S., Faloona, F., Mullis, K., Horn, G. T.,

Erlich, H. A. & Arnheim, N. (1985) Science 230, 1350-1354.18. Kalman, M., Kalman, E. T. & Cashel, M. (1990) Biochem.

Biophys. Res. Commun. 167, 504-506.19. Shyamala, V. & Ames, G. F.-L. (1989) Gene 84, 1-8.20. Takahashi, E., Hon, T., O'Connell, P., Leppert, M. & White,

R. (1990) Hum. Genet. 86, 14-16.21. Takahashi, E., Yamauchi, M., Tsuji, H., Hitomi, A., Meuth,

M. & Hon, T. (1991) Hum. Genet. 88, 119-121.22. Humphrey, M. B., Herrera-Sosa, H., Gonzalez, G., Lee, R. &

Bryan, J. (1992) Gene 112, 197-204.23. Mount, S. M. (1982) Nucleic Acids Res. 10, 459-472.24. Berk, A. J. & Sharp, P. A. (1978) Cell 14, 659-711.25. Ziff, E. B. (1982) Nature (London) 287, 491-499.26. Beachy, P. A., Helfand, S. L. & Hogness, D. S. (1985) Nature

(London) 313, 545-551.27. Boggs, R. T., Greagor, P., Idriss, S., Belote, J. M. & Mc-

Keown, M. (1987) Cell 50, 739-747.28. Kitamura, N., Kitagawa, H., Fukushima, D., Takag, Y., Miyata,

T. & Nakanishi, S. (1985) J. Biol. Chem. 260, 8610-8617.29. Lerner, M. R., Boyle, J. A., Mount, S. M., Wolin, S. & Steitz,

J. (1980) Nature (London) 283, 220-224.30. Rogers, J. & Wall, R. (1980) Proc. Natl. Acad. Sci. USA 77,

1877-1879.31. Krainer, A. R., Conway, G. C. & Kozak, D. (1990) Genes Dev.

4, 1158-1171.32. Krainer, A. R., Conway, G. C. & Kozak, D. (1990) Cell 62,

35-42.33. Mayeda, A. & Krainer, A. R. (1992) Cell 68, 365-375.

Proc. Nad. Acad. Sci. USA 89 (1992)

Dow

nloa

ded

by g

uest

on

Oct

ober

14,

202

0