genomic and molecular characterization of cl-43 and its proximal promoter
TRANSCRIPT
![Page 1: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/1.jpg)
Genomic and molecular characterization of CL-43 and
its proximal promoter$
Soren Hansena,1, Dorte Holma, Vivi Moellera, Lars Vitveda, Christian Bendixenb,Karsten Skjoedta, Uffe Holmskova,*
aDepartment of Immunology and Microbiology, University of Southern Denmark, Odense, Winsloewparken 21, DK-5000 Odense C, DenmarkbDepartment of Animal Breeding and Genetics, Research Centre Foulum, DK-8830 Tjele, Denmark
Received 16 May 2002; received in revised form 5 September 2002; accepted 23 September 2002
Abstract
Collectins are part of the innate immune system as they bind nonself glycoconjugates on the surface of microorganisms and inhibit
infection by direct neutralization, agglutination or opsonization of the invaders. Conglutinin and CL-43 are serum proteins that have only
been found and characterized in Bovidae. We have studied molecular and genomic characteristics of CL-43 to identify polymorphisms that
might be associated with disease-susceptible phenotypes or other traits in cattle, and to elucidate how the Bovidae may benefit from
possessing additional collectins. Screening a bovine cDNA library resulted in the isolation of two plasmid clones that encoded the entire
translated sequence of CL-43. The 5V-untranslated end and start point of transcription were identified by 5V-RACE and showed that the
mRNA transcript comprises either 1326 or 1241 nucleotides because of alternative splicing. Both transcripts encode a protein of 321 amino
acids including a signal peptide of 20 residues. Characterization of two overlapping genomic lambda phage clones showed that the gene
comprised seven exons spanning 8.5 kbp. The CL-43 gene, like the conglutinin gene, was mapped to Bos taurus chromosome 28 at q1.8. The
CL-43 promoter has 96% identity with the conglutinin promoter recently described by us, and the assignment of potential cis-regulatory
elements shows that several hepatic transcription factors may regulate transcription in the acute phase response and in response to metabolic
changes.
D 2002 Elsevier Science B.V. All rights reserved.
Keywords: Collectin; CL-43; Innate immunity; Cattle; Comparative immunology/evolution
1. Introduction
CL-43 (43-kDa collectin) is a member of the collectin
family comprising conglutinin, mannan-binding lectin
(MBL), lung surfactant protein A (SP-A), lung surfactant
protein D (SP-D) and CL-L1 (collectin liver 1). The
collectins are made up of C-type carbohydrate recognition
domains attached to collagen-like regions via an alpha-
helical coiled-coil region [1,2]. In contrast with the other
collectins, CL-43 does not form large oligomers but exists
only as single subunits of three polypeptide chains joined by
a fixed pattern of interchain disulfide bridges in the N-
terminus [3,4]. Conglutinin and CL-43 are both humoral
proteins synthesized in the liver, similar to MBL, but have
only been found in Bovidae.
Collectins play a role in the nonadaptive immune de-
fense. They bind via their carbohydrate recognition domain
to nonself glycoconjugates located on the surface of micro-
organisms. This binding may lead directly to neutralization
and aggregation of the microorganisms, or to opsonization
and subsequent phagocytosis by macrophages expressing
different collectin receptors [5,6]. In the lung, opsonization
by the action of SP-A or SP-D decreases the secretion of
pro-inflammatory cytokines and modulates the T-cell
response, protecting the lung from inflammation and
immune mediated damage [7,8]. Mice deficient in SP-A
0167-4781/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved.
PII: S0167 -4781 (02 )00531 -6
Abbreviations: CL-43, 43-kDa collectin; CL-L1, collectin liver 1; MBL,
mannan-binding lectin; SP-A, lung surfactant protein A; SP-D, lung
surfactant protein D$ The nucleotide sequences reported in this paper have been deposited
with GenBank/EMBL data libraries under accession numbers AY071821
and AY071822.
* Corresponding author. Tel.: +45-6550-3775; fax: +45-6591-5267.
E-mail address: [email protected] (U. Holmskov).1 Present address: Department of Cell Biology, Duke University
Medical Center, Durham, NC 27710, USA.
www.bba-direct.com
Biochimica et Biophysica Acta 1625 (2003) 1–10
![Page 2: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/2.jpg)
are susceptible to bacterial as well as viral pulmonary
infections, while mice deficient in SP-D show increased
susceptibility only to viral infections [9]. MBL is the only
collectin which on binding to microorganisms activates
complement through associated serine proteases that cleave
C4 and C2. Complement activation may lead to direct
killing or enhance opsonization by deposition of C3 [10].
Besides binding to carbohydrates, SP-A and SP-D bind
certain phospholipids and protect these and alveolar macro-
phages from oxidative damage [11,12]. This may be impor-
tant for lipid homeostasis, as mice lacking SP-D accumulate
surfactant lipid and show morphologic changes in alveolar
macrophages and type II cells [13,14]. Additional roles for
SP-D outside the lungs are implied by its presence on the
surface of mucosal epithelial cells lining the ducts of several
glands and organs [15].
Despite many attempts to find a human homologue of
conglutinin, no such protein has been identified, and it
appears that conglutinin and CL-43 are limited to the
Bovidae or Ruminantia. Few data have been published on
the biological role of CL-43. CL-43 binds to the rotavirus
Nebraska calf diarrhoea virus, reducing its viral infectivity
[16]. Like other collectins, CL-43 binds to the opportunistic
yeast pathogen, Cryptococcus neoformans in its acapsular
form [17]. It binds to mannan (KD of 2.7� 10�8 M) and
neoglycoproteins with terminal mannose or fucose residues
[18,19].
We have previously partly characterized the cDNA
encoding CL-43 and observed that the serum concentration
of this protein varies among cattle ([20]; unpublished
results). The aim of the present work was to characterize
both CL-43 and conglutinin to obtain additional clues to
their function and possible reasons for their evolutionary
preservation in Bovidae or Ruminantia, as well as informa-
tion that might be generally applicable to closely related
collectins like SP-D. Here we describe full-length cDNA of
CL-43, the complete CL-43 gene sequence with its exon/
intron structure and promoter region, and its chromosomal
localization. Comparison with other collectin genes shows
that CL-43 and conglutinin have diverged from an SP-D-
like protein. The genomic characterization will allow for
analysis of polymorphisms that might lead to low serum
concentrations of CL-43 and thus be important for the
disease susceptibility of cattle.
2. Materials and methods
2.1. Full-length cDNA clones
A CL-43 derived probe, spanning nucleotide 633–1026
of the CL-43 cDNA sequence (GenBank accession number
X75912), was obtained by PCR amplification using the
primers: 5V-GGCCTCCCCACGCTCTTCA-3Vand 5V-CCTTCTGGCCTCATCCTGTGG-3Vwith a previously iso-
lated CL-43 cDNA clone as template [20]. The PCR
consisted of initial denaturation at 94 jC for 4 min, followed
by 30 cycles of 94 jC for 45 s; 56 jC for 30 s; 72 jC for 30
s and a final extension step at 72 jC for 1 min. The product
was purified by means of the Qiaex II gel extraction kit
(Qiagen, Hilden, Germany) and labeled with [32P]-dCTP
using an oligolabeling kit and random primers according to
the procedure recommended by the manufacturer (Amer-
sham Pharmacia Biotech, Piscataway, NJ). Approximately
2.8� 105 plaques of a bovine liver lambda cDNA library
(Stratagene, La Jolla, CA) were screened with the 32P-
labeled probe. The final high stringency wash was carried
out in 0.3� SSC at 55 jC for 20 min. Positive clones were
replated and rescreened twice to uniformity. Inserts were
excised from the lambda ZAP XR vector in vivo and
transformed into Escherichia coli SOLR according to man-
ufacturer’s instructions. Plasmids were purified by means of
the Wizard Plus SV Miniprep purification system (Promega,
Madison, WI) and sequenced.
2.2. 5V-RACE
RNA was isolated from 250 mg bovine liver by
means of the TRI-reagent kit (Sigma, St. Louis, MO),
following manufacturer’s instructions. cDNA was synthe-
sized from 4 Ag of total RNA by means of Superscript II
H�reverse transcriptase (Invitrogen, Groeningen, Nether-
lands) in the presence of the CL-43 specific primer 5V-GCTATCTGCTGGTGGAGC-3V. The cDNA was precipi-
tated twice with ethanol and ammonium acetate and
polycytosine tails were attached to the cDNA transcripts
by using 25 U of terminal transferase (Roche, Mannheim,
Germany) in the presence of 2 mM dCTP. The tailing
reaction was carried out for 30 min at 37 jC and the
terminal transferase was subsequently heat-inactivated.
One-fifth of the tailed cDNA (2 Al) was used as template
in a 90-Al PCR with the CL-43 specific primer 5V-GAGGGTTTTCTCCGAATAGACATC-3 and a poly-G13
primer. The PCR included 2 min initial denaturation at 94
jC, followed by the addition of Taq polymerase and 10
cycles of touch-down amplification starting at 63 jC and
decreasing by 1 jC per cycle. The main amplification
comprised 25 cycles of 94 jC for 45 s, 54 jC for 30 s,
72 jC for 30 s and a final step extension of 3 min at 72
jC. Products were analyzed by agarose gel electrophoresis
and excised products purified as above. The product (10
ng) was ligated into the PCRII-vector and heat-shock
transformed into INVaFVE. coli using the original TA
cloning kit (Invitrogen). Plasmids were purified and
sequenced as above.
2.3. Genomic cloning
The 3V-end of the CL-43 gene was isolated from a bovine
genomic EMBL3 lambda phage library (Clonetech, Palo
Alto, CA) using the same probe as used in the isolation of
cDNA clones. Screening and washing conditions were as
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–102
![Page 3: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/3.jpg)
above. Phage DNA was isolated, digested with SacI and
analyzed by agarose gel electrophoresis. Fragments not
observed in a similar digestion of empty EMBL3 vector
were purified by means of the Qiaex II gel extraction kit.
The fragments (20–100 ng) were ligated into 25 ng SacI-
digested pBS(ks+)-vector pretreated with calf intestinal
phosphatase-treated pBS(ks+)-vector and were heat shock
transformed into chemically competent XL-10 E. coli.
Plasmids were purified and sequenced as above.
The 5V-end of the gene was isolated from the same library
and under similar conditions with a 281-bp probe derived
from the 5V-untranslated sequence and the sequence encodingthe N-terminal segment. The probe was obtained by PCR-
amplification using the primers 5V-CTGAATGGGCGCTT-TATCCT-3V and 5V-ATGACCACGAAGGCTATCTGCT-3Vwith the newly isolated full-length cDNA clone, CL-43
160, as template. The PCR consisted of 4 min initial denatu-
ration at 94 jC followed by 30 cycles of 94 jC for 45 s; 54 jCfor 30 s; 72 jC for 30 s and a final extension step of 1 min at
72 jC. Subcloning into SacI-linearized plasmids and subse-
quent sequencing of the isolated phage clones were per-
formed as above. To assemble the sequences obtained from
the subclones, four PCRs (P1–P4) overlapping the junctions
were performed using the primers: 5V-CTGAATGGGCG-CTTTATCCT-3V, 5V-ATGACCACGAAGGCTATCTGCT-3V(P1); 5V-CCAGCAGATAGCCTTCGTGGTCAT-3V, 5V-CTGTGATGGGGGTGAGGAGTGAG-3V (P2); 5V-GG-
GCCATCGGTCCACA-3V, 5V-GGCGTTGAACTTCT-
CCCTCTAA-3V (P3); and 5V-TAAGGCAGCGGATGA-
GAAAC-3V, 5V-CTCTGGGCCTTCGTCTTTTG-3V (P4).Purified phage DNA (50 ng) was used as template in 90-Alreactions using Pwo polymerase (Roche) under the condition
recommended by the manufacturer. After 2 min of initial
denaturation, Pwo polymerase was added to the reactions
followed by 10 cycles of touch-down amplification starting
10 jC above the respective annealing temperatures of 54 jC(P1), 60 jC (P2), 53j (P3) and 52 jC (P4) and decreasing by 1
jC per cycle. Main amplification followed, comprising 20
cycles of 94 jC for 45 s, 30 s at the respective annealing
temperatures, 72 jC for 2min (P1)/3min (P2)/30 s (P3)/4min
(P4), followed by a final extension step of 8 min at 72 jC.Products were analysed by agarose gel electrophoresis and
purified from the excised gel plugs as above. Purified
products were inserted into the pCRB4bluntTOPO plasmid
(Invitrogen) according to manufacturer’s instructions and
heat-shock-transformed into E. coli XL-10. Plasmids were
prepared and sequenced as above.
2.4. Chromosomal localization
Genomic DNA (20 ng) of 90 hamster/cow somatic cell
hybrids [21] was used as template in PCRs with the primers:
5V-TGTGGGGCCAGGTATGC-3V and 5V-TCAGTCCA-CACCATCTACATACAT-3V. Amplification comprised a 4
min initial denaturation, 10 cycles of 94 jC for 45 s; 61–52
jC for 30 s; 72 jC for 1 min, decreasing the annealing
temperature by 1 jC per cycle, followed by an amplification
of 25 cycles of 94 jC for 45 s; 51 jC for 30 s; 72 jC for 1
min and a final extension step of 2 min at 72 jC. Productswere analyzed using agarose gel electrophoresis.
2.5. DNA sequencing
PCR products or plasmid preparations were sequenced
with the Prism Ready Reaction BigDyeDeoxy Terminator
sequencing kit (PE Applied Biosystems, Alleroed, Den-
mark) using the recommended conditions. Samples were
subjected to electrophoresis on an ABI prism 310 Genetic
Analyzer and data analyzed with the ABI Prism Software
Version 2.1.1 (PE Applied Biosystems).
3. Results
3.1. Characterization of full-length CL-43 transcript
By partial protein sequencing, isolation of a partial
cDNA clone and PCR, we had previously obtained a CL-
43 cDNA sequence without the 5V-untranslated sequence,
the signal peptide and the initial N-terminal sequence [20].
To characterize the full-length transcript, we screened a liver
cDNA library with a probe encoding the carbohydrate
recognition domain of CL-43 and isolated two different
CL-43 clones, clone 151 and clone 160 (Fig. 1A). The
inserts of these clones encoded the whole translated
sequence and some of the 5V-untranslated sequence. They
differed from each other in that clone 151 lacked 85 bp of
the 5V-untranslated sequence located between the start of
transcription and the translated sequence. Forty base pairs of
additional 5V-untranslated sequence, not found in the cDNA
clones, were characterized by 5V-RACE. The total CL-43
transcript comprised 171 or 86 bp of 5V-untranslatedsequence, 966 bp encoding the protein with intact start
and stop codons, followed by 189 bp of 3V-untranslatedsequence including a complete polyadenylation site and
polyadenylation tail (Fig. 1B). The deduced amino acid
sequence revealed a structure made of a signal peptide of 20
amino acid residues, an N-terminal segment of 28 amino
acid residues, 38 Gly-Xaa-Yaa repeats, a neck region of 31
amino acid residues and a carbohydrate recognition domain
of 128 amino acid residues. In both cDNA clones, we found
a threonine codon (ACC) at position 125 instead of the
alanine codon (GCC) reported by Lim et al. [20]. Another
discrepancy with the previously reported sequence was
found at nucleotide position 390 of the cDNA, where we
found an adenine nucleotide contrary to the reported gua-
nine nucleotide. This discrepancy is silent as it occurs in the
third position of the codon encoding glycine 130 (GGG vs.
GGA). Clone 160 possesses an additional silent variation,
differing from clone 151 and the published sequence, at the
third position of the codon encoding serine 27 (TCT vs.
TCG). The complete cDNA sequence and the deduced
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–10 3
![Page 4: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/4.jpg)
amino acid sequence of CL-43 have been submitted to
GenBank, with accession number AY071821.
3.2. Genomic characterization
Two EMBL3 phage clones (E43-1 and E43-2) were
isolated by screening a bovine genomic lambda library with
probes corresponding to the 5V-end and the 3V-end of the CL-43 transcripts. Subclones S1 (2363 bp) and S2 (1849 bp)
(Fig. 2) were obtained from SacI digestion of clone E43-1,which included the 5V-end of the gene. Subclones S3 (1923
bp) and S4 (1785) were obtained from a similar digestion of
E43-2, which included most of the 3V-end of the gene. PCRs
were carried out to obtain overlapping sequences, and the
products P2 (2127 bp), P3 (350 bp), P4 (2779 bp) were
amplified from phage DNA derived from E43-2. In
parallel, the product P1 (1389 bp) was amplified from
E43-1. There was 100% identity in the 1132-bp overlap
of S2 and P2 derived from separate phage clones, show-
ing that E43-1 and E43-2 were fragments of the same
gene.
We encountered a single discrepancy with the cDNA
sequence at position 1434 of the gene, where a thymine
nucleotide was found. This discrepancy is silent as it occurs
Fig. 1. CL-43 cDNA and deduced amino acid sequence. (A) Strategy for cDNA characterization. The resulting transcript is shown above the 5V-RACE and the
two cDNA clones, clone 160 and 151. The alternatively spliced transcript is shown with a dashed angled line. The previously obtained cDNA clone is
represented below. (B) cDNA and deduced amino acid sequence. Nucleotides and amino acid residues are numbered in 5Vto 3Vand N-terminal to C-terminal
directions, respectively. The start of translation defines the first amino acid residue (+1) and the numbering of nucleotides. The start codon (ATG) is highlighted
and the polyadenylation site (AATAAA) is underlined. The dashed underlining indicates the alternatively spliced 5V-untranslated sequence.
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–104
![Page 5: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/5.jpg)
in the third position of the codon for threonine 30 (ACC vs.
ACT). The inserts of E43-1 and E43-2 were not further
characterized, as subclones and amplified products covered
the gene of 8493 and 1605 bp of the proximal promoter. The
first exon may vary in size (168/83 bp) because of alter-
native splicing of the 5V-untranslated sequence observed on
characterizing the cDNA (Fig. 2C). The second exon
encodes the signal peptide, the N-terminal segment and
the first five Gly-Xaa-Yaa repeats. The rest of the
collagen-like region is encoded by exons 3–5. Exon 6
encodes the alpha-helical neck region and exon 7 encodes
the carbohydrate recognition domain plus 192 bp of 3V-untranslated sequence. The prediction that exon 7 con-
tains the carbohydrate recognition domain with 3V-untrans-
Fig. 2. The CL-43 gene. (A) Chromosomal localization to B. taurus chromosome 28 at position q1.8. The proximate microsatellite is typed in bold. (B) Cloning
strategy. Subcloned fragments (S1–S4) and amplified PCR products (P1–P4) are shown. Numbering is from the first transcribed nucleotide in exon 1. (C)
Genomic organization. Exons are designated U: 5V-untranslated sequence (exon 1); SNC: signal peptide, N-terminal segment and collagen-like region (exon 2);
C: collagen-like region (exon 3–5); a: alpha-helical neck region (exon 6); and CRD/U: carbohydrate recognition domain and 3V-untranslated sequence (exon
7). Numbers indicate the size of exons and introns in bp. (D) Exon and intron boundaries. Numbering is from the start of transcription and capital letters show
transcribed nucleotides. A potential TATAA box in exon 1 and polyadenylation signal AATAAA in exon 7 are underlined with solid lines. The dashed
underlining in exon 1 indicates the alternatively spliced 5V-untranslated sequence. Complete sequences of both exons and introns (not shown) have been
submitted to GenBank, with accession number AY071822.
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–10 5
![Page 6: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/6.jpg)
lated sequence is based on phylogenetic conservation of the
carbohydrate recognition domain and 3V-untranslatedsequence as a single exon in all collectin genes characterized
so far. All exon/intron boundaries obliged the GT–AG rule
defining donor and acceptor sites for splicing out introns
(Fig. 2D). Sequences of the entire gene including introns
have been submitted to GenBank, with accession number
AY071822.
3.3. Chromosomal localization of CL-43
DNA isolated from hamster/cow somatic hybrids was
screened by PCR-amplification of a region of intron 6. The
specificity of the PCR was previously tested using bovine
genomic DNA as template under the conditions given in
materials and methods. A single product (1111 bp) was
amplified from genomic DNA and sequencing verified that
it was part of the CL-43 gene (not shown). The screening of
the hybrid panel showed 21 hybrids out of a total of 90
hybrids contained the CL-43 gene. No products were
observed when using DNA isolated from CHO cells. On
basis of the previous characterizations of the panel, the CL-
43 gene was linked with a LOD score of 7 to the micro-
satellite marker ILST099 located on chromosome 28 at the
distal position q1.8 (Fig. 2A) [21].
3.4. Assigning potential cis-regulatory elements to the
proximal promoter
The proximal promoter region of CL-43 shows strikingly
high homology (99.7%) with the initially published pro-
moter sequence of conglutinin [22,23], with a difference of
only two nucleotides in the 742-bp known upstream
sequence of the conglutinin gene (not shown; will be
discussed later). A TATAA box and a CAAT box were
found at �28 and �78, respectively, in close vicinity of the
transcription start site, defined as position + 1 (Fig. 3). The
first 1000 bp of the upstream sequence was analyzed for cis-
regulatory elements using the MatInspector (Biological
Databases, Braunschweig, Germany) program and the
Transfac 5.0 database [24]. Regulatory elements were
selected with stringent search criteria requiring 100% core
factor identity and a matrix identity of more than 90%
(Table 1). Several AP-1 sites and NF-1 elements were
found, with a distinct cluster of AP-1 sites located in
upstream proximity to the CAAT element (�78 to�160).
Fig. 3. Promoter region and potential regulatory elements of the CL-43 gene. Numbering is from the start of transcription, showed with an angled arrow. The
proximate TATAA-box and CAAT-box are underlined and putative binding sites for transcription factors are boxed according to their position and consensus
sequences found in Table 1. Arrows above the boxes indicate the orientation of the binding site.
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–106
![Page 7: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/7.jpg)
Table 1
Potential cis-regulatory elements (based on matrix analysis)
Transcription factor Matrix (#/1000 bp) Consensus CL-43 sequence Relative fit Position
AP-1 AP1_Q2 (1.82) RSTGACTNMNW ATTGACAAATT 0.908 �78(�)GTTGACTCAGT 0.970 �96(�)ATTGACTGAGT 0.952 �100(+)CATGACTCAAA 0.930 �150(+)TCTGACTCTTT 0.901 �613(�)
AP4_Q5 (2.48) NNCAGCTGNN ATCAGCGGCA 0.914 �137(+)CCCAGCTCTT 0.916 �419(+)GGCAGCTCCC 0.909 �426(+)CTCAGCAGCC 0.923 �470(+)GGCAGCTCCC 0.901 �508(+)ATCAGCAGCT 0.901 �991(+)
AP4_Q6 (0.5) CWCAGCTGGN CACAGCAGGT 0.905 �361(+)AP1FJ_Q2 (2.5) RSTGACTNMNW AGTGACCCCCTG 0.930 �524(+)
TGTGACCCCGT 0.939 �623(�)ARNT ARNT_01 (0.69) NNNNNCACGTGNNNNN AAAGGCACGTGGAGGG 0.957 �563(+)
CCCTCCACGTGCCTTT 0.914 �563(�)cEBP CEBP_01 (2.1) RNRTKNNGMAAKNN TGATTGTGCAACTT 0.948 �297(+)
TGCTTTTGCAAGAA 0.931 �318(+)GACTTGAGCAACTG 0.928 �669(+)ACCYGATGCAAAAA 0.907 �869(+)
c-Myb CMYB_01 (2.0) NNNNNNGNCNGTTGNN CCCACAGGCTGTTGCC 0.940 �477(+)AGTGACCCCTGTTGGG 0.915 �517(+)
c-Myc MYCMAX_02 (1.7) NANCACGTGNNW CTCCACGTGCCT 0.923 �565(+)deltaEBF1 DELTAEF1_01 (2.4) NNNCACCTNAN CCCCACCTGGA 0.969 �115(�)
GAGCACCTTCA 0.945 �180(�)CAGCACCTGCT 0.943 �357(�)GGCCACCTGAT 0.961 �876(+)
E2F E2F_02 (0.13) TTTSGCGC TTTGCCGC 0.928 �135(�)E47 E47_01(0.11) NSNGCAGGTGKNCNN ACAGCAGGTGCTGTT 0.921 �355(+)FREAC-7 FREAC7_01 (0.52) WNNANATAAAYANNNN CAGGTATAAATACTCA 0.910 �21(+)IK-2 IK2_01 (4.0) NNNYGGGAWNNN AGCTGGGAGCTG 0.911 �422(�)
CTGAGGGAAGCC 0.939 �650(+)CTCCGGGAATTG 0.935 �739(+)CTTGGGGATCTG 0.906 �981(+)
Lmo-2 LMO2COM_01 (1.1) SNNCAGGTGNNN ATCCAGGTGGGG 0.914 �137(+)CAGCAGGTGCTG 0.990 �357(+)CATCAGGTGGCC 0.958 �857(�)
Max MAX_01 (0.11) NNANCACGTGNTNN AAGGCACGTGGAGG 0.925 �564(+)MYCMAX_02 (1.7) NANCACGTGNNW CTCCACGTGCCT 0.923 �565(�)
MyoD MYOD_Q6 (1.0) NNCANCTGNY CCCACCTGGA 0.910 �116(�)AGCACCTGCT 0.963 �358(�)GCCACCTGAT 0.952 �876(+)
SRACAGGTGKYG CAGCAGGTGCTG 0.963 �357(+)CATCAGGTGGCC 0.917 �875(�)
MZF-1 MZF1_01 (3.8) NGNGGGGA CTGGGGGA 0.956 �504(�)ATGGGGGA 0.966 �774(�)CTTGGGGA 0.965 �985(+)
NF-1 NF1_06 (4.1) NNTTGGCNNNNNNCCNNN TATTGGCTGAGGAACTTC 0.929 �2(+)TTTTGGCTTGTTGACTCA 0.942 �87(�)CTTTGGCCACCTGATGCA 0.910 �873(+)
NFAT NFAT_06 (1.9) NNNWGGAAAANN CTCTGGAAACTG 0.908 �78(�)CGATGGAAATGA 0.939 �761(+)AGGAGGAAAATG 0.974 �812(+)TTCTGGAAAAGA 0.956 �832(+)
NFY NFY_06 (0.70) TRRCCAATSRN CAGCCAATATT 0.912 �11(�)TCACCAATTCC 0.910 �735(�)
N-Myc NMYC_01 (1.6) NNNCACGTGNNN AGGCACGTGGAG 0.959 �565(+)CTCCACGTGCCT 0.985 �565(�)
RFX1 RFX1_02 (0.95) NNGTNRCNNNRGYAACNN GGGCTGCTGAGGCAACAG 0.933 �469(�)Sox-5 SOX5_01 (1.1) NNAACAATNN ACAACAATTC 0.982 �583(+)SRY SR_02 (3.9) NWWAACAAWANN AACAACAATTCT 0.911 �582(+)TCF11 TCF11_01 (4.6) GTCATNNWNNNNN GTCATGAACATCA 0.969 �143(+)USF USF_01 (1.2) GYCACGTGNC AAGGCACGTGAGGG 0.982 �564(+)
CCTCCACGTGCCTT 0.982 �564(�)
Site were selected by the principles of Quandt et al. with a core factor identity of 100% and a relative fit of more than 0.9. The matrixes and consensus motifs are from the transfac 5.0
database (Biological Databases) and the cores of the consensus matrixes are underlined. The number in brackets after each matrix is the number of times it is likely to be found in a
random sequence of 1000 nucleotides and is thus inversely related to its selective quality.
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–10 7
![Page 8: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/8.jpg)
A cluster of potential elements for the transcription factors:
AP-1, E47 (product of the E2A gene), Lmo-2 (Lim domain
factor 2), MyoD (myogenic determination factor) and del-
taEF1 (delta-crystalin enhancer binding factor 1) were
identified at position �355 to �361 [23–28]. The canonical
E-box (GAGCTC) and flanking sequence at�563 to �578may serve as binding sites for the factors Arnt (nuclear
transporter for the aryl hydrocarbon receptor), c-Myc (cel-
lular Myc factor), USF (upstream stimulatory factor), N-
Myc (activated Myc factor in neuroblastoma) and Max
(binding partner of Myc family factors) [29–31]. A third
and dense cluster of potential cis-element for the factors NF-
1, deltaEF1, MyoD, C/EBP (CCAAT enhancer binding
protein), c-Myb (cellular Myb factor), Lmo-2, NFY (nuclear
factor Y) and USF is located further upstream at position
�869 to �890 [32]. The analyzed sequence also contained
multiple binding sites for C/EBP, deltaEF1, Ik-2 (Ikaros 2),
MZF-1 (myeloid zinc finger 1) and NFAT, whereas single
elements were observed for the factors E2F (E2 promoter
binding factor), Freac-7 (forkhead related activator 7),
RFX1 (regulator factor X 1), Sox-5 (SRY box binding
factor 5), SRY (SRY factor) and TCF11 (transcription factor
11) [33–40].
4. Discussion
The previously published cDNA sequence of CL-43
provided no information on the 5V-untranslated sequence,
the signal peptide and the sequence encoding the first eight
amino acid residues of the N-terminal segment [20]. In the
present work, we isolated two cDNA clones that both
encoded all of the entire translated sequence together with
some of the 5V-untranslated sequence. The two clones
differed with respect to the length of 5V-untranslatedsequence, as alternative splicing of mRNA resulted in 5V-untranslated sequence of either 171 or 86 bp.
In relation to the exon/intron structure, the alternative
splicing leads to a varying size of the first exon, which has
either 168 or 83 bp, and in the first intron, which has either
1167 or 1252 bp. Both transcripts obliged the GT–AT rule
for splicing out the first intron. Transcription of conglutinin
shows a similar splicing pattern and the first exon of the
conglutinin gene also shows a high degree of homology
(98.2%) with that of the CL-43 gene [22,23].
In the sequences obtained from our cDNA clones and
genomic subclones, we encountered four discrepancies with
the previously published sequence. Three were silent and at
least two reflected polymorphisms, as we found different
codons in the different cDNA clones and sequences derived
from the gene (serine 27 and threonine 30). The productive
discrepancy found in the codon encoding amino acid
residue number 125 (threonine vs. alanine) was seen in all
the analyzed sequences. This residue occurs in the collagen-
like region and the substitution probably has no impact on
the structure of the collagen helix. An additional discrep-
ancy to the CL-43 GenBank sequence accession number
X75912 was seen at asparagine 286, but this merely reflects
a mistaken report to GenBank, as our observation agrees
with original published sequence [20].
The structure of the CL-43 gene resembles that of the
genes encoding conglutinin and human SP-D (Fig. 4A),
although CL-43 lacks an exon corresponding to the second
exon of the conglutinin gene, which encodes additional 5V-untranslated sequence (Fig. 4A). Two of collagen-coding
exons of the CL-43 and conglutinin genes, as well as that of
the predicted bovine SP-D gene, appear to share a unique
size of 108 bp. This finding, together with the phylogenetic
comparison of the carbohydrate recognition domains, makes
it likely that the ancestral conglutinin/CL-43 molecule
Fig. 4. Phylogenetic analysis and comparison of gene structure. (A)
Structural comparison of collectin genes. *The exon/intron structures of
porcine SP-D and bovine SP-D were predicted from the their cDNA
sequence and comparison with previously characterized collectin genes.
The CL-L1 gene was drawn on the basis of its cDNA sequence and the
human genomic sequence reported in GenBank accession numbers
AC080033 and AC023487. Exons are designated U: 5V-untranslatedsequence; SNC: signal peptide, N-terminal segment and collagen like
region; C: collagen-like region; a: alpha-helical neck region; and CRD/U:
carbohydrate recognition domain and 3V-untranslated sequence. (B)
Phylogenetic tree based on sequence alignment of carbohydrate recognition
domains. Sequences were aligned using the Clustal V method, applying
likelihood of branching orders from an ancestral sequence, and the tree was
created by the neighbourhood joining method. The estimated time scale
refers to million years (myr).
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–108
![Page 9: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/9.jpg)
evolved by duplication of the bovine SP-D gene after the
Bovidae separated from other mammals (Fig. 4B). A second
duplication of the gene for the ancestral conglutinin/CL-43
molecule then permitted conglutinin and CL-43 to diverge.
During or following this process, CL-43 lost the exons
encoding additional 5V-untranslated sequence and the initial
collagen-coding exon, while the size of the third collagen-
coding exon decreased from 108 to 72 bp. Aligning the
corresponding conglutinin exons with the CL-43 gene
reveals that the GT–AG splicing rule would have been
violated by splicing in the missing 5V-untranslated exon and
if the 72 bp collagen-coding exon was extended to 108 bp.
We could not align the missing collagen-coding exon, which
is completely deleted in the gene as shown by the fact that
there are no open reading frames encoding Gly-Xaa-Yaa
repeats in intron 2 (not shown).
Why cattle should have evolved additional SP-D-like
collectins after diverging from other mammals is an inter-
esting question. Because of the reported extrapulmonary
localization of SP-D to tissues of the digestive system, we
think it is likely that the function of CL-43 and conglutinin
may be associated with rumination [15]. Rumination relies
on symbiosis with microorganisms and SP-D, conglutinin
and CL-43 may be crucial factors in the first line of defense
against invading microorganisms in the gut. If the anti-
inflammatory action that SP-D shows in the lung also
applies to the gut, these SP-D like proteins could help
prevent tissue damage from inflammatory processes [7].
Screening the cow/hamster hybrid panel showed that the
CL-43 gene localizes to chromosome 28 in proximity to a
microsatellite at the distal position q1.8. The gene encoding
conglutinin localizes to the same position. Like the human
collectin gene cluster at 10q21.1 to 21.4, a bovine gene
cluster appears to exist on Bos taurus chromosome 28, a
phenomenon that is also supported by comparative mapping
(not shown).
Because of the strikingly high homology (99.7%) of the
CL-43 promoter here described with the initially published
conglutinin promoter, which differs in only 2 out of 742 bp,
we recharacterized the conglutinin gene and found that the
promoter region did not correspond to the published
sequence [22,23,41]. Thus, the previously characterized
conglutinin promoter represents the promoter region of the
CL-43 gene, and the confusion probably resulted from a
PCR-based cloning-strategy and the high degree of homol-
ogy between the two genes [40]. The newly characterized
conglutinin promoter differs from the CL-43 promoter in 27
of 742 bp (96% identity).
Functional studies on the previously characterized con-
glutinin promoter, which should now be regarded as apply-
ing to the CL-43 promoter, showed that hepatic transcription
factors bound to the AP-1 site at �150 and elements at
positions �115 to �143 and �167 to �180 [23]. The
unidentified factor which bound at �167 to �180 was
termed K-factor and it had, combined with an AP-1 site
(�150), a positive synergistic effect on transcription. The
synergy of the two sites depended critically on the distance
that separated them. As the CL-43 promoter shows 100%
identity to previously published conglutinin sequence in this
region (�180 to �150), a similar control of CL-43 tran-
scription may be expected. One of the two nucleotides that
differ in the 742 bp of the two promoters is located at
position�139 and may influence the binding of unidentified
factors in the region�115 to �143. However, the differencedoes not change the potential rare motif for the transcription
factor E2F (�135), but may have a minor impact on the
expected affinity of the AP-1 site (�137). Using less-
stringent search criteria for potential responsive elements
in the K-element and its flanking sequences reveals that the
region may serve as a binding site for HNF3beta (hepato-
cyte nuclear factor 3 beta), also known as Foxa2, with a
core-factor identity of 86% and a total fit of 77% [42].
Foxa2 is highly expressed in hepatocytes and a key factor
for the expression of a variety of liver specific proteins. It
belongs to the family of transcription factors known as
forkhead proteins or winged helix proteins, and it was
recently shown that the human SP-D promoter included a
similar site that is important for transcription of the gene
[43]. Matrix analysis of the SP-D promoter shows that this
site has a lower score than that of the potential Foxa2 site in
the K-element of the CL-43 promoter (not shown).
The roles of the potential cis-elements (�355 to �361)for the factors E47, Lmo-2, MyoD and deltaEF1 are ques-
tionable, as these factors are important regulators of differ-
entiation of B-lymphocytes and T-lymphocytes (E47, Lmo-
2 and deltaEF1) or muscle-specific activation (MyoD)
[25,44,45]. It is nevertheless possible that these sites serve
as binding sites for transcription factors that repress tran-
scription of the gene in tissues other than the liver. Potential
binding sites for the factors Arnt and USF are found in the
second cluster of potential elements (�563 to �578). SinceArnt and USF are found in hepatocytes and play key roles in
the adaptive metabolic responses to polycyclic aromatic
hydrocarbons and glucose/insulin, respectively, it is possible
that CL-43 expression might also be influenced by the
energy metabolism. The two adjacent binding sites for C/
EBP (�297, �318), initially known as IL6-REs, may serve
as binding sites for homo/hetero dimers of members of C/
EBP family of transcription factors expressed in the liver,
which indicates that the gene may be regulated by inflam-
matory processes as part of an acute phase response [46]. It
should be emphasized that this assignment of regulatory
elements is theoretical and that the final assignment and
further comparison to the regulation of other collectin genes
require functional studies as well.
Acknowledgements
This work was supported by the Alfred Benzon
Foundation, Frode and Norma Jacobsen’s Foundation, and
the Novo Nordisk Foundation.
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–10 9
![Page 10: Genomic and molecular characterization of CL-43 and its proximal promoter](https://reader031.vdocuments.pub/reader031/viewer/2022020606/575075301a28abdd2e984686/html5/thumbnails/10.jpg)
References
[1] S. Hansen, U. Holmskov, Immunobiology 199 (1998) 165–189.
[2] K. Ohtani, Y. Suzuki, S. Eda, T. Kawai, T. Kase, H. Yamazaki, T.
Shimada, H. Keshi, Y. Sakai, A. Fukuoh, T. Sakamoto, N. Waka-
miya, J. Biol. Chem. 274 (1999) 13681–13689.
[3] U. Holmskov, B. Teisner, A.C. Willis, K.B. Reid, J.C. Jensenius, J.
Biol. Chem. 268 (1993) 10120–10125.
[4] A.B. Rothmann, H.D. Mortensen, U. Holmskov, P. Hojrup, Eur. J.
Biochem. 243 (1997) 630–635.
[5] U. Holmskov, J. Leukoc. Biol. 66 (1999) 747–752.
[6] H. Sano, H. Chiba, D. Iwaki, H. Sohma, D.R. Voelker, Y. Kuroki, J.
Biol. Chem. 275 (2000) 22442–22451.
[7] A.M. LeVine, J.A. Whitsett, J.A. Gwozdz, T.R. Richardson, J.H.
Fisher, M.S. Burhans, T.R. Korfhagen, J. Immunol. 165 (2000)
3934–3940.
[8] P. Borron, J.C. McIntosh, T.R. Korfhagen, J.A. Whitsett, J. Taylor,
J.R. Wright, Am. J. Physiol., Lung. Cell. Mol. Physiol. 278 (2000)
L840–L847.
[9] A.M. LeVine, J.A. Whitsett, Microbes Infect. 3 (2001) 161–166.
[10] S. Thiel, T. Vorup Jensen, C.M. Stover, W. Schwaeble, S.B. Laursen,
K. Poulsen, A.C. Willis, P. Eggleton, S. Hansen, U. Holmskov, K.B.
Reid, J.C. Jensenius, Nature 386 (1997) 506–510.
[11] J.P. Bridges, H.W. Davis, M. Damodarasamy, Y. Kuroki, G. Howles,
D.Y. Hui, F. McCormack, J. Biol. Chem. 275 (2000) 38848–38855.
[12] Y. Kuroki, D.R. Voelker, J. Biol. Chem. 269 (1994) 25943–29946.
[13] C. Botas, F. Poulain, J. Akiyama, C. Brown, L. Allen, J. Goerke, J.
Clements, E. Carlson, A.M. Gillespie, C. Epstein, S. Hawgood, Proc.
Natl. Acad. Sci. U. S. A. 95 (1998) 11864–11869.
[14] T.R. Korfhagen, V. Sheftelyevich, M.S. Burhans, M.D. Bruno, G.F.
Ross, S.E. Wert, M.T. Stahlman, A. Jobe, M. Ikegami, J.A. Whitsett,
J.H. Fisher, J. Biol. Chem. 273 (1998) 28438–28443.
[15] J. Madsen, A. Kliem, I. Tornoe, K. Skjodt, C. Koch, U. Holmskov, J.
Immunol. 164 (2000) 5866–5870.
[16] P.C. Reading, U. Holmskov, E.M. Anders, J. Gen. Virol. 79 (Pt 9)
(1998) 2255–2263.
[17] S. Schelenz, R. Malhotra, R.B. Sim, U. Holmskov, G. Bancroft, In-
fect. Immun. 63 (1995) 3360–3366.
[18] R.W. Loveless, U. Holmskov, T. Feizi, Immunology 85 (1995)
651–659.
[19] U. Holmskov, P.B. Fischer, A. Rothmann, P. Hojrup, FEBS Lett. 393
(1996) 314–316.
[20] B.L. Lim, A.C. Willis, K.B. Reid, J. Lu, S.B. Laursen, J.C. Jensenius,
U. Holmskov, J. Biol. Chem. 269 (1994) 11820–11824.
[21] J.E. Womack, J.S. Johnson, E.K. Owens, C.E. Rexroad, J. Schlapfer,
Y.P. Yang, Mamm. Genome 8 (1997) 854–856.
[22] N. Kawasaki, T. Itoh, T. Kawasaki, Biochem. Biophys. Res. Com-
mun. 198 (1994) 597–604.
[23] N. Kawasaki, M. Satonaka, M. Imagawa, H. Naito, T. Kawasaki, J.
Biochem. (Tokyo) 124 (1998) 1188–1197.
[24] K. Quandt, K. Frech, H. Karas, E. Wingender, T. Werner, Nucleic
Acids Res. 23 (1995) 4878–4884.
[25] G. Bain, C. Murre, Semin. Immunol. 10 (1998) 143–153.
[26] O. Hobert, H. Westphal, Trends Genet. 16 (2000) 75–83.
[27] J.D. Molkentin, E.N. Olson, Proc. Natl. Acad. Sci. U. S. A. 93 (1996)
9366–9373.
[28] R. Sekido, K. Murai, J. Funahashi, Y. Kamachi, A. Fujisawa-Sehara,
Y. Nabeshima, H. Kondoh, Mol. Cell. Biol. 14 (1994) 5692–5700.
[29] K. Sogawa, Y. Fujii-Kuriyama, J. Biochem. (Tokyo) 122 (1997)
1075–1079.
[30] V.S. Vallet, M. Casado, A.A. Henrion, D. Bucchini, M. Raymondjean,
A. Kahn, S. Vaulont, J. Biol. Chem. 273 (1998) 20175–20179.
[31] B. Luscher, L.G. Larsson, Oncogene 18 (1999) 2955–2966.
[32] J. Lekstrom-Himes, K.G. Xanthopoulos, J. Biol. Chem. 273 (1998)
2848–28545.
[33] K. Georgopoulos, D.D. Moore, B. Derfler, Science 258 (1992)
808–812.
[34] R. Hromas, S.J. Collins, D. Hickstein, W. Raskind, L.L. Deaven, P.
O’Hara, F.S. Hagen, F.K. Kaushansky, J. Biol. Chem. 266 (1991)
14183–14187.
[35] J. Northrop, S.N. Ho, L. Chen, D.J. Thomas, L.A. Timmerman, G.P.
Nolan, A. Admon, G.R. Crabtree, Nature 369 (1994) 497–502.
[36] H. Muller, K. Helin, Biochim. Biophys. Acta 1470 (2000) M1–M12.
[37] S. Pierrou, M. Hellqvist, L. Samuelsson, S. Enerback, P. Carlsson,
EMBO J. 13 (1994) 5002–5012.
[38] W. Reith, E. Barras, S. Satola, M. Kobr, D. Reinhart, C.H. Sanchez,
B. Mach, Proc. Natl. Acad. Sci. U. S. A. 86 (1989) 4200–4204.
[39] H.M. Prior, M.A. Walter, Mol. Med. 2 (1996) 405–412.
[40] L. Luna, N. Skammelsrud, O. Johnsen, K.J. Abel, B.L. Weber, H.
Prydz, A.B. Kolsto, Genomics 27 (1995) 237–244.
[41] S. Hansen, V. Moeller, D. Holm, L. Vitved, C. Bendixen, K. Skjoedt,
U. Holmskov, Mol. Immunol. 39 (2002) 39–43 .
[42] K.H. Kaestner, Trends Endocrinol. Metab. 11 (2000) 281–285.
[43] Y. He, E.C. Crouch, K. Rust, E. Spaite, S.L. Brody, J. Biol. Chem.
275 (2000) 31051–31060.
[44] B. Brand-Saberi, B. Christ, Cell Tissue Res. 296 (1999) 199–212.
[45] Y. Higashi, H. Moribe, T. Takagi, R. Sekido, K. Kawakami, H.
Kikutani, H.J. Kondoh, Exp. Med. 185 (1997) 1467–1479.
[46] V. Poli, J. Biol. Chem. 273 (1998) 29279–29282.
S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–1010