genomic and molecular characterization of cl-43 and its proximal promoter

Genomic and molecular characterization of CL-43 and

its proximal promoter$

Soren Hansena,1, Dorte Holma, Vivi Moellera, Lars Vitveda, Christian Bendixenb,Karsten Skjoedta, Uffe Holmskova,*

aDepartment of Immunology and Microbiology, University of Southern Denmark, Odense, Winsloewparken 21, DK-5000 Odense C, DenmarkbDepartment of Animal Breeding and Genetics, Research Centre Foulum, DK-8830 Tjele, Denmark

Received 16 May 2002; received in revised form 5 September 2002; accepted 23 September 2002

Abstract

Collectins are part of the innate immune system as they bind nonself glycoconjugates on the surface of microorganisms and inhibit

infection by direct neutralization, agglutination or opsonization of the invaders. Conglutinin and CL-43 are serum proteins that have only

been found and characterized in Bovidae. We have studied molecular and genomic characteristics of CL-43 to identify polymorphisms that

might be associated with disease-susceptible phenotypes or other traits in cattle, and to elucidate how the Bovidae may benefit from

possessing additional collectins. Screening a bovine cDNA library resulted in the isolation of two plasmid clones that encoded the entire

translated sequence of CL-43. The 5V-untranslated end and start point of transcription were identified by 5V-RACE and showed that the

mRNA transcript comprises either 1326 or 1241 nucleotides because of alternative splicing. Both transcripts encode a protein of 321 amino

acids including a signal peptide of 20 residues. Characterization of two overlapping genomic lambda phage clones showed that the gene

comprised seven exons spanning 8.5 kbp. The CL-43 gene, like the conglutinin gene, was mapped to Bos taurus chromosome 28 at q1.8. The

CL-43 promoter has 96% identity with the conglutinin promoter recently described by us, and the assignment of potential cis-regulatory

elements shows that several hepatic transcription factors may regulate transcription in the acute phase response and in response to metabolic

changes.

D 2002 Elsevier Science B.V. All rights reserved.

Keywords: Collectin; CL-43; Innate immunity; Cattle; Comparative immunology/evolution

1. Introduction

CL-43 (43-kDa collectin) is a member of the collectin

family comprising conglutinin, mannan-binding lectin

(MBL), lung surfactant protein A (SP-A), lung surfactant

protein D (SP-D) and CL-L1 (collectin liver 1). The

collectins are made up of C-type carbohydrate recognition

domains attached to collagen-like regions via an alpha-

helical coiled-coil region [1,2]. In contrast with the other

collectins, CL-43 does not form large oligomers but exists

only as single subunits of three polypeptide chains joined by

a fixed pattern of interchain disulfide bridges in the N-

terminus [3,4]. Conglutinin and CL-43 are both humoral

proteins synthesized in the liver, similar to MBL, but have

only been found in Bovidae.

Collectins play a role in the nonadaptive immune de-

fense. They bind via their carbohydrate recognition domain

to nonself glycoconjugates located on the surface of micro-

organisms. This binding may lead directly to neutralization

and aggregation of the microorganisms, or to opsonization

and subsequent phagocytosis by macrophages expressing

different collectin receptors [5,6]. In the lung, opsonization

by the action of SP-A or SP-D decreases the secretion of

pro-inflammatory cytokines and modulates the T-cell

response, protecting the lung from inflammation and

immune mediated damage [7,8]. Mice deficient in SP-A

0167-4781/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved.

PII: S0167 -4781 (02 )00531 -6

Abbreviations: CL-43, 43-kDa collectin; CL-L1, collectin liver 1; MBL,

mannan-binding lectin; SP-A, lung surfactant protein A; SP-D, lung

surfactant protein D$ The nucleotide sequences reported in this paper have been deposited

with GenBank/EMBL data libraries under accession numbers AY071821

and AY071822.

* Corresponding author. Tel.: +45-6550-3775; fax: +45-6591-5267.

E-mail address: [email protected] (U. Holmskov).1 Present address: Department of Cell Biology, Duke University

Medical Center, Durham, NC 27710, USA.

www.bba-direct.com

Biochimica et Biophysica Acta 1625 (2003) 1–10

are susceptible to bacterial as well as viral pulmonary

infections, while mice deficient in SP-D show increased

susceptibility only to viral infections [9]. MBL is the only

collectin which on binding to microorganisms activates

complement through associated serine proteases that cleave

C4 and C2. Complement activation may lead to direct

killing or enhance opsonization by deposition of C3 [10].

Besides binding to carbohydrates, SP-A and SP-D bind

certain phospholipids and protect these and alveolar macro-

phages from oxidative damage [11,12]. This may be impor-

tant for lipid homeostasis, as mice lacking SP-D accumulate

surfactant lipid and show morphologic changes in alveolar

macrophages and type II cells [13,14]. Additional roles for

SP-D outside the lungs are implied by its presence on the

surface of mucosal epithelial cells lining the ducts of several

glands and organs [15].

Despite many attempts to find a human homologue of

conglutinin, no such protein has been identified, and it

appears that conglutinin and CL-43 are limited to the

Bovidae or Ruminantia. Few data have been published on

the biological role of CL-43. CL-43 binds to the rotavirus

Nebraska calf diarrhoea virus, reducing its viral infectivity

[16]. Like other collectins, CL-43 binds to the opportunistic

yeast pathogen, Cryptococcus neoformans in its acapsular

form [17]. It binds to mannan (KD of 2.7� 10�8 M) and

neoglycoproteins with terminal mannose or fucose residues

[18,19].

We have previously partly characterized the cDNA

encoding CL-43 and observed that the serum concentration

of this protein varies among cattle ([20]; unpublished

results). The aim of the present work was to characterize

both CL-43 and conglutinin to obtain additional clues to

their function and possible reasons for their evolutionary

preservation in Bovidae or Ruminantia, as well as informa-

tion that might be generally applicable to closely related

collectins like SP-D. Here we describe full-length cDNA of

CL-43, the complete CL-43 gene sequence with its exon/

intron structure and promoter region, and its chromosomal

localization. Comparison with other collectin genes shows

that CL-43 and conglutinin have diverged from an SP-D-

like protein. The genomic characterization will allow for

analysis of polymorphisms that might lead to low serum

concentrations of CL-43 and thus be important for the

disease susceptibility of cattle.

2. Materials and methods

2.1. Full-length cDNA clones

A CL-43 derived probe, spanning nucleotide 633–1026

of the CL-43 cDNA sequence (GenBank accession number

X75912), was obtained by PCR amplification using the

primers: 5V-GGCCTCCCCACGCTCTTCA-3Vand 5V-CCTTCTGGCCTCATCCTGTGG-3Vwith a previously iso-

lated CL-43 cDNA clone as template [20]. The PCR

consisted of initial denaturation at 94 jC for 4 min, followed

by 30 cycles of 94 jC for 45 s; 56 jC for 30 s; 72 jC for 30

s and a final extension step at 72 jC for 1 min. The product

was purified by means of the Qiaex II gel extraction kit

(Qiagen, Hilden, Germany) and labeled with [32P]-dCTP

using an oligolabeling kit and random primers according to

the procedure recommended by the manufacturer (Amer-

sham Pharmacia Biotech, Piscataway, NJ). Approximately

2.8� 105 plaques of a bovine liver lambda cDNA library

(Stratagene, La Jolla, CA) were screened with the 32P-

labeled probe. The final high stringency wash was carried

out in 0.3� SSC at 55 jC for 20 min. Positive clones were

replated and rescreened twice to uniformity. Inserts were

excised from the lambda ZAP XR vector in vivo and

transformed into Escherichia coli SOLR according to man-

ufacturer’s instructions. Plasmids were purified by means of

the Wizard Plus SV Miniprep purification system (Promega,

Madison, WI) and sequenced.

2.2. 5V-RACE

RNA was isolated from 250 mg bovine liver by

means of the TRI-reagent kit (Sigma, St. Louis, MO),

following manufacturer’s instructions. cDNA was synthe-

sized from 4 Ag of total RNA by means of Superscript II

H�reverse transcriptase (Invitrogen, Groeningen, Nether-

lands) in the presence of the CL-43 specific primer 5V-GCTATCTGCTGGTGGAGC-3V. The cDNA was precipi-

tated twice with ethanol and ammonium acetate and

polycytosine tails were attached to the cDNA transcripts

by using 25 U of terminal transferase (Roche, Mannheim,

Germany) in the presence of 2 mM dCTP. The tailing

reaction was carried out for 30 min at 37 jC and the

terminal transferase was subsequently heat-inactivated.

One-fifth of the tailed cDNA (2 Al) was used as template

in a 90-Al PCR with the CL-43 specific primer 5V-GAGGGTTTTCTCCGAATAGACATC-3 and a poly-G13

primer. The PCR included 2 min initial denaturation at 94

jC, followed by the addition of Taq polymerase and 10

cycles of touch-down amplification starting at 63 jC and

decreasing by 1 jC per cycle. The main amplification

comprised 25 cycles of 94 jC for 45 s, 54 jC for 30 s,

72 jC for 30 s and a final step extension of 3 min at 72

jC. Products were analyzed by agarose gel electrophoresis

and excised products purified as above. The product (10

ng) was ligated into the PCRII-vector and heat-shock

transformed into INVaFVE. coli using the original TA

cloning kit (Invitrogen). Plasmids were purified and

sequenced as above.

2.3. Genomic cloning

The 3V-end of the CL-43 gene was isolated from a bovine

genomic EMBL3 lambda phage library (Clonetech, Palo

Alto, CA) using the same probe as used in the isolation of

cDNA clones. Screening and washing conditions were as

S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–102

above. Phage DNA was isolated, digested with SacI and

analyzed by agarose gel electrophoresis. Fragments not

observed in a similar digestion of empty EMBL3 vector

were purified by means of the Qiaex II gel extraction kit.

The fragments (20–100 ng) were ligated into 25 ng SacI-

digested pBS(ks+)-vector pretreated with calf intestinal

phosphatase-treated pBS(ks+)-vector and were heat shock

transformed into chemically competent XL-10 E. coli.

Plasmids were purified and sequenced as above.

The 5V-end of the gene was isolated from the same library

and under similar conditions with a 281-bp probe derived

from the 5V-untranslated sequence and the sequence encodingthe N-terminal segment. The probe was obtained by PCR-

amplification using the primers 5V-CTGAATGGGCGCTT-TATCCT-3V and 5V-ATGACCACGAAGGCTATCTGCT-3Vwith the newly isolated full-length cDNA clone, CL-43

160, as template. The PCR consisted of 4 min initial denatu-

ration at 94 jC followed by 30 cycles of 94 jC for 45 s; 54 jCfor 30 s; 72 jC for 30 s and a final extension step of 1 min at

72 jC. Subcloning into SacI-linearized plasmids and subse-

quent sequencing of the isolated phage clones were per-

formed as above. To assemble the sequences obtained from

the subclones, four PCRs (P1–P4) overlapping the junctions

were performed using the primers: 5V-CTGAATGGGCG-CTTTATCCT-3V, 5V-ATGACCACGAAGGCTATCTGCT-3V(P1); 5V-CCAGCAGATAGCCTTCGTGGTCAT-3V, 5V-CTGTGATGGGGGTGAGGAGTGAG-3V (P2); 5V-GG-

GCCATCGGTCCACA-3V, 5V-GGCGTTGAACTTCT-

CCCTCTAA-3V (P3); and 5V-TAAGGCAGCGGATGA-

GAAAC-3V, 5V-CTCTGGGCCTTCGTCTTTTG-3V (P4).Purified phage DNA (50 ng) was used as template in 90-Alreactions using Pwo polymerase (Roche) under the condition

recommended by the manufacturer. After 2 min of initial

denaturation, Pwo polymerase was added to the reactions

followed by 10 cycles of touch-down amplification starting

10 jC above the respective annealing temperatures of 54 jC(P1), 60 jC (P2), 53j (P3) and 52 jC (P4) and decreasing by 1

jC per cycle. Main amplification followed, comprising 20

cycles of 94 jC for 45 s, 30 s at the respective annealing

temperatures, 72 jC for 2min (P1)/3min (P2)/30 s (P3)/4min

(P4), followed by a final extension step of 8 min at 72 jC.Products were analysed by agarose gel electrophoresis and

purified from the excised gel plugs as above. Purified

products were inserted into the pCRB4bluntTOPO plasmid

(Invitrogen) according to manufacturer’s instructions and

heat-shock-transformed into E. coli XL-10. Plasmids were

prepared and sequenced as above.

2.4. Chromosomal localization

Genomic DNA (20 ng) of 90 hamster/cow somatic cell

hybrids [21] was used as template in PCRs with the primers:

5V-TGTGGGGCCAGGTATGC-3V and 5V-TCAGTCCA-CACCATCTACATACAT-3V. Amplification comprised a 4

min initial denaturation, 10 cycles of 94 jC for 45 s; 61–52

jC for 30 s; 72 jC for 1 min, decreasing the annealing

temperature by 1 jC per cycle, followed by an amplification

of 25 cycles of 94 jC for 45 s; 51 jC for 30 s; 72 jC for 1

min and a final extension step of 2 min at 72 jC. Productswere analyzed using agarose gel electrophoresis.

2.5. DNA sequencing

PCR products or plasmid preparations were sequenced

with the Prism Ready Reaction BigDyeDeoxy Terminator

sequencing kit (PE Applied Biosystems, Alleroed, Den-

mark) using the recommended conditions. Samples were

subjected to electrophoresis on an ABI prism 310 Genetic

Analyzer and data analyzed with the ABI Prism Software

Version 2.1.1 (PE Applied Biosystems).

3. Results

3.1. Characterization of full-length CL-43 transcript

By partial protein sequencing, isolation of a partial

cDNA clone and PCR, we had previously obtained a CL-

43 cDNA sequence without the 5V-untranslated sequence,

the signal peptide and the initial N-terminal sequence [20].

To characterize the full-length transcript, we screened a liver

cDNA library with a probe encoding the carbohydrate

recognition domain of CL-43 and isolated two different

CL-43 clones, clone 151 and clone 160 (Fig. 1A). The

inserts of these clones encoded the whole translated

sequence and some of the 5V-untranslated sequence. They

differed from each other in that clone 151 lacked 85 bp of

the 5V-untranslated sequence located between the start of

transcription and the translated sequence. Forty base pairs of

additional 5V-untranslated sequence, not found in the cDNA

clones, were characterized by 5V-RACE. The total CL-43

transcript comprised 171 or 86 bp of 5V-untranslatedsequence, 966 bp encoding the protein with intact start

and stop codons, followed by 189 bp of 3V-untranslatedsequence including a complete polyadenylation site and

polyadenylation tail (Fig. 1B). The deduced amino acid

sequence revealed a structure made of a signal peptide of 20

amino acid residues, an N-terminal segment of 28 amino

acid residues, 38 Gly-Xaa-Yaa repeats, a neck region of 31

amino acid residues and a carbohydrate recognition domain

of 128 amino acid residues. In both cDNA clones, we found

a threonine codon (ACC) at position 125 instead of the

alanine codon (GCC) reported by Lim et al. [20]. Another

discrepancy with the previously reported sequence was

found at nucleotide position 390 of the cDNA, where we

found an adenine nucleotide contrary to the reported gua-

nine nucleotide. This discrepancy is silent as it occurs in the

third position of the codon encoding glycine 130 (GGG vs.

GGA). Clone 160 possesses an additional silent variation,

differing from clone 151 and the published sequence, at the

third position of the codon encoding serine 27 (TCT vs.

TCG). The complete cDNA sequence and the deduced

S. Hansen et al. / Biochimica et Biophysica Acta 1625 (2003) 1–10 3

amino acid sequence of CL-43 have been submitted to

GenBank, with accession number AY071821.

3.2. Genomic characterization

Two EMBL3 phage clones (E43-1 and E43-2) were

isolated by screening a bovine genomic lambda library with

probes corresponding to the 5V-end and the 3V-end of the CL-43 transcripts. Subclones S1 (2363 bp) and S2 (1849 bp)

(Fig. 2) were obtained from SacI digestion of clone E43-1,which included the 5V-end of the gene. Subclones S3 (1923

bp) and S4 (1785) were obtained from a similar digestion of

E43-2, which included most of the 3V-end of the gene. PCRs

were carried out to obtain overlapping sequences, and the

products P2 (2127 bp), P3 (350 bp), P4 (2779 bp) were

amplified from phage DNA derived from E43-2. In

parallel, the product P1 (1389 bp) was amplified from

E43-1. There was 100% identity in the 1132-bp overlap

of S2 and P2 derived from separate phage clones, show-

ing that E43-1 and E43-2 were fragments of the same

gene.

We encountered a single discrepancy with the cDNA

sequence at position 1434 of the gene, where a thymine

nucleotide was found. This discrepancy is silent as it occurs

Fig. 1. CL-43 cDNA and deduced amino acid sequence. (A) Strategy for cDNA characterization. The resulting transcript is shown above the 5V-RACE and the

two cDNA clones, clone 160 and 151. The alternatively spliced transcript is shown with a dashed angled line. The previously obtained cDNA clone is

represented below. (B) cDNA and deduced amino acid sequence. Nucleotides and amino acid residues are numbered in 5Vto 3Vand N-terminal to C-terminal

directions, respectively. The start of translation defines the first amino acid residue (+1) and the numbering of nucleotides. The start codon (ATG) is highlighted

and the polyadenylation site (AATAAA) is underlined. The dashed underlining indicates the alternatively spliced 5V-untranslated sequence.


in the third position of the codon for threonine 30 (ACC vs.

ACT). The inserts of E43-1 and E43-2 were not further

characterized, as subclones and amplified products covered

the gene of 8493 and 1605 bp of the proximal promoter. The

first exon may vary in size (168/83 bp) because of alter-

native splicing of the 5V-untranslated sequence observed on

characterizing the cDNA (Fig. 2C). The second exon

encodes the signal peptide, the N-terminal segment and

the first five Gly-Xaa-Yaa repeats. The rest of the

collagen-like region is encoded by exons 3–5. Exon 6

encodes the alpha-helical neck region and exon 7 encodes

the carbohydrate recognition domain plus 192 bp of 3V-untranslated sequence. The prediction that exon 7 con-

tains the carbohydrate recognition domain with 3V-untrans-

Fig. 2. The CL-43 gene. (A) Chromosomal localization to B. taurus chromosome 28 at position q1.8. The proximate microsatellite is typed in bold. (B) Cloning

strategy. Subcloned fragments (S1–S4) and amplified PCR products (P1–P4) are shown. Numbering is from the first transcribed nucleotide in exon 1. (C)

Genomic organization. Exons are designated U: 5V-untranslated sequence (exon 1); SNC: signal peptide, N-terminal segment and collagen-like region (exon 2);

C: collagen-like region (exon 3–5); a: alpha-helical neck region (exon 6); and CRD/U: carbohydrate recognition domain and 3V-untranslated sequence (exon

7). Numbers indicate the size of exons and introns in bp. (D) Exon and intron boundaries. Numbering is from the start of transcription and capital letters show

transcribed nucleotides. A potential TATAA box in exon 1 and polyadenylation signal AATAAA in exon 7 are underlined with solid lines. The dashed

underlining in exon 1 indicates the alternatively spliced 5V-untranslated sequence. Complete sequences of both exons and introns (not shown) have been

submitted to GenBank, with accession number AY071822.


lated sequence is based on phylogenetic conservation of the

carbohydrate recognition domain and 3V-untranslatedsequence as a single exon in all collectin genes characterized

so far. All exon/intron boundaries obliged the GT–AG rule

defining donor and acceptor sites for splicing out introns

(Fig. 2D). Sequences of the entire gene including introns

have been submitted to GenBank, with accession number

AY071822.

3.3. Chromosomal localization of CL-43

DNA isolated from hamster/cow somatic hybrids was

screened by PCR-amplification of a region of intron 6. The

specificity of the PCR was previously tested using bovine

genomic DNA as template under the conditions given in

materials and methods. A single product (1111 bp) was

amplified from genomic DNA and sequencing verified that

it was part of the CL-43 gene (not shown). The screening of

the hybrid panel showed 21 hybrids out of a total of 90

hybrids contained the CL-43 gene. No products were

observed when using DNA isolated from CHO cells. On

basis of the previous characterizations of the panel, the CL-

43 gene was linked with a LOD score of 7 to the micro-

satellite marker ILST099 located on chromosome 28 at the

distal position q1.8 (Fig. 2A) [21].

3.4. Assigning potential cis-regulatory elements to the

proximal promoter

The proximal promoter region of CL-43 shows strikingly

high homology (99.7%) with the initially published pro-

moter sequence of conglutinin [22,23], with a difference of

only two nucleotides in the 742-bp known upstream

sequence of the conglutinin gene (not shown; will be

discussed later). A TATAA box and a CAAT box were

found at �28 and �78, respectively, in close vicinity of the

transcription start site, defined as position + 1 (Fig. 3). The

first 1000 bp of the upstream sequence was analyzed for cis-

regulatory elements using the MatInspector (Biological

Databases, Braunschweig, Germany) program and the

Transfac 5.0 database [24]. Regulatory elements were

selected with stringent search criteria requiring 100% core

factor identity and a matrix identity of more than 90%

(Table 1). Several AP-1 sites and NF-1 elements were

found, with a distinct cluster of AP-1 sites located in

upstream proximity to the CAAT element (�78 to�160).

Fig. 3. Promoter region and potential regulatory elements of the CL-43 gene. Numbering is from the start of transcription, showed with an angled arrow. The

proximate TATAA-box and CAAT-box are underlined and putative binding sites for transcription factors are boxed according to their position and consensus

sequences found in Table 1. Arrows above the boxes indicate the orientation of the binding site.


Table 1

Potential cis-regulatory elements (based on matrix analysis)

Transcription factor Matrix (#/1000 bp) Consensus CL-43 sequence Relative fit Position

AP-1 AP1_Q2 (1.82) RSTGACTNMNW ATTGACAAATT 0.908 �78(�)GTTGACTCAGT 0.970 �96(�)ATTGACTGAGT 0.952 �100(+)CATGACTCAAA 0.930 �150(+)TCTGACTCTTT 0.901 �613(�)

AP4_Q5 (2.48) NNCAGCTGNN ATCAGCGGCA 0.914 �137(+)CCCAGCTCTT 0.916 �419(+)GGCAGCTCCC 0.909 �426(+)CTCAGCAGCC 0.923 �470(+)GGCAGCTCCC 0.901 �508(+)ATCAGCAGCT 0.901 �991(+)

AP4_Q6 (0.5) CWCAGCTGGN CACAGCAGGT 0.905 �361(+)AP1FJ_Q2 (2.5) RSTGACTNMNW AGTGACCCCCTG 0.930 �524(+)

TGTGACCCCGT 0.939 �623(�)ARNT ARNT_01 (0.69) NNNNNCACGTGNNNNN AAAGGCACGTGGAGGG 0.957 �563(+)

CCCTCCACGTGCCTTT 0.914 �563(�)cEBP CEBP_01 (2.1) RNRTKNNGMAAKNN TGATTGTGCAACTT 0.948 �297(+)

TGCTTTTGCAAGAA 0.931 �318(+)GACTTGAGCAACTG 0.928 �669(+)ACCYGATGCAAAAA 0.907 �869(+)

c-Myb CMYB_01 (2.0) NNNNNNGNCNGTTGNN CCCACAGGCTGTTGCC 0.940 �477(+)AGTGACCCCTGTTGGG 0.915 �517(+)

c-Myc MYCMAX_02 (1.7) NANCACGTGNNW CTCCACGTGCCT 0.923 �565(+)deltaEBF1 DELTAEF1_01 (2.4) NNNCACCTNAN CCCCACCTGGA 0.969 �115(�)

GAGCACCTTCA 0.945 �180(�)CAGCACCTGCT 0.943 �357(�)GGCCACCTGAT 0.961 �876(+)

E2F E2F_02 (0.13) TTTSGCGC TTTGCCGC 0.928 �135(�)E47 E47_01(0.11) NSNGCAGGTGKNCNN ACAGCAGGTGCTGTT 0.921 �355(+)FREAC-7 FREAC7_01 (0.52) WNNANATAAAYANNNN CAGGTATAAATACTCA 0.910 �21(+)IK-2 IK2_01 (4.0) NNNYGGGAWNNN AGCTGGGAGCTG 0.911 �422(�)

CTGAGGGAAGCC 0.939 �650(+)CTCCGGGAATTG 0.935 �739(+)CTTGGGGATCTG 0.906 �981(+)

Lmo-2 LMO2COM_01 (1.1) SNNCAGGTGNNN ATCCAGGTGGGG 0.914 �137(+)CAGCAGGTGCTG 0.990 �357(+)CATCAGGTGGCC 0.958 �857(�)

Max MAX_01 (0.11) NNANCACGTGNTNN AAGGCACGTGGAGG 0.925 �564(+)MYCMAX_02 (1.7) NANCACGTGNNW CTCCACGTGCCT 0.923 �565(�)

MyoD MYOD_Q6 (1.0) NNCANCTGNY CCCACCTGGA 0.910 �116(�)AGCACCTGCT 0.963 �358(�)GCCACCTGAT 0.952 �876(+)

SRACAGGTGKYG CAGCAGGTGCTG 0.963 �357(+)CATCAGGTGGCC 0.917 �875(�)

MZF-1 MZF1_01 (3.8) NGNGGGGA CTGGGGGA 0.956 �504(�)ATGGGGGA 0.966 �774(�)CTTGGGGA 0.965 �985(+)

NF-1 NF1_06 (4.1) NNTTGGCNNNNNNCCNNN TATTGGCTGAGGAACTTC 0.929 �2(+)TTTTGGCTTGTTGACTCA 0.942 �87(�)CTTTGGCCACCTGATGCA 0.910 �873(+)

NFAT NFAT_06 (1.9) NNNWGGAAAANN CTCTGGAAACTG 0.908 �78(�)CGATGGAAATGA 0.939 �761(+)AGGAGGAAAATG 0.974 �812(+)TTCTGGAAAAGA 0.956 �832(+)

NFY NFY_06 (0.70) TRRCCAATSRN CAGCCAATATT 0.912 �11(�)TCACCAATTCC 0.910 �735(�)

N-Myc NMYC_01 (1.6) NNNCACGTGNNN AGGCACGTGGAG 0.959 �565(+)CTCCACGTGCCT 0.985 �565(�)

RFX1 RFX1_02 (0.95) NNGTNRCNNNRGYAACNN GGGCTGCTGAGGCAACAG 0.933 �469(�)Sox-5 SOX5_01 (1.1) NNAACAATNN ACAACAATTC 0.982 �583(+)SRY SR_02 (3.9) NWWAACAAWANN AACAACAATTCT 0.911 �582(+)TCF11 TCF11_01 (4.6) GTCATNNWNNNNN GTCATGAACATCA 0.969 �143(+)USF USF_01 (1.2) GYCACGTGNC AAGGCACGTGAGGG 0.982 �564(+)

CCTCCACGTGCCTT 0.982 �564(�)

Site were selected by the principles of Quandt et al. with a core factor identity of 100% and a relative fit of more than 0.9. The matrixes and consensus motifs are from the transfac 5.0

database (Biological Databases) and the cores of the consensus matrixes are underlined. The number in brackets after each matrix is the number of times it is likely to be found in a

random sequence of 1000 nucleotides and is thus inversely related to its selective quality.


A cluster of potential elements for the transcription factors:

AP-1, E47 (product of the E2A gene), Lmo-2 (Lim domain

factor 2), MyoD (myogenic determination factor) and del-

taEF1 (delta-crystalin enhancer binding factor 1) were

identified at position �355 to �361 [23–28]. The canonical

E-box (GAGCTC) and flanking sequence at�563 to �578may serve as binding sites for the factors Arnt (nuclear

transporter for the aryl hydrocarbon receptor), c-Myc (cel-

lular Myc factor), USF (upstream stimulatory factor), N-

Myc (activated Myc factor in neuroblastoma) and Max

(binding partner of Myc family factors) [29–31]. A third

and dense cluster of potential cis-element for the factors NF-

1, deltaEF1, MyoD, C/EBP (CCAAT enhancer binding

protein), c-Myb (cellular Myb factor), Lmo-2, NFY (nuclear

factor Y) and USF is located further upstream at position

�869 to �890 [32]. The analyzed sequence also contained

multiple binding sites for C/EBP, deltaEF1, Ik-2 (Ikaros 2),

MZF-1 (myeloid zinc finger 1) and NFAT, whereas single

elements were observed for the factors E2F (E2 promoter

binding factor), Freac-7 (forkhead related activator 7),

RFX1 (regulator factor X 1), Sox-5 (SRY box binding

factor 5), SRY (SRY factor) and TCF11 (transcription factor

11) [33–40].

4. Discussion

The previously published cDNA sequence of CL-43

provided no information on the 5V-untranslated sequence,

the signal peptide and the sequence encoding the first eight

amino acid residues of the N-terminal segment [20]. In the

present work, we isolated two cDNA clones that both

encoded all of the entire translated sequence together with

some of the 5V-untranslated sequence. The two clones

differed with respect to the length of 5V-untranslatedsequence, as alternative splicing of mRNA resulted in 5V-untranslated sequence of either 171 or 86 bp.

In relation to the exon/intron structure, the alternative

splicing leads to a varying size of the first exon, which has

either 168 or 83 bp, and in the first intron, which has either

1167 or 1252 bp. Both transcripts obliged the GT–AT rule

for splicing out the first intron. Transcription of conglutinin

shows a similar splicing pattern and the first exon of the

conglutinin gene also shows a high degree of homology

(98.2%) with that of the CL-43 gene [22,23].

In the sequences obtained from our cDNA clones and

genomic subclones, we encountered four discrepancies with

the previously published sequence. Three were silent and at

least two reflected polymorphisms, as we found different

codons in the different cDNA clones and sequences derived

from the gene (serine 27 and threonine 30). The productive

discrepancy found in the codon encoding amino acid

residue number 125 (threonine vs. alanine) was seen in all

the analyzed sequences. This residue occurs in the collagen-

like region and the substitution probably has no impact on

the structure of the collagen helix. An additional discrep-

ancy to the CL-43 GenBank sequence accession number

X75912 was seen at asparagine 286, but this merely reflects

a mistaken report to GenBank, as our observation agrees

with original published sequence [20].

The structure of the CL-43 gene resembles that of the

genes encoding conglutinin and human SP-D (Fig. 4A),

although CL-43 lacks an exon corresponding to the second

exon of the conglutinin gene, which encodes additional 5V-untranslated sequence (Fig. 4A). Two of collagen-coding

exons of the CL-43 and conglutinin genes, as well as that of

the predicted bovine SP-D gene, appear to share a unique

size of 108 bp. This finding, together with the phylogenetic

comparison of the carbohydrate recognition domains, makes

it likely that the ancestral conglutinin/CL-43 molecule

Fig. 4. Phylogenetic analysis and comparison of gene structure. (A)

Structural comparison of collectin genes. *The exon/intron structures of

porcine SP-D and bovine SP-D were predicted from the their cDNA

sequence and comparison with previously characterized collectin genes.

The CL-L1 gene was drawn on the basis of its cDNA sequence and the

human genomic sequence reported in GenBank accession numbers

AC080033 and AC023487. Exons are designated U: 5V-untranslatedsequence; SNC: signal peptide, N-terminal segment and collagen like

region; C: collagen-like region; a: alpha-helical neck region; and CRD/U:

carbohydrate recognition domain and 3V-untranslated sequence. (B)

Phylogenetic tree based on sequence alignment of carbohydrate recognition

domains. Sequences were aligned using the Clustal V method, applying

likelihood of branching orders from an ancestral sequence, and the tree was

created by the neighbourhood joining method. The estimated time scale

refers to million years (myr).


evolved by duplication of the bovine SP-D gene after the

Bovidae separated from other mammals (Fig. 4B). A second

duplication of the gene for the ancestral conglutinin/CL-43

molecule then permitted conglutinin and CL-43 to diverge.

During or following this process, CL-43 lost the exons

encoding additional 5V-untranslated sequence and the initial

collagen-coding exon, while the size of the third collagen-

coding exon decreased from 108 to 72 bp. Aligning the

corresponding conglutinin exons with the CL-43 gene

reveals that the GT–AG splicing rule would have been

violated by splicing in the missing 5V-untranslated exon and

if the 72 bp collagen-coding exon was extended to 108 bp.

We could not align the missing collagen-coding exon, which

is completely deleted in the gene as shown by the fact that

there are no open reading frames encoding Gly-Xaa-Yaa

repeats in intron 2 (not shown).

Why cattle should have evolved additional SP-D-like

collectins after diverging from other mammals is an inter-

esting question. Because of the reported extrapulmonary

localization of SP-D to tissues of the digestive system, we

think it is likely that the function of CL-43 and conglutinin

may be associated with rumination [15]. Rumination relies

on symbiosis with microorganisms and SP-D, conglutinin

and CL-43 may be crucial factors in the first line of defense

against invading microorganisms in the gut. If the anti-

inflammatory action that SP-D shows in the lung also

applies to the gut, these SP-D like proteins could help

prevent tissue damage from inflammatory processes [7].

Screening the cow/hamster hybrid panel showed that the

CL-43 gene localizes to chromosome 28 in proximity to a

microsatellite at the distal position q1.8. The gene encoding

conglutinin localizes to the same position. Like the human

collectin gene cluster at 10q21.1 to 21.4, a bovine gene

cluster appears to exist on Bos taurus chromosome 28, a

phenomenon that is also supported by comparative mapping

(not shown).

Because of the strikingly high homology (99.7%) of the

CL-43 promoter here described with the initially published

conglutinin promoter, which differs in only 2 out of 742 bp,

we recharacterized the conglutinin gene and found that the

promoter region did not correspond to the published

sequence [22,23,41]. Thus, the previously characterized

conglutinin promoter represents the promoter region of the

CL-43 gene, and the confusion probably resulted from a

PCR-based cloning-strategy and the high degree of homol-

ogy between the two genes [40]. The newly characterized

conglutinin promoter differs from the CL-43 promoter in 27

of 742 bp (96% identity).

Functional studies on the previously characterized con-

glutinin promoter, which should now be regarded as apply-

ing to the CL-43 promoter, showed that hepatic transcription

factors bound to the AP-1 site at �150 and elements at

positions �115 to �143 and �167 to �180 [23]. The

unidentified factor which bound at �167 to �180 was

termed K-factor and it had, combined with an AP-1 site

(�150), a positive synergistic effect on transcription. The

synergy of the two sites depended critically on the distance

that separated them. As the CL-43 promoter shows 100%

identity to previously published conglutinin sequence in this

region (�180 to �150), a similar control of CL-43 tran-

scription may be expected. One of the two nucleotides that

differ in the 742 bp of the two promoters is located at

position�139 and may influence the binding of unidentified

factors in the region�115 to �143. However, the differencedoes not change the potential rare motif for the transcription

factor E2F (�135), but may have a minor impact on the

expected affinity of the AP-1 site (�137). Using less-

stringent search criteria for potential responsive elements

in the K-element and its flanking sequences reveals that the

region may serve as a binding site for HNF3beta (hepato-

cyte nuclear factor 3 beta), also known as Foxa2, with a

core-factor identity of 86% and a total fit of 77% [42].

Foxa2 is highly expressed in hepatocytes and a key factor

for the expression of a variety of liver specific proteins. It

belongs to the family of transcription factors known as

forkhead proteins or winged helix proteins, and it was

recently shown that the human SP-D promoter included a

similar site that is important for transcription of the gene

[43]. Matrix analysis of the SP-D promoter shows that this

site has a lower score than that of the potential Foxa2 site in

the K-element of the CL-43 promoter (not shown).

The roles of the potential cis-elements (�355 to �361)for the factors E47, Lmo-2, MyoD and deltaEF1 are ques-

tionable, as these factors are important regulators of differ-

entiation of B-lymphocytes and T-lymphocytes (E47, Lmo-

2 and deltaEF1) or muscle-specific activation (MyoD)

[25,44,45]. It is nevertheless possible that these sites serve

as binding sites for transcription factors that repress tran-

scription of the gene in tissues other than the liver. Potential

binding sites for the factors Arnt and USF are found in the

second cluster of potential elements (�563 to �578). SinceArnt and USF are found in hepatocytes and play key roles in

the adaptive metabolic responses to polycyclic aromatic

hydrocarbons and glucose/insulin, respectively, it is possible

that CL-43 expression might also be influenced by the

energy metabolism. The two adjacent binding sites for C/

EBP (�297, �318), initially known as IL6-REs, may serve

as binding sites for homo/hetero dimers of members of C/

EBP family of transcription factors expressed in the liver,

which indicates that the gene may be regulated by inflam-

matory processes as part of an acute phase response [46]. It

should be emphasized that this assignment of regulatory

elements is theoretical and that the final assignment and

further comparison to the regulation of other collectin genes

require functional studies as well.

Acknowledgements

This work was supported by the Alfred Benzon

Foundation, Frode and Norma Jacobsen’s Foundation, and

the Novo Nordisk Foundation.


References

[1] S. Hansen, U. Holmskov, Immunobiology 199 (1998) 165–189.

[2] K. Ohtani, Y. Suzuki, S. Eda, T. Kawai, T. Kase, H. Yamazaki, T.

Shimada, H. Keshi, Y. Sakai, A. Fukuoh, T. Sakamoto, N. Waka-

miya, J. Biol. Chem. 274 (1999) 13681–13689.

[3] U. Holmskov, B. Teisner, A.C. Willis, K.B. Reid, J.C. Jensenius, J.

Biol. Chem. 268 (1993) 10120–10125.

[4] A.B. Rothmann, H.D. Mortensen, U. Holmskov, P. Hojrup, Eur. J.

Biochem. 243 (1997) 630–635.

[5] U. Holmskov, J. Leukoc. Biol. 66 (1999) 747–752.

[6] H. Sano, H. Chiba, D. Iwaki, H. Sohma, D.R. Voelker, Y. Kuroki, J.

Biol. Chem. 275 (2000) 22442–22451.

[7] A.M. LeVine, J.A. Whitsett, J.A. Gwozdz, T.R. Richardson, J.H.

Fisher, M.S. Burhans, T.R. Korfhagen, J. Immunol. 165 (2000)

3934–3940.

[8] P. Borron, J.C. McIntosh, T.R. Korfhagen, J.A. Whitsett, J. Taylor,

J.R. Wright, Am. J. Physiol., Lung. Cell. Mol. Physiol. 278 (2000)

L840–L847.

[9] A.M. LeVine, J.A. Whitsett, Microbes Infect. 3 (2001) 161–166.

[10] S. Thiel, T. Vorup Jensen, C.M. Stover, W. Schwaeble, S.B. Laursen,

K. Poulsen, A.C. Willis, P. Eggleton, S. Hansen, U. Holmskov, K.B.

Reid, J.C. Jensenius, Nature 386 (1997) 506–510.

[11] J.P. Bridges, H.W. Davis, M. Damodarasamy, Y. Kuroki, G. Howles,

D.Y. Hui, F. McCormack, J. Biol. Chem. 275 (2000) 38848–38855.

[12] Y. Kuroki, D.R. Voelker, J. Biol. Chem. 269 (1994) 25943–29946.

[13] C. Botas, F. Poulain, J. Akiyama, C. Brown, L. Allen, J. Goerke, J.

Clements, E. Carlson, A.M. Gillespie, C. Epstein, S. Hawgood, Proc.

Natl. Acad. Sci. U. S. A. 95 (1998) 11864–11869.

[14] T.R. Korfhagen, V. Sheftelyevich, M.S. Burhans, M.D. Bruno, G.F.

Ross, S.E. Wert, M.T. Stahlman, A. Jobe, M. Ikegami, J.A. Whitsett,

J.H. Fisher, J. Biol. Chem. 273 (1998) 28438–28443.

[15] J. Madsen, A. Kliem, I. Tornoe, K. Skjodt, C. Koch, U. Holmskov, J.

Immunol. 164 (2000) 5866–5870.

[16] P.C. Reading, U. Holmskov, E.M. Anders, J. Gen. Virol. 79 (Pt 9)

(1998) 2255–2263.

[17] S. Schelenz, R. Malhotra, R.B. Sim, U. Holmskov, G. Bancroft, In-

fect. Immun. 63 (1995) 3360–3366.

[18] R.W. Loveless, U. Holmskov, T. Feizi, Immunology 85 (1995)

651–659.

[19] U. Holmskov, P.B. Fischer, A. Rothmann, P. Hojrup, FEBS Lett. 393

(1996) 314–316.

[20] B.L. Lim, A.C. Willis, K.B. Reid, J. Lu, S.B. Laursen, J.C. Jensenius,

U. Holmskov, J. Biol. Chem. 269 (1994) 11820–11824.

[21] J.E. Womack, J.S. Johnson, E.K. Owens, C.E. Rexroad, J. Schlapfer,

Y.P. Yang, Mamm. Genome 8 (1997) 854–856.

[22] N. Kawasaki, T. Itoh, T. Kawasaki, Biochem. Biophys. Res. Com-

mun. 198 (1994) 597–604.

[23] N. Kawasaki, M. Satonaka, M. Imagawa, H. Naito, T. Kawasaki, J.

Biochem. (Tokyo) 124 (1998) 1188–1197.

[24] K. Quandt, K. Frech, H. Karas, E. Wingender, T. Werner, Nucleic

Acids Res. 23 (1995) 4878–4884.

[25] G. Bain, C. Murre, Semin. Immunol. 10 (1998) 143–153.

[26] O. Hobert, H. Westphal, Trends Genet. 16 (2000) 75–83.

[27] J.D. Molkentin, E.N. Olson, Proc. Natl. Acad. Sci. U. S. A. 93 (1996)

9366–9373.

[28] R. Sekido, K. Murai, J. Funahashi, Y. Kamachi, A. Fujisawa-Sehara,

Y. Nabeshima, H. Kondoh, Mol. Cell. Biol. 14 (1994) 5692–5700.

[29] K. Sogawa, Y. Fujii-Kuriyama, J. Biochem. (Tokyo) 122 (1997)

1075–1079.

[30] V.S. Vallet, M. Casado, A.A. Henrion, D. Bucchini, M. Raymondjean,

A. Kahn, S. Vaulont, J. Biol. Chem. 273 (1998) 20175–20179.

[31] B. Luscher, L.G. Larsson, Oncogene 18 (1999) 2955–2966.

[32] J. Lekstrom-Himes, K.G. Xanthopoulos, J. Biol. Chem. 273 (1998)

2848–28545.

[33] K. Georgopoulos, D.D. Moore, B. Derfler, Science 258 (1992)

808–812.

[34] R. Hromas, S.J. Collins, D. Hickstein, W. Raskind, L.L. Deaven, P.

O’Hara, F.S. Hagen, F.K. Kaushansky, J. Biol. Chem. 266 (1991)

14183–14187.

[35] J. Northrop, S.N. Ho, L. Chen, D.J. Thomas, L.A. Timmerman, G.P.

Nolan, A. Admon, G.R. Crabtree, Nature 369 (1994) 497–502.

[36] H. Muller, K. Helin, Biochim. Biophys. Acta 1470 (2000) M1–M12.

[37] S. Pierrou, M. Hellqvist, L. Samuelsson, S. Enerback, P. Carlsson,

EMBO J. 13 (1994) 5002–5012.

[38] W. Reith, E. Barras, S. Satola, M. Kobr, D. Reinhart, C.H. Sanchez,

B. Mach, Proc. Natl. Acad. Sci. U. S. A. 86 (1989) 4200–4204.

[39] H.M. Prior, M.A. Walter, Mol. Med. 2 (1996) 405–412.

[40] L. Luna, N. Skammelsrud, O. Johnsen, K.J. Abel, B.L. Weber, H.

Prydz, A.B. Kolsto, Genomics 27 (1995) 237–244.

[41] S. Hansen, V. Moeller, D. Holm, L. Vitved, C. Bendixen, K. Skjoedt,

U. Holmskov, Mol. Immunol. 39 (2002) 39–43 .

[42] K.H. Kaestner, Trends Endocrinol. Metab. 11 (2000) 281–285.

[43] Y. He, E.C. Crouch, K. Rust, E. Spaite, S.L. Brody, J. Biol. Chem.

275 (2000) 31051–31060.

[44] B. Brand-Saberi, B. Christ, Cell Tissue Res. 296 (1999) 199–212.

[45] Y. Higashi, H. Moribe, T. Takagi, R. Sekido, K. Kawakami, H.

Kikutani, H.J. Kondoh, Exp. Med. 185 (1997) 1467–1479.

[46] V. Poli, J. Biol. Chem. 273 (1998) 29279–29282.


genomic and molecular characterization of cl-43 and its proximal promoter

Documents