laboratoire d’ingénierie des systèmes biologiques et des ... 2020/toulouse/toulouse... ·...
TRANSCRIPT
Laboratoire d’Ingénierie des Systèmes
Biologiques et des Procédés
UMR INSA/CNRS 5504 – UMR INSA/INRA 792
LISBP/INSA – 135 Avenue de Rangueil – 31077 Toulouse cedex 4 (France)
Tél. : + 33 (0) 5 61 55 94 01 – Fax : + 33 (0) 5 61 55 94 00 – Mél : [email protected] www.lisbp.fr
CSC 2020
PhD project: Engineering of oligosaccharide transporters
Supervisor’s Name : Dr Gabrielle Potocki-Veronese
Laboratory: LISBP, Toulouse, France
Project description
Glycan catabolism is a crucial function, both for natural and artificial microbial ecosystems, and for the
functioning of chassis strains used in synthetic biology. In bacteria, glycan utilization pathways involve
complex machineries of glycan sensing, binding, transport and degradation. If carbohydrate active
enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively easy to
characterize, the specificity of transporters is much more difficult to decipher, due to their transmembrane
location, the multiplicity of transport systems in native strains, and the lack of genetic tools for many
species (especially the non-cultured organisms which make up the major part of microbial ecosystems).
However, transporters represent crucial biotechnological tools, and are important determinants of the
metabolic ability of bacteria. In the past few years, the LISBP demonstrated that molecular
characterization of transporters issued from uncultured bacteria (including their transmembrane
components) can be perfomed in E. coli, and developed several new technologies to screen and
characterize their specificity.
This PhD project will aim at engineering the specificity of glycoside transporters previously identified by
the LISBP by functional metagenomics of the human and bovine gut microbiomes. Combinatorial protein
engineering approaches will be used in order to:
- analyze the structure-function relationships of the different proteic elements of transporters involved in
the degradation of host and dietary glycans in gut microbiomes
- design new artificial channels capable of transporting oligosaccharides of complex structures for
synthetic biology
The project is based on the expertise of the team in protein engineering, ultra-high throughput functional
screening, and in structural biology. It targets various applications for synthetic biology, as for the control
of microbial ecosystems functioning, including the human gut microbiota in which the glycan-mediated
interrelationships between bacteria and the host play key roles for human health.
10.1101/gr.108332.110Access the most recent version at doi: 2010 20: 1605-1612 originally published online September 14, 2010Genome Res.
Lena Tasse, Juliette Bercovici, Sandra Pizzut-Serin, et al. dietary fiber catabolic enzymesFunctional metagenomics to mine the human gut microbiome for
MaterialSupplemental http://genome.cshlp.org/content/suppl/2010/08/09/gr.108332.110.DC1.html
Referenceshttp://genome.cshlp.org/content/20/11/1605.full.html#ref-list-1This article cites 53 articles, 22 of which can be accessed free at:
serviceEmail alerting
click heretop right corner of the article orReceive free email alerts when new articles cite this article - sign up in the box at the
http://genome.cshlp.org/subscriptions go to: Genome ResearchTo subscribe to
Copyright © 2010 by Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
Method
Functional metagenomics to mine the human gutmicrobiome for dietary fiber catabolic enzymes
Lena Tasse,1,2,7 Juliette Bercovici,1,2,7 Sandra Pizzut-Serin,1,2 Patrick Robe,3 Julien Tap,4
Christophe Klopp,5 Brandi L. Cantarel,6 Pedro M. Coutinho,6 Bernard Henrissat,6
Marion Leclerc,4 Joel Dore,4 Pierre Monsan,1,2 Magali Remaud-Simeon,1,2
and Gabrielle Potocki-Veronese1,2,8
1Universite de Toulouse, INSA, UPS, INP, LISBP, F-31077 Toulouse, France; 2UMR5504, UMR792 Ingenierie des Systemes Biologiques
et des Procedes, CNRS, INRA, F-31400 Toulouse, France; 3LibraGen S.A., F-31400 Toulouse, France; 4INRA UEPSD, bat 405, Domaine
de Vilvert, F-78352 Jouy en Josas Cedex, France; 5Plateforme Bio-informatique Toulouse Genopole, UBIA INRA, BP 52627, F-31326
Castanet-Tolosan Cedex, France; 6Architecture et Fonction desMacromolecules Biologiques, UMR6098, CNRS, Universites Aix-Marseille
I & II, F-13288 Marseille, France
The human gut microbiome is a complex ecosystem composed mainly of uncultured bacteria. It plays an essential role in
the catabolism of dietary fibers, the part of plant material in our diet that is not metabolized in the upper digestive tract,
because the human genome does not encode adequate carbohydrate active enzymes (CAZymes). We describe a multi-step
functionally based approach to guide the in-depth pyrosequencing of specific regions of the human gut metagenome
encoding the CAZymes involved in dietary fiber breakdown. High-throughput functional screens were first applied to
a library covering 5.4 3 109 bp of metagenomic DNA, allowing the isolation of 310 clones showing beta-glucanase,
hemicellulase, galactanase, amylase, or pectinase activities. Based on the results of refined secondary screens, sequencing
efforts were reduced to 0.84 Mb of nonredundant metagenomic DNA, corresponding to 26 clones that were particularly
efficient for the degradation of raw plant polysaccharides. Seventy-three CAZymes from 35 different families were dis-
covered. This corresponds to a fivefold target-gene enrichment compared to random sequencing of the human gut
metagenome. Thirty-three of these CAZy encoding genes are highly homologous to prevalent genes found in the gut
microbiome of at least 20 individuals for whose metagenomic data are available. Moreover, 18 multigenic clusters encoding
complementary enzyme activities for plant cell wall degradation were also identified. Gene taxonomic assignment is
consistent with horizontal gene transfer events in dominant gut species and provides new insights into the human gut
functional trophic chain.
[Supplemental material is available online at http://www.genome.org. The sequence data from this study have been
submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) under accession nos. GU942928–GU942942 and
GU942944–GU942954.]
The human intestinal microbiome is the dense and complex eco-
system that resides in the distal part of our digestive tract. Its role
in metabolizing dietary constituents (Sonnenburg et al. 2005;
Flint et al. 2008; Ley et al. 2008) and in protecting the host
against pathogens (Rakoff-Nahoumet al. 2004) is crucial to human
health (Macdonald and Monteleone 2005; McGarr et al. 2005;
Manichanh et al. 2006; Turnbaugh and Gordon 2009). It is mainly
composed of commensal bacteria from the Bacteroidetes, Firm-
icutes, Proteobacteria, and Actinobacteria phyla (five), and of sev-
eral archaeal and eukaryotic species.With up to 1012 cells per gram
of feces, the bacterial abundance is estimated to reach 1000 oper-
ational taxonomic units (OTUs) per individual, 70% to 80% of the
most dominant ones being subject-specific (Zoetendal et al. 1998;
Tap et al. 2009). However, only 20% of the bacterial species have
been successfully cultured so far (Eckburg et al. 2005). Large-scale
analyses of genomic and metagenomic sequences have provided
gene catalogs and statistical evidence on protein families involved
in the predominant functions of the human gut microbiome (Gill
et al. 2006; Kurokawa et al. 2007; Flint et al. 2008; Turnbaugh et al.
2009; Qin et al. 2010), among which the catabolism of dietary fi-
bers is of particular interest in humannutrition andhealth. Dietary
fibers are the components of vegetables, cereals, leguminous seeds,
and fruits that are not digested in the stomach or in the small in-
testine, but are fermented in the colon by the gut microbiome
and/or excreted in feces (Grabitske and Slavin 2008). Chemically,
dietary fibers are mainly composed of complex plant cell wall
polysaccharides and their associated lignin (Selvendran 1984),
along with storage polysaccharides such as fructans and resistant
starch (Institute of Medicine 2005). Dietary fibers have been
identified as a strong positive dietary factor in the prevention
of obesity, diabetes, and cardiovascular diseases (World Health
Organization 2003). Because of the wide structural diversity of die-
tary fibers, the human gut bacteria produce a huge panel of car-
bohydrate active enzymes (CAZymes), with widely different sub-
strate specificities, to degrade these compounds intometabolizable
monosaccharides and disaccharides. The functions and the evo-
lutionary relationships of CAZyme-encoding genes of the human
gut microbiome are being extensively studied through functional
and structural genomics investigations (Flint et al. 2008; Lozupone
7These authors contributed equally to this work.8Corresponding author.E-mail [email protected]; fax 33-5-61-55-94-00.Article published online before print. Article and publication date are athttp://www.genome.org/cgi/doi/10.1101/gr.108332.110.
20:1605–1612 Ó 2010 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/10; www.genome.org Genome Research 1605www.genome.org
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
et al. 2008; Mahowald et al. 2009; Martens et al. 2009), which are
nevertheless restricted to cultivated bacterial species. CAZyme di-
versity has also been described in three metagenomics studies fo-
cused on this microbiome (Gill et al. 2006; Turnbaugh et al. 2009,
2010), and these revealed the presence of at least 81 families of
glycoside-hydrolases, making the human gut metagenome one of
the richest source of CAZymes (Li et al. 2009). However, the proof
of function of annotated genes issued from metagenomes still
constitutes a goal for enzyme discovery. This can be addressed by
functional screening of metagenomic libraries, in order to retrieve
genes of interest. Numerous studies have provided conclusive ev-
idence on the potential of such an approach for the identification
of novel glycoside-hydrolases from various ecosystems such as soil
(Rondon et al. 2000; Richardson et al. 2002; Voget et al. 2003; Pang
et al. 2009), lakes (Rees et al. 2003), hot springs (Tang et al. 2006,
2008), rumen (Ferrer et al. 2005; Guo et al. 2008; Liu et al. 2008;
Duan et al. 2009), rabbit (Feng et al. 2007), and insect guts
(Brennan et al. 2004; for review, see Ferrer et al. 2009; Li et al. 2009;
Simon and Daniel 2009; Uchiyama and Miyazaki 2009). In all
cases, the identification of the gene responsible for the screened
activity was carried out by sequencing only a few kilobases of
metagenomic DNA. Collectively these studies have established an
experimental proof of function for 35 glycoside hydrolases (from
eight families) issued from metagenomes (data from the CAZy
database; http://www.cazy.org/), a number that is very small con-
sidering the known CAZy diversity. Here, we examined the po-
tential of high-throughput functional screening of large insert li-
braries to guide in-depth pyrosequencing of specific regions of the
human gut metagenome that encode the enzymatic machinery
involved in dietary fiber catabolism.
Results and Discussion
Function-based strategy to target novel CAZymes
The overall strategy (Fig. 1) relies on the screening of a large meta-
genomic library issued from the feces of a healthy volunteer adult
individual who followed a fiber-rich diet, to easily isolate genes
encoding enzymes that were able to break down raw and mostly
insoluble plant polysaccharides. First, the library was screened at
a throughput of 200,000 clones assayed per week and per activity,
using both commercial and home-made polysaccharides (Supple-
mental Table S1). In the secondary step, all positive clones were
screened again using a panel of 15 raw and chemically modified
polysaccharides of various structures (Supplemental Table S1), to
distinguish different enzyme specificities toward glycosidic link-
ages within clones that were able to degrade the same polysac-
charide in the primary screens. In parallel, enzymepHdependency
and thermostability were assayed. Then, in-depth pyrosequencing
of the metagenomic DNA insert from the most interesting clones
was carried out. To identify the enzymes responsible for plant
polysaccharide breakdown and their microbial origin, sequence
analysis was focused on taxonomic annotation of the DNA inserts
and CAZyme-encoding gene annotation.
Multi-step functional screening
The initial library consisted of 156,000 Escherichia coli fosmid
clones, covering in total 5.463 109 bp of metagenomic DNA, each
clone comprising a 30–40-kb DNA insert. The library was screened
for the ability to hydrolyze five different polysaccharides, namely,
beta-glucan, xylan, beta-(1-4)-galactan, pectin, and amylose. In
total, 704,000 tests were performed, and 310 positive clones were
obtained. Hit frequency varied from 0.05% to 0.8% (Supplemental
Table S1). No clone degraded more than one of the substrates in-
cluded in the primary screens. Secondary screening results allowed
the clustering of the 310 positive clones on the basis of their ability
to break down various polysaccharide structures (Supplemental
Table S2). One-hundred-and-forty-two clones were able to degrade
only the polysaccharide used in the primary screen, while the
others could also cleave polysaccharides carrying modifications in
the main chain and in the various side chains. Besides, the en-
zymes’ ability to work at extreme pH and high temperature was
investigated for their potential use in industrial process. Enzyme
stability is related to tight protein structural features, and not only
to the thermotolerance of the organism they are issued from. Here,
eight of the 310 positive clones maintained enzyme activity at
pH 4 and/or 9, and three were still active after a 55°C heat shock.
Even issued from an ecosystem regulated at 37°C, a total of 26
clones were selected from the two screening steps either for their
efficiency of degradation of particularly resistant substrates, like
native heteroxylans, beta-glucans, or resistant starches, and/or
for their stability at various pH values or high temperatures. The
percentage of clones being sequenced was thus not related to hit
frequency.
Pyrosequencing and gene prediction
The third step of our work consisted in pyrosequencing the inserts
from the 26 selected positive clones. Read assembly resulted in 27
large contigs obtained with a mean coverage sequencing depth of
443. Two large contigs were found for clone 4. Surprisingly, three
cases of partial sequence redundancy occurred for beta-glucanase,
xylanase, and galactanase active clones, respectively. Excluding
the vector sequences, these 27 large contigs, sizing between 8.3
and 43.8 kb, included 843,256 nt of nonredundant metagenomic
DNA. The high sequencing depth allowed accurate gene pre-
diction, gene organization, and taxonomic assignment. The to-
tal number of predicted genes sizing at least 60 nt was 665 (622Figure 1. Overall strategy based on the use of multi-step functionalscreens for gene discovery from metagenomic sequences.
1606 Genome Researchwww.genome.org
Tasse et al.
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
complete genes). Among the 622 complete protein sequences
reported here, 349 were assigned to clusters of orthologous groups
of proteins (COGs). The distribution pattern of COG-assigned
proteins (Fig. 2; Supplemental Table S4) highlights the dominance
of theG cluster, corresponding to proteins predicted to be involved
in carbohydrate transport and metabolism. The G cluster size was
found to contain 23% of COG-assigned proteins, which is drasti-
cally higher than what was previously obtained from random se-
quencing of the human gut metagenome (Kurokawa et al. 2007;
Turnbaugh et al. 2009; Qin et al. 2010). This demonstrates the
power of the functional screening steps to isolate large meta-
genomic DNA fragments that are enriched in genes encoding the
enzymatic machinery for dietary fiber digestion.
Taxonomic assignment of metagenomic DNA
To obtain new insights into the relationships existing between
bacteria taxonomy and their role in fiber metabolization, the
bacterial origin of the metagenomic DNA inserts was predicted on
the basis of sequence homology with the protein sequences con-
tained in the nonredundant (NR) protein sequence database of the
NCBI. The amount of assignable and unassignable metagenomic
DNA fragments is biased by the number of bacterial genome se-
quences present in the NR database, and it is related to the highly
stringent criteria (Kurokawa et al. 2007) that we used to avoid false
taxonomic assignment. For all clones, themetagenomic sequences
contained some genes encoding proteins without any high se-
quence identity with any known proteins (Supplemental Fig. S1).
We thus conclude that they originate frommicroorganisms whose
genome sequence is not (or not yet) available. Moreover, using the
chosen criteria, 13 large contigs were nonassignable, one was as-
signed to a bacterial order, seven were assigned to one bacterial
genus, and six at a bacterial species level (Fig. 3). Among them,
nine corresponded to bacteria from the Bacteroidetes phylum and
five to Gram-positive bacteria. This indicates that a significant
number of genes originating from these bacteria were successfully
expressed and produced functional proteins, even if some ex-
pression bias probably occurred by using E. coli as the recombinant
host for functional screening (Gabor et al. 2004; Chen et al. 2007).
Indeed, it appears that some genes that were correctly expressed in
E. coli (based on the transposon mutagenesis results) were located
up to 30 kb from any possible upstream vector-borne promoters.
These genes came, among others, from contigs assigned to Bac-
teroides (i.e., prot ID ADD61481, clone 14, 30 kb; ADD61507, clone
16, 14 kb) and theGram-positive Eubacterium (ADD61840, clone 3,
20 kb) (Supplemental Table S3). In the E. coli host, transcription
of these genes was probably initiated from the native Bacteroides
and Eubacterium promoters.
Additionally, we compared the taxonomic assignment of
contigs with that of the total metagenomic DNA used for con-
structing the library (based on 4530 16S rDNA gene sequences)
(Supplemental Fig. S2). The total bacterial diversity of the origi-
nating sample, estimated by Chao index on 16S rDNA library data
sets (Supplemental Fig. S3), is consistent with the average diversity
in fecal samples from healthy individuals, cumulatively reaching
9940 OTUs for 17 individuals (Tap et al. 2009). In the initial sam-
ple, the most abundant 16S rDNA sequences were assigned to five
OTUs: two Eubacterium rectale (1207 sequences), Ruminococcus sp.
(710 sequences), Bacteroides sp. (367 sequences), and Ruminococcus
bromii (125 sequences). Surprisingly, none of the bacterial species
assigned to the contigs corresponded to these five OTUs. In addi-
tion, based on 16S rDNA sequencing, some of the metagenomic
fragments originated from species representing <1% of the initial
sample: One 16S rDNA sequence only corresponded to Bacteroides
stercoris, Bacteroides thetaiotaomicron, and Bacteroides uniformis,
while 29 16S rDNA sequences corresponded to Bifidobacterium
longum. Even if some cloning (Temperton et al. 2009) and ex-
pression (Gabor et al. 2004; Chen et al. 2007) biases may have
occurred, and considering only taxonomic assignment to the ge-
nus level, it can be concluded that the present functionally guided
strategy allows the isolation of DNA fragments from bacteria rep-
resenting only a few percent of the dominant gut bacteria (like
Bifidobacteria), provided that one is capable of exploring a suffi-
ciently large sequence space.
Because the frequent occurrence of horizontal gene transfer
(HGT) is thought to help gut bacteria to share their advantages
when facing common challenges (Roberts et al. 2008), taxonomic
assignment based on sequence identity may be inconsistent with
that based on 16S rDNA. It has been shown previously that the
human gut metagenome is rich in conjugative transposons, inte-
grases, and recombinases (Jones and Marchesi 2007; Kurokawa
et al. 2007; Qu et al. 2008). Based on the data available in 2008,
Tamames and Moya (2008) predicted that 1%–2.5% of contigs of
the human gut metagenome contain probable HGT events. More-
over, the analysis of 36 bacterial gut genomes revealed that
CAZyme convergence was largely due to HGT (Lozupone et al.
2008). Here, based on the analysis of only 0.84 Mb of nonredun-
dant metagenomic sequences, we identified 11 genes predicted to
Figure 2. Distribution pattern of COG-assigned proteins. The genes notassignable to any COGs are not shown in this figure. (C) Energy pro-duction and conversion. (D) Cell cycle control, mitosis, and meiosis. (E)Amino acid transport and metabolism. (F) Nucleotide transport andmetabolism. (G) Carbohydrate transport and metabolism. (H) Coenzymetransport and metabolism. (I) Lipid transport and metabolism. (J) Trans-lation. (K) Transcription. (L) Replication, recombination, and repair. (M)Cell wall/membrane biogenesis. (N) Cell motility. (O) Post-translationalmodification, protein turnover, chaperones. (P) Inorganic ion transportand metabolism. (Q) Secondary metabolite biosynthesis, transport, andcatabolism. (R) General function prediction only. (S) Function unknown.(T) Signal transduction mechanisms. (U) Intracellular trafficking and se-cretion. (V) Defense mechanisms. (Z) Cytoskeleton.
Metagenome screening to boost enzyme discovery
Genome Research 1607www.genome.org
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
encode transposases, recombinases, and integrases, assigned to
COG families 3385, 4584, 5433, 3464, 3547, 4973, and 4974 (COG
category L) (Supplemental Table S4). Moreover, in five cases, we
observed a drastic change of DNA taxonomic assignation based on
sequence homology around the gene encoding transposase, inte-
grase, or recombinase (Fig. 4). In the case of clones 2, 11, 12, and
14/15, the first part of the contigs presented a perfect syntenywith
a genomic fragment fromone gut bacterium,while the second part
showed synteny with a fragment of a different gut bacterial ge-
nome. In the case of clone 16, the synteny with the B. uniformis
ATCC 8492 genome is lost for seven genes in the middle of the
contig that are not even highly similar to any B. uniformis ATCC
8492 gene. We thus hypothesize that, as for the other clones
mentioned in Figure 4, such a gene organization results from gene
transfers between bacterial species. For these clone sequences, the
genomic heterogeneity was also confirmed by tetranucleotide
frequency analysis (Supplemental Fig. S4). This provides conclu-
sive evidence of human gut metagenome plasticity. Such a dem-
onstration was rendered possible by the in-depth sequencing of
large metagenomic DNA fragments, which provided both reliable
information about gene organization and the proof that the con-
tigable sequences originated from a single bacterial genome.
Identification and organization of CAZyme-encoding genes
The detection of genes encoding CAZymes, which are responsible
for polysaccharide degradation, was the last step of the strategy
(Fig. 1). A BLAST-based sequence comparison against the CAZy
database identified 73CAZymeproteins, encoded by 65 full-length
and eight truncated genes (SI). Several proteins were multi-
modular, resulting in a total of 86 modules assigned to 35 known
CAZy families (Supplemental Table S3), corresponding mainly
to polysaccharide degrading activities, including 20 glycoside-
hydrolase (GH), seven carbohydrate-esterase (CE), and one poly-
saccharide lyase (PL) families. In order to identify the gene that is
responsible for the detected activity in the primary screens, we
have performed a transposon mutagenesis of the fosmid inserts.
All of the proteins (labeled in Supplemental Table S3) for which an
experimental proof of function is provided, were identified as
CAZymes by using sequence-based analysis. They all contain a
catalytic module belonging to a known GH or CE family, of which
the activity described in the CAZy database is in agreement with
the activity we screened for. We did not obtain any inactivated
clones by transposon mutagenesis of clones 1, 5, 8, and 9. This
indicates that several enzymes encoded by these fosmids may be
involved in the detected activity.
Besides, many CAZymes involved in the breakdown of plant
polysaccharides display a modular structure in which the catalytic
domain carries one or several ancillary domains that can be cata-
lytic, carbohydrate-binding, or of as-yet-unknown function. Four
known families of carbohydrate-binding modules (CBM) and one
fibronectin (FN) module were also found to be associated with
catalytic modules, presumably for the attachment of enzymes
to their substrates. Moreover, five of the 73 identified CAZymes
(marked in Supplemental Table S3) harbored additional modules
with no similarity to any known CAZy family. These families of
Figure 3. CAZy gene clusters for each clone sequence from 1 to 26. Below the clone number is the activity forwhich each clone has been screened. (Blue)CAZy-encoding genes; (yellow) SusD homolog–encoding genes; (green) transport system protein–encoding genes; (purple) other genes. 14/15 showsthe CAZy gene clusters of assembled sequences from these clones. Clones 10 and 11 and clones 17 and 18 have the same CAZy gene clusters; thesesequences are not assembled together. On top of each bar is the taxonomic assignation of the clone when assignable, other clones are nonassigned. (*)Synteny with Roseburia intestinalis L1-82 (1); Bacteroides uniformis ATCC 8492 (2); Bacteroides stercoris ATCC 43183 (3); Bacteroides eggerthiiDSM 20697 (4).
Tasse et al.
1608 Genome Researchwww.genome.org
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
modules of unknown function potentially represent five novel
CAZy families. The precise function of these novel protein mod-
ules will be investigated by rational truncation of the correspond-
ing proteins, in order to identify the catalytic or carbohydrate-
binding function of the modules in question.
Among the 622 complete nonredundant genes, 19% were
predicted to encode a signal peptide. This number increased to
38% when considering only the CAZyme-encoding genes. This is
consistent with the role of these enzymes in vivo in the digestion
of polysaccharide substrates that are impossible to internalize by
bacterial cells. It is probable that most of the CAZymes were not
secreted by E. coli cells used here as the recombinant host. Instead,
CAZyme access to the insoluble polysaccharides of the functional
screens was most likely due to the release of cytoplasmic proteins
by E. coli cell lysis.
As demonstrated by the G COG-cluster enrichment, the pres-
ent function-based strategy was very powerful in focusing the se-
quencing only on metagenomic DNA fragments rich in CAZyme
modules. One module was found every 10 kb, with a fivefold
higher frequency than that observed from random sequencing
(Turnbaugh et al. 2009). The enrichment in catabolic genes can
also be estimated by the glycoside hydrolase/glycosyltransferase
(GH/GT) ratio. The functional screen strategy that we used led to
a GH/GT ratio of 33, much higher than the 1.5 ratio obtained in
the analysis of complete genomes from gut bacteria (Lozupone
et al. 2008) or even the 3.4 ratio within metagenomics short reads
(Turnbaugh et al. 2009). Our strategy for target-gene enrichment in
metagenomes is even more efficient that those based on DNA iso-
lation from enrichment cultures grown on polysaccharides (Grant
et al. 2004) or on labeling DNA through stable isotope probing
(Kalyuzhnaya et al. 2008).
The study of the organization of CAZyme-encoding genes
identified here is of particular interest. Among the 73 CAZyme-
encoding genes, 48 were found to constitute 18 multigenic clus-
ters, possibly representing operon-like systems including other
genes involved in carbohydrate transport and/or binding like SusD
homologs and putative proteins from the TonB-dependant re-
ceptor family (Fig. 3; Martens et al. 2009). In five cases, a striking
synteny was obtained with similar gene clusters from genomes of
gastrointestinal tract bacteria, for which the biochemical proof of
function has never been described to our knowledge. For the first
time using a screening-basedmetagenomics approach, we describe
CAZyme gene clusters involved in dietary fiber catabolism by the
human gut microbiome.
Interestingly, the distribution of CAZyme gene clusters and
the number of CAZymemodules and families were highly variable
among the clones and found to depend on their activities. Indeed,
metagenomic DNA inserts from clones able to degrade starch,
contained only one to three CAZyme modules corresponding
mainly to family GH13. In comparison, the DNA fragments in-
serted in clones able to degrade beta-glucans and xylan contained
up to 17 CAZymemodules corresponding to 13 different CAZyme
families. All the functions of these CAZyme modules (cellulases,
hemicellulases, carbohydrate-esterases, and associated carbohy-
drate-binding modules) are required in vivo for the complete
degradation of plant cell wall polysaccharides, whose structures are
muchmore complex than that of starch. These operon-like clusters
probably reflect the adaptation of the genetic potential of gut
bacteria to the degradation of highly complex polysaccharide
structures.
Finally, in order to assess how prevalent the genes we iden-
tified are among the gut microbiomes worldwide, we compared
our data to the metagenome sequences currently available, issued
from 124 European (Qin et al. 2010), 13 Japanese (Kurokawa et al.
2007), and 46 U.S. individuals (Gill et al. 2006; Turnbaugh et al.
2009, 2010). None of the genes we identified in our contigs was
found in the U.S. and Japanese individual data sets. This was
probably because we used highly stringent criteria for searching
similarities with our full-length protein sequences (E-value = 0;
identity $ 90%), in order to avoid any overestimation of the
gene prevalence. In contrast, when comparing our data to the 3.3-
million-gene catalog obtained from the European individuals, we
identified 154 highly prevalent genes, detected in 20 individuals
or more (identity $ 90%) (Supplemental Table S4). Among them,
33 encodeCAZymes. In addition, among the65 completeCAZyme-
encoding genes of the present study, 32 matched with 100%
identity to genes present in at least one individual, and six in at
least 12 individuals (protein ID ADD61840, clone 3; ADD62008,
clone 10; ADD62010, clone 10; ADD62011, clone 10; ADD61504,
clone 16; ADD61689, clone 22) (Supplemental Table S3). These six
CAZymes were found in the gut microbiomes of individuals with
very distinct body mass index (lean, overweight, obese), and with
different clinical status (healthy, inflammatory bowel–diseased
patients).Moreover, inmany cases (for clones 3, 4, 9, 10/11, 14/15,
16, 19, 20, 26), the genes surrounding these highly prevalent
CAZymes were also present in several individuals. These results
show the power of such an activity-based functional meta-
genomics approach, even when applied on a single sample, to
provide an experimental proof of function to highly prevalent
genes and gene clusters of the human gut microbiome. This also
underlines the interest of coupling sequence-based and activity-
based metagenomics to investigate the gut microbiota functions
and to measure the prevalence and abundance of targeted genes.
Figure 4. Evidence of horizontal gene transfers (HGTs) in human gutmetagenomic sequences. HGTs were identified when rupture was ob-served in gene synteny between the genes present in the metagenomicDNA fragments and their best BLASTP hits issued from sequenced genomes.For each clone, the first line represents the clone metagenomic sequence,and the second line represents the genome part in synteny with it. Eacharrow represents a gene. (Red arrows) Genes encoding putative transposasesor integrases; (black arrows) CAZy-encoding genes; (stars within black ar-rows) genes encoding the CAZymes involved in the activity detected in theprimary screens, as proven by transposon insertion in the fosmid inserts.
Metagenome screening to boost enzyme discovery
Genome Research 1609www.genome.org
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
Concluding remarks
This study demonstrates that the rational design of a multi-step
functional screening procedure to guide sequencing is a very
powerful strategy to accelerate enzyme discovery inmetagenomes.
Here, it was applied to identify highly prevalent genes encoding
enzymes that are involved in the catabolism of the dietary fibers by
the human gut microbiome and provided new insights into the
gastrointestinal tract functional trophic chain. Besides, our pro-
cedure appears to efficiently identify clusters of potentially com-
plementary activities for the complete breakdown of complex
plant polysaccharides, which can be of prime interest for bio-
refinery processes and white biotechnologies. Their potential for
such applications will have to be evaluated in futureworks. Finally,
we note that the strategy reported here, which coupled functional
screens and sequence-based metagenomics, is highly generic and
can be applied to mine other ecosystems known to be highly
specialized for raw biomass degradation (i.e., rumen and insect gut
microbiomes) for novel biocatalysts.
Methods
Construction of the metagenomic library
The fecal sample was collected from a healthy 30-yr-old male who
followed a vegetarian and fish-eating diet. His ascendants were
omnivorous. The individual did not eat any functional food such
as prebiotics or probiotics, nor did he receive any antibiotics or
other drugs during the 6mobefore sampling. The bacterial fraction
was recovered from 2 g of feces by a gradient density technique
using Nycodenz as previously described (Courtois et al. 2003). The
bacterial cell fraction was collected, washed with ultra-pure water,
then centrifuged for 10 min at 12,000g. The cell pellet was resus-
pended in a 50 mM Tris (pH 8), 100 mM EDTA buffer and then
incorporated in low-melt-point agarose before a gentle enzy-
matic lysis, as described by Ginolhac et al. (2004). High-molecular-
weight bacterial DNA trapped in agarose plugs was immediately
inserted into the wells of a 0.8% low-melting-temperature gel (Bio-
Rad) and separated for 18 h by pulsed-field gel electrophoresis at
4.5 V/cmwith 5- to 40-sec pulse times with a CHEFDRIII apparatus
(Bio-Rad). DNA fragments with size ranging from 30 to 40 kb were
isolated and recovered from the gel with GELase (Epicentre Tech-
nologies). Phylogenetic analysis of the extracted metagenomic
DNA using 16S rDNA sequencing was performed according to Tap
et al. (2009). The GenBank accession numbers for the 16S rDNA
molecular inventory are HM475513–HM480042. The correspon-
dence between the bacterial clone numbers appearing in Supple-
mental Figure S2 and the corresponding GenBank accession num-
bers is mentioned in Supplemental Table S5.
The metagenomic DNA was then cloned into fosmids by us-
ing the pCC1FOS fosmid library production kit (Epicentre Tech-
nologies) as recommended by the manufacturer. Recombinant
colonies were transferred to 384-well microtiter plates containing
freezing medium (Luria-Bertani, 8% glycerol complemented with
12.5 g/mL chloramphenicol), using an automated colony picker
(QpixII; Genetix). After 22 h of growth at 37°C without any agi-
tation, the plates were stored at ÿ80°C.
High-throughput functional screens
Metagenomic clones were screened for polysaccharide digestion
activities by spotting them on 22 cm 3 22 cm bioassay trays
containing solid agar and the target polysaccharide, using a QPixII
(Genetix) colony picker. Solid agar was either PLA (agar-supple-
mented LB buffered to pH 6.6 by addition of 5.4 g/L Na2H-
PO4�12H2O and 4.8 g/L NaH2PO4�H2O) or, in the case of starch
related polysaccharide containing media, terrific broth (TB). All
media were supplemented with 12.5 mg/L chloramphenicol and
with polysaccharides (beta-glucans, xylans, pectin, amylose, gal-
actan) as listed in Supplemental Table S1. The assay plates were
incubated for 7 d at 37°C, except for plates containing AZCL-
amylose, which were incubated for only 3 d to avoid interference
with E. coli host starch-degrading activities. A final throughput of
200,000 clones assayed per week and per substrate was achieved.
After incubation on plates containing chromogenic poly-
saccharides, positive clones were visually detected by the presence
of a blue or red halo resulting from the production of colored oli-
gosaccharides that diffused around the bacterial colonies. For
pectin assays, the plates were colored for 20 min with an aqueous
solution of Ruthenium Red (0.5%m/v) at room temperature. After
removing exceeding Ruthenium Red solution by aspiration, clear
halos were observed around the positive clones.
Secondary screens
All positive clones were further screened for hydrolysis efficiency
and specificity toward various polysaccharide structures, by
screening them on solid agar containing polysaccharides of vari-
ous structures (Supplemental Table S1). Native polysaccharides
were added to the sterile agar media at 50°C to conserve their
crystalline structure. Tenmicroliters of overnight liquid cultures of
the positive clones were placed on the agar surface, and the plates
were incubated for 3 to 7 d at 37°C. Plates containing non-
chromogenic beta-glucans and xylan were stained with an aque-
ous solution of Congo Red (0.05% m/v) followed by an overnight
exposure to 1 M NaCl. Digestion zones were visible as clear halos
around the positive colonies, except the deep brown halos ob-
served for carboxymethyl cellulose. Nonchromogenic amylose-
(Potocki-Veronese et al. 2005) and starch-containing plates were
stained by exposure to iodine vapor, revealing unstained halos
around positive colonies. Nonchromogenic pectic polysaccharides
were stained with Ruthenium Red as described in the previous
section.
To measure enzyme thermostability and activities at various
pH values, positive clones were grown in liquid cultures in 96-well
microplates. Cell lysis was performed by addition of 0.5 mg/mL
lysozyme and one cycle of freeze/thaw at ÿ20°C. For thermosta-
bility assays, cell extracts were incubated for 15 min at 55°C. Cell
extracts were incubated in 20mMcitrate-phosphate buffer at pH 4,
7, and 9, supplemented with 0.1% AZCL-polysaccharides (same as
used in primary screens), for 24 h at 37°C. Polysaccharide hydro-
lysis resulting in the release of soluble blue oligosaccharides was
quantified by measuring absorbance at 590 nm.
Transposon mutagenesis of the DNA inserts from the 26 se-
lected clones was performed using the EZ-Tn5 <oriV/KAN-2> In-
sertion Kit (Epicentre). Inactivated clones were identified by plat-
ing isolated colonies on agar-supplemented LB containing 12.5
mg/L chloramphenicol, 50 mg/L kanamycine, and the polysac-
charide used in the primary screens. Sanger sequencing was per-
formed outward from the nested transposon using the primers
supplied in the kit.
Pyrosequencing, read assembly, and gene prediction
Pyrosequencing of whole fosmid inserts was performed on a 454
Life Sciences (Roche) GS FLX system by the Genoscope sequencing
facility (Evry, France), yielding in total 186,762 contigable reads.
Read assembly was done using CAP3 (Huang and Madan 1999),
a DNA Sequence Assembly Program, and resulted in 106 contigs
sizing between 113 bp and 51,798 bp, covering in total 1,002,117
Tasse et al.
1610 Genome Researchwww.genome.org
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
bp. Ninety-eight percent of the sequenced nucleotides were in-
cluded in 27 large contigs of at least 8343 nt, obtained with amean
sequencing depth of 443. Two large contigs were found for clone
4. These 27 large contigs were further used for analysis. pCC1FOS
sequences were identified using Crossmatch (http://bozeman.
mbt.washington.edu/phredphrapconsed.html), discarded, and re-
placed by NNN. Excluding the vector sequences, these 27 large
contigs included 881,473 nt of metagenomic DNA. The compari-
son of these sequences with themselves revealed three cases of
partial sequence redundancy, which always occurred between
clones presenting the same enzymatic activity detected using the
primary screens. In the first two cases, the 59 extremity of a contig
was identical to the 39 extremity of a contig from another clone
(clones 14/15 and 17/18), which allowed manual assembly of
them to provide up to 71.3 kb of metagenomic DNA issued from
one unique gut bacterium. In the case of beta-glucanase active
clones, one sequence fragment (20.9 kb) from clone 10 was also
found in the contig sequence from clone 11, without any ho-
mologies of the contig extremities. As described in this report, this
particular sequence redundancy phenomenon may be due to
HGTs. The Metagene program (http://metagene.cb.k.u-tokyo.ac.
jp/metagene) was used to predict open reading frames (ORFs$ 20
amino acids) from the resulting sequences. No frameshift was
detected in the gene sequences by using BLASTX comparison to
the Uniref100 database, reflecting the reliability of read assembly
and gene detection. For each of the 26 clones, the large contig
sequence has been deposited in DDBJ/EMBL/GenBank under ac-
cession numbers GU942928–GU942942 and GU942944–GU942954.
ORF analysis
COG assignment of predicted gene products was made using RPS-
BLAST analysis against the reference GOG data set. COG assign-
ment was taken into account only for E-values # 10ÿ8. When
a predicted gene product was assigned to multiple COGs, this hit
was counted as divided by the number of assigned COGs, and
the value was dispensed evenly to each COG. Signal peptide pre-
diction was performed using PHOBIUS (http://www.ebi.ac.uk/
Tools/phobius/). CAZyme-encoding genes were identified by
BLAST analysis of the nucleotide sequences from the 106 contigs
against the amino acid sequences derived from the CAZy database
(http://www.cazy.org) using a cut-off E-value of 7 3 10ÿ6. Other
genes were manually annotated using NCBI-BLASTP against the
NR database (E-value < 10ÿ8, identity > 35%, query length cover-
age $ 50%). Gene prevalence in the human gut microbiome was
detected by using a TBLASTN comparison of the protein sequences
identified in this study to the metagenomic data sets available for
124 European (Qin et al. 2010), 13 Japanese (Kurokawa et al. 2007),
and 46 U.S. individuals (Gill et al. 2006; Turnbaugh et al. 2009,
2010) (E-value = 0, identity $ 90% or identity = 100%).
Taxonomic assignment of metagenomic sequences
Two methodologies were used. The first was based on protein se-
quence similarities with proteins of sequenced genomes, using a
BLASTP analysis against the nonredundant protein sequence da-
tabase of the NCBI. For each protein of each metagenomic DNA
fragment, the microbial origin of the best BLAST hit was assigned
only for matches covering at least 50% of the protein length, with
an E-value better than 10ÿ8 and an identity of at least 90%. Pro-
teins that did not pass those criteria were assigned to the ‘‘no hits’’
category. We assigned a class, genus, or species to the DNA frag-
ment issued from one clone when at least 50% of the putative
proteins encoded by this fragment presented a best BLAST hit
issued from the same microbe. Also, if putative proteins encoded
from the same DNA fragment had the best BLAST hit issued from
microbes of different classes, we considered the entire fragment as
unassignable. The second approach was based on tetranucleotide
frequency count, an analysis related to genomic signatures, by
using Ocount software (Teeling et al. 2004) connected to a pre-
viously designed pipeline allowing a normalization of tetranu-
cleotide frequency according to sequence length (Tap et al. 2009).
The 26-fosmid insert sequences were analyzed as divided into
10-kb fragments. Genetic diversity, recorded as 256-tetranucleotide
distribution, was represented by a principle component analysis
(PCA) using R software (Chessel et al. 2004). Only the first two PCA
components, representing 49.7% of the total genetic diversity,
were used to illustrate this analysis.
Acknowledgments
The high-throughput screening work was performed at the Labo-
ratory for BioSystems & Process Engineering (Toulouse, France)
with the ICEO automated facility. ICEO is supported by grants
from the Region Midi-Pyrenees, France, the European Regional
Development Fund, and the Institut National de la Recherche
Agronomique, France (the French National Institute for Agricul-
tural Research). We thank Sophie Bozonnet and Sandrine Laguerre
for their assistance. This work was carried out with the financial
support of the ANR—Agence Nationale de la Recherche—The
French National Research Agency under the Programme National
de Recherche en Alimentation et nutrition humaine, project ANR-
06-PNRA-024. Pyrosequencing was funded by the French National
Institute for Agricultural Research.
References
Brennan Y, Callen WN, Christoffersen L, Dupree P, Goubet F, Healey S,Hernandez M, Keller M, Li K, Palackal N, et al. 2004. Unusual microbialxylanases from insect guts. Appl Environ Microbiol 70: 3609–3617.
Chen S, Bagdasarian M, Kaufman MG, Bates AK, Walker ED. 2007.Mutational analysis of the ompA promoter from Flavobacteriumjohnsoniae. J Bacteriol 189: 5108–5118.
Chessel D, Dufour AB, Thioulouse J. 2004. The ade4 package—I: One-tablemethods. R News 4: 5–10.
Courtois S, Cappellano CM, Ball M, Francou FX, Normand P, Helynck G,Martinez A, Kolvek SJ, Hopke J, Osburne MS, et al. 2003. Recombinantenvironmental libraries provide access to microbial diversity for drugdiscovery from natural products. Appl Environ Microbiol 69: 49–55.
Duan CJ, Xian L, Zhao GC, Feng Y, Pang H, Bai XL, Tang JL, Ma QS, Feng JX.2009. Isolation and partial characterization of novel genes encodingacidic cellulases from metagenomes of buffalo rumens. J Appl Microbiol107: 245–256.
Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, GillSR, Nelson KE, Relman DA. 2005. Diversity of the human intestinalmicrobial flora. Science 308: 1635–1638.
Feng Y, Duan CJ, Pang H, Mo XC, Wu CF, Yu Y, Hu YL, Wei J, Tang JL, FengJX. 2007. Cloning and identification of novel cellulase genes fromuncultured microorganisms in rabbit cecum and characterization of theexpressed cellulases. Appl Microbiol Biotechnol 75: 319–328.
Ferrer M, Golyshina OV, Chernikova TN, Khachane AN, Reyes-Duarte D,Santos VA, Strompl C, Elborough K, Jarvis G, Neef A, et al. 2005. Novelhydrolase diversity retrieved from a metagenome library of bovinerumen microflora. Environ Microbiol 7: 1996–2010.
Ferrer M, Beloqui A, Timmis KN, Golyshin PN. 2009. Metagenomics formining new genetic resources of microbial communities. J Mol MicrobiolBiotechnol 16: 109–123.
Flint HJ, Bayer EA, Rincon MT, Lamed R, White BA. 2008. Polysaccharideutilization by gut bacteria: Potential for new insights from genomicanalysis. Nat Rev Microbiol 6: 121–131.
Gabor EM, Alkema WB, Janssen DB. 2004. Quantifying the accessibility ofthe metagenome by random expression cloning techniques. EnvironMicrobiol 6: 879–886.
Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI,Relman DA, Fraser-Liggett CM, Nelson KE. 2006. Metagenomic analysisof the human distal gut microbiome. Science 312: 1355–1359.
Metagenome screening to boost enzyme discovery
Genome Research 1611www.genome.org
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
Ginolhac A, Jarrin C, Gillet B, Robe P, Pujic P, Tuphile K, Bertrand H, VogelTM, Perriere G, Simonet P, et al. 2004. Phylogenetic analysis ofpolyketide synthase I domains from soil metagenomic libraries allowsselection of promising clones. Appl Environ Microbiol 70: 5522–5527.
Grabitske HA, Slavin JL. 2008. Low-digestible carbohydrates in practice.J Am Diet Assoc 108: 1677–1681.
Grant S, Sorokin DY, Grant WD, Jones BE, Heaphy S. 2004. A phylogeneticanalysis of Wadi el Natrun soda lake cellulase enrichment cultures andidentification of cellulase genes from these cultures. Extremophiles 8:421–429.
Guo H, Feng Y,MoX, Duan C, Tang J, Feng J. 2008. [Cloning and expressionof a beta-glucosidase gene umcel3G frommetagenome of buffalo rumenand characterization of the translated product]. Sheng Wu Gong ChengXue Bao 24: 232–238.
Huang X, Madan A. 1999. CAP3: A DNA sequence assembly program.Genome Res 9: 868–877.
Institute of Medicine. 2005. Dietary reference intakes. National Academy ofSciences, Washington, DC.
Jones BV,Marchesi JR. 2007. Transposon-aided capture (TRACA) of plasmidsresident in the human gut mobile metagenome. Nat Methods 4: 55–61.
Kalyuzhnaya MG, Lapidus A, Ivanova N, Copeland AC, McHardy AC, SzetoE, Salamov A, Grigoriev IV, Suciu D, Levine SR, et al. 2008. High-resolution metagenomics targets specific functional types in complexmicrobial communities. Nat Biotechnol 26: 1029–1034.
Kurokawa K, Itoh T, Kuwahara T, Oshima K, Toh H, Toyoda A, Takami H,Morita H, Sharma VK, Srivastava TP, et al. 2007. Comparativemetagenomics revealed commonly enriched gene sets in human gutmicrobiomes. DNA Res 14: 169–181.
Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS,Schlegel ML, Tucker TA, Schrenzel MD, Knight R, et al. 2008. Evolutionof mammals and their gut microbes. Science 320: 1647–1651.
Li LL, McCorkle SR, Monchy S, Taghavi S, van der Lelie D. 2009.Bioprospecting metagenomes: Glycosyl hydrolases for convertingbiomass. Biotechnol Biofuels 2: 10. doi: 10.1186/1754-6834-2-10.
Liu JR, Duan CH, Zhao X, Tzen JT, Cheng KJ, Pai CK. 2008. Cloning ofa rumen fungal xylanase gene and purification of the recombinantenzyme via artificial oil bodies. Appl Microbiol Biotechnol 79: 225–233.
Lozupone CA, Hamady M, Cantarel BL, Coutinho PM, Henrissat B, GordonJI, Knight R. 2008. The convergence of carbohydrate active generepertoires in human gut microbes. Proc Natl Acad Sci 105: 15076–15081.
Macdonald TT, Monteleone G. 2005. Immunity, inflammation, and allergyin the gut. Science 307: 1920–1925.
MahowaldMA, Rey FE, Seedorf H, Turnbaugh PJ, Fulton RS,WollamA, ShahN, Wang C, Magrini V, Wilson RK, et al. 2009. Characterizing a modelhuman gut microbiota composed of members of its two dominantbacterial phyla. Proc Natl Acad Sci 106: 5859–5864.
Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L,Nalin R, Jarrin C, Chardon P, Marteau P, et al. 2006. Reduced diversity offaecal microbiota in Crohn’s disease revealed by a metagenomicapproach. Gut 55: 205–211.
Martens EC, Koropatkin NM, Smith TJ, Gordon JI. 2009. Complex glycancatabolism by the human gut microbiota: The Bacteroidetes Sus-likeparadigm. J Biol Chem 284: 24673–24677.
McGarr SE, Ridlon JM, Hylemon PB. 2005. Diet, anaerobic bacterialmetabolism, and colon cancer: A review of the literature. J ClinGastroenterol 39: 98–109.
Pang H, Zhang P, Duan CJ, Mo XC, Tang JL, Feng JX. 2009. Identification ofcellulase genes from the metagenomes of compost soils and functionalcharacterization of one novel endoglucanase. Curr Microbiol 58: 404–408.
Potocki-Veronese G, Putaux JL, Dupeyre D, Albenne C, Remaud-Simeon M,MonsanP, BuleonA. 2005. Amylose synthesized in vitroby amylosucrase:Morphology, structure, and properties. Biomacromolecules 6: 1000–1011.
Qin J, Li R, Raes J, ArumugamM, Burgdorf KS,ManichanhC,Nielsen T, PonsN, Levenez F, Yamada T, et al. 2010. A human gut microbial genecatalogue established by metagenomic sequencing. Nature 464: 59–65.
Qu A, Brulc JM,WilsonMK, Law BF, Theoret JR, Joens LA, Konkel ME, AnglyF, Dinsdale EA, Edwards RA, et al. 2008. Comparative metagenomicsreveals host specific metavirulomes and horizontal gene transferelements in the chicken cecum microbiome. PLoS ONE 3: e2945. doi:10.1371/journal.pone.0002945.
Rakoff-Nahoum S, Paglino J, Eslami-Varzaneh F, Edberg S, Medzhitov R.2004. Recognition of commensal microflora by toll-like receptors isrequired for intestinal homeostasis. Cell 118: 229–241.
Rees HC, Grant S, Jones B, Grant WD, Heaphy S. 2003. Detecting cellulaseand esterase enzyme activities encoded by novel genes present inenvironmental DNA libraries. Extremophiles 7: 415–421.
Richardson TH, Tan X, Frey G, Callen W, Cabell M, Lam D, Macomber J,Short JM, Robertson DE, Miller C. 2002. A novel, high performanceenzyme for starch liquefaction. Discovery and optimization of a low pH,thermostable alpha-amylase. J Biol Chem 277: 26501–26507.
Roberts AP, Chandler M, Courvalin P, Guedon G, Mullany P, Pembroke T,Rood JI, Smith CJ, Summers AO, Tsuda M, et al. 2008. Revisednomenclature for transposable genetic elements. Plasmid 60: 167–173.
Rondon MR, August PR, Bettermann AD, Brady SF, Grossman TH, Liles MR,Loiacono KA, Lynch BA, MacNeil IA, Minor C, et al. 2000. Cloning thesoil metagenome: A strategy for accessing the genetic and functionaldiversity of uncultured microorganisms. Appl Environ Microbiol 66:2541–2547.
Selvendran RR. 1984. The plant cell wall as a source of dietary fiber:Chemistry and structure. Am J Clin Nutr 39: 320–337.
Simon C, Daniel R. 2009. Achievements and new knowledge unraveled bymetagenomic approaches. Appl Microbiol Biotechnol 85: 265–276.
Sonnenburg JL, Xu J, LeipDD, ChenCH,Westover BP,Weatherford J, BuhlerJD, Gordon JI. 2005. Glycan foraging in vivo by an intestine-adaptedbacterial symbiont. Science 307: 1955–1959.
Tamames J, Moya A. 2008. Estimating the extent of horizontal gene transferin metagenomic sequences. BMC Genomics 9: 136. doi: 10.1186/1471-2164-9-136.
Tang K, Utairungsee T, Kanokratana P, Sriprang R, Champreda V,Eurwilaichitr L, Tanapongpipat S. 2006. Characterization of a novelcyclomaltodextrinase expressed from environmental DNA isolated fromBor Khleung hot spring in Thailand. FEMS Microbiol Lett 260: 91–99.
Tang K, Kobayashi RS, Champreda V, Eurwilaichitr L, Tanapongpipat S.2008. Isolation and characterization of a novel thermostableneopullulanase-like enzyme from a hot spring in Thailand. BiosciBiotechnol Biochem 72: 1448–1456.
Tap J, Mondot S, Levenez F, Pelletier E, Caron C, Furet JP, Ugarte E, Munoz-Tamayo R, Paslier DL, Nalin R, et al. 2009. Towards the human intestinalmicrobiota phylogenetic core. Environ Microbiol 11: 2574–2584.
Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner FO. 2004. TETRA:A web-service and a stand-alone program for the analysis andcomparison of tetranucleotide usage patterns in DNA sequences. BMCBioinformatics 5: 163. doi: 10.1186/1471-2105-5-163.
Temperton B, Field D, Oliver A, Tiwari B, Muhling M, Joint I, Gilbert JA.2009. Bias in assessments of marine microbial biodiversity in fosmidlibraries as evaluated by pyrosequencing. ISME J 3: 792–796.
Turnbaugh PJ, Gordon JI. 2009. The core gut microbiome, energy balanceand obesity. J Physiol 587: 4153–4158.
Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE,Sogin ML, Jones WJ, Roe BA, Affourtit JP, et al. 2009. A core gutmicrobiome in obese and lean twins. Nature 457: 480–484.
Turnbaugh PJ, Quince C, Faith JJ, McHardy AC, Yatsunenko T, Niazi F,Affourtit J, Egholm M, Henrissat B, Knight R, et al. 2010. Organismal,genetic, and transcriptional variation in the deeply sequenced gutmicrobiomes of identical twins. Proc Natl Acad Sci 107: 7503–7508.
Uchiyama T, Miyazaki K. 2009. Functional metagenomics for enzymediscovery: Challenges to efficient screening. Curr Opin Biotechnol 20:616–622.
Voget S, Leggewie C, Uesbeck A, Raasch C, Jaeger KE, Streit WR. 2003.Prospecting for novel biocatalysts in a soil metagenome. Appl EnvironMicrobiol 69: 6235–6242.
World Health Organization. 2003.Diet, nutrition and the prevention of chronicdisease. Technical Report Series no. 916. http://whqlibdoc.who.int/trs/who_TRS_916.pdf.
Zoetendal EG, Akkermans AD, De Vos WM. 1998. Temperature gradient gelelectrophoresis analysis of 16S rRNA from human fecal samples revealsstable and host-specific communities of active bacteria. Appl EnvironMicrobiol 64: 3854–3859.
Received March 25, 2010; accepted in revised form July 29, 2010.
Tasse et al.
1612 Genome Researchwww.genome.org
Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from
Functional characterization of a gene locus from anuncultured gut Bacteroides conferringxylo-oligosaccharides utilization to Escherichia coli
Alexandra S. Tauzin,1,2 Elisabeth Laville,1
Yao Xiao,3 S�ebastien Nouaille,1
Pascal Le Bourgeois,1 St�ephanie Heux,1
Jean-Charles Portais,1 Pierre Monsan,2
Eric C. Martens,3 Gabrielle Potocki-Veronese1 and
Florence Bordes1*1LISBP, CNRS, INRA, INSAT, Universit�e de Toulouse,
Toulouse, France.2TWB, INRA, Ramonville Saint-Agne, France.3Department of Microbiology and Immunology,
University of Michigan Medical School, Ann Arbor, MI,
USA.
Summary
In prominent gut Bacteroides strains, sophisticated
strategies have been evolved to achieve the complete
degradation of dietary polysaccharides such as xylan,
which is one of the major components of the plant cell
wall. Polysaccharide Utilization Loci (PULs) consist of
gene clusters encoding different proteins with a vast
arsenal of functions, including carbohydrate binding,
transport and hydrolysis. Transport is often attributed
to TonB-dependent transporters, although major facili-
tator superfamily (MFS) transporters have also been
identified in some PULs. However, until now, few of
these transporters have been biochemically character-
ized. Here, we targeted a PUL-like system from an
uncultivated Bacteroides species that is highly
prevalent in the human gut metagenome. It encodes
three glycoside-hydrolases specific for xylo-
oligosaccharides, a SusC/SusD tandem homolog and
a MFS transporter. We combined PUL rational engi-
neering, metabolic and transcriptional analysis in
Escherichia coli to functionally characterize this
genomic locus. We demonstrated that the SusC and
the MFS transporters are specific for internalization of
linear xylo-oligosaccharides of polymerization degree
up to 3 and 4 respectively. These results were
strengthened by the study of growth dynamics and
transcriptional analyses in response to XOS induction
of the PUL in the native strain, Bacteroides vulgatus.
Introduction
Due to scarcity of genes coding for complex
polysaccharide-degrading enzymes (the so-called Car-
bohydrate Active enZymes, or CAZymes), humans
depend on the symbiotic microorganisms within their
digestive tract to breakdown dietary glycans that are
recalcitrant to digestion in the upper parts of the gut.
These glycans are mainly plant cell wall components,
consisting of a cellulose scaffold cross-linked with hemi-
celluloses and pectins. The structural complexity and
diversity make the complete degradation of these gly-
cans a complex issue.
To face this complexity, bacteria from various genera
have developed sophisticated systems involving bat-
teries of CAZymes and carbohydrate transporters,
encoded by genes co-localized on specific loci. In Bac-
teroides strains, which are the most prominent glycan
degraders in the intestine, Polysaccharide Utilization
Loci (PULs) encode all the proteins involved in sensing,
binding, transport and hydrolysis, that are required to
achieve the complete breakdown and uptake of glycans
(Hehemann et al., 2010; Larsbrink et al., 2014;
Rogowski et al., 2015; Cuskin et al., 2015). In the
archetypal PUL system specific to starch utilization
(SUS) from Bacteroides thetaiotaomicron, a TonB-
dependent transporter (SusC) works in synergy with
binding proteins (SusD, SusE and SusF) to internalize
the oligosaccharides derived from the hydrolysis of
starch by the cell surface a-amylase (SusG). The TonB-
dependent transporter in complex with ExbB and ExbD
proteins allows the transport of macromolecules across
the outer membrane of Gram negative bacteria via
energy derived from the proton motive force (for review
see ref. Ferguson and Deisenhofer, 2002; Schauer
et al., 2008).Accepted 8 August, 2016. *For correspondence. E-mail bordes@
insa-toulouse.fr; Tel. 133 5 61 55 94 39; Fax 133 5 61 55 94 00.
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd.This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use,distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
Molecular Microbiology (2016) 00(00), 00–00 j doi:10.1111/mmi.13480First published online 2016
Xylan is a major component of plant cell walls and is
highly abundant in cereal-derived human foods. In the
human gut, most of the xylanolytic bacteria were identi-
fied among the Bacteroides genus (Dodd et al., 2011;
Martens et al., 2011). To date, only two xylan PULs from
Bacteroides ovatus (PUL-XylS and PUL-XylL) were thor-
oughly studied but their characterization focused exclu-
sively on glycoside hydrolases and carbohydrate binding
proteins (Rogowski et al., 2015). Interestingly, the PUL-
XylL exhibits two SusC-like transporters while the
PUL-XylS possesses a SusC and a major facilitator
superfamily (MFS) transporter. MFS, which is located in
the inner membrane in Bacteroides species, is a second-
ary transporter of small molecules, including carbohy-
drates, in response to electrochemical potentials (for
review see ref. Yan, 2015). Few PULs have been identi-
fied harbouring both a SusC/D transport system and a
MFS transporter, such as the glycosaminoglycan and the
N-glycan PULs of Bacteroides thetaiotaomicron or the
sialic acid cluster of Bacteroides fragilis (Martens et al.,
2008; Stafford et al., 2012; Phansopa et al., 2014).
Nevertheless, the specificity of each of these proteins in
carbohydrate harvesting has not been deeply studied.
The characterization of transporters in native strains
indeed faces several bottlenecks (i) the deletion of tar-
geted gene might not be sufficient to confirm its function-
ality due to functional redundancy insured by other native
proteins; (ii) in a PUL-like system, the activation of the
system requires sensing of a specific glycan in periplasm,
which is usually different from the internalized oligosac-
charides obtained by extracellular hydrolysis, and (iii)
despite the huge efforts dedicated to bacterial genetics,
genome engineering remains a challenge for numerous
species, and is even impossible for uncultivated ones.
During the last decade, significant efforts have been
put into functional genomics and metagenomics
(Turnbaugh et al., 2007; Hess et al., 2011; Nielsen et al.,
2014) in order to elucidate the main functionalities of
microbiomes. Functional metagenomic is a powerful tool
to decipher the diversity of functions present within the
uncultured gut bacteria fraction, which represents up to
70% of the human gut microbiota. From activity-based
screening approaches emerge large metagenomic DNA
fragments (25–40 kb) containing full multigenic clusters
such as PULs that contain putative transporters (Tasse
et al., 2010). However, the diversity of carbohydrate
transporter specificities remains largely under-explored.
In this context, we decided to extend the characteriza-
tion of carbohydrate transporters to those harboured by
uncultured gut bacteria. We thus studied the recombinant
expression and functional capabilities of a PUL issued
from a highly prevalent uncultured Bacteroides strain,
involved in the metabolism of xylo-oligosaccharides
(XOS). This assembly of genes was identified from a fecal
metagenomic library screened for prebiotic degradation
(Cecchini et al., 2013). Here, by combining a transcrip-
tomic analysis of each gene of the metagenomic insert
in Escherichia coli with the biochemical characterization
of glycan hydrolysis and transport specificities, we
showed that this highly conserved PUL-like system pos-
sesses a complete functional arsenal for XOS metabo-
lism in the E. coli recombinant host. It is composed of
two transporters, one of them working in synergy with a
carbohydrate binding protein and a battery of PUL-
associated glycoside hydrolases (GH) allowing XOS
hydrolysis into xylose, which is further metabolized by
the cells. These results were strengthened by the study
of growth dynamics and transcriptional analyses in
response to XOS induction of the PUL in the native
strain, Bacteroides vulgatus.
Results and discussion
Sequence analysis reveals a PUL involved in XOS
utilization
Previously, Cecchini et al. (2013) identified the metage-
nomic clone F5, which was able to hydrolyze XOS up to
a degree of polymerization of 6 (DP6) (Cecchini et al.,
2013). The metagenomic DNA insert, sizing 39093 bp,
was assigned to Bacteroides vulgatus strain. Over 93%
of the F5 sequence showed 99% sequence identity with
a part of the B. vulgatus ATCC 8482 genome (Fig. 1A).
Its functional annotation revealed a PUL containing
genes encoding a truncated glycoside hydrolase of fam-
ily 43 (GH43_t), a hybrid two-component system
(HTCS), a TonB-dependent porin (SusC), a binding pro-
tein (SusD), two members of the glycoside hydrolase
family 43 (GH43A and GH43B), a member of the glyco-
side hydrolase family 10 (GH10), a MFS transporter and
a member of the glycoside hydrolase family 16 (GH16)
(Cecchini et al., 2013).
By comparison with the genome of B. vulgatus ATCC
8482, this metagenomic locus was interrupted within the
gh43 gene upstream of the HTCS. This interruption
might imply that at least the gh43 gene and likely the
other PUL genes upstream have been truncated during
the library construction process.
Within fully and partially sequenced genomes of gut
bacteria (Joint Genome Institute, Markowitz et al.,
2012), such PUL organization is closely conserved
throughout B. vulgatus strains and their phylogenetic
related neighbors such as B. dorei, B. sartorii and B.
massiliensis (Fig. 1A). While the PULs from B. sartorii
and B. massiliensis are not listed in the Polysaccharide-
Utilization Loci Database (PULDB), the PULs from B.
vulgatus and B. dorei are listed as predicted with some
length differences (Terrapon et al., 2015). The closest
2 A. S. Tauzin et al. j
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
PUL characterized so far, in terms of functionality, is the
small xylan PUL (PUL-XylS) from B. ovatus ATCC 8483
(Rogowski et al., 2015). This PUL encodes an HTCS, a
tandem SusC/D, a surface glycan binding protein
(SGBP), two GH10s, a MFS transporter, a GH43 and a
GH67 (Fig. 1A). It is induced by wheat arabinoxylan,
glucuronoxylan and linear XOS (Martens et al., 2008;
Rogowski et al., 2015). Immediately downstream of the
SusD-like, PULs usually encode a SGBP contributing to
the additional binding of the substrates (Cameron et al.,
2012; Rogowski et al., 2015) which is absent in the F5
PUL.
In addition to the arranged SusC- and D proteins to
potentially bind and transport glycans, the clone F5
exhibited a gene encoding a MFS transporter. In E. coli,
the sialic acid uptake is due to a specific MFS trans-
porter (NanT) (Vimrt and Troy, 1985) while in other bac-
teria the sialic-acid-targeting PULs display MFS
transporters that are sometimes associated with the
SusC/D transport system (NanO/U) such as in Tanner-
ella forsythia and B. fragilis (Roy et al., 2010; Stafford
et al., 2012; Phansopa et al., 2014). As introduced
above, such an association has also been observed in
other Bacteroides PULs and was demonstrated as being
part of the operon. Examples include the glycosamino-
glycan and the N-glycan PULs of B. thetaiotaomicron
and, more recently, the PUL-XylS of B. ovatus (Martens
et al., 2008; Rogowski et al., 2015).
Fig. 1. Representation of thePUL-like system.
A. Organization of the XOS
utilization locus based on
selected annotatedBacteroides genomes. Genes
encoding known and predicted
functionalities are colour-
coded: glycoside hydrolase(GH) with family number in
blue; hybrid two component
system (HTCS) in red; SusC
in orange; SusD in yellow;surface glycan binding protein
(SGBP or SusE-positioned) in
light yellow; transporter of the
major facilitator superfamily(MFS) in purple; transposase
(Tnp) in green and unknown
in grey. Synteny
(corresponding to 99% identityat the DNA level) between the
sequence of clone F5 and the
genome locus of B. vulgatus
ATCC 8482 are shown bygrey bars. Black arrows
represent putative
transcription units in the
Bacteroides natural host,according to the consensus
promoter sequence of
Bacteroides strain.
B. The reduced constructs ofF5 used in the present work.
Carbohydrate transporters of gut bacteria 3
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
Finally, five transposase sequences are present in the
F5 sequence (Fig. 1A). Three are located between the
htcs- and the susC-like genes, one between the gh10
and the mfs transporter genes and one between the
gh16 and the mfs transporter genes.
Metagenomic gene expression in E. coli
We (Tasse et al., 2010) and others previously showed
(Ferrer et al., 2005; Wang et al., 2012; Strachan et al.,
2014) that phenotype of fosmid/cosmid metagenomic
clones is often related to the presence of several genes
encoding enzymes with various, and often complementary
activities. This is particularly true for clones harbouring
PUL-like multigenic systems issued from Bacteroidetes.
These clones encode synergistic CAZymes that are able
to completely breakdown complex polysaccharidic struc-
tures (Tasse et al., 2010). However, functional expression
of such metagenomic genes in E. coli, which is still the
predominantly used host for activity-based metagenomic,
has never been experimentally investigated. Here, for the
first time, the abilities of E. coli system to host and express
a heterologous multigenic system that is involved in XOS
metabolism from uncultured Bacteroides have been
explored at the transcriptional level.
To further investigate the level of induction/expression
of the 27 genes present on the F5 metagenomic insert,
the transcriptional level of each open reading frame
(ORF) in E. coli has been measured by quantitative RT-
PCR (Fig. 2). In the LB medium, among the 27 genes
that are present on the metagenomic DNA insert, only
10 were not expressed or expressed at very low level
(including the truncated gh43, htcs and gh16 encoding
genes from the PUL cluster). The 17 others genes were
either expressed at significant levels comparable to
endogenous E. coli housekeeping gene (ihfB) or even at
level close to the highly expressed fosmidic cam gene
(chloramphenicol acetyltransferase) used for chloram-
phenicol selection. The strongest expression was
detected for genes encoding SusD, SusC and a drug
efflux protein, at a level over three-fold relative to the
expression of ihfB. The genes coding for GH43A,
GH43B, GH10 and MFS were expressed to a level of
1.7 to 3.5-folds lower than ihfB. In addition, more than
half of the genes contained on the metagenomic insert
were transcribed at various expression levels, demon-
strating that the quantified gene expressions were due
to the recognition of distinct promoter sequences by the
recombinant host. A bioinformatics analysis of the full
F5 sequence revealed 6 putative Bacteroides promoters,
of which 3 are within the PUL. In the native Bacteroides
strain, the F5 PUL could be expressed as an operon
from mfs or gh43A to susD and regulated by the htcs,
which is expressed separately (Fig. 1A). Nevertheless,
the Bacteroides promoters cannot be responsible for
heterologous expression in E. coli, since Mastropaolo
et al. (2009) showed that Bacteroides promoters are not
recognized by E. coli (Mastropaolo et al., 2009). Using
Fig. 2. Gene expression analysis of the clone F5 on LB (black), xylose (grey) or XOS (white) growth conditions. The mean of the biologicaltriplicates is represented with 1/2 the standard error of the mean. Hybrid two-component system (HTCS), major facilitator superfamily
transporter (MFS), integration host factor b-subunit (ihfB). Dashlined represents an arbitrary threshold of expression.
4 A. S. Tauzin et al. j
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
BPROM program which predicts with an accuracy of
80% consensus r70 promoter sequences for E. coli,
about 184 putative promoters were identified within the
metagenomic insert and 34 only within the PUL (Sup-
porting Information Fig. S1). Recently, Lam and Charles
(2015) suggested that metagenomic genes, especially
those issued from Bacteroides species, are spuriously
transcribed in E. coli thanks to the random presence of
E. coli rpoD/r70 promoter sequences on metagenomic
DNA inserts, that would also be responsible for cloning
bias in metagenomic libraries (Lam and Charles, 2015).
They counted only around 10 promoter sequences/Mb
of metagenomic DNA, but they focused exclusively on
the most common consensus sequence TTGACA(-35)
and TATAAT(-10). This specific promoter sequence was
not detected within the F5 sequence. However, it is
noteworthy that r70 promoters vary in their sequence,
the absence/presence of the -35 box and the length of
the spacer between the -10 and -35 sequences
(Shultzaberger et al., 2007; Singh et al., 2011). The
BPROM tool, which allows the identification of degener-
ated promoter sequences, thus seems more pertinent to
identify putative r70 promoters, since the predictions are
in good adequacy with the present transcriptomic
results. In functional metagenomic, the recurrent bottle-
neck is to access the full potential of metagenomes and
different strategies have been developed to overcome
this limitation, mostly by improving cloning strategies
and the screening host, i.e. E.coli (for perspective see
ref. Lam et al., 2015). However, our data point out that a
significant proportion of metagenomic genes can be effi-
ciently transcribed in E. coli, and that spurious transcrip-
tion would be more advantageous than deleterious for
heterologous expression of multiple genes.
Finally, comparison of the expression data obtained
during growth on LB medium to results obtained using
xylose- and XOS-grown cultures showed that the genes
belonging to the PUL were not differentially expressed
between all the conditions tested (Fig. 2), suggesting a
lack of regulation in E. coli. It has been shown in Bacter-
oidetes that the ability to modulate the PUL expression
depends on sensor-regulator systems such as the
hybrid two-component systems in response to their tar-
geted glycan (Bolam and Koropatkin, 2012). In our sys-
tem and in all culture media, htcs gene was expressed
at a very low level in E. coli. Thus we postulate that the
PUL expression is not regulated either by the presence
of XOS nor xylose in E. coli.
Functional potential of the hydrolases in E. coli
Previously, we demonstrated that the cell extracts of the
clone F5 were able to hydrolyze XOS up to DP 6
(Supporting Information Fig. S1 from ref. Cecchini et al.,
2013). To characterize the catabolic potential of the
CAZymes contained in the multiproteic system to break-
down hemicellulose, a more detailed screening of hydro-
lytic activities was carried out on a panel of
polysaccharides, oligosaccharides and chromogenic
substrates representing the different components of
plant cell wall. Xylan and arabinoxylan were both hydro-
lyzed by the F5 cytoplasmic extracts with a preference
for unbranched xylan which was cleaved more efficiently
(Table 1). The cytoplasmic extracts have also been
tested on synthetic substrates and showed activity on
pNP-b-D-xylopyranose and pNP-a-L-arabinofuranose
(Table 1). These detected xylanase, xylosidase and ara-
binofuranosidase activities are consistent with the
known activities of CAZy families GH10 and GH43 iden-
tified in the PUL. Conversely, no activity was detected
on b-glucan or xyloglucan, likely attributed to the GH16
enzyme, a result consistent with the absence of tran-
scription of the gh16 gene (Fig. 2). Considering the pre-
vious sequence annotation of the transcription units in
Bacteroides, the gene expression analysis, and the
activity results of the clone F5, the GH16 seems unlikely
to be part of PUL.
To confirm that the enzymatic activities were due to
the GH43A, GH43B and GH10 from the PUL, we gener-
ated a reduced construct named F5min_GH containing
only the three gh genes (Fig. 1B). The enzymatic activ-
ities of the F5min_GH cell extracts were similar to those
obtained for F5, implying that the enzymes responsible
for these activities are encoded by the genes gh43A,
gh43B and gh10 (Table 1). The high number of genes to
Table 1. Activities of the cell extracts on complex polysaccharides
and synthetic substrates.
Substrate
Clone
F5 F5min_GH
Complex polysaccharides
Xylan 176.66 4.2 329.8614.2
Arabinoxylan 1 n.d.Arabinan / n.d.
Arabinogalactan / n.d.
b-glucan / n.d.
Xyloglucan / n.d.Synthetic substrates
pNP b-D-Xylopyranose 399.76 10.0 609.3652.1
pNP a-D-Xylopyranose / n.d.
pNP a-L-Arabinofuranose 239.96 10.2 353.7628.3pNP b-L-Arabinopyranose / n.d.
pNP a-L-Arabinopyranose / n.d.
Activities were expressed in mU (with 1U51 mmol/min) per litre of
culture. Mean of three biological replicates.
Abbreviations:15 residual activity after 24 h; /5 no activitydetected; n.d.5not determined.
Carbohydrate transporters of gut bacteria 5
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
be expressed in the clone F5 could explain the lower
values in the observed activity compared to the clone
F5min_GH. The enzymes are essential to hydrolyze
XOS to xylose that is further metabolized by the strain.
This result also confirmed that the GH16 is not respon-
sible for any activity detected for F5 extracts.
The theoretical subcellular localization of the GHs pro-
duced by F5 was determined using LipoP 1.0 server
(Juncker et al., 2003). For GH43B, no peptide signal or
N-terminal lipidation could be assigned, indicating a
cytoplasmic location. GH43A and GH10 exhibited a
putative signal peptidase II cleavage site and were pre-
dicted as N-terminally lipidated proteins indicating their
potential attachment to the bacterial membrane. Bacte-
rial lipoproteins are membrane proteins present both in
Gram-negative and Gram-positive bacteria. In Bacteroi-
detes some lipoproteins, including the hydrolase (SusG,
GH13) from SUS systems are known to be transported
to the outer surface of the outer membrane (Shipman
et al., 1999). In E. coli the lipoproteins are anchored
either to the inner or to the outer membrane, and ori-
ented towards the periplasmic space (Tokuda and
Matsuyama, 2004). The 12 rule postulates that the resi-
due at the N-terminal second position is critical for the
membrane specificity of lipoproteins in E. coli (for review
see Okuda and Tokuda, 2011). An Asp in position 12
maintains the lipoprotein in the inner membrane. GH43A
exhibits a Ser which is characteristic of outer membrane
sorting signal (Yamaguchi et al., 1988). GH10 pos-
sesses a Gly in position 12 which could imply an
‘ambiguous’ sorting signal (both inner and outer mem-
brane facing periplasm) as observed for the periplasmic
maltose-binding protein expressed in E. coli (Seydel
et al., 1999).
To examine the subcellular localization of the GHs in
E. coli, enzymatic assays were performed on secreted,
soluble intracellular, periplasmic and membrane protein
fractions (Supporting Information Fig. S2). No secreted
or soluble periplasmic activities were detected. Xylanase,
arabinosidase and xylosidase activities were detected in
both soluble intracellular and membrane fractions. These
results are consistent with theoretical subcellular local-
ization of the GHs that two out of the three should be
attached to the membrane and one cytoplasmic.
XOS uptake in E. coli
To evidence and characterize the transport ability of the
clone F5, its growth and that of different truncated variants
have been monitored over 24 h in liquid minimal media
(MM) supplemented with different xylose-containing gly-
cans as sole carbon sources (Figs 3 and 4A, C and E).
As control strain, we used a metagenomic clone (clone
F4) able to hydrolyze a mixture of XOS into xylose and
xylobiose (due the presence of GH8, GH43 and GH120)
but without any transporter encoding gene within its meta-
genomic insert, as reported by Cecchini et al. (2013) (see
Fig. S1 from ref. Cecchini et al., 2013). Previously, the abil-
ity of the E. coli host strain to metabolize xylose as unique
carbon source was confirmed.
When we compared the growth of F5 and F4 in MM
containing a mixture of linear XOS of DP 2 to 6, only F5
could grow, even if both clones produce hydrolases able
to hydrolyze XOS (Fig. 3). Similarly, F5min_GH that is
Fig. 3. Growth of the clonesF5 (blue), F5min_SUS (red),
F5min_MFS (orange),
F5min_SUSDSusD (purple),
F5min_GH (green), F4 (cyan)and empty pCC1fos (Epi,
grey) on a mixture of xylo-
oligosaccharides at 0.5%
(m/v). The data represent theaverage of at least biological
triplicates.
6 A. S. Tauzin et al. j
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
deleted of the genes encoding transporters did not grow
on XOS (Fig. 3). Moreover, E. coli clone F5 displayed
no ability to grow on xylan, although cell extracts were
able to hydrolyze this substrate (Tables 1 and Support-
ing Information Table S2). These results confirm, as dis-
cussed above, that the hydrolases of F5 are not
secreted in E. coli, and suggest that they are unlikely
anchored to the outer surface of the outer membrane.
But we cannot exclude the hypothesis that F5min_GH
has a GH facing out and that the hydrolysis of the XOS
in the media will be so slow that it does not support the
growth on 24 h. It also demonstrated that functional
transporters are required, in addition to functional XOS
degrading GHs, to confer to recombinant E. coli the
ability to grow on these oligosaccharides. Hence, we
conclude that the clone F5 is able to metabolize XOS
due to the XOS internalization mediated by one or sev-
eral transporters, followed by their subsequent intracel-
lular hydrolysis into xylose.
The specificity of the transporters has been character-
ized by using MM containing XOS with DP ranging from 2
to 5 as sole carbon source (Fig. 4A). While the clone F5
was able to grow in MM with DP 2 to 4, no growth has
been detected in MM containing XOS larger than DP 4.
Thus the internalization of xylose-containing glycans is
possible up to DP 4. As both xylan and arabinoxylan could
be hydrolyzed by F5 cell extracts, we tested arabino-xylo-
oligosaccharides from 2 to 4 xylosyl residues branched
Fig. 4. Growth and xylo-oligosaccharides uptake of the clones F5, F5min_SUS and F5min_MFS. Growth curves of the clone F5 (A),
F5min_SUS (C) and F5min_MFS (E) supplemented with xylopentaose ( ), xylotetraose ( ), xylotriose ( ) and
xylobiose ( ). HPAEC-PAD analysis of the culture supernatants of the clones F5 (B), F5min_SUS (D) and F5min_MFS (F) tomeasure uptake of xylotetraose ( ), xylotriose ( ) and xylobiose ( ). The squares indicate the sampling time
points. The data represent the average of at least biological duplicates.
Carbohydrate transporters of gut bacteria 7
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
with 1 or 2 arabinosyl residues as sole carbon source. No
growth has been observed after 24 h of incubation with
these arabino-xylo-oligosaccharides (Supporting Informa-
tion Table S2). Thus the oligosaccharide transport in F5
seems specific to unbranched XOS in E. coli host.
The PUL encodes two transport systems, a MFS and
a SusC/D pair. MFS and SusC are known as membrane
proteins and SusD is a lipidated protein predicted to be
anchored to the outer membrane of Bacteroides species.
To assess the transport ability and specificity of the
potential transporters, new F5 variants were constructed
(Fig. 1B). The first variant, named F5min_SUS, harbours
the hydrolases and the arranged SusC/D homologs. The
second variant, named F5min_MFS, harbours the hydro-
lases and the MFS transporter. The growth on a mixture
of XOS (from DP 2 to DP 6) has been investigated.
F5min_SUS was able to grow on linear XOS mixture but
with a level of growth lower than F5 (Fig. 3). We thus
confirmed that the SusC/D system protein is an active
XOS transporter. However, this transporter was not suffi-
cient to completely restore the growth of the E. coli strain
harbouring the full F5 insert, suggesting the involvement
of another transporter. On the XOS mixture, F5min_MFS
showed a growth curve similar to F5 confirming the func-
tionality of the MFS transporter (Fig. 3). To grow on
XOS, F5 required at least one of its two functional trans-
porters, the SusC or the MFS transporter.
We investigated further the function of the SusD by
deleting the corresponding gene in the F5min_SUS
clone. This construct was named F5min_SUSDSusD
(Fig. 1B). This deletion completely abolished the ability
of E. coli to grow on XOS. As the susD gene being
located downstream of susC, any transcriptional effect
can be excluded, suggesting that the presence of SusD
was essential for the functionality of the SusC trans-
porter and for metabolism of XOS (Fig. 3). This result is
in agreement with previous studies demonstrating that,
in Bacteroides strains, DSusD mutants were unable to
grow on their targeted poly- and oligosaccharides (Koro-
patkin et al., 2008; Cameron et al., 2014; Tauzin et al.,
2016). The function of SusD is not restricted to its bind-
ing ability, its physical presence being essential for strain
growth on its targeted glycan. SusD physical presence
is sufficient, since supplementation of the DSusD strain
with the SusD* variant, which is a SusD mutant unable
to bind glycan, was enough to restore the growth of the
bacteria (Cameron et al., 2014; Tauzin et al., 2016).
To characterize more precisely the transport specific-
ity of each F5 transport system, namely SusC/D and
MFS, we monitored the variant growth on individual lin-
ear XOS of various lengths (Fig. 4C and E). While
F5min_SUS was able to grow on XOS up to DP 3,
F5min_MFS was able to grow on XOS up to DP 4. To
visualize the kinetic of internalization of XOS of different
DPs (from DP 2 to DP 3 for F5min_SUS and from DP 2
to 4 for F5 and F5min_MFS), we monitored their disap-
pearance from the culture supernatants using HPAEC-
PAD analysis. F5 and F5min_MFS, grown on XOS of
specified DP (2, 3 and 4), consumed each oligosaccha-
ride to completion in 24 h (Fig. 4B and F). The rate of
utilization was similar for xylobiose and xylotriose which
were both consumed faster than xylotetraose (Fig. 4B
and F). In contrast, F5min_SUS left residual xylobiose
and xylotriose in the culture supernatant after 24 h,
while the culture had already reached the stationary
phase which remained at a final OD600nm lower than
with F5 and F5min_MFS (Fig. 4C and D).
To date, two other XOS MFS transporters, specific to
linear XOS of either DP 3 or 6, have been reported in
Klebsiella spp. (Qian et al., 2003; Shin et al., 2010). In
addition, a TonB-dependent (SusC-like) transporter
essential for growth on XOS has been identified in Xan-
thomonas campestris but its specificity has still not been
investigated (D�ejean et al., 2013). In contrast, no func-
tional characterization of SusC and MFS transporters
specific to XOS has been reported in Bacteroides so
far, even if several xylan PULs have been described in
gut bacteria. SusC (BACOVA_4393) and MFS
(BACOVA_4388) contained in the XylS from B. ovatus
shared only 37% and 30% identity with the SusC and
the MFS, respectively, (Rogowski et al., 2015). These
transporters are to date the best homologs to the F5
transporters in validated PULs targeting XOS, but again,
their function has not been experimentally validated, nei-
ther was their transport specificity investigated.
Intriguingly, while no oligosaccharide was released
throughout the XOS uptake by F5min_SUS, residual xylo-
biose and xylose were observed throughout the growth of
F5 and F5min_MFS on xylobiose and xylotriose (Support-
ing Information Figs S3–S5). The released xylobiose and
xylose were consumed thereafter as well as the XOS ini-
tially present in the culture supernatant. The observed
exchanges of carbohydrates between the intra- and the
extra-cellular compartments are in line with the character-
istics of the transporters. The members of the MFS trans-
porter family indeed transport the compounds depending
on their concentration gradient while the TonB-dependent
transporters depend on energy coupling. We hypothe-
sized that the increase of xylose and xylobiose into the
cells changed the gradient concentration leading to
the release of these compounds outside the cells by the
MFS transporter.
It is clear from the results presented on Figs 3 and 4
that in E. coli, the MFS transporter allows a higher growth
rate than the SusC transporter. This is not due to the tran-
scription levels because in E. coli grown on LB the MFS
transporter is seven times lower expressed than SusC
(Fig. 2). Three hypotheses may explain this phenomenon.
8 A. S. Tauzin et al. j
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
First, it may be possible that the affinity for XOS is higher
for the MFS transporter than for the SusC/D transporter.
Another possibility would be that the SusC/D transport sys-
tem would not be perfectly functional when expressed in E.
coli compared to the native organism. The Ser residue at
the N-terminal second position of SusD suggests a localiza-
tion at the E. coli outer membrane (Yamaguchi et al., 1988;
Okuda and Tokuda, 2011). This is in accordance with our
results showing that SusD is necessary for SusC transport
function. However, its orientation towards the outer mem-
brane could not be confirmed. The last explanation could
be related to the different characteristics of both transport-
ers. The TBDT transporters indeed require energy derived
from the proton motive force thanks to the interaction with
the TonB-ExbB-ExbD complex (Noinaj et al., 2010) which
is provided by E. coli host as described by Phansopa et al.
(2014). Transport through SusC/D might thus be energy-
consuming for the E. coli host. In contrast, transport
through MFS is passive and just driven by gradient concen-
tration (Yan, 2015). Whatever is the mechanism of XOS
transport in the heterologous E. coli host, it cannot be
extended to what happens in the native organism, as the
cellular localization of the transporters could be different.
As explained above, in Bacteroides, it is probable that the
SusC/D transporter sits in the outer membrane, working in
coordination with the MFS transporter located in the inner
membrane. Based on the growth ability of the F5 clone
and all of its variants, we assume that in E. coli the MFS
protein would be located in the outer membrane, allowing
the XOS internalization within the periplasmic space, where
they would be hydrolyzed by the GHs bounds to the inner
and/or outer membranes and oriented towards the periplas-
mic space. Nevertheless, further experiments will be neces-
sary to demonstrate the exact localization of these
transporters in the native strains and when produced in
E. coli.
The PUL in B. vulgatus is involved in XOS utilization
We studied the biological function of the PUL in B. vulga-
tus as the metagenomic clone has high homology to
genes in B. vulgatus ATCC 8482. Growth dynamics of B.
vulgatus on XOS and a variety of xylans revealed that B.
vulgatus was able to grow on the XOS and the two enzy-
matically digested products of wheat arabinoxylan, AX2
and A2X4. B. vulgatus growth on XOS is very robust,
while on the branched oligosaccharides the growth is
delayed and at a lower rate. In contrast to the growth
dynamics on xylo- and arabino-xylo-oligosaccharides, B.
vulgatus was unable to utilize complex heavily decorated
xylans including RAX, SAX and CAX or the arabinoxylan
WAX (Fig. 5A). Although the intrinsic turbidity of
A
D
B C
Fig. 5. Growth dynamics of B. vulgatus ATCC 8482 on xylo-oligosaccharides and various xylans; and transcriptional responses of B. vulgatus
to xylo-oligosaccharides.Growth curves of B. vulgatus (n512) on xylo-oligosaccharides (A), two wood xylans (B), arabinoxylans from various sources (C). XOS, xylo-
oligosaccharides; A2X4, 23,33-di-a-L-arabinofuranosyl-xylotetraose; AX2, 32-a-L-arabinofuranosyl-xylobiose; RAX, rice arabinoxylan; CAX,
corn arabinoxylan; SAX, sorghum arabinoxylan; WAX, wheat arabinoxylan.
Transcription of B. vulgatus PUL genes in response to xylo-oligosaccharides (n53) (D).
Carbohydrate transporters of gut bacteria 9
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
Birchwood and Beechwood xylans resulted in elevated
initial absorbance at 600 nm as compared to other
xylans, the marginal growth rate suggested a poor and
delayed utilization of these simple wood glucuronoxylans
by B. vulgatus. Thus we hypothesize that gene clusters
in B. vulgatus have evolved to target shorter xylose poly-
mers, i.e. xylo-oligosaccharides, rather than xylans.
To investigate whether the utilization of XOS is con-
tributed by the function of this PUL, we did transcrip-
tional analyses of the PUL in response to XOS
induction. The SusC,D-genes, two gh43, gh10 and the
mfs gene expressions are highly induced by XOS, sug-
gesting that these genes of this PUL are responsive for
XOS utilization. SusC- and SusD- like genes and mfs
gene are most highly induced (more than 1000-fold) as
compare to genes encoding the glycoside hydrolases,
which are also induced at least 40-fold relative to glu-
cose or xylose growing condition (Fig. 5B). Interestingly,
no induction was observed with gh16 gene, indicating a
break in the gene cluster between MFS and GH16. Our
results suggested that this PUL in B. vulgatus indeed
involves XOS utilization. This finding further corrobo-
rated our functional analysis of the metagenomic clone
in an E. coli recombinant host.
Conclusions
Metagenomic is a powerful tool to explore the gut bacte-
ria diversity and specificity, as well as an extensive
genetic source for discovering new functions. However,
the rapid production of metagenomic data is vastly out-
pacing functional studies, which underscores the critical
need for protein biochemical characterization and struc-
tural enzymology to inform bioinformatics and systems
biology (Andr�e et al., 2014). As showed here, the direct
study of metagenomic clones for characterization of new
functions and/or new protein families could be a good
strategy not only to save time, but also to study some
complex mechanisms that require the synergistic action
of different proteins. Previously, some multigenic sys-
tems issued of metagenomic libraries have been used
to optimize E. coli abilities to produce ethanol or antifun-
gal activity (Chung et al., 2008; Loaces et al., 2015), but
no biochemical characterization of PULs heterologously
expressed in E. coli has been published so far. Here,
we demonstrated that E. coli is an interesting recombi-
nant host for characterizing the individual components
of PUL systems from Bacteroides strains. The present
work constitutes the first experimental study of expres-
sion in E. coli of metagenomic multigenic cluster cloned
in fosmids. Our results suggested that E. coli may be
able to recognize its own promoter sequences within the
metagenomic inserts, in particular DNA issued from
Bacteroidetes (Lam and Charles, 2015). This spurious
transcription can be highly advantageous to study the
synergistic action of proteins encoded on a same meta-
genomic locus.
In the present work, we characterized a PUL-like sys-
tem that confers E. coli the ability to metabolize XOS.
Taking into account the possible inability of E. coli to
produce extracellular or cell surface attached proteins,
this new functionality requires the coordinated action of
at least 2 activities: (i) functional transport to internalize
oligosaccharides and (ii) oligosaccharide hydrolysis to
release monomers that will be used for E. coli growth.
The study of the transport system was based on growth
screening and required the presence of both transport
and hydrolytic functions.
To conclude, the present results pave the way for
boosting the functional characterization of individual
components of PULs, especially transporters issued
from cultured and uncultured Bacteroidetes. The generic
approach we developed could be extended to study
other catabolic pathways that are crucial for host and
dietary glycan harvesting by prominent gut bacteria, and
even for metabolism of other bioactive compounds.
Moreover, the construction and characterization of
recombinant E. coli strains that are able to metabolize
plant cell wall components opens the way to further
metabolic engineering works to develop microbial cell
factories dedicated to bio-sourced product synthesis.
Experimental procedures
Cloning
The metagenomic clones F5 (Genbank accession number
HE717017) and F4 (control, HE717016) were obtained from
metagenomic library issued of human fecal sample as previ-
ously described (Cecchini et al., 2013). Both are metagenomic
fragments cloned into pCC1fos fosmid and transformed in
EPI100 E. coli cells (Epicentre Technologies). The minimal
variants of the clone F5 were constructed by using the In-
Fusion HD Plus cloning kit (Clontech) following the manufac-
turer’s instructions. The primers used in this study are listed in
Supporting Information Table S1. Considering that the expres-
sion of the genes might be driven by promoter sequence
potentially located in the sequence of the upstream gene,
each variant was cloned by amplifying the CDS and the cog-
nate �600 bp to 1 kb upstream sequence.
Growth study
Media. All E. coli cells were grown on minimal synthetic
media (Na2HPO4�12H2O 17.4 g l21, KH2PO4 3.03 g l21,
NH4Cl 2.04 g l21, NaCl 0.51 g l21, MgSO4 0.49 g l21, CaCl24.38 mg l21, Na2EDTA�2H2O 15 mg l21, ZnSO4�7H2O
4.5 mg l21, CoCl2�6H2O 0.3 mg l21, MnCl2�4H2O 1 mg l21,
H3BO3 1 mg l21, Na2MoO4�2H2O 0.4 mg l21, FeSO4�7H2O
10 A. S. Tauzin et al. j
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
3 mg l21, CuSO4�5H2O 0.3 mg l21, thiamine 0.1 g l21 and
leucine 0.02 g l21) containing an appropriate carbon source
and supplemented with 12.5 mg l21 chloramphenicol. After a
first growing step in LB medium supplemented with
12.5 mg l21 chloramphenicol, an overnight culture in mini-
mum synthetic media containing xylose was realized to inoc-
ulate 0.5 ml of minimum synthetic media containing xylose
at an optical density (OD) at 600 nm of 0.05 into 48-well
microplate. The growth was followed by measuring the
OD600 over 24 h at 378C using the FLUOStar Optima (BMG
Labtech).
Bacteroide vulgatus ATCC 8482 strain was routinely
grown in tryptone-yeast extract-glucose (TYG) medium
(Holdeman et al., 1977), type-1 minimal medium (Urs,
Pudlo and Martens, unpublished data). Carbon sources
were added to a final concentration of 5 mg ml21 unless
otherwise stated. Cultures were grown at 378C in an anaer-
obic chamber (10% H2, 5% CO2, and 85% N2; Coy Manu-
facturing, Grass Lake, MI).
To quantify growth dynamics of B. vulgatus on various
carbon sources, the increase in culture absorbance
(600 nm) in 200 ml cultures was measured every 10 min on
an automated plate reader (Martens et al., 2011). Growth
dynamics showed the average of 12 replicates for each car-
bon source.
Growth substrates. Growth dynamics of E. coli and B. vul-
gatus were performed on minimal media supplemented with
a variety of oligosaccharides and polysaccharides carbon
sources. We used a mixture of XOS (WAKO and IOR-
TAIHE, from DP 2 to DP 7). Individual XOS from DP 2 to 5
and arabino-xylo-oligosaccharides (32-a-L-arabinofuranosyl-
xylobiose, AX2; 23-a-L-arabinofuranosyl-xylotriose, AX3;
23,33-di-a-L-arabinofuranosyl-xylotriose, A2X3; 23,33-a-L-
arabinofuranosyl-xylotetraose, AX4; and 23,33-di-a-L-arabi-
nofuranosyl-xylotetraose, A2X4) were purchased from
Megazyme. Simple xylans, with sparsely decorated struc-
tures, were purchased from Sigma for beechwood xylan,
from Sigma for birchwood glucuronoxylan and from Mega-
zyme for wheat arabinoxylan (WAX). More complex heavily
decorated glucuronoarabinoxylans (rice, RAX; sorghum,
SAX, and corn, CAX) were kind gifts of Dr. Bruce Hamake
(Purdue University).
Gene expression analyses
RNA extraction and retrotranscription in cDNA. Fromthree independent cultures of F5 clone, total RNAs were
extracted as previously described (Nouaille et al., 2009).
Briefly, 10 ml of exponentially growing cells (OD6005 1) in
LB medium were collected, centrifuged and the pellets
were immediately frozen in liquid nitrogen. Cells were dis-
rupted through high-speed shaking with stainless steel
beads. Total RNAs were extracted using an RNeasy mini
kit (Qiagen) following the manufacturer’s instructions.
RNAs were quantified using a NanoDropTM and their qual-
ity was controlled using a Bioanalyzer RNA kit (Agilent
Technologies).
The equivalent of 50 mg of RNA was subjected to DNAse
treatment and purified with RNeasy Mini spin column
(Qiagen). Then 5 mg of RNA were retrotranscribed using the
SuperScriptVR II RT (ThermoFisher Scientific) according to
the manufacturer’s protocol and cDNA were purified using
illustraTM MicroSpinTM G-25 columns (GE Healthcare).
For the transcriptional analysis in B. vulgatus, total RNA
was extracted using RNeasy mini kit (Qiagen) from 5 ml of
exponentially growing B. vulgatus culture in minimal
medium containing 5 mg ml21 of XOS. Contaminating DNA
was removed with TURBO DNA-freeTM Kit (Ambion).
Reverse transcription was performed with 1 lg of RNA
using Super ScriptVR III Reverse Transcriptase (Thermo
Fisher Scientific) using random primers (Invitrogen) accord-
ing to manufacturer’s instructions. cDNA quantification was
performed with a MastercyclerVR ep realplex (Eppendorf),
using homemade SYBRVR qPCR mix containing Hot-start
Taq Polymerase (NEB) and 400 nM primers, except
62.5 nM primers for 16S rRNA, for 40 cycles of 958C for
3 s, 528C for 20 s, 688C for 20 s, followed by a melting step
to determine amplicon purity. All transcript levels were nor-
malized based on 16S rRNA abundance. The expression of
each gene in the PUL was expressed relative to the tran-
script level of glucose or xylose growing condition.
Primer design. The primers used for real-time quantitative
PCR of each gene on the F5 metagenonic clone insert
were designed with Bio-Rad Beacon Designer software to
have lengths from 18 to 22 bases, GC contents of more
than 50%, melting temperatures of about 608C and to
amplify PCR products between 83 and 148 bases long
(Supporting Information Table S1).
The primers used for real-time quantitative PCR for each
gene in B. vulgatus PUL were designed using Primer 3.
These primers range from 18 to 24 bases, with GC con-
tents between 40% and 60%. The melting temperatures
lies around 608C and the amplicon size range between 80
and 150 bases (Supporting Information Table S1).
High throughput real-time quantitative PCR. Highthroughput real-time quantitative PCR was carried out using
the 48.48 dynamic arrayTM IFCs and the BioMarkTM HD
System (Fluidigm Corporation, CA, USA) following the man-
ufacturer’s protocol (Spurgeon et al., 2008) and performed
at GeT-PlaGe facilities (Castanet, France). Prior to RNA
expression analysis, primer specificity and the absence of
genomic DNA contamination in extracted total RNAs were
checked.
In total 1044 data were collected from qPCR analyses
combining 4 technical replicates (used at 3 different dilu-
tions) issued from 3 biological samples and the 29 primer
couples corresponding to the 27 genes of the metagenomic
insert and the 2 additional control genes (ihfB and cam).
Data analysis. Relative mRNA expression means were cal-
culated from the biological triplicates after initial raw data
analysis accomplished with the Fluidigm real-time PCR
analysis software v.4.1.2. The PCR efficiency was checked
for each primer couple and was close to 100%. The com-
parative DDCt method was used to calculate the change in
transcripts levels with correction (Livak and Schmittgen,
2001). As the best alternative, the mean expression of 5
less expressed genes was used to determine which gene is
Carbohydrate transporters of gut bacteria 11
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
significantly expressed compared to the threshold we fixed
at fivefold the value of the less expressed genes.
Two reference genes were used for data normalization
between samples: the integration host factor b-subunit
(ihfB) which is one of the commonly used reference gene in
E. coli (Weglenska et al., 1996) as its expression remained
constant throughout growth, and the chloramphenicol resist-
ance gene (cam) encoding for chloramphenicol acetyltrans-
ferase present on the recombinant vector and essential for
antibiotic resistance.
Enzymatic assays
The cells were grown on LB supplemented with
12.5 mg l21 chloramphenicol and inoculated with over-
night culture at 0.05 of OD600. When the OD600 reaches
1, the cells were harvested and the pellet was suspended
in 50 mM potassium phosphate buffer pH 7.0 containing
lysozyme (0.5 mg ml21 final concentration) to reach an
OD600 of 80. After incubation at 378C for one hour, the
suspension was frozen 15 min at 2808C and then
defrosted. Then samples were centrifuged and the super-
natant (cell extracts) was used to performed activity test.
All reactions were carried out at 378C in 50 mM potassium
phosphate buffer pH 7.0.
The activity tests against complex polysaccharides
(xylan, arabinoxylan, arabinan, arabinogalactan, b-glucan,
xyloglucan) were measured using the 3,5-dinitrosalicylic
acid reducing-sugar (DNS) assay. Reaction samples (250 ml
of cell extract incubated with 5 mg ml21 of specified sub-
strate) were added to an equal volume of DNS reagent to
terminate the reaction, and the colour was developed by
boiling for 5 min. Enzymatic activities with various para-
NitroPhenol (pNP) sugar derivatives were also realized.
After incubation of 150 ml of cell extract with 1 mM pNP-
glycosides (pNP-a-D-xylopyranose, pNP-b-D-xylopyranose,
pNP-a-L-arabinofuranose, pNP-a-L-arabinopyranose and
pNP-b-L-arabinopyranose), the reaction was stopped by
raising the pH to 11.0 through the addition of an equal vol-
ume of 0.2M Na2CO3. The released of reducing-sugar
(DNS assay) and pNP were measured in an Optima
(TECAN) at A540nm and A405nm respectively. A standard
curve was used to calculate product concentration.
Cellular localization of proteins
The control (pCC1fos empty) and F5min_GH clones were
grown in 250 ml of LB at 378C until OD600 reaches 0.9. The
cells were collected by centrifugation at 4400 g for 10 min
at 48C. The supernatant was filtered (0.22 mm) and tested
for secreted activity. The other protein fractions were
obtained from the different treatments of the pellet as
described by Larsbrink et al. (2011).
Briefly, the periplasmic proteins were collected using an
osmotic shock. The cells were washed with 10 ml of
50 mM Tris-HCl (pH 7.7) and collected by centrifugation at
4400 g for 10 min at 48C. The pellet was resuspended in
50 ml of 30 mM Tris-HCl, 20% (w/v) sucrose and 1 mM
EDTA (pH 8.0), and the cells were incubated at room tem-
perature for 10 min. The cells were then collected by
centrifugation at 4400 g for 15 min at 48C. The pellet was
resuspended in ice-cold 5 mM MgSO4, and the cells were
incubated on ice for 10 min. The cells were collected by
centrifugation at 14 000 g for 10 min at 48C. The superna-
tant was retained and contained the periplasmic proteins.
The pellet was resuspended in 50 mM sodium phosphate
buffer (pH 7.4) and sonicated to lyse cells. The lysate was
centrifuged at 5000 g for 10 min at 48C. Using an ultracen-
trifuge, the supernatant was centrifuged at 100 000 g for
1 h at 48C to recover the cytoplasmic proteins. The pellet of
the lysate was resuspended in 100 mM sodium carbonate
buffer (pH 9.0) and centrifuged at 100 000 g for 1 h at 48C.
The supernatant from this step contained the potential
trapped soluble proteins and/or weakly membrane-associated
proteins. The pellet, containing the membrane proteins, was
resuspended in 50 mM sodium phosphate buffer (pH 7.4).
XOS uptake
To assay the XOS uptake, cells were grown in M9 medium
supplemented with XOS of specified chain length or mix-
ture of XOS at 378C. Growth was monitored by measuring
the A600nm. During the growth, samples were collected at
regular time point and centrifuged. The supernatants were
filtered and conserved at 2208C. The amount of XOS pres-
ent in the culture supernatants were analyzed by HPAEC-
PAD on a Dionex ICS-3000 system (Dionex) equipped with
a CarboPac PA100 column. The analyses were carried out
at 308C with a flow rate of 0.5 ml min21 with the following
multistep gradient: 0–30 min (0–60% B), 30–31 min (60–
0% B) and 31–36 min (0% B). Solvents were 150 mM
NaOH (eluent A) and 150 mM NaOH, 500 mM CH3COONa
(eluent B). To quantify the remaining concentration in the
culture supernatant of XOS, the respective commercial oli-
gosaccharides (Megazymes) were used as standards.
Bioinformatic analyses
Promoter consensus sequences used to identify promoters
from E. coli (rpoD/r70) and Bacteroides (rABfr) were
(TTGACA15-19TATAAT) and (TTTG19-21TA2TTTG), respec-
tively, (Mastropaolo et al., 2009). The BPROM program
was used to identify the putative promoters in E. coli (Solo-
vyev and Salamov, 2011).
LipoP and SignalP servers were used to determine the
presence and location of lipoprotein and other protein sig-
nal peptide cleavage sites, respectively, (Juncker et al.,
2003; Petersen et al., 2011).
Acknowledgements
This research was funded by the French National Center of
Excellence Toulouse White Biotechnology. We cordially thank
Amandine Deroite, Nathan Davidenko, Adrien Guibert, Clar-
isse Lozano and Nelly Monties for their technical assistance.
The analytic work was carried out at the Laboratory for Bio-
Systems & Process Engineering (Toulouse, France) with the
equipment of the ICEO facility. MetaToul (Metabolomics &
Fluxomics Facitilies, Toulouse, France, www.metatoul.fr) and
12 A. S. Tauzin et al. j
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
its staff members are gratefully acknowledged for technical
support and access to microplate reader. MetaToul is part of
the national infrastructure MetaboHUB-ANR-11-INBS-0010
(The French National infrastructure for metabolomics and
fluxomics, www.metahub.fr). MetaToul is supported by grants
from the R�egion Midi-Pyr�en�ees, the European Regional
Development Fund, SICOVAL, the Infrastructures en Biologie
Sante et Agronomie (IBiSa, France), the Centre National de la
Recherche Scientifique (CNRS) and the Institut National de la
Recherche Agronomique (INRA). The work on Bacteroides
was supported by the grant GM090080 fromNIH.
Conflict of interest
Authors have no conflict of interest to declare.
References
Andr�e, I., Potocki-V�eronese, G., Barbe, S., Moulis, C., and
Remaud-Sim�eon, M. (2014) CAZyme discovery and
design for sweet dreams. Curr Opin Chem Biol 19: 17–24.
Bolam, D.N., and Koropatkin, N.M. (2012) Glycan recogni-
tion by the Bacteroidetes Sus-like systems. Curr Opin
Struct Biol 22: 563–569.
Cameron, E.A., Maynard, M.A., Smith, C.J., Smith, T.J.,
Koropatkin, N.M., and Martens, E.C. (2012) Multidomain
carbohydrate-binding proteins involved in Bacteroides
thetaiotaomicron starch metabolism. J Biol Chem 287:
34614–34625.
Cameron, E.A., Kwiatkowski, K.J., Lee, B.H., Hamaker,
B.R., Koropatkin, N.M., and Martens, E.C. (2014) Multi-
functional nutrient-binding proteins adapt human symbi-
otic bacteria for glycan competition in the gut by
separately promoting enhanced sensing and catalysis.
MBio 5: e01441–e01414.
Cecchini, D.A., Laville, E., Laguerre, S., Robe, P., Leclerc,
M., Dor�e, J., et al. (2013) Functional metagenomics
reveals novel pathways of prebiotic breakdown by human
gut bacteria. PLoS One 8: e72766.
Chung, E.J., Lim, H.K., Kim, J., Choi, G.J., Park, E.J., Lee,
M.H, et al. (2008) Forest soil metagenome gene cluster
involved in antifungal activity expression in Escherichia
coli. Appl Environ Microbiol 74: 723–730.
Cuskin, F., Lowe, E.C., Temple, M.J., Zhu, Y., Cameron,
E.A., Pudlo, N.A, et al. (2015) Human gut Bacteroidetes
can utilize yeast mannan through a selfish mechanism.
Nature 517: 165–169.
D�ejean, G., Blanvillain-Baufum�e, S., Boulanger, A., Darrasse,
A., Bernonville, T.D.D., Girard, A.L, et al. (2013) The xylan
utilization system of the plant pathogen Xanthomonas
campestris pv campestris controls epiphytic life and
reveals common features with oligotrophic bacteria and
animal gut symbionts. New Phytol 198: 899–915.
Dodd, D., Mackie, R.I., and Cann, I.K.O. (2011) Xylan deg-
radation, a metabolic property shared by rumen and
human colonic Bacteroidetes. Mol Microbiol 79: 292–304.
Ferguson, A.D., and Deisenhofer, J. (2002) TonB-depend-
ent receptors-structural perspectives. Biochim Biophys
Acta 1565: 318–332.
Ferrer, M., Golyshina, O.V., Chernikova, T.N., Khachane,
A.N., Reyes-Duarte, D., Santos, V.A., et al. (2005) Novel
hydrolase diversity retrieved from a metagenome library of
bovine rumen microflora. Environ Microbiol 7: 1996–2010.
Hehemann, J.H., Correc, G., Barbeyron, T., Helbert, W.,
Czjzek, M., and Michel, G. (2010) Transfer of
carbohydrate-active enzymes from marine bacteria to
Japanese gut microbiota. Nature 464: 908–912.
Hess, M., Sczyrba, A., Egan, R., Kim, T.W., Chokhawala,
H., Schroth, G., et al. (2011) Metagenomic discovery of
biomass-degrading genes and genomes from cow rumen.
Science 331: 463–467.
Holdeman, L.V., Cato, E.D., and Moore, W.E.C. (1977)
Anaerobe Laboratory Manual, 4th ed. Blacksburg, VA:
Virginia Polytechnic Institute and State University.
Juncker, A.S., Willenbrock, H., Heijne, G.V., Brunak, S.,
Nielsen, H., and Krogh, A. (2003) Prediction of lipoprotein
signal peptides in Gram-negative bacteria. Protein Sci
12: 1652–1662.
Koropatkin, N.M., Martens, E.C., Gordon, J.I., and Smith,
T.J. (2008) Starch catabolism by a prominent human gut
symbiont is directed by the recognition of amylose heli-
ces. Structure 16: 1105–1115.
Lam, K.N., and Charles, T.C. (2015) Strong spurious tran-
scription likely contributes to DNA insert bias in typical
metagenomic clone libraries. Microbiome 3: 22.
Lam, K.N., Cheng, J., Engel, K., Neufeld, J.D., and
Charles, T.C. (2015) Current and future resources for
functional metagenomics. Front Microbiol 6: 1196.
Larsbrink, J., Izumi, A., Ibatullin, F.M., Nakhai, A., Gilbert,
H.J., Davies, G.J., and Brumer, H. (2011) Structural and
enzymatic characterization of a glycoside hydrolase fam-
ily 31 a -xylosidase from Cellvibrio japonicus involved in
xyloglucan saccharification. Biochem J 567–580.
Larsbrink, J., Rogers, T.E., Hemsworth, G.R., McKee, L.S.,
Tauzin, A.S., Spadiut, O., et al. (2014) A discrete genetic
locus confers xyloglucan metabolism in select human gut
Bacteroidetes. Nature 506: 498–502.
Livak, K.J., and Schmittgen, T.D. (2001) Analysis of relative
gene expression data using real-time quantitative PCR and
the 2(-delta delta C(T)) method. Methods 25: 402–408.
Loaces, I., Amarelle, V., and Mu~noz-Gutierrez, I. (2015)
Improved ethanol production from biomass by a rumen meta-
genomic DNA fragment expressed in Escherichia coli MS04
during fermentation. Appl Environ Microbiol 99: 9049–9060.
Markowitz, V.M., Chen, I.M.A., Palaniappan, K., Chu, K.,
Szeto, E., Grechkin, Y., et al. (2012) IMG: the integrated
microbial genomes database and comparative analysis
system. Nucl Acids Res 40: D115–D122.
Martens, E.C., Chiang, H.C., and Gordon, J.I. (2008) Muco-
sal glycan foraging enhances fitness and transmission of
a saccharolytic human gut bacterial symbiont. Cell Host
Microbe 4: 447–457.
Martens, E.C., Lowe, E.C., Chiang, H., Pudlo, N. A., Wu,
M., McNulty, N.P., et al. (2011) Recognition and degrada-
tion of plant cell wall polysaccharides by two human gut
symbionts. PLoS Biol 9: e1001221.
Mastropaolo, M.D., Thorson, M.L., and Stevens, A.M.
(2009) Comparison of Bacteroides thetaiotaomicron and
Escherichia coli 16S rRNA gene expression signals.
Microbiology 155: 2683–2693.
Carbohydrate transporters of gut bacteria 13
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
Nielsen, H.B., Almeida, M., Juncker, A.S., Rasmussen, S.,
Li, J., Sunagawa, S., et al. (2014) Identification and
assembly of genomes and genetic elements in complex
metagenomic samples without using reference genomes.
Nat Biotechnol 32: 822–828.
Noinaj, N., Guillier, M., Barnard, T.J., and Buchanan, S.K.
(2010) TonB-dependent transporters: regulation, struc-
ture, and function. Annu Rev Microbiol 64: 43–60.
Nouaille, S., Even, S., Charlier, C., Loir, Y.L., Cocaign-
Bousquet, M., and Loubiere, P. (2009) Transcriptomic
response of Lactococcus lactis in mixed culture with Staphy-
lococcus aureus. Appl Environ Microbiol 75: 4473–4482.
Okuda, S., and Tokuda, H. (2011) Lipoprotein sorting in
bacteria. Annu Rev Microbiol 65: 239–259.
Petersen, T.N., Brunak, S., Heijne, G.V., and Nielsen, H.
(2011) SignalP 4.0: discriminating signal peptides from
transmembrane regions. Nat Methods 8: 785–786.
Phansopa, C., Roy, S., Rafferty, J.B., Douglas, C.W.I.,
Pandhal, J., Wright, P.C., et al. (2014) Structural and func-
tional characterization of NanU, a novel high-affinity sialic
acid-inducible binding protein of oral and gut-dwelling Bac-
teroidetes species. Biochem J 458: 499–511.
Qian, Y., Yomano, L.P., Preston, J.F., Aldrich, H.C., and
Ingram, L.O. (2003) Cloning, characterization, and func-
tional expression of the Klebsiella oxytoca xylodextrin uti-
lization operon (xynTB) in Escherichia coli. Appl Environ
Microbiol 69: 5957–5967.
Rogowski, A., Briggs, J.A., Mortimer, J.C., Tryfona, T.,
Terrapon, N., Lowe, E.C., et al. (2015) Glycan complexity
dictates microbial resource allocation in the large intes-
tine. Nat Commun 6: 7481.
Roy, S., Douglas, C.W.I., and Stafford, G.P. (2010) A novel
sialic acid utilization and uptake system in the periodontal
pathogen Tannerella forsythia. J Bacteriol 192: 2285–2293.
Schauer, K., Rodionov, D.A., and Reuse, H. D. (2008) New
substrates for TonB-dependent transport: do we only see
the “tip of the iceberg?”. Trends Biochem Sci 33: 330–338.
Seydel, A., Gounon, P., and Pugsley, A.P. (1999) Testing
the ’1 2 rule’ for lipoprotein sorting in the Escherichia
coli cell envelope with a new genetic selection. Mol
Microbiol 34: 810–821.
Shin, H., Mcclendon, S., Vo, T., and Chen, R.R. (2010)
Escherichia coli binary culture engineered for direct fer-
mentation of hemicellulose to a biofuel. Appl Environ
Microbiol 76: 8150–8159.
Shipman, J.A., Cho, K.H., Siegel, H.A., and Salyers, A.A.
(1999) Physiological characterization of SusG, an outer
membrane protein essential for starch utilization by Bac-
teroides thetaiotaomicron. J Bacteriol 181: 7206–7211.
Shultzaberger, R.K., Chen, Z., Lewis, K.A., and Schneider,
T.D. (2007) Anatomy of Escherichia coli s 70 promoters.
Nucleic Acids Res 35: 771–788.
Singh, S.S., Typas, A., Hengge, R., and Grainger, D.C.
(2011) Escherichia coli p 70 senses sequence and con-
formation of the promoter spacer region. Nucleic Acids
Res 39: 5109–5118.
Solovyev, V., and Salamov, A. (2011) Automatic annotation
of microbial genomes and metagenomic sequences. In
Metagenomics and its Applications in Agriculture, Biome-
dicine and Environmental Studies. Li, R.W. (ed.). New
York: Nova Science Publishers.
Spurgeon, S.L., Jones, R.C., and Ramakrishnan, R. (2008)
High throughput gene expression measurement with real
time pcr in a microfluidic dynamic array. PLoS One 3:
e1662.
Stafford, G., Roy, S., Honma, K., and Sharma, A. (2012)
Sialic acid, periodontal pathogens and Tannerella for-
sythia: stick around and enjoy the feast!. Mol Oral Micro-
biol 27: 11–22.
Strachan, C.R., Singh, R., VanInsberghe, D.,
Ievdokymenko, K., Budwill, K., Mohn, W.W., et al. (2014)
Metagenomic scaffolds enable combinatorial lignin trans-
formation. Proc Natl Acad Sci USA 111: 10143–10148.
Tasse, L., Bercovici, J., Pizzut-Serin, S., Robe, P., Tap, J.,
Klopp, C., et al. (2010) Functional metagenomics to mine
the human gut microbiome for dietary fiber catabolic
enzymes. Genome Res 20: 1605–1612.
Tauzin, A.S., Kwiatkowski, K.J., Orlovsky, N.I., Smith, C.J.,
Creagh, A.L., Haynes, C.A., et al. (2016) Molecular dis-
section of xyloglucan recognition in a prominent human
gut symbiont. MBio 7: e02134–e02115.
Terrapon, N., Lombard, V., Gilbert, H.J., and Henrissat, B.
(2015) Automatic prediction of polysaccharide utilization
loci in Bacteroidetes species. Bioinformatics 31: 647–655.
Tokuda, H., and Matsuyama, S.I. (2004) Sorting of lipopro-
teins to the outer membrane in E. coli. Biochim Biophys
Acta 1693: 5–13.
Turnbaugh, P.J., Ley, R.E., Hamady, M., Fraser-Liggett,
C.M., Knight, R., and Gordon, J.I. (2007) The human
microbiome project. Nature 449: 804–810.
Vimrt, E.R., and Troy, F.A. (1985) Identification of an induci-
ble catabolic system for sialic acids (nan) in Escherichia
coli. J Bacteriol 164: 845–853.
Wang, Y., Chen, Y., Zhou, Q., Huang, S., Ning, K., Xu, J.,
et al. (2012) A culture-independent approach to unravel
uncultured bacteria and functional genes in a complex
microbial community. PLoS One 7: e47530.
Weglenska, A., Jacob, B., and Sirko, A. (1996) Trancrip-
tional pattern of Escherichia coli ihfB (himD) gene
expression. Gene 181: 85–88.
Yamaguchi, K., Yu, F., and Inouye, M. (1988) A single
amino acid determinant of the membrane localization of
lipoproteins in E. coli. Cell 53: 423–432.
Yan, N. (2015) Structural biology of the major facilitator
superfamily transporters. Annu Rev Biophys 44: 257–283.
Supporting information
Additional supporting information may be found in the
online version of this article at the publisher’s web-site.
14 A. S. Tauzin et al. j
VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00
electronic reprint
ISSN: 1399-0047
journals.iucr.org/d
Structural bases for N-glycan processing by mannosidephosphorylase
Simon Ladeveze, Gianluca Cioci, Pierre Roblin, Lionel Mourey, SamuelTranier and Gabrielle Potocki-Veronese
Acta Cryst. (2015). D71, 1335–1346
IUCr JournalsCRYSTALLOGRAPHY JOURNALS ONLINE
This open-access article is distributed under the terms of the Creative Commons Attribution Licencehttp://creativecommons.org/licenses/by/2.0/uk/legalcode, which permits unrestricted use, distribution, andreproduction in any medium, provided the original authors and source are cited.
Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. · N-Glycan processing by mannoside phosphorylase
research papers
Acta Cryst. (2015). D71, 1335–1346 http://dx.doi.org/10.1107/S1399004715006604 1335
Received 23 January 2015
Accepted 1 April 2015
Edited by Z. S. Derewenda, University of
Virginia, USA
Keywords: GH130 enzymes; Nglycans;
glycoside phosphorylases; human gut
microbiota.
PDB references: Uhgb_MP, apo, 4udi;
complex with mannose, 4udj; complex with
Nacetylglucosamine, 4udg; complex with
mannose and Nacetylglucosamine, 4udk
Supporting information: this article has
supporting information at journals.iucr.org/d
Structural bases for Nglycan processing bymannoside phosphorylase
Simon Ladeveze,a,b,c Gianluca Cioci,a,b,c Pierre Roblin,d Lionel Mourey,e,f
Samuel Traniere,f* and Gabrielle PotockiVeronesea,b,c*
aUniversite de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, 31077 Toulouse, France, bCNRS, UMR5504,
31400 Toulouse, France, cINRA, UMR792 Ingenierie des Systemes Biologiques et des Procedes, 31400 Toulouse, France,dSynchrotron SOLEIL, L’Orme des Merisiers, BP 48, Saint Aubin, 91192 GifsurYvette CEDEX, France, eInstitut de
Pharmacologie et de Biologie Structurale (IPBS), Centre National de la Recherche Scientifique (CNRS), 205 Route de
Narbonne, BP 64182, 31077 Toulouse, France, and fUniversite de Toulouse, Universite Paul Sabatier, IPBS,
31077 Toulouse, France. *Correspondence email: [email protected], veronese@insatoulouse.fr
The first crystal structure of Uhgb_MP, a �-1,4-mannopyranosyl-chitobiose
phosphorylase belonging to the GH130 family which is involved in N-glycan
degradation by human gut bacteria, was solved at 1.85 A resolution in the apo
form and in complex with mannose and N-acetylglucosamine. SAXS and crystal
structure analysis revealed a hexameric structure, a specific feature of GH130
enzymes among other glycoside phosphorylases. Mapping of the �1 and +1
subsites in the presence of phosphate confirmed the conserved Asp104 as the
general acid/base catalytic residue, which is in agreement with a single-step
reaction mechanism involving Man O3 assistance for proton transfer. Analysis of
this structure, the first to be solved for a member of the GH130_2 subfamily,
revealed Met67, Phe203 and the Gly121–Pro125 loop as the main determinants
of the specificity of Uhgb_MP and its homologues towards the N-glycan core
oligosaccharides and mannan, and the molecular bases of the key role played by
GH130 enzymes in the catabolism of dietary fibre and host glycans.
1. Introduction
N-linked glycans are present in many living organisms, notably
eukaryotes, and play a key role in major processes, including
cell signalling and recognition, protein stability and activity
tuning (Varki et al., 2009). These oligosaccharides, which are
covalently linked to the asparagine residues of glycoproteins,
display relatively restricted structural diversity (Lehle et al.,
2006; Larkin & Imperiali, 2011). In eukaryotes, N-glycans
share a common core structure composed of the �-d-Manp-
1,4-�-d-GlcpNAc-1,4-d-GlcpNAc (Man-GlcNAc2) trisac-
charide, carrying decorations on the nonreducing �-linked
mannosyl residue to form more complex structures (Aebi et
al., 2010; Nagae & Yamaguchi, 2012). Although the whole
pathways of human N-glycan synthesis and maturation have
been well described, little is known about their degradation,
especially by bacteria or fungi, despite the fact that degrada-
tion is a key factor in microbe–host interactions (Suzuki &
Harada, 2014). Until 2013, only glycoside hydrolases (GHs)
had been shown to be implicated in N-glycan breakdown in
the CAZy database (http://www.cazy.org/; Lombard et al.,
2014). Huge efforts have been made in recent years to
understand exactly how this hydrolytic process takes place,
particularly among gut inhabitants, as the alteration of host
glycans by microbes is thought to be related to intestinal
disorders, including Crohn’s disease and other inflammatory
ISSN 13990047
electronic reprint
bowel diseases (IBDs; Martens et al., 2008; Sheng et al., 2012).
The aim of these studies was to identify a broad consortium of
enzymes acting on different parts of the N-glycan structure,
such as exo-mannosidases and endo-mannosidases or endo-N-
acetyl-�-d-glucosaminidases produced by gut commensals and
pathogens (Roberts et al., 2000; Burnaugh et al., 2008; Renzi et
al., 2011), particularly by Bacteroides species (Martens et al.,
2009; Zhu et al., 2010).
In 2013, the first evidence for N-glycan breakdown by
phosphorolysis was published, which involved gut bacterial
mannoside phosphorylases belonging to glycoside hydrolase
family 130 (GH130; Nihira et al., 2013; Ladeveze et al., 2013).
Only two enzymes, namely the mannoside phosphorylase (EC
2.4.1.–) Uhgb_MP, an enzyme produced by an uncultivated
Bacteroides bacterium, and Bt1033, produced by B. thetaiota
omicron VPI-5482, are known to catalyze the conversion of
�-d-Manp-1,4-�-d-GlcpNAc (Man-GlcNAc) or �-d-Manp-
1,4-�-d-GlcpNAc-1,4-d-GlcpNAc (Man-GlcNAc2) and inor-
ganic phosphate into �-d-mannopyranose-1-phosphate and
GlcNAc or GlcNAc2, respectively (Nihira et al., 2013; Lade-
veze et al., 2013). CAZy subfamilies are subgroups found
within a family that share a more recent ancestor and that are
usually more uniform in molecular function, reflecting a high
degree of conservation in their active site (Aspeborg et al.,
2012). In the GH130 family, at least two enzyme subfamilies
have been identified (Ladeveze et al., 2013). Subfamily
GH130_1 gathers enzymes that are highly specific for �-d-
Manp-1,4-d-Glc, while GH130_2 contains enzymes that are
much more flexible towards mannosides. Uhgb_MP and
Bt1033 are classified in the GH130_2 subfamily, together with
40 other GH130 sequences, among which 15 originate from gut
bacterial genomes. Integration of metagenomic and genomic
data on the scale of the entire human gut microbiota revealed
that GH130_2 enzymes, especially Uhgb_MP and Bt1033,
probably play a critical role in alteration of the intestinal
barrier, as their encoding genes are particularly prevalent in
the human gut microbiome of patients suffering from IBDs
(Ladeveze et al., 2013). Based on genomic context analysis and
on the in silico detection of signal peptides, the physiological
role of Uhgb_MP and Bt1033 would be the intracellular
phosphorolyis of �-d-Manp-1,4-d-GlcNAc, which can be
internalized in the cell after extracellular hydrolysis of
N-glycans by glycoside hydrolases belonging to the GH18,
GH92 and possibly also the GH97 families (Ladeveze et al.,
2013). In addition to Uhgb_MP and Bt1033, subfamily
GH130_2 contains only one other biochemically characterized
enzyme, the RaMP2 enzyme from the ruminal bacterium
Ruminococcus albus 7. It has been suggested that this enzyme
is involved in mannan catabolism in the bovine rumen, as it
catalyzes the phosphorolysis of �-1,4-manno-oligosaccharides
(Kawahara et al., 2012). In vitro, these three enzymes present a
relaxed substrate specificity compared with all other known
mannoside phosphorylases. This property makes them extre-
mely interesting biocatalytic tools for the synthesis of diverse
manno-oligosaccharides by reverse phosphorolysis. In parti-
cular, Uhgb_MP is extremely efficient at producing N-glycan
core oligosaccharides such as �-d-Manp-1,4-d-GlcNAc and
�-d-Manp-1,4-�-d-GlcpNAc-1,4-d-GlcpNAc, the current
commercial price of which exceeds $10 000 per milligram
(Ladeveze et al., 2014). Moreover, it is the only known phos-
phorylase to act on mannans and long manno-oligosacchar-
ides. Uhgb_MP-based �-mannoside synthesis processes are
highly attractive, thanks to its flexible specificity and because it
is the only known enzyme able to produce such high added
value compounds from a hemicellulose constituent as a
substrate. Indeed, a one-pot reaction would allow Uhgb_MP
to produce �-d-Manp-1,4-�-d-GlcpNAc directly fromN-acetyl-
glucosamine and mannan following two reaction steps: a first
step of mannan phosphorolysis releasing �-d-Man-1-phos-
phate, and a second step of reverse phosphorolysis converting
�-d-Man-1-phosphate and N-acetylglucosamine into �-d-
Manp-1,4-�-d-GlcpNAc.
Currently, six GH130 enzyme structures are available in the
RCSB Protein Data Bank, sharing a common five-bladed
�-propeller fold. The crystal structure of BfMGP, a B. fragilis
NCTC 9343 enzyme classified with 78 other sequences into the
GH130_1 subfamily, was recently solved in complex with the
genuine substrates 4-O-�-d-mannosyl-d-glucose and phos-
phate and the product �-d-mannose-1-phosphate (Nakae et
al., 2013; PDB entries 3wat, 3was, 3wau and 4kmi). This
enzyme, which is involved in the final steps of mannan cata-
bolism in the human gut, exhibits a very narrow specificity
towards �-d-Manp-1,4-d-Glc (Senoura et al., 2011), like the
other characterized members of the GH130_1 subfamily
(RaMP1 from the ruminal bacterium R. albus 7 and the
RmMGP protein from the marine bacterium Rhodothermus
marinus DSM4252; Jaito et al., 2014). This pioneering study on
BfMGP highlighted a probably unique reaction mechanism
among known disaccharide phosphorylases, as the invariant
residue Asp131, which is assumed to be the general acid/base,
was not found close to the glycosidic O atom, which should be
protonated in the catalytic reaction.
The five other three-dimensional structures of GH130
enzymes available to date are the apo forms of (i) four
proteins belonging to the GH130_NC cluster, which gathers
enzymes that are not classified into the GH130_1 and
GH130_2 subfamilies, namely BACOVA_03624 and
BACOVA_02161 from B. ovatus ATCC 8483 (PDB entries
3qc2 and 4onz; Joint Center for Structural Genomics,
unpublished work), BDI_3141 from Parabacteroides distasonis
ATCC 8503 (PDB entry 3taw; Joint Center for Structural
Genomics, unpublished work) and BT_4094 from B. thetaiota
omicron VPI-5482 (PDB entry 3r67; Joint Center for Struc-
tural Genomics, unpublished work), and (ii) Tm1225 from
Thermotoga maritima MSB8 (PDB entry 1vkd; Joint Center
for Structural Genomics, unpublished work), which belongs to
the GH130_2 subfamily. No function has yet been attributed
to these five proteins, thus limiting our understanding of their
structure–specificity relationships. Until now, nothing has been
established regarding the molecular bases of the relaxed
specificity of the enzymes classified into the GH130_2 family.
In addition, no structural feature has been identified to explain
the efficiency of Uhgb_MP and Bt1033 in binding and
breaking down N-glycan core oligosaccharides.
research papers
1336 Ladeveze et al. � NGlycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346
electronic reprint
Here, we present the first crystal structure of an N-glycan
phosphorolytic enzyme, Uhgb_MP, solved by X-ray crystallo-
graphy in complex with inorganic phosphate, mannose and
N-acetylglucosamine. This study made it possible to review the
previously published three-dimensional model of Uhgb_MP
and provides key information to understand its catalytic
mechanism. Comparative analysis of this new tertiary and
quaternary structure with other GH130 structures allowed us
to identify structural features specific to GH130 subfamilies
that could explain their functional specificities and hence their
key role in mannose foraging in the human gut. This work
therefore paves the way for enzyme optimization by rational
engineering to fit industrial needs as well as for the design of
specific inhibitors to investigate, and potentially to control,
interactions between host and gut microbes.
2. Materials and methods
2.1. Recombinant production of Uhgb_MP and enzyme
purification
Uhgb_MP was produced in Escherichia coli BL21-AI cells
(Invitrogen) after its encoding gene had been cloned into the
pET-28a vector, yielding an N-terminally hexahistidine-tagged
protein (detailed procedures are provided as Supporting
Information). After purification by His-tag affinity chroma-
tography and gel filtration, the enzyme was stored in 20 mM
potassium phosphate pH 7.0, 150 mM NaCl (see Supporting
Information).
2.2. Activity measurements
Phosphorolytic activity was assessed using two substrates,
pNP-�-d-mannopyranose and �-d-mannopyranosyl-1,4-d-
mannose. All reactions were carried out with 0.1 mg ml�1
purified enzyme at 37�C (the optimal temperature for
Uhgb_MP) in 20 mM Tris–HCl pH 7.0 (the optimal pH for
Uhgb_MP). For measurement of the activity in the presence of
10 mM inorganic phosphate and 1 mM pNP-�-d-mannopyr-
anose, the pNP release rate was monitored at 405 nm using a
Cary-100 UV–visible spectrophotometer (Agilent Technolo-
gies). The release rate of �-d-mannopyranose-1-phosphate
from 10 mM inorganic phosphate and 10 mM �-d-mannobiose
(Megazyme, Ireland) was measured by quantification of �-d-
mannopyranose-1-phosphate using high-performance anion-
exchange chromatography with pulsed amperometric detec-
tion (HPAEC-PAD) as described previously (Ladeveze et al.,
2013).
2.3. Sizeexclusion chromatography multiangle laser light
scattering (SECMALLS) experiments
A 30 ml sample of gel-filtered Uhgb_MP at a concentration
of 6 mg ml�1 in 20 mM potassium phosphate pH 7.0, 150 mM
NaCl was loaded onto a Superdex 200 HR 10/300 column (GE
Healthcare, Massy, France) using an Agilent 1260 Infinity LC
chromatographic system (Agilent Technology) coupled to a
multi-angle laser light scattering (MALLS) detection system.
The protein was centrifuged for 5 min at 4�C at 10 000g before
the sample was loaded. The column was equilibrated with a
0.1 mm filtered buffer composed of 20 mM potassium phos-
phate pH 7.0, 150 mM NaCl. Separation was performed at a
flow rate of 0.4 ml min�1 at 15�C. Data were collected using a
DAWN HELEOS 8+ (eight-angle) light-scattering detector
and an Optilab T-rEX refractive-index detector (Wyatt
Technology, Toulouse, France). The results were analyzed
using the ASTRA v.6.0.2.9 software (Wyatt Technology).
2.4. Protein crystallization
Purified Uhgb_MP protein was concentrated using poly-
ethersulfone Vivaspin concentrators (Vivascience, Sartorius,
Gottingen, Germany). The concentration was determined by
measuring the A280 nm using a NanoDrop instrument
(Wilmington, Delaware, USA). All crystallization experi-
ments were carried out at 12�C by the sitting-drop vapour-
diffusion method using MRC 96-well microplates (Molecular
Dimensions, Newmarket, England) and a Nanodrop ExtY
crystallization instrument (Innovadyne Technologies, Santa
Rosa, USA) to prepare 400 nl droplets. The best Uhgb_MP
crystals were obtained within a week with a 1:1(v:v) ratio of
protein (9–12 mg ml�1 in 20 mM potassium phosphate pH 7.0,
150 mM NaCl supplemented with 5 mM mannose
and/or 5 mM N-acetylglucosamine for the co-crystallization
assays) to precipitant solution [17.5–20%(w/v) polyethylene
glycol 3350, 0.175–0.2M ammonium chloride]. Uhgb_MP
crystals grew to dimensions of 0.2� 0.08� 0.02 mm in a week.
They diffracted to a maximum resolution of 1.80 A, while
those obtained by co-crystallization with mannose, N-acetyl-
glucosamine or both diffracted to maximum resolutions of
1.94, 1.60 and 1.76 A, respectively.
2.5. Data collection and determination of the structure
X-ray experiments were carried out at 100 K. Crystals of
Uhgb_MP were soaked for a few seconds in reservoir solution
supplemented with 15%(v/v) glycerol (apo) or 15%(v/v) PEG
300 (complexes) prior to flash-cooling. Apo Uhgb_MP
diffraction data sets were collected on beamline ID23-1 at the
European Synchrotron Radiation Facility (ESRF), Grenoble,
France, while those for the complexes were collected on the
XALOC beamline at the ALBA Synchrotron, Cerdanyola del
Valles, Spain (Juanhuix et al., 2014). The diffraction intensities
were integrated and scaled using XDS (Kabsch, 2010) and 5%
of the scaled amplitudes were randomly selected and excluded
from the refinement procedure. All crystals belonged to the
orthorhombic space group P212121, with six molecules per
asymmetric unit, giving Matthews coefficients of 2.22 and
2.11 A3 Da�1 and solvent contents of 44 and 42% for the apo
forms and the three complexes, respectively. The structures
were solved by the molecular-replacement method using
Phaser (McCoy et al., 2007) from the CCP4 software suite
(Potterton et al., 2003) and chain A of the crystal structure of
Tm1225 from T. maritimaMSB8 (PDB entry 1vkd) as a search
model for the apo form. The final translation-function Z-score
was 42.8 and the R and Rfree values of the refined structure
were 0.155 and 0.190, respectively. Once solved, the apo
research papers
Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � NGlycan processing by mannoside phosphorylase 1337electronic reprint
structure was then used to solve the protein–ligand structures.
The structures of Uhgb_MP in complex with mannose, with
N-acetylglucosamine and with mannose and N-acetyl-
glucosamine were refined to final R/Rfree values of 0.150/0.193,
0.154/0.183 and 0.158/0.192, respectively, using REFMAC5
(Murshudov et al., 2011). Models were built manually in
�A-weighted electron-density maps using Coot (Emsley &
Cowtan, 2004). Water molecules were manually checked after
automatic assignment and ligand molecules were manually
fitted in residual maps. Refinement statistics are listed in
Table 1.
2.6. SAXS measurements
Small-angle X-ray scattering (SAXS) experiments were
performed on the SWING beamline at the SOLEIL
synchrotron, Gif-sur-Yvette, France. The wavelength was set
to 1.033 A. A 17 � 17 cm Aviex CCD detector was positioned
1800 mm from the sample, with the direct beam off-centred.
The resulting exploitable q-range was 0.006–0.6 A�1, where q
= 4�sin�/�, considering 2� as the scattering angle. The samples
were circulated in a thermostated quartz capillary with a
diameter of 1.5 mm and 10 mmwall thickness positioned inside
a vacuum chamber. A 80 ml volume of sample was injected
onto a size-exclusion column (Bio SEC3 300, Agilent) equi-
librated in phosphate-based buffer (20 mM potassium phos-
phate pH 7.0, 150 mM NaCl) or Tris-based buffer (20 mM
Tris–HCl pH 7.0, 300 mM NaCl supplemented with 10%
glycerol) using an Agilent high-performance liquid-chroma-
tography (HPLC) system and eluted directly into the SAXS
capillary cell at a flow rate of 200 ml min�1 at a temperature of
10�C. Samples were separated from the pushing liquid (water)
by two air volumes of 6 ml each, as described previously
(David & Perez, 2009). SAXS data were collected online
throughout the elution time and a total of 149 frames, each
lasting 2 s, were recorded separated by a dead time of 0.5 s
between frames. The transmitted intensity was continuously
measured with an accuracy of 0.1% using a diode embedded in
the beam stop. For each sample, the stability of the associated
radius of gyration and the global curve shape in the frames
corresponding to the main elution peak were checked, and the
resulting selection of curves were averaged as described
previously (David & Perez, 2009). The recorded curves were
normalized to the transmitted intensity and subsequently
averaged using Foxtrot, a dedicated in-house application. The
same protocol was applied to buffer scattering. Rg values were
determined by a Guinier fit of the one-dimensional curves
using the ATSAS package (Petoukhov et al., 2007). The P(r)
function was calculated using the GNOM program and the
corresponding ab initio envelopes were calculated using the
GASBOR program. Rigid-body SAXS modelling was
performed using the CORAL program.
research papers
1338 Ladeveze et al. � NGlycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346
Table 1Data-collection and refinement statistics for Uhgb_MP.
Values in parentheses are for the outer resolution shell.
Native (Pi) Mannose + Pi N-Acetylglucosamine + Pi
Mannose +N-acetylglucosamine + Pi
Data collectionSpace group P212121 P212121 P212121 P212121Unit-cell parameters (A, �) a = 84.1, b = 141.2,
c = 176.2,� = � = � = 90
a = 83.9, b = 140.8,c = 168.7,� = � = � = 90
a = 83.8, b = 140.9,c = 168.6,� = � = � = 90
a = 83.7, b = 140.9,c = 168.8,� = � = � = 90
No. of molecules in asymmetric unit 6 6 6 6Matthews coefficient (A3 Da�1) 2.22 2.11 2.11 2.11Solvent content (%) 44.66 41.82 41.74 41.75Wavelength (A) 0.96863 0.97949 0.97949 0.97949Resolution range (A) 48.16–1.80 (1.91–1.80) 75.11–1.94 (1.98–1.94) 46.68–1.60 (1.64–1.60) 45.43–1.76 (1.80–1.76)No. of unique reflections 190253 (27574) 148300 (21410) 261650 (41727) 197652 (31069)No. of observed reflections 927534 (101770) 1217594 (168817) 2453350 (392431) 1818257 (276166)Completeness (%) 98.00 (88.70) 99.66 (98.85) 99.84 (98.42) 99.64 (95.24)Multiplicity 4.88 (3.69) 8.20 (7.90) 9.37 (7.00) 9.19 (8.88)hI/�(I)i 10.46 (1.63) 9.30 (3.40) 14.49 (2.07) 13.70 (2.54)Rmerge (%) 9.1 (70.0) 15.5 (58.9) 9.3 (103.5) 11.9 (86.5)
RefinementRwork/Rfree 0.157/0.190 0.152/0.193 0.155/0.183 0.158/0.192Root-mean-square deviationsBond lengths (A) 0.0188 0.0181 0.0191 0.0183Bond angles (�) 1.9218 1.8528 1.8905 1.9235
Ramachandran plotFavoured (%) 91.6 91.3 91.3 91.1Allowed (%) 8.1 7.9 8.5 8.6
B factors (A2)Wilson B 24 18 22 22Mean 35 20 22 21Main chain 33 18 20 19Side chain 37 21 24 23Ligand/water 28/37 23/26 26/30 27/26
PDB code 4udi 4udj 4udg 4udk
electronic reprint
3. Results and discussion
3.1. Conformational stability optimization of Uhgb_MP
Previous work on Uhgb_MP allowed the production and
purification of a recombinant form of the protein in amounts
suitable for crystallization (Ladeveze et al., 2013). However,
owing to enzyme instability, an optimized production system
was set up by subcloning the open reading frame of Uhgb_MP
into the pET-28a vector (Supporting Information xS1).
Subcloning into pET-28a made it possible to produce a
recombinant protein with a thrombin-cleavable N-terminal
hexahistidine tag and a five-residue shortened linker between
the tag and the N-terminal extremity of the native enzyme.
After protein purification and processing in the same buffer as
previously described (Ladeveze et al., 2013), the activity on
pNP-�-d-mannopyranose was increased by 73% to 10.9 �
10�3mmol min�1 mg�1, indicating that the 16-amino-acid
linker used in the initial construct negatively impacted on the
Uhgb_MP activity. To avoid the use of Tween 80, which is not
suitable for crystallization assays, we then screened for an
optimized buffer composition by differential scanning fluori-
metry (DSF; Supporting Information xS1). In-house-prepared
96 deep-well screens adapted from Ericsson et al. (2006) were
used to assess the effect of buffer nature, pH and NaCl
concentration on protein thermal stability. The denaturation
curves revealed two fusion temperatures, Tm1 = 65.7�C and
Tm2 = 70.8�C, in the initial Tris–HCl buffer, indicating that
Uhgb_MP may adopt different oligomeric states in solution.
The DSF results showed unambiguously that phosphate-based
buffers (sodium and potassium phosphate) largely stabilize
the Uhgb_MP structure at all of the assayed pH values (5.0,
5.5, 6.0, 6.5 and 7.0) and NaCl concentrations (136, 159, 287
and 439 mM). The best result was observed using 100 mM
potassium phosphate pH 6.0 with 136 mM NaCl, leading to an
increase in Tm1 and Tm2 of 6.82 � 0.07 and 8.47 � 0.07�C,
respectively. We therefore chose to use potassium phosphate
buffer supplemented with 150 mM NaCl to purify and store
the protein produced using pET-28a::Uhgb_MP. In addition,
the pH was set to 7.0 to allow sufficient separation efficiency
of the protein in the affinity-purification step (Tm1 and Tm2
increased by 4.80� 0.30 and 6.45� 0.30�C, respectively, at this
pH value). Under these optimal conditions, the protein
production yield reached 90 mg pure protein per litre of
culture. Finally, the �-1,4-d-mannobiose phosphorolytic
activity of the enzyme stored in these conditions was increased
tenfold compared with that of enzyme previously expressed in
pDEST17 and purified in Tris buffer (Ladeveze et al., 2013).
3.2. Crystallographic structure of Uhgb_MP subunits
The overall structure of Uhgb_MP was determined by
molecular replacement using the structure of Tm1225 from
T. maritima MSB8 (PDB entry 1vkd), which shares 60%
identity with Uhgb_MP, as a model. The crystal structure of
research papers
Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � NGlycan processing by mannoside phosphorylase 1339
Figure 1Monomeric Uhgb_MP fold, substrates and interacting residues. Likeother GH130 enzymes, the Uhgb_MP monomer has a five-bladed�-propeller fold with a central catalytic furrow. The Pi, mannose andN-acetylglucosamine molecules present at the catalytic site are shown assticks, while interacting residues are shown as lines. The catalytic Asp104is shown in red, Pi-interacting residues in green, mannose-interactingresidues in blue and N-acetylglucosamine-interacting residues in orange.The Asn44 and Asp104 side chains are shown in the B conformation, i.e.the conformation that is catalytically active. Water molecules 436, 457 and656, which mediate interactions between the N-acetylglucosamine andresidues Lys212 and Tyr242, Asp304 and His235, respectively, are shownas black crosses. Protein–Pi interactions: NH2 of Arg150 is contacting PiO4 and O2, while NH2 of Asn151 contacts Pi O
4. !NH2 of the Arg168 sidechain binds to PiO
1 and its !0NH2 is interacting with PiO2. The side-chain
amine of Lys212 binds to Pi O2 and His231 N"2 is at a hydrogen-bond
distance from Pi O1. Finally, the hydroxyl of the Tyr242 side chain is in
contact with Pi O3.
Figure 2Alternative conformations of Asn44, Ser45 and the catalytic Asp104upon mannose binding in the �1 subsite. Superposition of the apostructure of Uhgb_MP (PDB entry 4udi) and Uhgb_MP complexed withmannose (PDB entry 4udj), illustrating the movement of the catalyticresidues when mannose is bound at the �1 subsite. The backbone of theapo form is shown in grey (A conformation), while the backbone of thecomplexed, catalytically active form (B conformation) is in green. Water,phosphate and glycerol molecules in the apo form are shown. 2Fo � Fc
electron-density maps are shown (contoured at 1.0�) for the catalyticresidues in the apo and mannose-bound forms. Interatomic distances arelabelled in A.
electronic reprint
Uhgb_MP revealed a homohexameric organization. The
homohexamer consisted of a trimer of dimers with D3
symmetry, with six molecules per asymmetric unit. The apo
structure was refined to 1.80 A resolution, while the
complexes with mannose, with N-acetylglucosamine and with
the two combined were refined to 1.94, 1.60 and 1.76 A,
respectively (Table 1). The electron-density maps did not
enable construction of the N-terminal extremity of the poly-
peptide chains. The N-terminal hexahistidine tag and the
following 6–8 first residues have thus been omitted from the
final model. The overall fold of each Uhgb_MP protomer
consists of a five-bladed �-propeller (Fig. 1). The catalytic
centre is located in the central cleft as previously hypothesized
(Ladeveze et al., 2013), as a phosphate ion (Pi) and mannose
and N-acetylglucosamine were observed in the central furrow.
Pi is deeply buried in the catalytic site, strongly stabilized by
hydrogen bonds and ionic interactions with the surrounding
residues (Fig. 1). Compared with other glycoside phosphor-
ylases, Pi appeared to be quite strongly bound, since the Pi
dissociation constant values previously determined for the
ternary enzyme–Pi–mannobiose and enzyme–Pi–�-d-Manp-
1,4-�-d-GlcpNAc-1,4-d-GlcpNAc complexes (0.64 and
0.13 mM, respectively; Ladeveze et al., 2013) are more than
200 times lower than that determined for RaMP1 (belonging
to the GH130_1 subfamily), the only other GH130 enzyme for
which a Pi dissociation constant has been determined. In the
apo structure, a molecule of glycerol, which was used as a
cryoprotectant, occupied the �1 subsite. This glycerol mole-
cule was hydrogen-bonded to the side-chain carboxylate
moiety of Asp304 (O1 to O3 and O2 to O1), while its O2 atom
was hydrogen-bonded to the Pi, mimicking the interactions
between mannose and the surrounding amino acids that were
observed in the protein–ligand complexes. Interestingly, in the
Uhgb_MP structures where binding of mannose in the �1
subsite occurred (PDB entries 4udj and 4udk in Table 1), the
mannose ring was found in a stressed B2,5 boat conformation
stabilized by hydrogen bonding to Asp304 (O1 to the C6
hydroxyl and O2 to the C4 hydroxyl); the mutation of this
critical residue to asparagine abolishes 96% of the activity
(Ladeveze et al., 2013). This unusual conformation of mannose
was present in all six chains (Supplementary Fig. S1) and was
previously observed in the BfMGP structures (PDB entry
3was, for which in cristallo activity was observed, and PDB
entry 3wat). It should be noted that this B2,5 boat conforma-
tion is less unstable in mannose compared with other mono-
saccharides owing to the pseudo-equatorial position of the C2–
OH, which is in an anti configuration to the ring O atom, thus
bending the C3–OH towards the �-glycosidic O atom, in a syn
axial position. The binding of mannose in the �1 subsite
induced a large conformational movement in the active site.
Indeed, in the apo form, where a glycerol molecule was
observed in place of mannose, the amino moiety of Asn44
interacts with the C3 hydroxyl of glycerol through a water
molecule (conformation A in Fig. 2). Upon mannose binding
(conformation B), the peptide bond between Phe43 and
Asn44 flips in order to allow direct interaction of the Asn44
side chain with the C3 and C4 hydroxyls of the sugar. The side
chain of the catalytic residue Asp104, mutation of which to
asparagine completely abolished the activity (Ladeveze et al.,
2013), is also moved towards mannose in a position that is
occupied by two water molecules in the apo form. In this B
conformation, Asn44 stabilizes the catalytic Asp104 through
hydrogen bonding, thereby imposing selection of the rotamer
facing the mannose C3 hydroxyl, which acts as a proton relay
during catalysis (Fig. 2). This is the first time that such a
concerted movement upon substrate binding in the �1 subsite
has been reported for a glycoside phosphorylase. It must be
noted that the B conformation of the catalytic Asp104 is
probably the one that is active since it has also been observed
for BfMGP, which was demonstrated to be catalytically active
in cristallo (Nakae et al., 2013). The A conformation of the
catalytic residue Asp104 of Uhgb_MP was observed in the
structure of the GH130_2 Tm1225 protein in the apo form
(PDB entry 1vkd). In the Uhgb_MP structures, the B
conformation was only observed when mannose was bound in
the �1 subsite, and is therefore independent of the presence
of N-acetylglucosamine in the +1 subsite. Indeed, the A
conformation was observed in the complex with N-acetyl-
glucosamine alone (PDB entry 4udg, with glycerol in the �1
subsite and N-acetylglucosamine in the +1 subsite), while the
B conformation was observed in the complex with mannose
and N-acetylglucosamine (PDB entry 4udk, with mannose in
the �1 subsite and N-acetylglucosamine in the +1 subsite). In
theA and B conformations, N-acetylglucosamine bound in the
+1 subsite was found in a 4C1 relaxed chair conformation,
stacked with Tyr103, and bound through hydrogen bonding to
the C6 hydroxyl group, the His174 imidazole ring, the Lys212
side chain and the Tyr242 hydroxyl group via a water mole-
cule. The C3 hydroxyl interacts with the Arg59 side-chain
amine moiety, as well as with Asp304 O1 through a water
molecule. The N-acetyl moiety is also involved in binding
through hydrogen bonding between its NH group and the S
atom of Met67 and between its carbonyl moiety and the
His235 carbonyl via a water molecule. The hydrophobic
methyl moiety faces the side chain of Ala207 and Met67, with
these residues forming a hydrophobic pocket (Fig. 1).
These data enabled us to revise our previous Uhgb_MP
model, which was built using the atomic coordinates of the
Tm1225 protein from T. maritimaMSB8 (PDB entry 1vkd) as
a structural template, considering a monomeric form of the
enzyme and using geometrical constraints provided by a
classical inverting GH-like single-displacement mechanism. In
this model, we previously hypothesized a +1 subsite formed by
the Tyr103, Asp304, His174, Tyr240 and Phe283 residues,
which are specifically conserved in the GH130_2 family, while
the +2 subsite would be delineated by Tyr242, Pro279, Asn280
and Asp304. In this configuration, the exit of the catalytic
tunnel would be orientated towards the inside of the oligo-
meric structure, thereby reducing access to the catalytic site. In
addition, the conserved His235–Tyr240 loop from a cognate
monomer would block the furrow that we have hypothesized.
These new data emphasize the importance of taking into
account the quaternary organization when modelling oligo-
meric enzymes, using SAXS data when possible to define the
research papers
1340 Ladeveze et al. � NGlycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346
electronic reprint
low-resolution envelope and avoiding restraining the possible
docking modes of substrates too much to envisage original
reaction mechanisms. Here, thanks to the high-resolution
crystallographic structures of hexameric Uhgb_MP, which is
catalytically active (as detailed in the next section), in complex
with mannose and N-acetylglucosamine, we propose a revised
active-site topology in which the oligosaccharide chain rotates
by 180�, inverting the �1 and +1 subsite positions. This new
orientation allows the substrate to enter from the open side of
the tunnel, and is in complete accordance with the orientation
found in the crystal structure of BfMGP in complex with its
substrates (Nakae et al., 2013).
As previously shown by kinetic analysis of Uhgb_MP,
phosphorolysis of the N-glycan oligosaccharide core follows a
mixed-type sequential random Bi-Bi mechanism (Ladeveze et
al., 2013). However, the order of substrate binding was not
determined. Functional and structural data now lead us to
suggest that the phosphorolytic catalytic mechanism is
composed of a first step in which the inorganic phosphate is
conveyed to the catalytic centre, followed by entry of the
substrate to be phosphorolyzed. Indeed, the Pi binding site is
located deeper in the catalytic site than the glycoside
substrate, meaning that Pi could not bind after the disac-
charide. Thus, mannose binding in the �1 subsite would
induce a flip of the Phe43–Asn44 peptide bond from confor-
mation A to conformation B, thus maintaining the side chain
of Asp104 in a catalytically competent configuration. In
reverse phosphorolysis, entry of mannose-1-phosphate would
be the first step, followed by conformational changes of Phe43,
Asn44, Ser45 and Asp104. Entrance of the acceptor would
lead to the reverse phosphorolysis reaction. Regarding the
phosphorolytic reaction itself, our data show that when the
mannosyl moiety was present in the catalytic site, no water
molecule was located where it could relay the proton from the
catalytic Asp104 to the interosidic O atom. In addition,
comparison between the apo and complexed forms showed
that the change in Asp104 from conformation A to confor-
mation B did not allow the catalytic aspartic acid to be at a
hydrogen-bonding distance from the interosidic O atom,
indicating that this residue is not directly involved in proton
transfer to the interosidic O atom, as seen in the BfMGP
structures. In the latter case, Nakae and coworkers suggested a
catalytic mechanism different from that of known inverting
glycoside phosphorylases (GPs), involving the assistance of
C3—OH to relay proton transfer, because, like us, they did not
observe a water molecule at the catalytic site in any of their
structures (Supplementary Fig. S2). The stressed B2,5 boat
conformation of mannose bound in the �1 subsite and the B
configuration of Asp104 which is only stabilized when
mannose present in the�1 subsite was compatible with such a
mechanism. The first step of phosphorolysis would thus be (i)
nucleophilic attack of Pi on C1 of the mannosyl moiety bound
in the �1 subsite, (ii) a two-step protonation through
Asp104 O2–Man O3–GlcNac O4, with the Asp104 side-chain
carboxylic acid being located at 2.5 A from the C3–OH
(Supplementary Fig. S2). However, even if the catalytic
mechanism of Uhgb_MP and BfMGP appears to be identical,
clear differences in the substrate specificity of GH130
research papers
Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � NGlycan processing by mannoside phosphorylase 1341
Figure 3Uhgb_MP homohexameric structure. (a) Uhgb_MP is a hexameric structure formed by a trimer of dimers. Individual monomers are shown in a singlecolour and are labelled A, B, C, A0, B0 and C0 for clarity. Each pair of dimers is coloured in pale/dark blue, green and red. (b) Close-up of the catalytictunnel of a single Uhgb_MP protomer in the hexameric structure. The quaternary-structure assembly imposes structural constraints on active-siteaccessibility. The inorganic phosphate, the mannose and the N-acetylglucosamine molecules present in the catalytic site of protomer A are shown assticks deeply buried in the catalytic site, which is accessible by a tunnel whose sides are formed by different protomers. The extremity of the tunnel islocated at the centre of the plane formed by four Uhgb_MP molecules, in this case A, A0, B and B0.
electronic reprint
subfamilies have been identified owing to structural motifs
located in other parts of the protein, as further detailed below.
3.3. Uhgb_MP quaternary structure
In the Uhgb_MP crystal structure, each subunit of the
homohexamer is roughly globular in shape. There is a large
cavity inside the homohexamer and large holes at the centre of
each of the three lateral planes formed by the homohexamer
assembly (Fig. 3a). No discretely bound solvent molecules
were found in these cavities, which are 15 A in diameter at
their narrowest point, indicating that these holes are large
enough to enable substrate access to the active site of each
protomer. The association of the surrounding subunits in the
homohexamer caps the furrow of each protomer, giving rise to
a funnel whose entrance is orientated towards the lateral
aperture (Fig. 3b). The catalytic site is therefore deeply buried,
with the phosphate ion and the �1 subsite located at the
bottom of the tunnel.
Each dimer is formed of a large buried surface area of
1300 A2 involving two Uhgb_MP molecules linked by twofold
symmetry (Fig. 4a). The interactions promoting dimer asso-
ciation involve the side chains of the residues at the interface,
such as His123 and His196 stacking, and several hydrogen
bonds or salt bridges involving side-chain atoms, such as
between Arg195 and Gln142, Glu142 and Arg193, Asp93 and
Arg195, and Glu189 and Arg193. Other polar interactions
involve main-chain atoms of Ala144 and His194, Tyr122 and
His196, and Trp191 and a symmetry mate. The homohexamer
is formed by the association of three dimers arranged around a
pseudo-threefold axis. Each dimer is related to its neighbours
through symmetrical interactions involving each of the two
protomers (covering an interaction surface of 840 A2 each;
Fig. 4b). These interactions involve T-shaped stacking between
the imidazole groups of His174 and His235 and hydrogen-
bond interactions between side chains between Asn40 and
Tyr264 and between Ser64 and Pro263. Main-chain carbonyl
and amino groups are also involved in the assembly. More
precisely, the Thr63 carbonyl is in contact with Tyr264 NH,
while the side chains of Asn238 and Asn280 interact with the
carbonyl group of Gly276 and the amide N atom of Tyr240,
respectively. Finally, the side chain of Asn238 makes contact
with the Pro279 carbonyl moiety.
SEC-MALLS and SAXS analysis confirmed the hexameric
organization of Uhgb_MP in solution. The protein apparent
molecular mass determined by SEC-MALLS was 240 kDa
(n = 6.16; Supplementary Fig. S3). Guinier analysis of the
SAXS data revealed that in phosphate and Tris–glycerol
buffers, the radius of gyration (Rg) was considerably larger
than the theoretical Rg, indicating protein aggregation. Based
on data collected in Tris–glycerol buffer in the presence of
1 mM TCEP as reducing agent, an Rg value of 37.9 � 0.08 A
was obtained, which is in good agreement with the theoretical
Rg calculated from the apo crystal structure (�37 A). The pair
distribution function P(r) revealed a compact particle with a
Dmax of �110 A that closely matches the largest dimension of
research papers
1342 Ladeveze et al. � NGlycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346
Figure 4Interaction surfaces between the different Uhgb_MP protomers. (a) The interaction surfaces between two Uhgb_MP protomers involved in dimerformation. (b) The interaction surfaces between Uhgb_MP dimers to form the hexamer. Only the interactions involving the three upper monomers fromeach dimer are shown here, in order to clarify the view, as these surfaces are symmetrical in the lower monomers.
electronic reprint
the hexamer (Supplementary Fig. S4). The ab initio envelope
confirmed the compact shape of Uhgb_MP in solution with a
trimer-of-dimers organization superimposable with the crys-
tallographic structure. SAXS-based rigid-body modelling was
attempted by taking into account the flexibility of the
N-terminal residues missing from the crystal structure. The fit
showed that the theoretical curve closely matched the
experimental data, thus confirming the general hexameric
organization of Uhgb_MP in solution (Supplementary Fig.
S4).
Considering that (i) we never observed the existence of
Uhgb_MP monomers, either in solution or in crystals, (ii) the
quaternary structures deduced from both SAXS and crystallo-
graphic data are superimposable and (iii) the intracellular Pi
concentration in bacteria is around 10 mM (Motomura et al.,
2011), meaning that Pi binds to Uhgb_MP in vivo as in the
crystals, we conclude that hexamerization is required for
enzyme activity and that the crystal structure presented here is
certainly the most probable organization under physiological
conditions.
All other known GPs belonging to the GH3, GH13, GH65,
GH94, GH112, GT4 and GT35 families crystallize and act as
homodimers. In contrast, it is difficult to find general features
that control the oligomerization of GH130 enzymes, even for
those belonging to the same subfamily. Indeed, all of the data
that we gathered on functionally or structurally characterized
GH130 enzymes showed that no single subfamily contained
homogenous oligomerization profiles. Enzymes belonging to
the GH130_NC group (the BACOVA_03624, BACOVA_
02161, BT4094 and BDI_3141 proteins) crystallized as
monomers, while the two functionally characterized proteins
Teth514_1788 and Teth514_01789 have been shown to be
dimeric and monomeric in solution, respectively (Chiku et al.,
2014). Various oligomeric forms (in solution or crystals) have
been found in the GH130_1 (hexameric BfMGP, pentameric
RmMGP and dimeric RaMP1) and GH130_2 (hexameric
Uhgb_MP, dimeric Tm1225, hexameric RaMP2 and tetrameric
Bt1033) subfamilies.
Moreover, the presence of the AxxxAxxxA motif in the
BfMGP N-terminal helix, which was thought to mediate
oligomerization (Nakae et al., 2013), was not found in RaMP1
or RmMGP, demonstrating that this particular motif is not the
only element that is able to promote the formation of oligo-
mers in enzymes belonging to the same GH130 subfamily.
Finally, no particular secondary-structure element appeared to
mediate interactions between the different Uhgb_MP proto-
mers, which were only associated by surface interactions,
without any involvement of secondary-structure elements, in
contrast to what was observed for BfMGP (the only GP of
known structure with a similar homohexameric conforma-
tion). Indeed, the BfMGP loop Thr42–Met68, which was
found in contact with the N-terminal tab helix and is thought
to help contact the cognate protomer, is completely absent in
Uhgb_MP, and more generally in all GH130_2 sequences.
Taken together, these data highlight the structural originality
of GH130 enzymes among glycoside phosphorylases. Never-
theless, the solutions of many other GH130 structures will be
required to be able to highlight possible structural markers of
oligomerization.
3.4. Molecular bases of specificity towards mannosides
Uhgb_MP is the first member of the GH130_2 subfamily to
be characterized both functionally and structurally. Tm1225
has only been structurally characterized, and no structural
data is available for the functionally characterized GH130_2
members RaMP2 and Bt1033. In contrast to the known
enzymes classified into the GH130_1 subfamily (including
BfMGP, the only crystallized member of this subfamily;
Senoura et al., 2011; Nakae et al., 2013), which exhibit a narrow
research papers
Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � NGlycan processing by mannoside phosphorylase 1343
Figure 5Sequence alignment of characterized GH130 enzymes. This alignment highlights the conservation of the catalytic and substrate-interacting residuesamong characterized GH130 enzymes, as well as family-specific loops.
electronic reprint
specificity towards �-d-Manp-1,4-d-Glc, GH130_2 enzymes
present a highly relaxed substrate specificity. Moreover, the
Uhgb_MP structure is the first structure of an inverting
glycoside phosphorylase that is active on a polysaccharide.
Indeed, the only enzymes of known structure that are able to
phosphorolyze polysaccharides are the retaining �-maltosyl
phosphate:�-1,4-d-glucan-4-�-d-maltosyltransferases belonging
to GH13 and glycogen or starch phosphorylases belonging to
the GT35 family (Egloff et al., 2001; Mirza et al., 2006).
Uhgb_MP structures were compared with the six structures
available for GH130 enzymes in order to identify any struc-
tural features that could explain the enzyme specificity, in
particular towards N-glycan oligosaccharides, long manno-
oligosaccharides and mannans. The residues involved in the
catalytic machinery and in substrate binding are highly
conserved in both the sequences and the three-dimensional
structures, with the notable exception of the side chains of
Asp104, Asn44 and Ser45 in the A configuration (Fig. 5 and
Fig. 6), with overall r.m.s.d. values after C� superposition of
2.1, 2.1, 2.1, 2.0, 2.0 and 1.0 A for BfMGP (25% identity with
Uhgb_MP), BACOVA_03624 (23% identity), BACOVA_
02161 (22% identity), Bt4094 (23% identity), BDI_3141 (25%
identity) and Tm1225 (61% identity), respectively (Fig. 6). In
addition, the Pi and glycosyl moieties in the�1 and +1 subsites
superimposed perfectly with those present in the structure of
BfMGP in complex with inorganic phosphate, mannose and
glucose. The electron density of mannose was separated from
that of N-acetylglucosamine, in contrast to what would have
been observed for the disaccharide �-d-Manp-1,4-d-GlcpNAc,
because of the impossible superimposition of mannose O1 and
N-acetylglucosamine O4. However, some structural features
that are conserved in the subfamily explain the differences in
substrate specificities observed between subfamilies (Fig. 6).
The most significant structural changes were identified in
the Uhgb_MP Gly121–Pro125 loop, which is 11 residues
longer in GH130_1 enzymes compared with those belonging
to the GH130_2 and GH130_NC clus-
ters (Fig. 6). This longer loop appeared
at the extremity of the catalytic tunnel
and, in the case of BfMGP, actually
filled it. Therefore, the accommodation
of longer substrates than disaccharide
would be impossible for GH130_1
enzymes, which is in accordance with
the biochemical data published to date
(Senoura et al., 2011; Kawahara et al.,
2012; Jaito et al., 2014). In GH130_2
enzymes the shorter loop would enable
the entry of longer substrates, such as
long manno-oligosaccharides or even
mannans for Uhgb_MP, into the large
cavity formed inside the quaternary
structure between the three lateral
planes of the homohexamer. Moreover,
the +1 subsite flexibility of the GH130_2
members, which are able to accom-
modate N-acetylglucosamine and the C2
epimer of glucose, would be explained
by the location of the Uhgb_MP Arg65
residue. Indeed, the arginine side chain
is at a distance of 6.43 A from the O2
atom compared with 2.96 A for the side
chain of the corresponding BfMGP
residue, Arg94, which would be
responsible for the specificity of the
GH130_1 members for �-d-Manp-1,4-d-
Glc through hydrogen bonding to O2 of
glucose.
In addition, the stronger specificity of
GH130_2 towards N-acetylglucosamine
at the +1 subsite compared with glucose
or mannose is explained by hydro-
phobic interactions of the N-acetyl-
methyl moiety with Met67, which is not
conserved in the GH130_1 subfamily,
research papers
1344 Ladeveze et al. � NGlycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346
Figure 6Superposition of GH130 structures. Superposition of Uhgb_MP (GH130_2; red) with mannosyl-glucose phosphorylase (BfMGP) from B. fragilis NCTC 9343 (GH130_1; PDB entry 4kmi; blue),and BACOVA_03624 from Bacteroides ovatus ATCC 8483 (GH130_NC; PDB entry 3qc2; green),illustrating the structural differences between GH130 subfamilies. The catalytic Asp104 is shown inthe B conformation. With the exception of the BfMGP loop Thr42–Met68, which does not exist inthe Uhgb_MP structure, the loops are numbered with respect to the Uhgb_MP sequence. The11-residue longer Gly121–Pro125 loop, which is specific to GH130_1, fills the entrance to theUhgb_MP tunnel. In place of the Asp61–Arg65 loop, an extension is observed for GH130_NC,capping the catalytic site. These two loops, which are specific to GH130_1 and GH130_NC,respectively, may explain the inability of enzymes belonging to these subfamilies to phosphorolyzelong substrates. Loop Asp171–Phe177 (the so-called ‘lid loop’ in BfMGP) is shorter in GH130_NCenzymes than in GH130_1 and GH130_2, thus allowing access to the active site. In addition, inGH130_1 enzymes loop Asp171–Phe177 is very mobile because of the GSGGG motif located at itsbase, which is locked close to the catalytic site only when a substrate is bound, as shown for BfMGPstructures. In contrast, in Uhgb_MP the loop is not so mobile and holds His174, which is conservedin GH130_2 and which has already been shown to be involved in the +1 subsite. The BfMGP loopThr42–Met68 in contact with the N-terminal helix involved in oligomerization is completely absentin Uhgb_MP even when both proteins are assembled as homohexamers.
electronic reprint
while being present in half of the GH130_2 members, espe-
cially those acting on �-d-Manp-1,4-d-GlcNAc (Bt1033 and
RaMP2).
Moreover, in the structures containing N-acetyl-
glucosamine, the Phe203 side chain of a cognate monomer was
found close to Met67; these two residues form a hydrophobic
pocket that interacts with the methyl group of the N-acetyl
moiety. On the contrary, in the apo form and in the structure
containing mannose alone in the �1 subsite, the Phe203 side
chain was rather found rotated towards the exit of the catalytic
tunnel. Therefore, Phe203, which is not conserved in the
GH130_1 subfamily, while being present in 25% of GH130_2
members, would thus be a specific feature of GH130 enzymes
that are able to phosphorolyze �-d-Manp-1,4-�-d-GlcpNAc.
The C3 stereochemistry at the +1 subsite also appears to be
critical since all pyranoside inhibitors of Uhgb_MP (allose,
l-rhamnose and altrose) share an inversion of configuration at
this position compared with that of mannose. This effect is
probably owing to the proximity of Tyr103 (or the equivalent
Tyr130 in BfMGP), thus implying a steric constraint that
would select an equatorial hydroxyl at this position in the case
of a 4C1 chair, which is the case for N-acetylglucosamine in
the structure of the corresponding complex. We previously
observed that a Y103E mutation strongly destabilizes
Uhgb_MP, as is the case for the wild-type enzyme without
phosphate. The Y103E mutation also significantly increases
the ratio between hydrolysis and phosphorolysis, with the
glutamic acid playing the role of the second catalytic acid
required for hydrolysis (Ladeveze et al., 2013). The role of
Tyr103 in stabilizing the active-site conformation in the
presence of phosphate is owing to hydrogen-bonding inter-
actions between its lateral chain and that of Arg150, which
interacts with phosphate (Fig. 1). The Y103E mutation would
decrease hydrogen-bonding interactions, while positioning the
glutamic acid at a hydrogen-bonding distance (less than 4 A)
from the interosidic O atom to allow hydrolysis to occur.
3.5. Significance
In this paper, we present the first structure of a phosphor-
olytic enzyme involved in N-glycan degradation in its apo
form and in complex with mannose and N-acetylglucosamine.
As previously highlighted by the integration of biochemical,
genomic and metagenomic data, Uhgb_MP and GH130
enzymes more generally can be considered as interesting
targets to study interactions between host and gut microbes,
especially since GH130_2 sequences are overrepresented in
the metagenomes of IBD patients. Further studies will be
needed to confirm the physiological role of these enzymes and
their potential involvement in damage to the intestinal barrier,
such as metabolomic and transcriptomic analyses of the gut
bacteria that produce them in the presence of N-glycans as a
carbon source or inoculated in model animals with and
without inhibitors. This three-dimensional structure paves the
way for such studies through the design of specific GH130_2
inhibitors that could mimic substrate binding.
In addition, analysis of tertiary and quaternary structures
led to the identification of structural features involved in the
accommodation of long oligosaccharides and polysaccharides.
This is a key feature that is unique to Uhgb_MP and is of great
biotechnological interest for the conversion of hemicellulose
into compounds with high added value. Identification of the
structural determinants of the strong specificity of Uhgb_MP
and other GH130_2 enzymes towards Man-GlcNAc also paves
the way for the rational engineering of GH130 enzymes
optimized for manno-oligosaccharide synthesis and diversifi-
cation. Functional investigations of structurally characterized
enzymes classified in the different GH130 sequence clusters
would significantly advance our knowledge of the molecular
bases of substrate specificities and improve our understanding
of their role in key catabolic pathways, especially in the
mammalian gut.
4. Related literature
The following references are cited in the Supporting Infor-
mation for this article: Studier (2005).
Acknowledgements
This work was supported by the French Ministry of Higher
Education and Research and by the French National Institute
for Agricultural Research (INRA, ‘Meta-omics of Microbial
Ecosystems’ research program). The equipment used for
protein purification (ICEO facility), biophysical (DSF, SEC-
MALLS) and crystallographic experiments are part of the
Integrated Screening Platform of Toulouse (PICT, IBiSA). We
thank Dr Valerie Guillet for technical assistance with SEC-
MALLS. We also thank the European Synchrotron Radiation
Facility (ESRF), Grenoble, France, in particular the staff of
beamline ID-23-1. Experiments were also performed on the
XALOC beamline at the ALBA Synchrotron (Barcelona,
Spain) with the collaboration of the ALBA staff (Dr Jordi
Juanhuix). Author contributions: Uhgb_MP production and
purification, SL; crystallographic studies and X-ray data
collection, SL, ST, GC and LM; SAXS experiments, PR, SL
and GC; DSF and SEC-MALLS experiments, SL and ST.
Experiments were designed by SL, ST and GPV. The manu-
script was written primarily by SL and GPV with contributions
from ST, GC and PR. SL, GC and PR prepared the figures.
The research leading to these results has received funding
from the European Community’s Seventh Framework
Programme (FP7/2007-2013) under BioStruct-X (grant
agreement No. 283570).
References
Aebi, M., Bernasconi, R., Clerc, S. & Molinari, M. (2010). TrendsBiochem. Sci. 35, 74–82.
Aspeborg, H., Coutinho, P. M., Wang, Y., Brumer, H. & Henrissat, B.(2012). BMC Evol. Biol. 12, 186.
Burnaugh, A. M., Frantz, L. J. & King, S. J. (2008). J. Bacteriol. 190,221–230.
Chiku, K., Nihira, T., Suzuki, E., Nishimoto, M., Kitaoka, M.,Ohtsubo, K. & Nakai, H. (2014). PLoS One, 9, e114882.
David, G. & Perez, J. (2009). J. Appl. Cryst. 42, 892–900.
research papers
Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � NGlycan processing by mannoside phosphorylase 1345electronic reprint
Egloff, M. P., Uppenberg, J., Haalck, L. & van Tilbeurgh, H. (2001).Structure, 9, 689–697.
Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132.Ericsson, U. B., Hallberg, B. M., DeTitta, G. T., Dekker, N. &Nordlund, P. (2006). Anal. Biochem. 357, 289–298.
Jaito, N., Saburi, W., Odaka, R., Kido, Y., Hamura, K., Nishimoto, M.,Kitaoka, M., Matsui, H. & Mori, H. (2014). Biosci. Biotechnol.Biochem. 78, 263–270.
Juanhuix, J., Gil-Ortiz, F., Cunı, G., Colldelram, C., Nicolas, J., Lidon,J., Boter, E., Ruget, C., Ferrer, S. & Benach, J. (2014). J.
Synchrotron Radiat. 21, 679–689.Kabsch, W. (2010). Acta Cryst. D66, 125–132.Kawahara, R., Saburi, W., Odaka, R., Taguchi, H., Ito, S., Mori, H. &Matsui, H. (2012). J. Biol. Chem. 287, 42389–42399.
Ladeveze, S., Tarquis, L., Cecchini, D. A., Bercovici, J., Andre, I.,Topham, C. M., Morel, S., Laville, E., Monsan, P., Lombard, V.,Henrissat, B. & Potocki-Veronese, G. (2013). J. Biol. Chem. 288,32370–32383.
Ladeveze, S., Tarquis, L., Henrissat, B., Monsan, P., Laville, E. &Potocki-Veronese, G. (2014). International Patent WO/2015/014973.
Larkin, A. & Imperiali, B. (2011). Biochemistry, 50, 4411–4426.Lehle, L., Strahl, S. & Tanner, W. (2006). Angew. Chem. Int. Ed. 45,6802–6818.
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. &Henrissat, B. (2014). Nucleic Acids Res. 42, D490–D495.
Martens, E. C., Chiang, H. C. & Gordon, J. I. (2008). Cell Host
Microbe, 4, 447–457.Martens, E. C., Koropatkin, N. M., Smith, T. J. & Gordon, J. I. (2009).J. Biol. Chem. 284, 24673–24677.
McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D.,Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674.
Mirza, O., Skov, L. K., Sprogøe, D., van den Broek, L. A. M.,Beldman, G., Kastrup, J. S. & Gajhede, M. (2006). J. Biol. Chem.
281, 35576–35584.
Motomura, K., Hirota, R., Ohnaka, N., Okada, M., Ikeda, T.,Morohoshi, T., Ohtake, H. & Kuroda, A. (2011). FEMS Microbiol.
Lett. 320, 25–32.Murshudov, G. N., Skubak, P., Lebedev, A. A., Pannu, N. S., Steiner,R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011).Acta Cryst. D67, 355–367.
Nagae, M. & Yamaguchi, Y. (2012). Int. J. Mol. Sci. 13, 8398–8429.Nakae, S., Ito, S., Higa, M., Senoura, T., Wasaki, J., Hijikata, A.,Shionyu, M., Ito, S. & Shirai, T. (2013). J. Mol. Biol. 425, 4468–4478.
Nihira, T., Suzuki, E., Kitaoka, M., Nishimoto, M., Ohtsubo, K. &Nakai, H. (2013). J. Biol. Chem. 288, 27366–27374.
Petoukhov, M. V., Konarev, P. V., Kikhney, A. G. & Svergun, D. I.(2007). J. Appl. Cryst. 40, s223–s228.
Potterton, E., Briggs, P., Turkenburg, M. & Dodson, E. (2003). ActaCryst. D59, 1131–1137.
Renzi, F., Manfredi, P., Mally, M., Moes, S., Jeno, P. & Cornelis, G. R.(2011). PLoS Pathog. 7, e1002118.
Roberts, G., Tarelli, E., Homer, K. A., Philpott-Howard, J. &Beighton, D. (2000). J. Bacteriol. 182, 882–890.
Senoura, T., Ito, S., Taguchi, H., Higa, M., Hamada, S., Matsui, H.,Ozawa, T., Jin, S., Watanabe, J., Wasaki, J. & Ito, S. (2011). Biochem.
Biophys. Res. Commun. 408, 701–706.Sheng, Y. H., Hasnain, S. Z., Florin, T. H. J. & McGuckin, M. A.(2012). J. Gastroenterol. Hepatol. 27, 28–38.
Studier, F. W. (2005). Protein Expr. Purif. 41, 207–234.Suzuki, T. & Harada, Y. (2014). Biochem. Biophys. Res. Commun.
453, 213–219.Varki, A., Cummings, R. D., Esko, J. D., Freeze, H. H., Stanley, P.,Bertozzi, C. R., Hart, G. W. & Etzler, M. E. (2009). Editors.Essentials of Glycobiology, 2nd ed. New York: Cold Spring HarborLaboratory Press.
Zhu, Y., Suits, M. D. L., Thompson, A. J., Chavan, S., Dinev, Z.,Dumon, C., Smith, N., Moremen, K. W., Xiang, Y., Siriwardena, A.,Williams, S. J., Gilbert, H. J. & Davies, G. J. (2010). Nature Chem.
Biol. 6, 125–132.
research papers
1346 Ladeveze et al. � NGlycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346
electronic reprint