1 importance of –omics and systems biology wen-hsiung li ( 李文雄 ) ecology and evolution...

Importance of –omics and Systems Biology

Wen-Hsiung Li ( 李文雄 )

Ecology and EvolutionUniversity of Chicago

Biodiversity and Genomics Research CenterAcademia Sinica, Taiwan

What is -omics?

It is the suffix of Genomics, Proteomics, Transcriptomics, etc.

We shall start with the introduction of genome, proteome, transcriptome, etc.

Transcription

Translation

Gene (DNA)

polypeptideor protein

Central dogma: information flows from DNA through RNA to protein

What is a genome?

In all bacteria and eukaryotes, the genetic (hereditary) material is DNA. A gene is a DNA sequence that serves one

or more functions. Genes are arranged on chromosomes. A chromosome may contain not only

genes but also regulatory elements and non-coding DNA.

The genome of an organism is a complete set of the genetic material in the organism. That is, the genome should contain all the genetic information of the organism.

The human genome:

The first 22 chromosomes and the X & the Y.

For example: Humans have 22 pairs of autosomes and two sex chromosomes X and Y. So the human genome consists of the 22 autosomes, and one X and one Y chromosome.

What is Genomics? (1)

Genomics is the study of genomes.

The first step to study a genome is to determine the entire DNA sequences of the genome.

One major purpose is to identify all the

genes in the genome.

Another purpose is to identify all the regulatory elements in the genome.

What is Genomics? (2)

It also aims to understand the structure of the genome such as how genes and regulatory elements are arranged in the genome and which parts of the genome are functional and which parts are non-functional.

A segment of the E. coli genome

Modified from Lodish et al. 1999

9Modified from Lodish et al. 1999

A segment of the human genome

10Modified from Gregory 2005

E. coli

Genome sizes

Rough estimates of gene copy numbers

Genomes Gene number

Human 22,000

Mouse 24,000

Chicken 16,700

Pufferfish 21,800

Ciona intestinalis 14,000

fruitfly 14,000

worm 20,000

Budding yeast 6,000

E. coli 4,200

What is a transcriptome?

The transcriptome of an organism refers to the total set of RNA transcripts that the organism can produce.

For a single cell organism such as a

bacterium, it is all the RNA transcript that the cell is capable of producing.

In a multicellular organism, the

transcriptome includes all the RNA transcripts that all the cells in the organism can produce.

What is a transcriptome? (2)

In a complex organism such as human, we may also talk about the transcriptome of an organ or a tissue such as the liver.

Even in an unicellular organism such an E. coli cell, the RNA transcripts can vary drastically with external environments. Thus, the transcriptome of a cell under a condition reflects the genes that are active under the condition.

Microarrays

DNA microarrays are a powerful tool for obtaining large amounts of gene expression data.

When many genes of a genome are determined, we can use the gene sequences to design DNA hybridization probes and spot them on a glass (chip). The chip can then be used to study the expression profile of many genes. The profile includes the timing of on-and-off and the peak of the expression.

Types of microarrays 1. cDNA arrays: Spot the cDNA sequences

of the genes you want to study on the glass. Each spot is a specific cDNA and is very tiny.

2. Oligo arrays: Instead of the entire cDNA, you select only a specific segment of the gene sequence as your probe and synthesize the DNA (or by PCR). Spot each probe on the glass.

cDNA probe: longer→ stronger hybridizationOligo probe: shorter, more specific. Cannot be too short. 40 nucleotides or longer

Types of microarrays

3. Affymetrix arrays: Instead of spotting, it chemically synthesizes each probe directly on the glass. Each probe is usually 25 nucleotides long. Many probes are usually selected for a gene. Also, for comparison for each probe another probe with a mismatch in the middle is synthesized on another spot. This practice, however, may be eliminated.

Print the Chip:

• Pattern of genes expressed in a cell is usually characteristic of its current state

• Virtually all differences in cell state or type are correlated with changes in mRNA levels of many genes

• Understanding the function of uncharacterized genes by comparison of expression patterns

• Combine with metabolic schemas to understand how pathways are changed under varying conditions

Gene Expression Studies

Competitive HybridizationCompetitive HybridizationCancer

NormalCell

Hybridization

Scan red

Scan green

Compute Differential Expressions

Competitive cDNA Microarray(http://www.bioteach.ubc.ca/MolecularBiology/microarray/cDNA-array.jpg)

cDNA Microarray Experiment

Hybridization Detection

Quantization

Prepare cDNA probe

(http://www.accessexcellence.org/AB/GG/microArray.html)

http://sequence.aecom.yu.edu/bioinf/microarray/reader.html

Differential Expression

Lashkari et al. (1997)

Overexpression in the untreated sample

Overexpression in the treated

sampleHigh and equal

expression between untreated

and treated samples

Low and/or equal expression between untreated and treated samples

cDNA Microarray Image

Chen et al. (1997)

Proteome (1)

The proteome of an organism is the set of proteins produced by it during its life.

It may also refer to the expressed proteins at a given time point under a defined condition, or to the proteins expressed in a cell, tissue, or organ.

Proteome (2)

The proteome is larger than the genome, especially in eukaryotes, in the sense that there are more proteins than genes, owing to alternative splicing of genes and post-translational modifications such as glycosylation or phosphorylation.

What is Proteomics?• The large-scale study of proteins,

particularly their structures and functions.

• Much more complicated than genomics: An organism's genome is constant, but a proteome varies from cell to cell & changes through its biochemical interactions with the genome and the environment. One organism has radically different protein expression in different parts of its body, different stages of its life cycle and different environmental conditions.

Technologies for proteomics (1)

• 2-D gel electrophoresis– Separates proteins in a mixture on the

basis of their molecular weight and charge

• Mass spectrometry– Reveals identity of proteins

• Protein chips– A wide variety of identification methods

Technologies for proteomics (2)

• Yeast two-hybrid method– Determines how proteins interact with

each other• Biochemical genomics

– Screens gene products for biochemical activity

2-D gel electrophoresis

• Polyacrylamide gel• Voltage across both axes

– pH gradient along first axis neutralizes charged proteins at different places

– pH constant on a second axis where proteins are separated by weight

• x–y position of proteins on stained gel uniquely identifies the proteins

BasicAcidic Hig

Differential in gel electrophoresis• Label protein samples

from control and experimental tissues– Fluorescent dye #1

for control– Fluorescent dye #2

for experimental sample

• Mix protein samples together

• Identify identical proteins from different samples by dye color

withbenzoicacidCy3

withoutbenzoicacidCy5

Caveats associated with 2-D gels

• Poor performance of 2-D gels for the following reasons:– Very large proteins

– Very small proteins

– Less abundant proteins

– Membrane-bound proteins

• Presumably, the most promising drug targets

Mass spectrometry

• Measures mass-to-charge ratio

• Components of mass spectrometer

– Ion source

– Mass analyzer

– Ion detector

– Data acquisition unit

A mass spectrometer

Identifying proteins with mass spectrometry

• Preparation of protein sample– Extraction from a gel– Digestion by proteases — e.g., trypsin

• Mass spectrometer measures mass-charge ratio of peptide fragments

• Identified peptides are compared with database– Software used to generate theoretical peptide mass

fingerprint (PMF) for all proteins in database– Match of experimental readout to database PMF

allows researchers to identify the protein

Limitations of mass spectrometry

• Not very good at identifying minute quantities of protein

• Trouble dealing with phosphorylated proteins

• Doesn’t provide concentrations of proteins

• Improved software eliminating human analysis is necessary for high-throughput projects

A schematic of the yeast two-hybrid method

Results from a yeast two-hybrid experiment

• Goal: To characterize protein–protein interactions among 6,144 yeast ORFs– 5,345 were successfully cloned into yeast as

both bait and prey– Identity of ORFs determined by DNA

sequencing in hybrid yeast– 692 protein–protein interaction pairs– Interactions involved 817 ORFs

Caveats associated with the yeast two-hybrid method

• There is evidence that other methods may be more sensitive

• Some inaccuracy reported when compared against known protein–protein interactions

– False positives

– False negatives

Protein-protein interactions

Most proteins function in collaboration with other proteins, and one goal of proteomics is to identify which proteins interact. This often gives important clues about the functions of newly discovered proteins.

Methods: The traditional method is yeast two-hybrid analysis. New methods include protein microarrayss, immunoaffinity chromatography followed by mass spectrometry, and combinations of experimental methods such as phage display and computational methods.

Other -omes

Large-scale high-throughput technologies have led to other –omes.

For example:

Metabolome refers to the complete set of metabolites of an organism or a cell.

What is Systems Biology? (1)

Systems biology is the study of biological systems. It includes the study of

(1) what the components or parts of the system are,

(2) how the interactions between components of a system can give rise to the function and behavior of the system

(3) the dynamics and stability of the system, and

(4) how the failure of one component may affect the function of other parts or the system.

What is Systems Biology? (2)

Approaches: (1) Reductionist approach: Look at

components individually but do not try to integrate observations from different parts.

(2) Systems approach: Observe, through quantitative measures, multiple components simultaneously and rigorously integrate data from different components with mathematical models.

Examples of Biological Systems

Biological networks: (1) Regulatory networks (2) Protein-protein interaction

networks (3) Others such as genetic networks

a. autoregulation b. multi-component loop c. feed forward loop

d. single input motif (SIM) e. multi-input motif (MIM)

f. regulator chain

Models of regulation (Lee et al. 2002). Blue circles are TFs; red squares are target genes.

A regulatory network in yeast

Regulation of gluconeogenesis by Cat8p and Sip4p

Snf1(α)Snf4(γ) β

Alternative β-subunits:Sip1, Sip2, Gal83

Fermentable sugar(ex. Glucose, ……)

URECAT8

Cat8 P

CSRESIP4

Sip4 P

CSREgluconeogenic genes

(ICL1, MLS1, MDH2, FBP1, PCK1,ACS1, ADH2, JEN1, SFC1, IDP2, and more.....)

URE: upstream regulatory elementCSRE: carbon source-responsive element

Modified from Schuller, 2003

Histone contact

Exon 3

圖 1 為酵母菌轉錄因子 Cat8p 和 Sip4p 所調控的基因調控網路的一部分。細胞中 Snf1 蛋白的激酶複合體，活性原本被培養基中的可發酵醣類 (fermentable sugar)所抑制，當缺乏可發酵醣類時可再被活化。活化的Snf1 激酶複合體可能具有雙重功能 : 一為將 Mig1p 抑制子去活化，二為透過磷酸化 Cat8p 和 Sip4p 蛋白而使其活化。去活化的 Mig1p 無法再抑制 CAT8 基因的表現， Cat8p 轉錄因子的生合成經由 Snf1 激酶複合體的後轉譯俢飾而活化，進一步刺激另一轉錄因子Sip4p 的表現。 Cat8p 與 Sip4p 兩者同時透過去抑制化的作用，都對有關醣質新生 (gluconeogenic) 的結構基因 (structural genes) 表現有所貢獻，其中 Cat8p 更是主要的活化因子。 Cat8p 與 Sip4p 都藉由與啟動子(promoter) 上的碳來源反應元素 (CSRE) 結合，進而促進有關醣質新生的基因，如 ICL1, MLS1 等的表現。

Yeast protein interaction network

Applications to biofuel research

What is the relevance of -omics to biofuel research?

Main feedstocks for current generation biofuels

• Biodiesel --- Soybean

• Ethanol -- Corn (U.S.) Sugarcane (Brazil)

Major steps in biomass conversion

• From feedstocks to cellulose and hemicellulose

• From cellulose or hemicellulose to component sugars

• From sugars to ethanol

狼尾草狼尾草

Next generation:Renewable Energy Biomass Program

• The vast bulk of plant material is cell wall, which consists of cellulose (40-50%), hemicellulose (20-30%), and lignin (20-30%), depending on plant species.

• The race now is to develop technology to use cellulose and hemicellulose for ethanol production.

(http://www.jsxnw.gov.cn)

Rice Straw as a Rice Straw as a Source of Source of BiofuelsBiofuels

Napiergrass Napiergrass 狼尾狼尾草草 as a Biofuel as a Biofuel CropCropAdvantages: fast growth,

disease resistance, adaptability, minimal management, easy to propagate

Composition of Napier GrassComposition of Napier Grass

Protein – 10%Protein – 10%Carbohydrate – 10%Carbohydrate – 10%

Sugar – 8%Sugar – 8% Lignin – 10-12% Lignin – 10-12%

Cellulose + hemicellulose – 60-62%Cellulose + hemicellulose – 60-62%

Napiergrass and Rice: Genetics Napiergrass and Rice: Genetics and Genomicsand GenomicsBreeding of Napier grass for high productivity and high cellulose content to reduce cost

Establishment of Napiergrass tissue culture and transformation system for future improvement

Expression of endoglucanase and other lignocellulolytic enzymes in Napiergrass and rice as a bioreactor or for autohydrolysis

狼尾草狼尾草

A combination of 3 enzymes is required to degrade Cellulose:

Cellobiohydrolases (Exoglucanasesexo-b-1,4-glucanases)

Endoglucanases (endo--1,4-glucanases, EG)

-Glucosidases (BGLU)

Endo-cellulase (Endoglucanase)

Endo-cellulase breaks internal bonds to disrupt the crystalline structure of cellulose, exposing individual cellulose polysaccharide chains

Exo-cellulase (Exoglucanase)

• It cleaves 2-4 units from the ends of the chains produced by endocellulase, resulting in tetrasaccharides or disacharide (cellobiose).

• Two main types of exo-cellulases

(cellobiohydrolases, CBH): one type works processively from the reducing end, and the other works processively from the non-reducing end of cellulose.

Type 1

Type 2

Beta-glucosidase (Cellobiase)

Beta-glucosidase hydrolyses cellobioses into monosaccharides.

cellobiase

The key step is to breakdown cellulose into glucose and hemicellulose into xylose

Two main obstacles in cellulose breakdown

• Lignins prevent access of cellulose to enzyme attack.

• Cellulose in crystalline form cannot be degraded efficiently by cellulases.

So, there is great need to find powerful cellulases to break down cellulose into glucose

How to look for good cellulases?

It is mainly from microbes from decaying composes of rice straw, sugarcane bagasse, etc., or from guts (stomach) of termites, grasshoppers, cattle, etc.

Now suppose you have found a microbe that seems to possess excellent cellulases. What can you do?

Of course, you want to identify the genes that code for the cellulases

But how do you do that?

Identification of cellulase genes in an organism (1)

Traditional approach: Try to isolate the enzyme. Sequence a small segment of the enzyme and use the amino acid sequence to design PCR primers to amplify the gene from genome DNA. Or use the amino acid sequence to design hybridization probes and hybridize them to the cDNA library to isolate the cDNA for the gene.

Identification of cellulase genes in an organism (2)

Genomic approach: Sequence the genome and annotate the genes, using genes identified in other genomes, especially from related genomes.

From the annotated genes try to see if there are genes annotated as cellulase genes. If yes, try to select candidate genes to test for good cellulase activities.

• It can efficiently degrade lignin and gain access to cellulose and hemicellulose of plant cell walls.

• The genomic sequence is completed.

• Its genome contains genes for cellulase, xylanase, and lignin degrading enzymes, which can be explored for biomass conversion and industrial usage.

Phanerochaete chrysosporium, a white rot fungus

How can transcriptomic study help identify cellulase genes in a microbe?

From selected candidate genes for cellulases, one can design oligonucleotide sequences as probes and spot them on chips.

One can then feed the organism with cellulose or rice straw powder and obtain the RNAs to produce cDNAs. Hybridize the cDNAs to the chip to see which cellulase genes have high expressions.

An example from Clostridium thermocellum to design an oligo probe

• Go to http://genome.jgi-psf.org (Joint Genome Institute, DOE) to download genomic sequence of Clostridium thermocellum.

• Retrieve sequences of putative EXG, EG, BGLU, and carbohydrate metabolic genes.

A total of 180 genes were selected:

Cellulosome and GHs

Carbohydrate metabolism

Regulatory proteins

• Use Picky algorithm (download from http://www.complex.iastate.edu/download/Picky/) to design 55 - 60mer oilgonucleotide corresponding to unique sequence in each gene.

• Print slides.

Grow C. thermocellum with following carbohydrates sources:

• Cellobiose

• Avicel (pure cellulose)

• Microcrystalline (cellulose in crystalline)

• Sugarcane Bagasse ( 甘蔗渣 )

• Grass ( 狼尾草 )

• Rice straw ( 稻草稈 )

1. Isolate RNAs from different cultures.

2. Perform microarray experiments using RNA from the cellobiose culture as the control.

3. Hybridize the cDNAs to the chip to see whether different cellulase genes are affected by different carbohydrate sources.

The bottom line:

• Transcriptomic study can help us make intelligent choices in selecting candidate cellulase genes for further studies.

Two approaches for bioethanol production:

• Direct cellulase treatment Not efficient and expensive• Consolidated bioprocessing (CBP)

Engineering of a microbe or a group of compatible microbes that can carry out cellulase production, hydrolysis, and fermentation, in a single process.

Combining cellulase production, hydrolysis, and fermentation into one single process (systems biology approach) .

A major obstacle in CBP

Such microbes are currently unavailable or inefficient.

The key requirement for CBP is a microbe or a group of compatible microbes that can carry out cellulase production, hydrolysis, and fermentation in a single process.

• Genetics and genomics are well understood.

• Genetic and metabolic engineering are not too difficult

• It is highly efficient in converting sugars to ethanol

• It has relatively high ethanol tolerance• It grows fast in sugars

But: It has no cellulase activity

Yeast as a starting point?

Aim 1. Transcriptome and regulatory network studies of Clostridium thermocellum, Phanerochaete chrysosporium and interesting Taiwan microbial isolates during growth on different biomass feedstocks.

Our Aims

• A rare organism that is both cellulolytic and ethanogenic.

• It produces a cellulase system, cellulosome, highly active on crystalline cellulose.

• A promising organism for the industrial process, Consolidated Bioprocessing (CBP), that directly converts cellulosic materials to ethanol by fermentative microorganism.

Clostridium thermocellum as a starting point

A schematic diagram of the C. thermocellum cellulosome

Demain, Newcomb and Wu (2005) Microbiol. Mol. Biol. Rev. 69: 124-154.

• It can efficiently degrade lignin and gain access to cellulose and hemicellulose of plant cell walls.

• The genomic sequence is completed.

• Its genome contains genes for cellulase, xylanase, and lignin degrading enzymes, which can be explored for biomass conversion and industrial usage.

Phanerochaete chrysosporium, a white rot fungus

• We found that both Clostridium thermocellum and Phanerochaete chrysosporium can grow on powders of rice straw, sugarcane bagasse, or P. alopecuroides ( 狼尾草 ), even without any pretreatment.

• Microarray analyses indicate that glycosyl hydrolase genes and cellulosome enzyme encoded genes were differentially regulated in C. thermocellum grown under different substrates.

A serious disadvantage for Clostridium thermocellum and its relatives is that its genetics is little understood and there is no transformation tool.

Aim 2. Metabolic and genetic engineering of fungal and bacterial strains for efficient conversion of cellulose and hemicellulose to ethanol.

• Identify and test combinations of EXG and EG and BGLU from different microbial species that can efficiently degrade cell wall of different feedstocks.

Steps in metabolic engineering of S. cerevisiae for efficient conversion of cellulose to ethanol.

Step 1: Select cellulase genes from microbial species.

Step 2: Cloning of EXG, EG, and BGLU genes into appropriate yeast expression vector.

94From Yan-Ping Shih et al. Protein Sci 2002; 11: 1714-1719

Employing a “Sticky ends PCR cloning” for high throughput cloning of cellulase genes

Clone into an EcoR1 & Xho1 double digested yeast expression vector

Step 1: Select cellulase genes from the white rot fungus Phanerochaete chrysosporium.

Step 2: Cloning of EXG, EG, and BGLU genes into appropriate yeast expression vector.

Step 3: Transform resulting constructs into a host yeast strain with desired properties, including vigorous fermentation capability and high ethanol tolerance.

Step 4. Select transformants expressing proper enzyme activity.

Step 5. Test combinations of yeast strains expressing EG, EXG, and BGLU that can efficiently utilize different sources of cellulose for ethanol production.

Aim 3: To develop genetic engineering techniques for microorganisms• The ability to express foreign genes in cellulotic

or thermophilic microorganisms is needed.

• We will develop transformation techniques and forward and reverse genetic approaches for organisms with promising properties.

• Such techniques will be used in engineering microbes expressing various lignocellulolytic enzymes or utilizing multiple sugars for fermentation.

To do list:

• Determine cellulose, hemi-cellulose, and lignin degradation capability.

• Optimize growth conditions.• Physiological studies and metabolic

profiling.• Gene expression profiling.• Genomic sequencing of desirable

microbes. • To identify novel cellulases or other

enzymes for improving efficiency of biomass conversion.

Thanks

1 importance of –omics and systems biology wen-hsiung li ( 李文雄 ) ecology and evolution...

Documents

multi-omics and ai on translational medicine research and...

high-spatial-resolution multi-omics atlas sequencing of...

dedicated to my brothers rajan and dasan · 2020. 4....

expert systems with application 36 (2009) chun-hsiung lee,...

no. 3multi-omics...

precision environmental health monitoring by longitudinal...

felix f. wu, wen-hsiung e. liu, lars holten, anders ... ·...

thermo fisher connect omics comparator マニュアル...3...

new benchmarking joint multi-omics dimensionality reduction...

auxin - nejdéle a nejlépe známý fytohormon ·...

multi-platform 'omics analysis of human ebola virus ......

foro encuentro de omics - registro nuevo usuario

oscal t.-c. chen and chia-hsiung liu

achim tresch computational biology ‘omics’ - analysis of...

06.20246.jau hsiung wang

journal of diabetes and metabolism - omics … subjects and...

multi-omics integration analysis robustly predicts...

personal omics profiling reveals dynamic...

pharmacovigilance-2015, august 10-12, 2015 @crowne plaza...

单击此处编辑母版标题样式 about omics group omics...