in silico study of cancer-related genes and micrornas...

35
In silico study of cancer-related genes and microRNAs 運運運運運運運運運運運運運運運運運運運運 microRNAs Ka-Lok Ng ( 運運運 ) Department of Biomedical Informatics ( 運運運運運運運運運 ) Asia University

Upload: derron

Post on 19-Jan-2016

65 views

Category:

Documents


0 download

DESCRIPTION

In silico study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs. Ka-Lok Ng ( 吳家樂 ) Department of Biomedical Informatics ( 生物與醫學資訊學系 ) Asia University. Contents. Motivation Predict cancer genes based on microarray mRNA expression levels - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

In silico study of cancer-related genes and microRNAs運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Ka-Lok Ng (吳家樂 )Department of Biomedical Informatics

(生物與醫學資訊學系 )Asia University

Page 2: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Contents

Motivation Predict cancer genes based on microarray mRNA expression levels microRNA (miRNA) can act as an oncogene (OCG) or tumor suppressor

gene (TSG) Identify cancer-related miRNAs, their target genes, downstream protein-

protein interactions (prediction novel cancerous proteins)

(1) Introduction – microarray, cancer, microRNA(2) Methods – input data(3) Results

(a) cancer genes prediction (Bioconductor), i.e. prostate/breast cancer (b) correlation study of miRNAs and mRNA expression levels (c) ncRNAppi – A platform for studying microRNA and their target

genes’ protein-protein interactions(4) Summary

Page 3: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Central dogma of molecular biology

Post-transcription regulation – microRNA targets mRNA

transcriptome

Page 4: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Types of RNAsRNA

mRNAncRNA

Non-coding RNA. Transcribed RNA with a structural, functional or catalytic role

rRNARibosomal RNA

Participate in protein synthesis

tRNATransfer RNA

Interface betweenmRNA &

amino acids

snRNASmall nuclear RNA

-Incl. RNA thatform part of the

spliceosome

snoRNASmall nucleolar RNAFound in nucleolus,

involved in modificationof rRNA

miRNAMicro RNA

Small RNA involvedregulation of expression

OtherIncluding large RNA

with roles in chromotin structure and

imprinting

siRNASmall interfering RNAActive molecules in

RNA interference

stRNASmall temporal RNA.RNA with a role in

developmental timing

Introduction

Page 5: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

癌症的形成及97年台灣前十大主要癌症死亡原因摘要

順位 死亡原因 Cause of Death 死亡數 百分率癌症類型 Cancer Type 38,913 100%

1 肺癌 Lung Cancer 7,777 20.0%

2 肝癌 Hepatocellular Carcinoma 7,651 19.7%

3 結腸直腸癌 Colorectal Cancer 4,266 11.0%

4 女性乳癌 Female Breast Cancer 1,541 4.0%

5 胃癌 Gastric Cancer 2,292 5.9%

6 口腔癌 Oral Cavity Cancer 2,218 5.7%

7 前列腺 (攝護腺 )癌 Prostate Cancer 892 2.3%

8 子宮頸癌 Cervical Cancer 710 1.8%

9 食道癌 Esophageal Cancer 1,433 3.7%

10 胰臟癌 Pancreatic Cancer 1,364 3.5%

Page 6: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

By Hanne Jarmer, BioCentrum-DTU, Technical University of Denmark

cDNA labeled by Cy3 (Green)

cDNA labeled by Cy5 (Red)

Probe genes

Target

Microarray – overview

Page 7: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Microarrays are used to measure gene expression levels in two different conditions. Green label for the control sample and a red one for the experimental sample.

DNA-cDNA or DNA-mRNA hybridization.

The hybridised microarray is excited by a laser and scanned at the appropriate wavelenghts for the red and green dyes

Amount of fluorescence emitted (intensity) upon laser excitation ~ amount of mRNA bound to each spot

If the sample in control/experimental condition is in abundance green/red, which indicates the relative amount of transcript for the mRNA (EST) in the samples.

If both are equal yellow

If neither are present black

cDNA microarrays

Page 8: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Microarray data generation, processing and analysis

Information processing Image quantitation –

locating the spots and measuring their fluorescence intensities

Data normalization and integration – construction of the gene expression matrix from sets of spot

Gene expression data analysis and mining – finding differentially expressed genes (DEGs) or clusters of similarly expressed genes

Generation from these analyses of new hypotheses about the underlying biological processes stimulates new hypotheses that in turn should be tested in follow-up experiments

http://www.mathworks.com/company/pressroom/image_library/biotech.html

Image analysis

Data analysisclustering

Page 9: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

miRNA gene pri-miRNA (stem-loop structure) processed by Drosha pre-miRNA (65~90 bp) carried by Exportin 5 to cytoplasm mature miRNA (20~25 bp) is generated by the RNaseIII type enzyme Dicer directed by RISC to the miRNA target mRNA cleavage or impede its translation into protein

Introduction – biogenesis of microRNA

Page 10: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

When miRNA plays an oncogenic role, it targets TSG, control cell differentiation or apoptosis genes, and leads to tumor formation.

if miRNA plays the tumor suppressor role, it targets OCG, control cell differentiation or apoptosis genes, so it can suppress tumor formation.

Expect negative correlation of miRNA and mRNA expression profiles

integrate the human miRNA-targeted (or siRNA-targetd) mRNA data, protein-protein interactions (PPI) records, tissues, pathways, and disease information to establish a disease-related miRNA (or siRNA) pathway database

Introduction - miRNAs can play the role of an OCG and TSG

Page 11: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Introduction – cancer-related miRNAs

Cancer-related miRNA Cancer type References

miR-17-92   cluster, let-7 Lung cancerMartin et al., 2006, Yanaiharaet a. 2006, Takamizawa et al., 2004

miR-10b, miR-21, miR-125b, miR-145, miR-155

Breast cancer Iorio et al., 2005, Si et al., 2007

miR-18, miR-122a, miR-224, miR-199a, miR-199a*

Liver cancerMurakami et al., 2006, Meng et al., 2007, Gramantieri et al., 2007

miR-195, miR-125a, miR-200a,miR15, miR-16

B-CLLCalin et al., 2004Calin et al. 2002

Page 12: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

A platform for studying miRNAs and cancerous target genes

miRNA

mRNA

miRNA-mRNAanti-correlation pairs

Annotation:TAG known OCG, TSG or CRGOMIM disease genesKEGG cancer pathways

Annotation:miR2Disease – disease related miRNAChromosomal fragile sitesmiRNA clusters info.CpG island proximal miRNA

TarBASE data Experimentally verified miRNA-mRNA pairs

NCI-60 cancer data:Expression profileof miRNA and mRNA

  Breast CNS Colon Lung Leukemia Melanoma Ovarian Prostate Renal

No. of Cell Lines 5 6 7 9 6 10 7 2 8

Number of cell lines for the nine cancer types in the NCI-60 data sets

Page 13: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

miRNA, target gene, protein-protein interaction (PPI)

Tissue specific miRNA or siRNA target, and its PPI partners up to the second level If the upstream miRNA (or siRNA) is defective, its effect could be amplified

downstream. As an illustration, given that a miRNA (or siRNA) targets gene TG, which has two

successive PPI partners, i.e. proteins L1 and L2; and suppose that genes TG and L2 are involved with the same disease, then it is highly probably that gene L1 is also related to the same disease quantify by enrichment analysis

miRNAor siRNA protein (mRNA is

suppressed)

protein

protein (TF)

protein

TG L1 L2BP/MF x y zOverlap BP/MF n1 n2

Page 14: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Input data and Methods

Databases : ArrayExpress

64 prostate cancer tissue and 18 normal prostate tissue samples’ raw data files with U95Av2

TAG (Tumor Associated Gene) NCI-60 – miRNA and mRNA gene expression profiles for 9 cancer types TarBase – miRNA targets (experimental verified) miR2Disease

a comprehensive resource of miRNA deregulation in various human diseases OMIM – human disease information KEGG – cancer pathways information ncRNAppi

a useful tool for identifying ncRNA target pathways PPI data (BioIR) – Seven databases are integrated: HPRD, DIP, BIND,

IntAct, MIPS, MINT and BioGRID Gene Ontology (GO) – Biological Function, Molecular Process annotations Tool: Bioconductor

Page 15: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

ResearchProtocol

Page 16: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Term Enter command in R environment

1 library("affy")

2 library("limma") 3 eset<-justRMA()4 design<-cbind(normal=c(rep(1,18),rep(0,64)),DM=c(rep(0,18),rep(1,64)))5 fit<-lmFit(eset,design)6 cont.matrix<-makeContrasts(DMvsNo=DM-normal,levels=design)7 fit2<-contrasts.fit(fit,cont.matrix)

8 fit2<-eBayes(fit2)

9 topTable(fit2,number=100,adjust="BH")10 genenames <- as.character(topTable(fit2,number=100,adjust="BH")$ID)11 adj.P_Val<-signif(topTable(fit2,number=100,adjust="BH")$adj.P.Val,digits=3)12 logFC <-signif(topTable(fit2,number=100,adjust="BH")$logFC ,digits=3)13 library("XML")14 annotation(eset)15 library("annotate")16 library("hgu95av2.db")17 absts <- pm.getabst(genenames,"hgu95av2.db")18 library("annaffy")19 atab <- aafTableAnn(genenames,"hgu95av2.db", aaf.handler())20 stattable <- aafTable("logFC " = logFC , "adj_P.Val" = adj.P_Val)

21 table <- merge(atab, stattable)

22 saveHTML(table, file = "report.html",title="Significant gene list and its annotation information")

Predict DEGs using R and Bioconductor commands

Page 17: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Results – DEGs predicted by Bioconductor The result of the top 100 DEGs (either up or down) Eliminate duplicated genes, the predicted total number of DEGs is 85,

and the adjusted p-value of all DEGs are less than 1.9 * 10-5. TAG ∩ DEGs 14 known cancer genes among the 85 predicted DEGs

(16.5%)

Page 18: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Results – miRNAs, DEGs and cancer types

Other DEGs

Page 19: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Results - The relationship among miR-20a, TGFBR2 and human prostate cancer

16461460http://ppi.bioinfo.asia.edu.tw/R_cancer/

Page 20: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

A platform for studying miRNAs and cancerous target genes

Page 21: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

A platform for studying miRNAs and cancerous target genes

miRNA

mRNA

miRNA-mRNAanti-correlation pairs

Annotation:TAG known OCG, TSG or CRGOMIM disease genesKEGG cancer pathways

Annotation:miR2Disease – disease related miRNAChromosomal fragile sitesmiRNA clusters info.CpG island proximal miRNA

TarBASE data Experimentally verified miRNA-mRNA pairs

NCI-60 cancer data:Expression profileof miRNA and mRNA

  Breast CNS Colon Lung Leukemia Melanoma Ovarian Prostate Renal

No. of Cell Lines 5 6 7 9 6 10 7 2 8

Number of cell lines for the nine cancer types in the NCI-60 data sets

Page 22: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

A platform for studying miRNAs and cancerous target genes

For a given cancer tissue type, we calculated both the PCC and SRC, , between the is given by,

where xi and yi denote the expression intensity of miRNA and the miRNA's target gene respectively.

One of the troubles with quantifying the strength of correlation by PCC is that it is susceptible to be skewed by outliers. Outliers that are a single data point can result in two genes appearing to be correlated, even when all the other data points not. SRC is a non-parametric statistical method that is robust to outliers.

The PCC and SRC are calculated for:

Three Affymetrix chips: U95(A-E), U133A, U133B

Normalization methods: GCRMA, MAS5, RMA

n

i

n

i ii

n

i ii

yyxx

yyxx

1 1

22

1

)()(

))((

Page 23: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Test of hypothesis of PCC and SRC

The Pearson product-moment table to test the significance of a PCC result. The hypothesis being tested is a one-tailed test. A different test is applied for the SRC results.

Critical values for one-tailed test using Pearson and Spearman correlation at a significant level of a equal to 0.05 and 0.10.

Page 24: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Results – hsa-miR-1:AXL, PCC and SRC calculations

Cases where both PCC and SRC are less than or equal to -0.5.

Page 25: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Results – hsa-miR-10b:HOXD10

miR2Disease - hsa-mir-10b initiated diseases, i.e. leukemia, breast, colon, ovarian, prostate cancers.

Another example:hsa-miR-21:PTEN (TSG)hsa-miR-15b: BCL2 (TSG)hsa-miR-16: BCL2 (TSG)

Page 26: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Extension - works in progress

Validate how good is correlation prediction Adding further information

– CpG island, miRNAs located around CpG islands (i.e., miR-34b, miR-137, miR-193a, and miR-203) are silenced by DNA hypermethylation in oral cancer

miRNA clusters, fragile sites

Positive correlated miRNA:mRNA pairs may involving TFs

Page 27: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

ncRNAppi – miRNA, target genes, PPI, andthe protocol of enrichment analysis

There is a tendency for two directly interacting proteins participate in the same biological process or share the same molecular function. Let a miRNA targeting pathway denoted by miRNA – TG – L1 – L2. We propose to rank the pathway result according to the number of overlapping of the biological processes (or molecular functions) between TG and L1, and between L1 and L2. The Jaccard coefficient (JC) is used to rank the significance of a pathway. JC of set A and B is defined by

where and denote the cardinality of and respectively.

||

||

BA

BAJC

|| BA || BA BA BA

miRNAor siRNA protein 

(mRNA is suppressed)

protein

protein (TF)

protein

JC(TG,L1) JC(L1,L2)

Page 28: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

ncRNAppi – The protocol of enrichment analysis

The biological process (BP) and molecular function (MF) annotations are carried from Gene Ontology, which is used to characterize the path TG – L1 – L2, and the JC for the pathway is given by,

where and denote the JC score of the biological process for segment TG – L1, and the TG – L1– L2 pathway respectively.

)]2,1()1,([2

1)2,1,( LLJCLTGJCLLTGJC BPBP

aveBP

)1,( LTGJCBP )2,1,( LLTGJC aveBP

Page 29: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

ncRNAppi – The protocol of enrichment analysis, p-value

We assigned a p-value to every JC calculation, this provides a measure of the statistical significance. Here is how we estimate the p-value. Let N be total number of BP found in GO. Assume that TG, L1 and L2 have x, y and z BP annotations respectively. Also, let n1 and n2 be the number of identical BP for TG – L1 and L1 – L2 respectively. Let p1 and p2 be the probabilities that TG – L1 and L1 – L2 have n1 and n2 common BP (or MF) terms respectively, which are defined as;

and

Ny

Nx

xNny

nNnx

Nn

CC

CCCp

1

1

11

1

Nz

Ny

yNnz

nNny

Nn

CC

CCCp

2

2

22

2

TG L1

x-n1 n1 y-n1

N

Page 30: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

ncRNAppi – Extension of TarBase targets

Limitations of miRNA target prediction tools

There are many tools available for miRNA target genes prediction, such as miRanda, TargetScan, and RNAhybrid etc. A major problem of miRNA target genes prediction is that the prediction accuracy remains uncertain, there was report indicated that the false positive rate could be as high as 24-39% for miRanda, and 22-31% for TargetScan. If the miRNA:mRNA targeting part is uncertain, then the ‘Level 1’ and ‘Level 2’ protein-protein interaction pathways derived from PPI database are doubtful.

Page 31: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

ncRNAppi – Extension of TarBase targets

miRNA target prediction tool – miRanda

Mature human miRNA FASTA sequences is downloaded from miRBase (the latest version is 13).

Then, we predict the possibilities of miRNA binding with OCG, and TSG. Target prediction tool, miRanda, allows for fining tuning of certain parameters, i.e. MFE threshold, score, shuffle statistics, gap open and gap extension scores. We set MFE threshold and the shuffle statistics to -25 kcal/mol and ON respectively. The rest of the parameters are set to their default values. Once the binding lists of OCG and TSG obtained, then their PPI pathways can be retrieved from the BioIR database.

Page 32: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

ncRNAppi provides web-based data access and allows disease assignment for a specific node along miRNA (siRNA) targeting pathways. For example Select miRNA ID – hsa-let-7 Checks the ‘OMIM Disease type for individual node’ box labeled with ‘Target’ and ‘Level-2’ Choose the item ‘lung tumor’ under the ‘TUMOR TYPE’ pull-down menu (OMIM) Select ‘Yes’ under the “Common expression of target, Level-1 and level-2 nodes in KEGG” pathways are ranked according to the Jaccrad index and p-value for BP or MF

Results - ncRNAppi

Example1)hsa-let72)Unigene: liver3)Target, L1 and L2 are OCG4)submit

Page 33: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Summary

The R and Bioconductor are used to predict DEGs using prostate cancer microarray data. By integrating the Tumor Associated Gene (TAG), ncRNAppi and miR2Disease databases, it is found that certain DEGs are regulated by microRNAs.

A platform for studying miRNAs and cancer target genes(1) PCC and SRC results are used to quantify the correlation between miRNA and

its target expression profiles. The predicted results are annotated with reference to the TAG, OMIM, miR2Disease and KEGG data sets.

(2) The main advantage of the two platforms on miRNA-mRNA targeting information is that all the target genes information and disease records are experimentally verified.

ncRNAppi platformncRNAppi provide a powerful tool for identifying cancer-related miRNAs or siRNAs. For instance, the tool allows the possibilities of predicting novel caner genes through tissue or disease specific search. This platform is useful for investigating the regulatory role of miRNAs and siRNAs for cancer study.

Page 34: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

AcknowledgementNational Science Foundation

Professor S.C. Lee (李尚熾 ) - Chung Shan Medical University

Mr. Liu Hsueh-Chuan (劉學銓 ) – former graduate student at Asia University

Mr. C.W. Weng (翁嘉偉 ) – former graduate student at Asia University

Mr. Kevin Lo (羅琮傑 ) – MSc. graduate student at Asia University

Page 35: In silico  study of cancer-related genes and microRNAs 運用微晶片篩選癌症基因及探討其上游之調控 microRNAs

Thank you for your attention.