ph.d. dissertation - integrative network analysis framework for multiple omics data using...
Post on 05-Aug-2015
45 Views
Preview:
TRANSCRIPT
Integrative network analysis framework for multiple omics data
using information-theoretic measure
정보이론 척도 기반 다중 오믹스 데이터 통합 네트워크 분석 프레임워크
Department of Information and Computer Engineering
정현환
May 20th, 2015
2
Outline
• Introduction• Motivation• Problem statement• Previous studies• Main purpose
• Proposed method• Mutual information• Outcome-guided network construction• Integrative network construction• Software (MINA)
• Experiments• Simulation study• Real data analysis
• Conclusion
3
INTRODUCTION
4
Motivation
• Interaction network• A representation of the
entities and their interactions of large-scale complex system
• Gene regulatory network, protein-protein interaction network, biological pathway, etc.
• Application• Protein function prediction• Disease gene prioritization• Clinical outcome prediction• …
A gene-gene interaction network of ovarian cancer subtype 1 (Hofree et al. 2013)
5
Problem statement (1/2)• Construction of the interaction network from
high-dimensional ( features) omics data
Sample ID X1 X2 … XP
Diseasestatus
S001 +0.3 -2.0 -10.3 O
S002 -0.1 -7.0 -11.1 O
S003 +1.2 +3.0 -5.0 X
S004 +0.9 +0.5 -3.2 X
…
Gene expression data Interaction network
P genes(features)
Nsamples
6
Problem statement (2/2)• Multi-omics data integration
“Importantly, integrative interpretation of the data will help identify how the consequences of mutations vary across tissues, with important therapeutic implications”
(TCGA et al., 2013)
(Kim et al. 2015)(TCGA et al., 2013)
7
Previous study (1/2)
Taxonomy of the computational network construction methods
• Interaction network construction by computational approach
8
Previous study (1/2) (Cont’d)• Interaction network construction methods
using mutual information measure
9
Previous study (2/2)• Data integration
(Kim et al., 2015)
10
Main Purpose (1/2)• Outcome-guided network construction using the mutual
information measure
MYO3A
SWI5
surv
ival
rate
survival month
11
Main Purpose (2/2)• network analysis framework for data integration
• Module finding• Prediction• Topology analysis• Pathway-inference• ...
Multi-omics datawith outcome
population
Outcome-guided network
Application
Single profile networks
Integrative network
12
Purpose of the study
• Outcome-guided network construction
• Integration of the outcome-guided network
• Utility of outcome-guided network and integrated network
13
PROPOSED METHOD1. Mutual information2. Outcome-guided network construction3. Network integration4. MINA
14
Mutual information• Association measure in information theory• Measure the linear/non-linear association between two
random variables
• The measure widely used in GWAS to measure strength of association between interaction of SNPs and traits(Leem et al., 2014, Hu et al., 2011)
𝐼 ( 𝑋1 , 𝑋 2;𝑌 )=𝐻 (𝑋 1 , 𝑋 2 )+𝐻 (𝑌 )−𝐻 (𝑋 1, 𝑋 2 ,𝑌 )
pair of features binary outcome : entropy of
15
Outcome-guided mutual information network construction
Mutual information
# of
edg
es
𝜽θ=𝑚𝑎𝑥 𝑖≠ 𝑗 𝐼 avg (𝑖 , 𝑗 )
𝐼 avg (𝑖 , 𝑗 )= 130
∑𝑝=1
30
𝐼avg (𝑔𝑖 ,𝑔 𝑗 ;𝑌 𝑝 )
𝜃∗ (1+𝜶 )
𝐺𝛼𝑝𝑟𝑜𝑓𝑖𝑙𝑒= {(𝑔𝑖 ,𝑔 𝑗 )|𝑔𝑖 ,𝑔 𝑗∈𝑃 𝑎𝑛𝑑 𝐼 (𝑔𝑖 ,𝑔 𝑗 ;𝑌 )≥𝜃(1+𝛼)}
: threshold: additional parameter : gene: outcome : p-th permuted outcome
16
Network integration (1/3)
• Integration based on edge occurrence
• Integration using network fusion technique
17
Network integration (2/3)
• Integration based on edge occurrence
• : integrated network with co-occurrence edges• : integrated network with one-or-more occurrence
edges
18
Network integration (3/3)• Integration using network fusion technique
Outcome-guided networks Fusion iterations Integrative network
Similarity network fusion (Wang et al., 2014)
𝑝(1)
𝑝(2 )𝑝(𝑐 )=
𝑃 𝑡(1)+𝑃𝑡
(2)+…+𝑃 𝑡(𝑚)
𝑚
𝑝𝑡(1)
𝑝𝑡(2 )
: affinity matrix : kernel matrix : final fused network
19
Development of integrated network analysis framework
• MINA : Mutual Information-based integrative Network Analysis framework• Easy to use
• Preprocessing of input data• Construction of outcome-guided networks• Network integration
• Written in C++ with OpenMP library• Runs in less than 3 hours for features on desktop
• Open source program• Publically available at https://github.com/hhjeong/MINA
20
Overview of MINA
21
Experiments• Simulation study• Real data analysis
• KARE dataset• TCGA dataset
22
Simulation study (1/2)• Five simulated models from Ma et al. 2011
• Expression data with 50 genes, 200 samples• Generated from multivariate normal distribution
• 2 disease status(affected/unaffected)
• Ground-truth of the simulated models• Fully connected sub-network of 25 genes (scenario 1~4)• Multiple sub-networks (mixed model, scenario 5)
• Performance measure (accuracy)• How many edges of the sub-network are correctly found?
• Comparisons with previous methods• ARACNE, CLR, MRNET, MRNETB
23
Simulation study (2/2)
• Performance assessment (accuracy)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦=𝑇𝑃+𝑇𝑁
𝑇𝑃+𝐹𝑃+𝐹𝑁 +𝑇𝑁
Number of edges correctly found+
Number of edges correctly not found
24
Experiments• Simulation study• Real data analysis
• KARE dataset• TCGA dataset
25
Real data analysis (1)Gastritis dataset in KARE project
• Korean Association REsource project• 185,426 Single Nucleotide Polymorphisms(SNPs)
• 3 types of genotype – AA, Aa and aa• 3,770 samples for gastritis
• Affected/unaffected by the disease history of the samples
Mapping the SNP-SNP associations onto the gene-gene interaction
26
KARE dataset - significant pairs• There were approximately 2~4% significant pairs
among all possible pairs in the chromosomes.
27
KARE dataset – network topology
Network centralization: 0.953Clustering coefficient : 0.848
Chromosome 18 Chromosome 20
28
KARE dataset - Functionality assessment
• 18 enriched GO terms detected in the studychromosome TERM Term Fold Enrichment FDR
1 GOTERM_CC_FAT GO:0005886~plasma membrane 1.35 1.93E-02
1 GOTERM_CC_FAT GO:0044459~plasma membrane part 1.51 1.97E-02
2 GOTERM_MF_FAT GO:0004908~interleukin-1 receptor activity 34.22 8.88E-03
7 GOTERM_CC_FAT GO:0042995~cell projection 2.67 2.47E-02
12 GOTERM_CC_FAT GO:0005626~insoluble fraction 2.40 2.98E-02
12 GOTERM_BP_FAT GO:0006811~ion transport 2.65 3.23E-02
16 GOTERM_MF_FAT GO:0005509~calcium ion binding 2.80 5.14E-03
17 GOTERM_MF_FAT GO:0003774~motor activity 5.88 2.38E-02
19 GOTERM_BP_FAT GO:0006350~transcription 2.70 5.40E-14
19 GOTERM_BP_FAT GO:0051252~regulation of RNA metabolic process 2.79 1.61E-12
19 GOTERM_BP_FAT GO:0045449~regulation of transcription 2.35 5.17E-12
19 GOTERM_BP_FAT GO:0006355~regulation of transcription, DNA-dependent 2.77 8.93E-12
19 GOTERM_MF_FAT GO:0003677~DNA binding 2.14 1.27E-07
19 GOTERM_MF_FAT GO:0008270~zinc ion binding 2.10 7.07E-07
19 GOTERM_MF_FAT GO:0046914~transition metal ion binding 1.92 2.99E-06
19 GOTERM_MF_FAT GO:0046872~metal ion binding 1.64 1.98E-05
19 GOTERM_MF_FAT GO:0043169~cation binding 1.62 3.31E-05
19 GOTERM_MF_FAT GO:0043167~ion binding 1.60 7.36E-05
29
Experiments• Simulation study• Real data analysis
• KARE dataset• TCGA dataset
30
Real data analysis (2) – Ovarian cancer dataset in TCGA• Measurements for 10,022 genes of 340 cancer patients in
three different genomic level• Survival month classification:
• Short-term(<36 month), long-term(otherwise)
Genomic profile Platform Data Type
CNA Affymetrix SNP 6 Discrete(by GISTIC 2.0)
mRNA Agilent microarray Continuous
Methylation Illumina Infinium HumanMethylation27 Continuous
31
TCGA dataset - Distribution of mutual information for different genomic levels• Cumulative distribution of mutual information value for each
genomic level.
Mutual information
log10
¿𝐸
∨¿
¿
32
TCGA dataset - Significance for survivability of the association• Kaplan-Meier estimator used to verify statistical
significance of the associations• Effects of the extracted association with and shows higher
significance than the effects of single genes.
𝛼
−log10
(𝑝−𝑣𝑎𝑙𝑢𝑒
)
33
TCGA dataset – survival analysis with the outcome-guided network
Network-based Cox-regression (Zhang 2013 et al.)Prediction of survivability
surv
ival
rate
survival month
Applying outcome-guided network
34
TCGA dataset - Network-based Cox-regression
• Comparison prediction power between association network and interaction network for each profile
mea
n(tim
eAU
C))
CNA mRNA methylation
35
TCGA dataset – network topology(Integration based on edge occurrence, )
• Integrated network construction scheme shows greatly enhanced level of scale-freeness.
networks
0.745
0.749
0.842
0.950
: coefficient of determination (model-fitting index)
36
TCGA dataset - Functionality assessment(Integration based on edge occurrence, )
• Comparisons of enrichment GO terms between one-or-more occurrence network and single genomic level network
𝐼 ∃
𝐼 ∃
𝐼 ∃
37
Spectral clustering
TCGA dataset – Spectral clustering(Integration using network fusion technique)
Integrative network Common modules
38
• Enrichment test for co-expression terms in MSigDB
Cluster Number Number of genes Represented enrichment terms
1 3,041Genes whose expression in suboptimally debulked ovarian tumors is associated with survival prognosis.
2 1,675Genes up-regulated in epithelial ovarian cancer (EOC) biopsies: invasive (TOV) vs low malignant potential (LMP) tumors.
3 2,607 Genes up-regulated in SKOV3ip1 cells (ovarian cancer) upon knockdown of EZH2 by RNAi.
4 2,699 Genes down-regulated in SKOC-3 cells (ovarian cancer) after YB-1 (YBX1) knockdown by RNAi.
TCGA dataset – Spectral clustering(Integration using network fusion technique)
39
Conclusion
• Contributions• Outcome-guided network construction
• Shows better statistical significance & biological functionality detection
• Improves survivability prediction power• Integration of the outcome-guided network
• Greatly enhances the biological significance• Development of software
• Future works• Outcome prediction with the integrative network• Software improvement• Application to other domains
40
PUBLICATIONS
41
Publications (1/2)
1. Hyun-hwan Jeong, So Yeon Kim, Kyubum Wee and Kyung-Ah Sohn. “Investigating the Utility of Clinical Outcome-guided Mutual Information Network in Network-based Cox Regression”. BMC systems Biology (2015).
2. Hyun-hwan Jeong and Kyung-Ah Sohn. “Relevance epistasis network of gastritis for intra-chromosomes in the KARE cohort study”. Genomics & Informatics (2014).
3. Sangseob Leem, Hyun-hwan Jeong, Jungseob Lee, Kyubum Wee, Kyung-Ah Sohn. “Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure”. Computational Biology and Chemistry (2014).
4. Kyung-Ah Sohn, Joshua Ho, Djordje Djordjevic, Hyun-hwan Jeong, Peter Park, Ju Han Kim. “hiHMM: Bayesian non-parametric joint inference of chromatin state maps”. Bioinformatics (advanced access) .
42
Publications (2/2)
5. Hyun-hwan Jeong, Sangseob Leem, Kyubum Wee and Kyung-Ah Sohn. “Integrative network analysis for survival-associated gene-gene interactions across multiple genomic profiles in ovarian cancer” . Journal of ovarian research (in revision)
6. Sangseob Leem, Hyun-hwan Jeong, Jungseob Lee, Kyubum Wee, Kyung-Ah Sohn. “MIBE: a software package for fast detection and interpretation of high-order epistatic interactions in genome-wide association study”. BioData Mining (submitted)
7. Jaeyeon Lee, Ho-min Park, Hyun-hwan Jeong, Kyung-Ah Sohn. “RecPAL: Recommending Problems of Adequate Level for Personalized Learning”. Expert Systems (submitted)
43
Conference presentations
1. Hyun-hwan Jeong, Garam-Lee, Kyung-Ah Sohn. “Integrative analysis for outcome-guided gene networks from multiple omics profiles”. ISMB 2015 (Poster, will present).
2. Hyun-hwan Jeong, So Yeon Kim, Kyubum Wee and Kyung-Ah Sohn. “Investigating the Utility of Clinical Outcome-guided Mutual Information Network in Network-based Cox Regression”. APBC 2015 (Oral).
3. Hyun-hwan Jeong, So Yeon Kim, Kyubum Wee, Kyung-Ah Sohn. “Outcome-guided mutual information networks for investigating gene-gene interaction effects on clinical outcomes”. ISB/TBC 2014 (Poster).
4. Hyun-hwan Jeong, Kyubum Wee, Kyung-Ah Sohn. “Detection of pair-wise genomic interactions associated with clinical outcome in ovarian cancer patients using information theoretic measure”. APBC 2014 (Poster).
5. Hyun-Hwan Jeong, Sangseob Leem, Kyubum Wee. “High-order epistatic interaction detection using clique finding algorithm in genome-wide association studies”. TBC/ISCB 2013 (Poster).
44
References (1/2)
• Hofree et al., “Network-based stratification of tumor mutations”, Nature methods 2013, 10:1108-1115.
• TCGA et al., “The Cancer Genome Atlas Pan-Cancer analysis project”, Nature Genetics 2013, 45:1113-1120.
• Kim et al., “Methods of integrating data to uncover genotype-phenotype interactions”, Nature methods 2015, 16:85-97.
• Ma et al., “COSINE: Condition-Specific sub-network identification using a global optimization method”, Bioinformatics 2011, 27(9): 1290-1298.
• Wang et al., “Similarity network fusion for aggregating data types on a genomic scale”, Nature methods 2014, 11:333-337.
45
References (2/2)
• Zhang et al. “Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment.”, PLoS ONE 2013, 9(3):e1002975.
• Butte a J, Kohane IS, “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements”, Pacific Symp Biocomput 2000:418–429.
• Margolin et al., “ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context”, BMC Bioinformatics 2006, Suppl 1:S7.
• Meyer et al., “MINET: A R/Bioconductor package for inferring large transcriptional networks using mutual information”, BMC Bioinformatics 2008, 9:461.
감사합니다
top related