DREAM4 Puzzle – inferring network structure from microarray data
Qiong Cheng
Outline
Gene Network Gene Regulatory Systems and Related Work FunGen: Reconstructing Biological Networks
Using Conditional Correlation Analysis ARACNE: Algorithm for Reconstructing
Accurate Cellular Network
Gene Network
Directed network– nodes : genes – edges : regulation– including loops– Scale-free:
• Degree distribution:– power law
P(k) ~ k-λ
Genetic Network Generation Schematic
Jong Modeling and simulation of genetic regulatory systems: a literature review. J. Comput Biol 2002;9(1):67-103
Random Network Model
ER model– each pair of nodes connected by an edge with
probability p– Independence of the edges– poisson degree distribution (e.g. P(k) ~ e-k for k)
BA model– Scale-free distribution ( P(k) ~ k-x )– Process:
new nodes prefer attached to already high degree nodes
http://arxiv.org/pdf/cond-mat/0010278
Random Network Model
Module extraction from source random scale-free network (used by DREAM3)
– Hierarchical scale-free network– Extraction: Random seed node + iteratively adding
neighbor nodes with highest modularity Q
Marbach D, Schaffter T, Mattiussi C, and Floreano D (2009) Generating Realistic in silico Gene Networks for Performance Assessment of Reverse Engineering Methods. J Comput Biol, 16(2):229–239
m
kkPijPABBss
mQ ji
ijjiijT
2 ; ;
4
1
Microarray Data Distributions
Benford’s law ( in base 10):
P(D)=log10(1+D-1)
Zipf’s law: microarray data log-normal distribution as a potential distribution for normalization of the bulk of the corrected spot intensities
Noise
Source: “Make Sense Of Microarray Data Distributions”
Reverse Engineering Clustering + … Correlation measures + …
Optimization method– Bayesian network (conditional independence via DAG)– Markov chains– Dynamic Bayesian network– Expectation maximization (max likelihood)– GA – Neuron network
Simulation– Piecewise-linear differential equations– Stochastic equations– Stochastic/hybrid petri-net– Boolean network
Regression techniques
FunGen : Reconstructing Biological Networks Using Conditional Correlation Analysis
Synthetic network Network dynamics Simulation protocol - perturbation Conditional correlation
– Correlation is symetric– Matrix is non-symetric– May lead to indirect connection
False positive (indirect connection) + false negative (noise)– error = FP/(FP+TN) + FN/(FN+TP)
Reduce false positive– Choose optimal ρ_opt– Triangle reduction construction
ARACNE: Algorithm for Reconstructing Accurate Cellular Network Assume two-way interaction: pairwise potential determines all statistical
dependencies + uniform marginal distributions Mutual information (MI) = measure of relatedness
Independency Data processing inequality: if genes g1 and g3 interact through g2 then
ARACNE starts with network so for every edge look at gene triplets and remove edge with smallest MI
Ignore the direction of the edges Reconstruct tree-network topologies exactly
– higher-order potential interactions will not be accounted for (ARACNE’s algorithm will open 3-gene loops).
– A two-gene interaction will be detected iff there are no alternate paths.
i ii
ii
ypxp
yxp
MyxI
)()(
),(log
1),(
2
1
2
2
)(
2
1)(
d
xxxpe
Mxp ji
ji
j
jijiji
d
yyxx
Mdyxp 2
2
22
22 2
)()(exp
2
1),(
)()(),( iif 0),( jijiji ypxpyxpyxI
ARACNE – Example & Evaluation
Synthetic networks: ER , BAPerformance to be assessed via Precision-Recall curves (PRCs)
Example:
ratio) success (expected inferredcorrectly nsinteractio trueoffraction Precision FPTP
TP
NN
N
ones inferred all among nsinteractio trueoffraction recallFNTP
TP
NN
N
(Demo) Sample input data file
Input_file_name.exp
N = 3 # genes
M = 2 # microarrays
Input file has N+1=4 lines
each lines has M+2 (2M+2) fields
AffyID HG_U95Av2 SudHL6.CHP ST486.CHPG1 G1 16.477367 0.69939363 20.150969 0.5297595G2 G2 7.6989274 0.55935365 26.04019 0.5445875G3 G3 8.8098955 0.5445875 21.554955 0.31372303
header line
annotation name
Microarray chip names
(value,p-value)-chip1
Source from ARACNE slides
(Demo, cont’d) Sample output data file
input_data_file_name[non-default_param_vals].adj# lines = N = # genes
G1:0 8 0.064729G2:1 2 0.0298643 7 0.0521425G3:2 1 0.0298643G4:3 8 0.0427217G5:4 5 0.403516G6:5 4 0.403516 6 0.582265G7:6 5 0.582265 9 0.38039G8:7 1 0.0521425 8 0.743262G9:8 0 0.064729 3 0.0427217 7 0.743262 9 0.333104G10:9 6 0.38039 8 0.333104
AffyID ID# Associated gene ID# MI value
9
14
8 10
7
2 3
6
5
Source from ARACNE slides