rationalizing cmap gene expression readouts via … cmap gene expression readouts via target...
Post on 05-Jul-2018
235 Views
Preview:
TRANSCRIPT
Rationalizing CMap Gene Expression Readoutsvia Target Prediction
Nolen Joy PerualilaNon-Clinical Statistics Conference 20149 October 2014
RESEARCH GROUP
Hasselt University,BelgiumNolen Joy PerualilaZiv Shkedy
Durham University,UKAdetayo Kasim
Cambridge University,UKAakash Chavan RavindranathAndreas Bender
Janssen Pharmaceutica NV,BelgiumLuc BijnensWillem TalloenHinrich W.H. Gohlmann
QSTAR http://qstar-consortium.org
N. J. Perualila NCS2014 2/18
OUTLINE
1 BackgroundMechanism of Action (MoA) of Compounds
2 Data SourcesTarget prediction DataGene expression Data
3 Analysis Flow
4 ResultsAssociating Protein Targets to compoundsAssociating Genes with compoundsGene-set EnrichmentUsing Pathways to understand MoABiclustering of CMap Gene expression data
5 Discussion
N. J. Perualila NCS2014 3/18
MOA OF COMPOUNDS
Aim: To find subsets of compounds that share similar target prediction andgene expression profiles.
N. J. Perualila NCS2014 4/18
Connectivity Map In silico
EARLY DRUG DISCOVERY
The development ofevery drug begins withthe search for a target onwhich the drug can act.
Lead compounds mustbe able to bind well tothe target protein like akey into a lock.
If a drug binds to oneprotein, its drug target,it may also bind toanother one (or many)(non-selective ligands).
N. J. Perualila NCS2014 5/18
Compound Protein Target
EARLY DRUG DISCOVERY
The development ofevery drug begins withthe search for a target onwhich the drug can act.
Lead compounds mustbe able to bind well tothe target protein like akey into a lock.
If a drug binds to oneprotein, its drug target,it may also bind toanother one (or many)(non-selective ligands).
N. J. Perualila NCS2014 5/18
Compound - Protein Target
EARLY DRUG DISCOVERY
The development ofevery drug begins withthe search for a target onwhich the drug can act.
Lead compounds mustbe able to bind well tothe target protein like akey into a lock.
If a drug binds to oneprotein, its drug target,it may also bind toanother one (or many)(non-selective ligands).
N. J. Perualila NCS2014 5/18
Compound - Protein Target
EARLY DRUG DISCOVERYThe development ofevery drug begins withthe search for a target onwhich the drug can act.
Lead compounds mustbe able to bind well tothe target protein like akey into a lock.
If a drug binds to oneprotein, its drug target,it may also bind toanother one (or many)(non-selective ligands).
Which drugs will bind to which protein?
N. J. Perualila NCS2014 5/18
image source: http://vds.cm.utexas.edu/
OVERVIEW: TARGET PREDICTION TOOL
Calculates the likelihood of binding for every protein target (seeKoutsoukas,2011).
N. J. Perualila NCS2014 6/18
.
COMPOUNDS, TARGETS, AND GENES
Ligand-bindingmodifies thebiological functions ofprotein target, a seriesof target-relateddownstream genes arethen influenced.
Drugs sharingcommon targets resultin similargene-expressionprofiles.
N. J. Perualila NCS2014 7/18
image source: http://cc.scu.edu.cn/G2S/eWebEditor/uploadfile/20120810155023582.jpg
COMPOUNDS, TARGETS, AND GENES
Ligand-bindingmodifies thebiological functions ofprotein target, a seriesof target-relateddownstream genes arethen influenced.
Drugs sharingcommon targets resultin similargene-expressionprofiles.
N. J. Perualila NCS2014 7/18
image source: http://cc.scu.edu.cn/G2S/eWebEditor/uploadfile/20120810155023582.jpg
DATA SOURCES
T =
T11 T12 . . . T1I
T21 T22 . . . T2I
. . . .
. . . .
. . . .
TJ1 TJ2 . . . TJI
X =
X11 X12 . . . X1I
X21 X22 . . . X2I
. . . .
. . . .
. . . .
XG1 XG2 . . . XGI
.
1 The target prediction scorematrix (binary)(J targets x I compounds)
Tji =
{1 compound i hit target j,0 otherwise.
2 The gene expression matrix(G genes x I compounds)
Xgi = expression level ofgene g for compound i
.
N. J. Perualila NCS2014 8/18
APPLICATION: CONNECTIVITY MAP
I = 35 compounds.MC7 cell line,6 hours after exposure,dose at 10 micromolars.
G' 2400 genes afterpre-processing.
J = 477 protein targets.
N. J. Perualila NCS2014 9/18
ANALYSIS FLOWStep I
Step II
Target-prediction
based Clustering
of Compounds
a cluster of compounds
Fisher’s Exact
Test: top K targets
Pathways
LIMMA: top N
differentially
expressed genes
Pathways/ MLPoverlap
For every target-driven compoundcluster
What are theshared targets?
What genes aredifferentiallyexpressed?
Whichbiologicalpathways areaffected?
N. J. Perualila NCS2014 10/18
ANALYSIS FLOWStep I
Step II
Target-prediction
based Clustering
of Compounds
a cluster of compounds
Fisher’s Exact
Test: top K targets
Pathways
LIMMA: top N
differentially
expressed genes
Pathways/ MLPoverlap
For every target-driven compoundcluster
What are theshared targets?
What genes aredifferentiallyexpressed?
Whichbiologicalpathways areaffected?
N. J. Perualila NCS2014 10/18
ANALYSIS FLOWStep I
Step II
Target-prediction
based Clustering
of Compounds
a cluster of compounds
Fisher’s Exact
Test: top K targets
Pathways
LIMMA: top N
differentially
expressed genes
Pathways/ MLPoverlap
For every target-driven compoundcluster
What are theshared targets?
What genes aredifferentiallyexpressed?
Whichbiologicalpathways areaffected?
N. J. Perualila NCS2014 10/18
inout
1 0Target j
cluster
ANALYSIS FLOWStep I
Step II
Target-prediction
based Clustering
of Compounds
a cluster of compounds
Fisher’s Exact
Test: top K targets
Pathways
LIMMA: top N
differentially
expressed genes
Pathways/ MLPoverlap
For every target-driven compoundcluster
What are theshared targets?
What genes aredifferentiallyexpressed?
Whichbiologicalpathways areaffected?
N. J. Perualila NCS2014 10/18
ANALYSIS FLOWStep I
Step II
Target-prediction
based Clustering
of Compounds
a cluster of compounds
Fisher’s Exact
Test: top K targets
Pathways
LIMMA: top N
differentially
expressed genes
Pathways/ MLPoverlap
For every target-driven compoundcluster
What are theshared targets?
What genes aredifferentiallyexpressed?
Whichbiologicalpathways areaffected?
N. J. Perualila NCS2014 10/18
ANALYSIS FLOWStep I
Step II
Target-prediction
based Clustering
of Compounds
a cluster of compounds
Fisher’s Exact
Test: top K targets
Pathways
LIMMA: top N
differentially
expressed genes
Pathways/ MLP
overlap
For every target-driven compoundcluster
What are theshared targets?
What genes aredifferentiallyexpressed?
Whichbiologicalpathways areaffected?
N. J. Perualila NCS2014 10/18
ANALYSIS FLOWStep I
Step II
Target-prediction
based Clustering
of Compounds
a cluster of compounds
Fisher’s Exact
Test: top K targets
Pathways
LIMMA: top N
differentially
expressed genes
Pathways/ MLPoverlap
For every target-driven compoundcluster
What are theshared targets?
What genes aredifferentiallyexpressed?
Whichbiologicalpathways areaffected?
N. J. Perualila NCS2014 10/18
TARGET PREDICTION-BASED CLUSTERING
Similarity matrix based on Tanimoto scores.
thio
ridazin
e
chlo
rpro
mazin
e
pro
chlo
rpera
zin
e
clo
zapin
e
triflu
opera
zin
e
fluphenazin
e
halo
peri
dol
vera
pam
il
dexve
rapam
il
felo
dip
ine
nife
dip
ine
nitre
ndip
ine
ara
chid
onyltri
fluoro
meth
ane
15−
delta p
rosta
gla
ndin
J2
ara
chid
onic
acid
cele
coxib
W−
13
metform
in
tetr
aeth
yle
nepenta
min
e
phenfo
rmin
phenyl big
uanid
e
rofe
coxib
LM
−1685
SC
−58125
dic
lofe
nac
4,5
−dia
nili
nophth
alim
ide
flufe
nam
ic a
cid
N−
phenyla
nth
ranili
c a
cid
pro
bucol
bute
in
bensera
zid
e
LY−
294002
tioguanin
e
fasudil
imatinib
compounds
0 0.6
Value
Color Key
N. J. Perualila NCS2014 11/18
ASSOCIATING TARGETS TO COMPOUNDS
Identify the top predicted protein targets of compounds in cluster 1.
thio
rid
azin
ech
lorp
rom
azin
ep
roch
lorp
era
zin
eclo
za
pin
etr
iflu
op
era
zin
eflu
ph
en
azin
eh
alo
pe
rid
ol
vera
pa
mil
dexve
rap
am
ilfe
lod
ipin
en
ifed
ipin
en
itre
nd
ipin
ea
rach
ido
nyltri
flu
oro
me
tha
ne
15
−d
elta
pro
sta
gla
nd
in J
2a
rach
ido
nic
acid
ce
lecoxib
W−
13
me
tfo
rmin
tetr
ae
thyle
ne
pe
nta
min
ep
he
nfo
rmin
ph
enyl b
igu
an
ide
rofe
coxib
LM
−1
68
5S
C−
58
12
5d
iclo
fen
ac
4,5
−d
ian
ilin
op
hth
alim
ide
flu
fen
am
ic a
cid
N−
ph
enyla
nth
ran
ilic a
cid
pro
bu
co
lbu
tein
be
nse
razid
eLY
−2
94
00
2tio
gu
an
ine
fasu
dil
ima
tin
ib
y
X5.hydroxyr_6
D.1B._dopator
Muscarinic_M3
Muscarinic_M1
Cytochrome2D6
NADPH_oxide_1
Histamine_tor
D.2._dopamtor
Targ
ets
Compounds
N. J. Perualila NCS2014 12/18
ASSOCIATING GENES WITH COMPOUNDS
Identify the most differentially expressed genes for compounds in cluster 1.
log fold change
−lo
g p
−va
lue
−0.4 −0.2 0.0 0.2
02
46
810
12 IDI1
SQLEMSMO1
INSIG1
MNT
SRSF7HMGCS1
CCR1CCNG2 KLHL24 PPIFSLC38A2 NPC2SGCE
PNO1BARD1
LPIN1HMGCRTGS1LDLR
log
2 c
oncentr
ation
−0.2
0.0
0.2
0.4
0.6
0.8
thio
rid
azin
e
ch
lorp
rom
azin
e
pro
ch
lorp
era
zin
e
clo
za
pin
e
triflu
op
era
zin
e
flu
ph
en
azin
e
ha
lop
eri
do
l
vera
pa
mil
dexve
rap
am
il
felo
dip
ine
nife
dip
ine
nitre
nd
ipin
e
ara
ch
ido
nyltri
flu
oro
me
tha
ne
15
−d
elta
pro
sta
gla
nd
in J
2
ara
ch
ido
nic
acid
ce
lecoxib
W−
13
me
tfo
rmin
tetr
ae
thyle
ne
pe
nta
min
e
ph
en
form
in
ph
enyl b
igu
an
ide
rofe
coxib
LM
−1
68
5
SC
−5
81
25
dic
lofe
na
c
4,5
−d
ian
ilin
op
hth
alim
ide
flu
fen
am
ic a
cid
N−
ph
enyla
nth
ran
ilic a
cid
pro
bu
co
l
bu
tein
be
nse
razid
e
LY−
29
40
02
tio
gu
an
ine
fasu
dil
ima
tin
ib
IDI1
INSIG1
MSMO1
LPIN1
SQLE
HMGCS1
NPC2
BHLHE40
N. J. Perualila NCS2014 13/18
ASSOCIATING GENES WITH COMPOUNDS
Identify the most differentially expressed genes for compounds in cluster 1.
log fold change
−lo
g p
−va
lue
−0.4 −0.2 0.0 0.2
02
46
810
12 IDI1
SQLEMSMO1
INSIG1
MNT
SRSF7HMGCS1
CCR1CCNG2 KLHL24 PPIFSLC38A2 NPC2SGCE
PNO1BARD1
LPIN1HMGCRTGS1LDLR
log
2 c
oncentr
ation
−0.2
0.0
0.2
0.4
0.6
0.8
thio
rid
azin
e
ch
lorp
rom
azin
e
pro
ch
lorp
era
zin
e
clo
za
pin
e
triflu
op
era
zin
e
flu
ph
en
azin
e
ha
lop
eri
do
l
vera
pa
mil
dexve
rap
am
il
felo
dip
ine
nife
dip
ine
nitre
nd
ipin
e
ara
ch
ido
nyltri
flu
oro
me
tha
ne
15
−d
elta
pro
sta
gla
nd
in J
2
ara
ch
ido
nic
acid
ce
lecoxib
W−
13
me
tfo
rmin
tetr
ae
thyle
ne
pe
nta
min
e
ph
en
form
in
ph
enyl b
igu
an
ide
rofe
coxib
LM
−1
68
5
SC
−5
81
25
dic
lofe
na
c
4,5
−d
ian
ilin
op
hth
alim
ide
flu
fen
am
ic a
cid
N−
ph
enyla
nth
ran
ilic a
cid
pro
bu
co
l
bu
tein
be
nse
razid
e
LY−
29
40
02
tio
gu
an
ine
fasu
dil
ima
tin
ib
IDI1
INSIG1
MSMO1
LPIN1
SQLE
HMGCS1
NPC2
BHLHE40
N. J. Perualila NCS2014 13/18
BIOLOGICAL PATHWAYS: CLUSTER 1Compounds Pathway Target Genesclozapine
Steroid metabolic process Cytochrome P450 2D6 INSIG1
thioridazinechlorpromazine
LDLRtrifluoperazineprochlorperazinefluphenazine
GO pathways containing the topgene sets with MLP analysis.
GO:0006695\
cholesterol biosynthetic
GO:0016126\
sterol biosynthetic
GO:0008203\
cholesterol metabolic
GO:0016125\
sterol metabolic
GO:0006694\
steroid biosynthetic
GO:0006066\
alcohol metabolic
GO:0008202\
steroid metabolic
GO:0008610\
lipid biosynthetic
GO:0046165\
alcohol biosynthetic
Top genes contributing to choles-terol biosynthetic process.
DHCR24:24−dehydrocholesterol reductase
G6PD:glucose−6−phosphate dehydrogenase
HMGCR:3−hydroxy−3−methylglutaryl−CoA reductas
HMGCS1:3−hydroxy−3−methylglutaryl−CoA synthas
IDI1:isopentenyl−diphosphate delta isomerase
INSIG1:insulin induced gene 1
INSIG2:insulin induced gene 2
PEX2:peroxisomal biogenesis factor 2
MSMO1:methylsterol monooxygenase 1
SOD1:superoxide dismutase 1, soluble
SQLE:squalene epoxidase
CNBP:CCHC−type zinc finger, nucleic acid bind
Sig
nific
an
ce
of te
ste
d g
en
es
invo
lve
d in
ge
ne
se
t GO
:00
06
69
5
Significance
0 1 2 3 4 5
N. J. Perualila NCS2014 14/18
INSIG1LDLR CYP450 2D6
Steroid MetabolicProcess
clozapine, thioridazine,chlorpromazine, trifluoperazine,prochlorperazine,fluphenazine
BIOLOGICAL PATHWAYS: CLUSTER 1Compounds Pathway Target Genesclozapine
Steroid metabolic process Cytochrome P450 2D6 INSIG1
thioridazinechlorpromazine
LDLRtrifluoperazineprochlorperazinefluphenazine
GO pathways containing the topgene sets with MLP analysis.
GO:0006695\
cholesterol biosynthetic
GO:0016126\
sterol biosynthetic
GO:0008203\
cholesterol metabolic
GO:0016125\
sterol metabolic
GO:0006694\
steroid biosynthetic
GO:0006066\
alcohol metabolic
GO:0008202\
steroid metabolic
GO:0008610\
lipid biosynthetic
GO:0046165\
alcohol biosynthetic
Top genes contributing to choles-terol biosynthetic process.
DHCR24:24−dehydrocholesterol reductase
G6PD:glucose−6−phosphate dehydrogenase
HMGCR:3−hydroxy−3−methylglutaryl−CoA reductas
HMGCS1:3−hydroxy−3−methylglutaryl−CoA synthas
IDI1:isopentenyl−diphosphate delta isomerase
INSIG1:insulin induced gene 1
INSIG2:insulin induced gene 2
PEX2:peroxisomal biogenesis factor 2
MSMO1:methylsterol monooxygenase 1
SOD1:superoxide dismutase 1, soluble
SQLE:squalene epoxidase
CNBP:CCHC−type zinc finger, nucleic acid bind
Sig
nific
an
ce
of te
ste
d g
en
es
invo
lve
d in
ge
ne
se
t GO
:00
06
69
5
Significance
0 1 2 3 4 5
N. J. Perualila NCS2014 14/18
GENE EXPRESSION DATA ANALYSIS
X =
X11 X12 . . . X1I
X21 X22 . . . X2I
. . . .
. . . .
. . . .
XG1 XG2 . . . XGI
.
met
form
in
phen
form
in
phen
yl b
igua
nide
vera
pam
il
dexv
erap
amil
rofe
coxi
b
15−
delta
pro
stag
land
in J
2
cele
coxi
b
LM−
1685
SC
−58
125
LY−
2940
02
flufe
nam
ic a
cid
N−
phen
ylan
thra
nilic
aci
d
arac
hido
nyltr
ifluo
rom
etha
ne
dicl
ofen
ac
nife
dipi
ne
nitr
endi
pine
felo
dipi
ne
fasu
dil
imat
inib
tetr
aeth
ylen
epen
tam
ine
cloz
apin
e
thio
ridaz
ine
halo
perid
ol
chlo
rpro
maz
ine
trifl
uope
razi
ne
W−
13
arac
hido
nic
acid
proc
hlor
pera
zine
fluph
enaz
ine
prob
ucol
bute
in
4,5−
dian
ilino
phth
alim
ide
bens
eraz
ide
tiogu
anin
e
CFLARARL4CHMOX1SAE1HMGXB4HIST1H1CHCG9CDH11PMAIP1ZMPSTE24TSPAN5MICAMRPS31SERPINE1RBM4BTOM1L1HIST1H3DPOP7SH2B3EIF1TAF15LPAR6OSGIN1SETXSLC7A11NAMPTSTARHMGCS1TSPAN1LOC100505761ADCK3SF3B4HIST1H3BNQO1MAPRE2IDI1LOC100506963CYP46A1NPC1UBA2NEAT1CDK2AP2CEBPZPDCD6IPATP9ACDK7CALHM2FABP4LOC100506469TXNDC9LOC100507619HIST1H2AEKDM3AHBP1HIST1H2BKDHRS9BCAP31TOB1INSIG1PELOGIT1CDKN1AHMGCRFGL2LOC100129361KIF20ARBM5RBM7BHLHE40PPIFSPRY2MED6MIR22HGGCLMGCLCHIST2H2BELPIN1SQLECDKN1BSLC6A8SPATA1PDIA6DHRS2GADD45AIRX5RTN2DDIT4AKR1C2MSMO1LOC100506168PRMT3CNIHTRIM13NET1HNRNPRSMC4FLOT1ARPC5TOMM6LDLRAKR1C3LOC100293516CDK4SPRY1FAM13AFAM117ATXNRD1LRPPRCZNF586TRIB1HIST1H2BDCCDC28AUSPL1HIST1H2BGRASGRP1BET1AKAP9MPHOSPH10PQBP1STAG1
COMPOUNDS
GE
NE
S
Biclustering of gene expression data provides a simultaneous localsearch of a subset of genes for which a similar expression profiles weredetected across a subset of compounds
N. J. Perualila NCS2014 15/18
Heatmap of the Expression Profiles
GENE EXPRESSION DATA ANALYSIS
X =
X11 X12 . . . X1I
X21 X22 . . . X2I
. . . .
. . . .
. . . .
XG1 XG2 . . . XGI
.
trifl
uope
razi
ne
proc
hlor
pera
zine
fluph
enaz
ine
thio
ridaz
ine
halo
perid
ol
cloz
apin
e
chlo
rpro
maz
ine
imat
inib
fasu
dil
W−
13
15−
delta
pro
stag
land
in J
2
bute
in
arac
hido
nyltr
ifluo
rom
etha
ne
4,5−
dian
ilino
phth
alim
ide
vera
pam
il
dexv
erap
amil
LY−
2940
02
SC
−58
125
LM−
1685
phen
yl b
igua
nide
felo
dipi
ne
nife
dipi
ne
nitr
endi
pine
arac
hido
nic
acid
cele
coxi
b
met
form
in
tetr
aeth
ylen
epen
tam
ine
phen
form
in
rofe
coxi
b
dicl
ofen
ac
flufe
nam
ic a
cid
N−
phen
ylan
thra
nilic
aci
d
prob
ucol
bens
eraz
ide
tiogu
anin
e
BET1CDKN1BNET1STAG1CDKN1AZMPSTE24IRX5CDK2AP2SF3B4HCG9SPRY2SPRY1POP7MRPS31HNRNPRCDK7TRIB1FLOT1EIF1USPL1TRIM13DHRS2CDK4MPHOSPH10PRMT3TXNDC9RBM5RBM7CNIHDHRS9LPAR6CEBPZFAM13AAKAP9TOB1NAMPTBCAP31PDIA6LRPPRCRASGRP1ARL4CKIF20APPIFTSPAN1CDH11TSPAN5ARPC5PQBP1ATP9ASAE1UBA2SMC4LOC100507619MICALOC100506963LOC100506469LOC100506168LOC100505761SPATA1HMGXB4TOM1L1LOC100293516SH2B3TOMM6PDCD6IPLOC100129361MED6HIST1H2BGKDM3AADCK3CALHM2HIST1H2BKHIST1H1CHIST1H2AEHIST1H3BRTN2HBP1CCDC28AHIST1H3DHIST2H2BEHIST1H2BDFGL2SETXCYP46A1CFLARSLC6A8TAF15GIT1MAPRE2STARRBM4BFABP4ZNF586GADD45APMAIP1PELONQO1GCLCSERPINE1MIR22HGOSGIN1GCLMAKR1C2AKR1C3SLC7A11TXNRD1HMOX1NEAT1LDLRDDIT4FAM117ANPC1BHLHE40HMGCS1HMGCRLPIN1IDI1SQLEMSMO1INSIG1
COMPOUNDS
GE
NE
S
Biclustering of gene expression data provides a simultaneous localsearch of a subset of genes for which a similar expression profiles weredetected across a subset of compounds
N. J. Perualila NCS2014 15/18
BICLUSTERING OF GENE EXPRESSION DATA
Target prediction-based clusteringof compounds + identification ofdifferentially expressed genes for acompound cluster of interest.⇒ Gives a subset of genesexhibiting consistent patterns over asubset of compounds.
⇒ A bicluster
Biclustering on Gene expressiondata using FABIA
identifies similar cluster ofcompounds and subset of genes
log
2 c
oncentr
ation
−0.2
0.0
0.2
0.4
0.6
0.8
thio
rid
azin
e
ch
lorp
rom
azin
e
pro
ch
lorp
era
zin
e
clo
za
pin
e
triflu
op
era
zin
e
flu
ph
en
azin
e
ha
lop
eri
do
l
vera
pa
mil
dexve
rap
am
il
felo
dip
ine
nife
dip
ine
nitre
nd
ipin
e
ara
ch
ido
nyltri
flu
oro
me
tha
ne
15
−d
elta
pro
sta
gla
nd
in J
2
ara
ch
ido
nic
acid
ce
lecoxib
W−
13
me
tfo
rmin
tetr
ae
thyle
ne
pe
nta
min
e
ph
en
form
in
ph
enyl b
igu
an
ide
rofe
coxib
LM
−1
68
5
SC
−5
81
25
dic
lofe
na
c
4,5
−d
ian
ilin
op
hth
alim
ide
flu
fen
am
ic a
cid
N−
ph
enyla
nth
ran
ilic a
cid
pro
bu
co
l
bu
tein
be
nse
razid
e
LY−
29
40
02
tio
gu
an
ine
fasu
dil
ima
tin
ib
IDI1
INSIG1
MSMO1
LPIN1
SQLE
HMGCS1
NPC2
BHLHE40
N. J. Perualila NCS2014 16/18
BICLUSTERING OF GENE EXPRESSION DATA
Target prediction-based clusteringof compounds + identification ofdifferentially expressed genes for acompound cluster of interest.⇒ Gives a subset of genesexhibiting consistent patterns over asubset of compounds.
⇒ A bicluster
Biclustering on Gene expressiondata using FABIA
identifies similar cluster ofcompounds and subset of genes
log
2 c
oncentr
ation
−0.2
0.0
0.2
0.4
0.6
0.8
thio
rid
azin
e
ch
lorp
rom
azin
e
pro
ch
lorp
era
zin
e
clo
za
pin
e
triflu
op
era
zin
e
flu
ph
en
azin
e
ha
lop
eri
do
l
vera
pa
mil
dexve
rap
am
il
felo
dip
ine
nife
dip
ine
nitre
nd
ipin
e
ara
ch
ido
nyltri
flu
oro
me
tha
ne
15
−d
elta
pro
sta
gla
nd
in J
2
ara
ch
ido
nic
acid
ce
lecoxib
W−
13
me
tfo
rmin
tetr
ae
thyle
ne
pe
nta
min
e
ph
en
form
in
ph
enyl b
igu
an
ide
rofe
coxib
LM
−1
68
5
SC
−5
81
25
dic
lofe
na
c
4,5
−d
ian
ilin
op
hth
alim
ide
flu
fen
am
ic a
cid
N−
ph
enyla
nth
ran
ilic a
cid
pro
bu
co
l
bu
tein
be
nse
razid
e
LY−
29
40
02
tio
gu
an
ine
fasu
dil
ima
tin
ib
IDI1
INSIG1
MSMO1
LPIN1
SQLE
HMGCS1
NPC2
BHLHE40
N. J. Perualila NCS2014 16/18
BICLUSTERING OF GENE EXPRESSION DATA
Target prediction-based clusteringof compounds + identification ofdifferentially expressed genes for acompound cluster of interest.⇒ Gives a subset of genesexhibiting consistent patterns over asubset of compounds.
⇒ A bicluster
Biclustering on Gene expressiondata using FABIA
identifies similar cluster ofcompounds and subset of genes
log
2 c
oncentr
ation
−0
.20
.00
.20
.40
.60
.8
thio
ridazin
e
chlo
rpro
mazin
e
pro
chlo
rpera
zin
e
clo
zapin
e
triflu
opera
zin
e
fluphenazin
e
halo
peri
dol
fasudil
imatinib
vera
pam
il
dexve
rapam
il
felo
dip
ine
nife
dip
ine
nitre
ndip
ine
ara
chid
onyltri
fluoro
meth
ane
15−
delta p
rosta
gla
ndin
J2
ara
chid
onic
acid
cele
coxib
W−
13
metform
in
tetr
aeth
yle
nepenta
min
e
phenfo
rmin
phenyl big
uanid
e
rofe
coxib
LM
−1685
SC
−58125
dic
lofe
nac
4,5
−dia
nili
nophth
alim
ide
flufe
nam
ic a
cid
N−
phenyla
nth
ranili
c a
cid
pro
bucol
bute
in
bensera
zid
e
LY−
294002
tioguanin
e
N. J. Perualila NCS2014 16/18
FABIA Bicluster 1
BICLUSTERING OF GENE EXPRESSION DATA
Target prediction-based clusteringof compounds + identification ofdifferentially expressed genes for acompound cluster of interest.⇒ Gives a subset of genesexhibiting consistent patterns over asubset of compounds.
⇒ A bicluster
Biclustering on Gene expressiondata using FABIA
identifies similar cluster ofcompounds and subset of genes
log
2 c
oncentr
ation
Genes
HC
Fabia
−0
.20
.00
.20
.40
.60
.8
thio
ridazin
e
chlo
rpro
mazin
e
pro
chlo
rpera
zin
e
clo
zapin
e
triflu
opera
zin
e
fluphenazin
e
halo
peri
dol
fasudil
imatinib
vera
pam
il
dexve
rapam
il
felo
dip
ine
nife
dip
ine
nitre
ndip
ine
ara
chid
onyltri
fluoro
meth
ane
15−
delta p
rosta
gla
ndin
J2
ara
chid
onic
acid
cele
coxib
W−
13
metform
in
tetr
aeth
yle
nepenta
min
e
phenfo
rmin
phenyl big
uanid
e
rofe
coxib
LM
−1685
SC
−58125
dic
lofe
nac
4,5
−dia
nili
nophth
alim
ide
flufe
nam
ic a
cid
N−
phenyla
nth
ranili
c a
cid
pro
bucol
bute
in
bensera
zid
e
LY−
294002
tioguanin
e
N. J. Perualila NCS2014 16/18
FABIA Bicluster 1
DISCUSSION
The similarity of the biclustering results with the integrated approachshows that accounting for another source of information in the analysisof gene expression data gives a more refined search of ‘biclusters’containing a subset of ‘mechanistically’ related compounds regulatinga subset of functionally related genes.
Combining two sources of data provides a better understanding of themechanism of action of a compound cluster.
The approach is not only limited to the use of gene expression andtarget prediction data.
N. J. Perualila NCS2014 17/18
top related