rationalizing cmap gene expression readouts via … cmap gene expression readouts via target...

Rationalizing CMap Gene Expression Readoutsvia Target Prediction

Nolen Joy PerualilaNon-Clinical Statistics Conference 20149 October 2014

RESEARCH GROUP

Hasselt University,BelgiumNolen Joy PerualilaZiv Shkedy

Durham University,UKAdetayo Kasim

Cambridge University,UKAakash Chavan RavindranathAndreas Bender

Janssen Pharmaceutica NV,BelgiumLuc BijnensWillem TalloenHinrich W.H. Gohlmann

QSTAR http://qstar-consortium.org

N. J. Perualila NCS2014 2/18

OUTLINE

1 BackgroundMechanism of Action (MoA) of Compounds

2 Data SourcesTarget prediction DataGene expression Data

3 Analysis Flow

4 ResultsAssociating Protein Targets to compoundsAssociating Genes with compoundsGene-set EnrichmentUsing Pathways to understand MoABiclustering of CMap Gene expression data

5 Discussion

MOA OF COMPOUNDS

Aim: To find subsets of compounds that share similar target prediction andgene expression profiles.

Connectivity Map In silico

EARLY DRUG DISCOVERY

The development ofevery drug begins withthe search for a target onwhich the drug can act.

Lead compounds mustbe able to bind well tothe target protein like akey into a lock.

If a drug binds to oneprotein, its drug target,it may also bind toanother one (or many)(non-selective ligands).

Compound Protein Target

Compound - Protein Target

EARLY DRUG DISCOVERYThe development ofevery drug begins withthe search for a target onwhich the drug can act.

Which drugs will bind to which protein?

image source: http://vds.cm.utexas.edu/

OVERVIEW: TARGET PREDICTION TOOL

Calculates the likelihood of binding for every protein target (seeKoutsoukas,2011).

COMPOUNDS, TARGETS, AND GENES

Ligand-bindingmodifies thebiological functions ofprotein target, a seriesof target-relateddownstream genes arethen influenced.

Drugs sharingcommon targets resultin similargene-expressionprofiles.

image source: http://cc.scu.edu.cn/G2S/eWebEditor/uploadfile/20120810155023582.jpg

COMPOUNDS, TARGETS, AND GENES

Ligand-bindingmodifies thebiological functions ofprotein target, a seriesof target-relateddownstream genes arethen influenced.

Drugs sharingcommon targets resultin similargene-expressionprofiles.

image source: http://cc.scu.edu.cn/G2S/eWebEditor/uploadfile/20120810155023582.jpg

DATA SOURCES

T11 T12 . . . T1I

T21 T22 . . . T2I

. . . .

TJ1 TJ2 . . . TJI

X11 X12 . . . X1I

X21 X22 . . . X2I

. . . .

XG1 XG2 . . . XGI

1 The target prediction scorematrix (binary)(J targets x I compounds)

{1 compound i hit target j,0 otherwise.

2 The gene expression matrix(G genes x I compounds)

Xgi = expression level ofgene g for compound i

APPLICATION: CONNECTIVITY MAP

I = 35 compounds.MC7 cell line,6 hours after exposure,dose at 10 micromolars.

G' 2400 genes afterpre-processing.

J = 477 protein targets.

ANALYSIS FLOWStep I

Step II

Target-prediction

based Clustering

of Compounds

a cluster of compounds

Fisher’s Exact

Test: top K targets

Pathways

LIMMA: top N

differentially

expressed genes

Pathways/ MLPoverlap

For every target-driven compoundcluster

What are theshared targets?

What genes aredifferentiallyexpressed?

Whichbiologicalpathways areaffected?

ANALYSIS FLOWStep I

Step II

Target-prediction

based Clustering

of Compounds

Fisher’s Exact

Test: top K targets

Pathways

LIMMA: top N

differentially

expressed genes

ANALYSIS FLOWStep I

Step II

Target-prediction

based Clustering

of Compounds

Fisher’s Exact

Test: top K targets

Pathways

LIMMA: top N

differentially

expressed genes

1 0Target j

cluster

ANALYSIS FLOWStep I

Step II

Target-prediction

based Clustering

of Compounds

Fisher’s Exact

Test: top K targets

Pathways

LIMMA: top N

differentially

expressed genes

ANALYSIS FLOWStep I

Step II

Target-prediction

based Clustering

of Compounds

Fisher’s Exact

Test: top K targets

Pathways

LIMMA: top N

differentially

expressed genes

ANALYSIS FLOWStep I

Step II

Target-prediction

based Clustering

of Compounds

Fisher’s Exact

Test: top K targets

Pathways

LIMMA: top N

differentially

expressed genes

Pathways/ MLP

overlap

ANALYSIS FLOWStep I

Step II

Target-prediction

based Clustering

of Compounds

Fisher’s Exact

Test: top K targets

Pathways

LIMMA: top N

differentially

expressed genes

TARGET PREDICTION-BASED CLUSTERING

Similarity matrix based on Tanimoto scores.

ridazin

triflu

fluphenazin

onyltri

fluoro

delta p

metform

nepenta

phenfo

phenyl big

−1685

−58125

−dia

nophth

phenyla

ranili

bensera

294002

tioguanin

fasudil

imatinib

compounds

Color Key

ASSOCIATING TARGETS TO COMPOUNDS

Identify the top predicted protein targets of compounds in cluster 1.

nyltri

lecoxib

enyl b

ilic a

X5.hydroxyr_6

D.1B._dopator

Muscarinic_M3

Muscarinic_M1

Cytochrome2D6

NADPH_oxide_1

Histamine_tor

D.2._dopamtor

Compounds

ASSOCIATING GENES WITH COMPOUNDS

Identify the most differentially expressed genes for compounds in cluster 1.

log fold change

−0.4 −0.2 0.0 0.2

12 IDI1

SQLEMSMO1

INSIG1

SRSF7HMGCS1

CCR1CCNG2 KLHL24 PPIFSLC38A2 NPC2SGCE

PNO1BARD1

LPIN1HMGCRTGS1LDLR

oncentr

−0.2

triflu

nyltri

lecoxib

enyl b

ilic a

INSIG1

HMGCS1

BHLHE40

ASSOCIATING GENES WITH COMPOUNDS

Identify the most differentially expressed genes for compounds in cluster 1.

log fold change

−0.4 −0.2 0.0 0.2

12 IDI1

SQLEMSMO1

INSIG1

SRSF7HMGCS1

CCR1CCNG2 KLHL24 PPIFSLC38A2 NPC2SGCE

PNO1BARD1

LPIN1HMGCRTGS1LDLR

oncentr

−0.2

triflu

nyltri

lecoxib

enyl b

ilic a

INSIG1

HMGCS1

BHLHE40

BIOLOGICAL PATHWAYS: CLUSTER 1Compounds Pathway Target Genesclozapine

Steroid metabolic process Cytochrome P450 2D6 INSIG1

thioridazinechlorpromazine

LDLRtrifluoperazineprochlorperazinefluphenazine

GO pathways containing the topgene sets with MLP analysis.

GO:0006695\

cholesterol biosynthetic

GO:0016126\

sterol biosynthetic

GO:0008203\

cholesterol metabolic

GO:0016125\

sterol metabolic

GO:0006694\

steroid biosynthetic

GO:0006066\

alcohol metabolic

GO:0008202\

steroid metabolic

GO:0008610\

lipid biosynthetic

GO:0046165\

alcohol biosynthetic

Top genes contributing to choles-terol biosynthetic process.

DHCR24:24−dehydrocholesterol reductase

G6PD:glucose−6−phosphate dehydrogenase

HMGCR:3−hydroxy−3−methylglutaryl−CoA reductas

HMGCS1:3−hydroxy−3−methylglutaryl−CoA synthas

IDI1:isopentenyl−diphosphate delta isomerase

INSIG1:insulin induced gene 1

PEX2:peroxisomal biogenesis factor 2

MSMO1:methylsterol monooxygenase 1

SOD1:superoxide dismutase 1, soluble

SQLE:squalene epoxidase

CNBP:CCHC−type zinc finger, nucleic acid bind

Significance

0 1 2 3 4 5

INSIG1LDLR CYP450 2D6

Steroid MetabolicProcess

clozapine, thioridazine,chlorpromazine, trifluoperazine,prochlorperazine,fluphenazine

BIOLOGICAL PATHWAYS: CLUSTER 1Compounds Pathway Target Genesclozapine

Steroid metabolic process Cytochrome P450 2D6 INSIG1

thioridazinechlorpromazine

LDLRtrifluoperazineprochlorperazinefluphenazine

GO pathways containing the topgene sets with MLP analysis.

GO:0006695\

cholesterol biosynthetic

GO:0016126\

sterol biosynthetic

GO:0008203\

cholesterol metabolic

GO:0016125\

sterol metabolic

GO:0006694\

steroid biosynthetic

GO:0006066\

alcohol metabolic

GO:0008202\

steroid metabolic

GO:0008610\

lipid biosynthetic

GO:0046165\

alcohol biosynthetic

Top genes contributing to choles-terol biosynthetic process.

DHCR24:24−dehydrocholesterol reductase

G6PD:glucose−6−phosphate dehydrogenase

HMGCR:3−hydroxy−3−methylglutaryl−CoA reductas

HMGCS1:3−hydroxy−3−methylglutaryl−CoA synthas

IDI1:isopentenyl−diphosphate delta isomerase

PEX2:peroxisomal biogenesis factor 2

MSMO1:methylsterol monooxygenase 1

SOD1:superoxide dismutase 1, soluble

SQLE:squalene epoxidase

CNBP:CCHC−type zinc finger, nucleic acid bind

Significance

0 1 2 3 4 5

GENE EXPRESSION DATA ANALYSIS

X11 X12 . . . X1I

X21 X22 . . . X2I

. . . .

XG1 XG2 . . . XGI

4,5−

CFLARARL4CHMOX1SAE1HMGXB4HIST1H1CHCG9CDH11PMAIP1ZMPSTE24TSPAN5MICAMRPS31SERPINE1RBM4BTOM1L1HIST1H3DPOP7SH2B3EIF1TAF15LPAR6OSGIN1SETXSLC7A11NAMPTSTARHMGCS1TSPAN1LOC100505761ADCK3SF3B4HIST1H3BNQO1MAPRE2IDI1LOC100506963CYP46A1NPC1UBA2NEAT1CDK2AP2CEBPZPDCD6IPATP9ACDK7CALHM2FABP4LOC100506469TXNDC9LOC100507619HIST1H2AEKDM3AHBP1HIST1H2BKDHRS9BCAP31TOB1INSIG1PELOGIT1CDKN1AHMGCRFGL2LOC100129361KIF20ARBM5RBM7BHLHE40PPIFSPRY2MED6MIR22HGGCLMGCLCHIST2H2BELPIN1SQLECDKN1BSLC6A8SPATA1PDIA6DHRS2GADD45AIRX5RTN2DDIT4AKR1C2MSMO1LOC100506168PRMT3CNIHTRIM13NET1HNRNPRSMC4FLOT1ARPC5TOMM6LDLRAKR1C3LOC100293516CDK4SPRY1FAM13AFAM117ATXNRD1LRPPRCZNF586TRIB1HIST1H2BDCCDC28AUSPL1HIST1H2BGRASGRP1BET1AKAP9MPHOSPH10PQBP1STAG1

COMPOUNDS

Biclustering of gene expression data provides a simultaneous localsearch of a subset of genes for which a similar expression profiles weredetected across a subset of compounds

Heatmap of the Expression Profiles

GENE EXPRESSION DATA ANALYSIS

X11 X12 . . . X1I

X21 X22 . . . X2I

. . . .

XG1 XG2 . . . XGI

4,5−

BET1CDKN1BNET1STAG1CDKN1AZMPSTE24IRX5CDK2AP2SF3B4HCG9SPRY2SPRY1POP7MRPS31HNRNPRCDK7TRIB1FLOT1EIF1USPL1TRIM13DHRS2CDK4MPHOSPH10PRMT3TXNDC9RBM5RBM7CNIHDHRS9LPAR6CEBPZFAM13AAKAP9TOB1NAMPTBCAP31PDIA6LRPPRCRASGRP1ARL4CKIF20APPIFTSPAN1CDH11TSPAN5ARPC5PQBP1ATP9ASAE1UBA2SMC4LOC100507619MICALOC100506963LOC100506469LOC100506168LOC100505761SPATA1HMGXB4TOM1L1LOC100293516SH2B3TOMM6PDCD6IPLOC100129361MED6HIST1H2BGKDM3AADCK3CALHM2HIST1H2BKHIST1H1CHIST1H2AEHIST1H3BRTN2HBP1CCDC28AHIST1H3DHIST2H2BEHIST1H2BDFGL2SETXCYP46A1CFLARSLC6A8TAF15GIT1MAPRE2STARRBM4BFABP4ZNF586GADD45APMAIP1PELONQO1GCLCSERPINE1MIR22HGOSGIN1GCLMAKR1C2AKR1C3SLC7A11TXNRD1HMOX1NEAT1LDLRDDIT4FAM117ANPC1BHLHE40HMGCS1HMGCRLPIN1IDI1SQLEMSMO1INSIG1

COMPOUNDS

Biclustering of gene expression data provides a simultaneous localsearch of a subset of genes for which a similar expression profiles weredetected across a subset of compounds

BICLUSTERING OF GENE EXPRESSION DATA

Target prediction-based clusteringof compounds + identification ofdifferentially expressed genes for acompound cluster of interest.⇒ Gives a subset of genesexhibiting consistent patterns over asubset of compounds.

⇒ A bicluster

Biclustering on Gene expressiondata using FABIA

identifies similar cluster ofcompounds and subset of genes

oncentr

−0.2

triflu

nyltri

lecoxib

enyl b

ilic a

INSIG1

HMGCS1

BHLHE40

⇒ A bicluster

oncentr

−0.2

triflu

nyltri

lecoxib

enyl b

ilic a

INSIG1

HMGCS1

BHLHE40

⇒ A bicluster

oncentr

ridazin

triflu

fluphenazin

fasudil

imatinib

onyltri

fluoro

delta p

metform

nepenta

phenfo

phenyl big

−1685

−58125

−dia

nophth

phenyla

ranili

bensera

294002

tioguanin

FABIA Bicluster 1

⇒ A bicluster

oncentr

ridazin

triflu

fluphenazin

fasudil

imatinib

onyltri

fluoro

delta p

metform

nepenta

phenfo

phenyl big

−1685

−58125

−dia

nophth

phenyla

ranili

bensera

294002

tioguanin

FABIA Bicluster 1

DISCUSSION

The similarity of the biclustering results with the integrated approachshows that accounting for another source of information in the analysisof gene expression data gives a more refined search of ‘biclusters’containing a subset of ‘mechanistically’ related compounds regulatinga subset of functionally related genes.

Combining two sources of data provides a better understanding of themechanism of action of a compound cluster.

The approach is not only limited to the use of gene expression andtarget prediction data.

THANK YOU!

rationalizing cmap gene expression readouts via … cmap gene expression readouts via target...

Documents

regulation of gene expression -2

analysis of gene expression

gene expression microarray

gene expression classification by kernel-based plm

digital gene expression signatures for - plant...

gene expression analyses of human mesenchymal stem …

gene expression profiling reveals potential prognostic...

improved expression of recombinant fusion defensin gene

a super gene expression system enhances the anti...

bmc molecular biology biomed central -...

iplant, heterosis, gene expression & protein...

gene expression control - practicum report

gene synthesis, expression, structures, and functional...

translating the genetic code gene expression part 3

initiation and maintenance of pluripotency gene expression

anti-inflammatory and ecm gene expression modulations of …

gene expression profiling identifies clinically relevant...

lecture #13: regulation of gene expression (part 2)

crc-113 gene expression signature for predicting prognosis...

rea lec 10 gene expression fp