bayesian multi-topic microarray analysis with hyperparameter reestimation

Tomonari MASADA (正田备也 )NAGASAKI University (长崎大学 )

masada@cis.nagasaki-u.ac.jp

Overview

Problem

Latent Process Decomposition (LPD)

Hyperparameter reestimation (MVB+)

Experiment

Results

Conclusions

Problem

Explain differences among the cells of

different nature (e.g. cancer/normal cells)

by analyzing differences in gene expression

obtained from DNA microarray experiments.

Gene expression

http://bix.ucsd.edu/bioalgorithms/slides.php

DNA microarray experiment

We can find out

which genes are

used (expressed)

by different types

of cells.

Latent Process Decomposition

latent Dirichlet allocation

(LDA)[Blei et al. 01]

latent process decomposition

(LPD)[Rogers et al. 05]

text mining microarray analysis

document sample

word gene

word frequency gene expression level

latent topic latent process

LPD as a multi-topic model

row = gene, column = sample, color = process

LPD as a generative model

For each sample d, draw a multinomial θd from

a Dirichlet prior Dir(α)

θd : mixing proportions of processes for sample d

For each gene g in each sample d,

Draw a process k from Mult(θd)

Draw a real number from Gaussian N(μgk, λgk)

Inference by VB [Rogers et al. 05]

Variational Bayesian inference

VB is used when EM cannot be used.

Instead of log likelihood,

variational lower bound is maximized.

Variational lower bound

gzdggzgz

dgdgdggk

),,,,,,,,(

),,,,,,,,(log),,,(

),,,,(log

Inference by MVB [Ying et al. 08]

Marginalized variational Bayesian inference

Marginalizes multinomial parameters

Achieves less approximation than VB

cf. Collapsed variational Bayesian inference

for LDA [Teh et al. 06]

Marginalization in MVB

gzdggzgz

d k dk

dgdgdggkx

dbapbap

00000000

),,,,,,,,(),,,,,,,(

),,,,,,,(log),,(

),,,,(log

2log)(

)1()1(log

gkdgkgk

ddgkgk

d dgdgkgk

d dgkgkgk

gkdggkgk

gkgkgk

dgkkgd kgd

dgkdgkkgd kgdkgd

dgkkgd

kgddgk

Our proposal: MVB+

MVB with hyperparameter reestimation

Empirical Bayes method

○ Estimate hyperparameters by maximizing

variational lower bound

Hand-tuned hyperparameter values often result

in poor quality of inference.

Update formulas in MVB+

mkg gk

g k gkgk ba

GKab 0

00 log

log)()( b

baa g k gkgk

Inversion of digamma function is required.

Hyperparameter reestimation

An outstanding trend in Bayesian modeling?

[Asuncion et al. UAI’09]

○ Reestimate hyperparameters of LDA

○ Overturn our common sense!

before: “VB < CVB < CGS”

after: “VB = CVB = CGS” (in perplexity)

[Masada et al. CIKM’09 (poster, to appear)]

Experiments

Datasets available from Web

LK: Leukemia ( 白血病 , 백혈병 )

○ http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=63

D1: "Five types of breast cancer”

D2: "Three types of bladder cancer”

D3: "Healthy tissues”

○ http://www.ihes.fr/~zinovyev/princmanif2006/

Data specifications

Dataset name (abbreviation) # of samples # of genes

Leukemia (LK) 72 12582

Five types of breast cancer (D1) 286 17816

Three types of bladder cancer (D2) 40 3036

Healthy tissues (D3) 103 10383

Results

1. Can we achieve inference of better quality?

2. Can we achieve better sample clustering?

3. Are there any qualitative differences

between MVB and MVB+?

# of iterations

# of processes

Sample clustering evaluation

dataset method precision recall F-score

LKMVB+ 0.934+0.007 0.931+0.010 0.932+0.009

MVB 0.930+0.000 0.924+0.000 0.927+0.000

D2MVB+ 0.837+0.038 0.822+0.032 0.829+0.033

MVB 0.779+0.084 0.751+0.069 0.763+0.071

(averaged over 100 trials)

Qualitative difference (LK)

row = gene, column = sample

MVB+ can preserve diversity of genes31

MVB+ MVB

Conclusions

Formulas for hyperparameter reestimation

Improvement in inference quality

Larger variational lower bounds

Better sample clustering

Gene diversity preservation

Future work

Use more data to prove efficiency

Devise collapsed Gibbs sampling for LPD

Accelerate computations

OpenMP, Nvidia CUDA

Provide a method for gene clustering

bayesian multi-topic microarray analysis with hyperparameter reestimation

Documents

bayesian statistics

bayesian tuto

bayesian classification

microbial diagnostic microarray

bayesian soccer

© 2015 the mathworks, inc. · hyperparameter tuning with...

bayesian regression & classiﬁcation · bayesian...

bayesian project

microarray solutions newsletter #3

naive bayesian and bayesian network

bayesian networks october 9, 2008 sung-bae cho. bayesian...

introduzione ai microarray games

microarray technology

elisa microarray inmuno

normalization of microarray

gene expression microarray

bioinformatica microarray

bayesian networks

3_metode bayesian

microarray - introduction