a simple stochastic gradient variational bayes for latent dirichlet allocation

24
Gradient Variational Bayes for Latent Dirichlet Allocation Tomonari MASADA ( 正正正正 ) Nagasaki University ( 正正正正 ) [email protected]

Upload: tomonari-masada

Post on 13-Apr-2017

418 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

A Simple Stochastic Gradient Variational

Bayesfor Latent Dirichlet

Allocation

Tomonari MASADA ( 正田备也 )Nagasaki University (长崎大学 )

[email protected]

Page 2: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Aim•Obtain an informative summary of a large

set of documents•by extracting word lists, each relating to a

specific topic

Topic modeling

2

Page 3: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

3

Page 4: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Contribution•We propose a new posterior estimation for

latent Dirichlet allocation (LDA) [Blei+ 03]

•by applying stochastic gradient variational Bayes

(SGVB) [Kingma+ 14] to LDA

4

Page 5: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

5

Page 6: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

LDA [Blei+ 03]• Achieve a clustering of word tokens by assigning each word

token to one among the topics.

• : the topic to which the -th word token in document is

assigned.

• : How often the topic is talked about in document ?

• Topic probability distribution in each document

• : How often the word is used to talk about the topic ?

•Word probability distribution for each topic

discrete variablescontinuous variables

6

Page 7: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Variational Bayesian (VB) inference= maximization of evidence lower bound (ELBO)•VB tries to approximate the true posterior.•An approximate posterior is introduced when ELBO is

obtained by applying Jensen's inequality:

• : discrete hidden variables (topic assignments)• : continuous hidden variables (multinomial parameters)

evidence approximate posterior

7

Page 8: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Factorization assumption•We assume the approximate posterior factorizes as

to make the inference tractable.

•Then ELBO can be written as

8

Page 9: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Stochastic gradient variational Bayes (SGVB) [Kingma+ 14]•A general framework for estimating evidence

lower bound (ELBO) in variational Bayes (VB)

•Only applicable to continuous distributions

9

Page 10: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

(SGVB) Monte Carlo integration•By using Monte Carlo integration, ELBO can be

estimated with random samples as

• The discrete part is estimated in a similar manner to

the original VB for LDA [Blei+ 03].10

Page 11: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

(SGVB) Reparameterization• SGVB can be applied "under certain mild conditions."•We use the logistic normal distributions for approximating

the true posterior of: per-doc topic probability distributions, and: per-topic word probability distributions.

•We can efficiently sample from the logistic normal with reparameterization.

11

Page 12: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Maximize ELBO using gradient ascent

12

Page 13: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

13

Page 14: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

"Stochastic" gradient VB•The expectation integrations in ELBO are estimated

by Monte Carlo method.

•The derivatives of ELBO depend on random

samples.

•Randomness is incorporated into maximization.• SGVB = VB where gradients are stochastic.

• (Observation) It seems easier to avoid poor local minima.

14

Page 15: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

without randomness= with zero standard deviation •A special case of the proposed method is quite

similar to CVB0 [Asuncion+ 09].

•Our method has a context.15

Page 16: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Data sets for evaluation# docs # vocabulary

words

NYT 99,932 46,263

MOVIE 27,859 62,408

NSF 128,818 21,471

MED 125,490 42,83016

Page 17: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

17

Page 18: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

18

Page 19: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

19

Page 20: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

20

Page 21: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Not that efficient in time…

•500 iters for NYT data set when

•LNV: 43 hours

•CGS: 14 hours

•VB: 23 hours

•However, parallelization with GPU works.

• (preparing an implementation with TensorFlow)

21

Page 22: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Conclusion•We incorporate randomness into variational

inference for LDA by applying SGVB to LDA.

•The proposed method gives perplexities

comparable to the existing inferences for

LDA.

22

Page 23: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Future work•SGVB is a general framework for devising a

posterior inference for probabilistic models.

•We've already applied SGVB to CTM [Blei+ 05].• This will be poster-presented at APWeb'16.

•SGVB is also applicable to other document models.• NVDM [Miao+ 16]: document modeling with MLP

23

Page 24: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

24