steering time-dependent estimation of posteriors with hyperparameter indexing in bayesian topic...

35
----- Steering ----- Time-Dependent Estimation --- of Posteriors --- with HYperparameter Indexing - in Bayesian Topic Models - Tomonari MASADA ( 正正正 正) Nagasaki University [email protected]

Upload: tomonari-masada

Post on 26-Jun-2015

148 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

----- Steering -----Time-Dependent Estimation--- of Posteriors ---with HYperparameter Indexing- in Bayesian Topic Models -

Tomonari MASADA (正田备也 )

Nagasaki [email protected]

Page 2: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

OUTLINE(1/3)• Aim–Improve LDA [Blei et al. 03]

in terms of perplexityby using document timestamps

e.g. SNS documents are timestamped.e.g. Facebook, Twitter, Weibo, ...

Page 3: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

OUTLINE(2/3)• Our approach–Prepare a word multinomial

for each timestamp• LDA : K word multinomials

• (Ours) : T x K word multinomials

Page 4: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

Topic distributions vary along time.(Increase # basis coefficient vectors)

topic = word multinomial

(Increase # basis vectors)

Word distributions vary along time.

Page 5: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

OUTLINE(3/3)• Problem–Overfitting• T x K x W word multinomial params

• Proposal–Hyperparameter indexing

Page 6: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

φ1 φK

Multi(φ1), Multi(φ2), ... , Multi(φK)

φk=(φk1, φk2, ..., φkW)

LDALDA

Page 7: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

φ1 φK

Di(β)

β=(β1, β2, ..., βW)

LDALDA

Page 8: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

φ11 φ1K

φTKφT1

Page 9: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

φ11 φ1K

φTKφT1

Di(β)

β=(β1, β2, ..., βW)

Option 0Option 0

Page 10: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

φ11 φ1K

φTKφT1

Option 1Option 1

Di(β1) . . . Di(βK)

β=(βk1, βk2, ..., βkW)

Page 11: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

φ11 φ1K

φTKφT1

Option 2Option 2

Di(β1)...Di(βT)

β=(βt1, βt2, ..., βtW)

Page 12: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

φ11 φ1K

φTKφT1

Option 3Option 3

Di(β11) . . . Di(β1K). . .. . .. . .Di(βT1) . . . Di(βTK)β=(βtk1, βtk2, ..., βtkW)

Page 13: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

LDA

Option 1

Option 3

PROPOSAL

Page 14: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

----- Steering -----

Time-Dependent Estimation

--- of Posteriors ---

with HYperparameter Indexing

- in Bayesian Topic Models -

ST E

PHY

Page 15: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

• VB for–Time Independent Model

• VB for–Slightly Time Dependent Model

• VB for–Heavily Time Dependent Model

S T E P H YLDA

Option 1

Option 3

x 50 iters

x 140 iters

x 10 iters

Page 16: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

w kwkwk jkjkjwk exp

w

jwkjwkjk n j

jwkjwwkw n

w kwtkwtk jkjkjwk jj exp

w

jwkjwkjk n

ttj

jwkjwkwtkw

j

n:

kwtkw

kwtkw

w kwtkwtk jkjkjwk jj exp

w

jwkjwkjk n

ttj

jwkjwtkwtkw

j

n:

LDA

Option 1

Option 3

wkw

Page 17: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

STEPHY• Conduct Multistage Inference

Over Different Topic Models

Having Compatible Parameters

Page 18: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

DATA SPECSJ W T P

NIPS 1,740 11,998 13 919,916

DBLP 1,235,988 273,173 20 7,814,175

DONGA 24,093 71,621 53 7,949,288

TDT 96,256 51,849 123 11,460,231

NSF 128,181 25,325 13 10,388,976

YOMI 367,910 84,060 52 32,762,456

Page 19: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

COMPLEXITY• Time: O(PK)

P = #(diff doc-word pairs)

• Space: O(QK) Q = #(diff timestamp-word pairs)

–No malloc for

–Malloc for jwk

twk

Page 20: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

IMPLIMENTATION

• VB

–Realm of embarrassing parallelism

•OpenMP

Page 21: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

[Wang et al. 06]

Page 22: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 23: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 24: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 25: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 26: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 27: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 28: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

• CGS for• VB for–Time Independent Model

• VB for–Slightly Time Dependent Model

• VB for–Heavily Time Dependent Model

LDA

Option 1

Option 3

x 1000 iters

x 50 iters

x 5 iters

NEW RESULTS

x 50 itersLDA

Page 29: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 30: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 31: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 32: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 33: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
Page 34: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

CONCLUSIONSTEPHY–Conduct Multistage Inference

Over Different Topic ModelsHaving Compatible Parameters.

–Can efficiently improve LDAin terms of test set perplexity.

Page 35: Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models

FUTURE WORK• Other types of mixture models– topic = Gaussian

• Bayesian nonparametrics– Topic distributions are left intact.

• Practical evaluatione.g. Classification, Clustering, Topic detection, IR, ...