steering time-dependent estimation of posteriors with hyperparameter indexing in bayesian topic...
TRANSCRIPT
----- Steering -----Time-Dependent Estimation--- of Posteriors ---with HYperparameter Indexing- in Bayesian Topic Models -
Tomonari MASADA (正田备也 )
Nagasaki [email protected]
OUTLINE(1/3)• Aim–Improve LDA [Blei et al. 03]
in terms of perplexityby using document timestamps
e.g. SNS documents are timestamped.e.g. Facebook, Twitter, Weibo, ...
OUTLINE(2/3)• Our approach–Prepare a word multinomial
for each timestamp• LDA : K word multinomials
• (Ours) : T x K word multinomials
Topic distributions vary along time.(Increase # basis coefficient vectors)
topic = word multinomial
(Increase # basis vectors)
Word distributions vary along time.
OUTLINE(3/3)• Problem–Overfitting• T x K x W word multinomial params
• Proposal–Hyperparameter indexing
φ1 φK
Multi(φ1), Multi(φ2), ... , Multi(φK)
φk=(φk1, φk2, ..., φkW)
LDALDA
φ1 φK
Di(β)
β=(β1, β2, ..., βW)
LDALDA
φ11 φ1K
φTKφT1
φ11 φ1K
φTKφT1
Di(β)
β=(β1, β2, ..., βW)
Option 0Option 0
φ11 φ1K
φTKφT1
Option 1Option 1
Di(β1) . . . Di(βK)
β=(βk1, βk2, ..., βkW)
φ11 φ1K
φTKφT1
Option 2Option 2
Di(β1)...Di(βT)
β=(βt1, βt2, ..., βtW)
φ11 φ1K
φTKφT1
Option 3Option 3
Di(β11) . . . Di(β1K). . .. . .. . .Di(βT1) . . . Di(βTK)β=(βtk1, βtk2, ..., βtkW)
LDA
Option 1
Option 3
PROPOSAL
----- Steering -----
Time-Dependent Estimation
--- of Posteriors ---
with HYperparameter Indexing
- in Bayesian Topic Models -
ST E
PHY
• VB for–Time Independent Model
• VB for–Slightly Time Dependent Model
• VB for–Heavily Time Dependent Model
S T E P H YLDA
Option 1
Option 3
x 50 iters
x 140 iters
x 10 iters
w kwkwk jkjkjwk exp
w
jwkjwkjk n j
jwkjwwkw n
w kwtkwtk jkjkjwk jj exp
w
jwkjwkjk n
ttj
jwkjwkwtkw
j
n:
kwtkw
kwtkw
w kwtkwtk jkjkjwk jj exp
w
jwkjwkjk n
ttj
jwkjwtkwtkw
j
n:
LDA
Option 1
Option 3
wkw
STEPHY• Conduct Multistage Inference
Over Different Topic Models
Having Compatible Parameters
DATA SPECSJ W T P
NIPS 1,740 11,998 13 919,916
DBLP 1,235,988 273,173 20 7,814,175
DONGA 24,093 71,621 53 7,949,288
TDT 96,256 51,849 123 11,460,231
NSF 128,181 25,325 13 10,388,976
YOMI 367,910 84,060 52 32,762,456
COMPLEXITY• Time: O(PK)
P = #(diff doc-word pairs)
• Space: O(QK) Q = #(diff timestamp-word pairs)
–No malloc for
–Malloc for jwk
twk
IMPLIMENTATION
• VB
–Realm of embarrassing parallelism
•OpenMP
[Wang et al. 06]
• CGS for• VB for–Time Independent Model
• VB for–Slightly Time Dependent Model
• VB for–Heavily Time Dependent Model
LDA
Option 1
Option 3
x 1000 iters
x 50 iters
x 5 iters
NEW RESULTS
x 50 itersLDA
CONCLUSIONSTEPHY–Conduct Multistage Inference
Over Different Topic ModelsHaving Compatible Parameters.
–Can efficiently improve LDAin terms of test set perplexity.
FUTURE WORK• Other types of mixture models– topic = Gaussian
• Bayesian nonparametrics– Topic distributions are left intact.
• Practical evaluatione.g. Classification, Clustering, Topic detection, IR, ...