a general optimization framework for smoothing language models on graph structures qiaozhu mei, duo...

A General Optimization Framework for Smoothing Language Models on

Graph Structures

Qiaozhu Mei, Duo Zhang, ChengXiang Zhai

University of Illinois at Urbana-Champaign

http://www.cs.uiuc.edu/

Kullback-Leibler Divergence Retrieval Method

Document d

A text mining paper

data mining

Doc Language Model (LM) θd : p(w|d) text 4/100=0.04

mining 3/100=0.03clustering 1/100=0.01…data = 0computing = 0…

Query q

Data ½=0.5Mining ½=0.5

Query Language Model θq : p(w|q)

Data ½=0.4Mining ½=0.4Clustering =0.1…

?p(w|q’)

text =0.039mining =0.028clustering =0.01…data = 0.001computing = 0.0005… Similarity

function

)|(

)|(log)|()||(

d

qq

Vwdq wp

wpwpD

Smoothed Doc LM θd' : p(w|d’)

2


Smoothing a Document Language Model

3

Retrieval performance estimate LM smoothing LM

text 4/100 = 0.04mining 3/100 = 0.03Assoc. 1/100 = 0.01clustering 1/100=0.01…data = 0computing = 0…

text = 0.039mining = 0.028Assoc. = 0.009clustering =0.01…data = 0.001computing = 0.0005…

Assign non-zero prob. to unseen words

Estimate a more accurate distribution from sparse data

text = 0.038mining = 0.026Assoc. = 0.008clustering =0.01…data = 0.002computing = 0.001…

)|( dMLE wP

)|( collectionwP )|()|()1()|( collectiondMLE wPwPdwP


Previous Work on Smoothing

d

Collection

d

Clusters

d

Nearest Neighbors

Collection

Cluster

neighbors

d

d~

Interpolate MLE

with Reference LM

Estimate a Reference language model

θref using the collection (corpus)

ref

)|()|()|( refdMLE wPwPdwP

[Ponte & Croft 98]

[Liu & Croft 04]

[Kurland& Lee 04]

4


Problems of Existing Methods

• Smoothing with Global Background– Ignoring collection structure

• Smoothing with Document Clusters– Ignoring local structures inside cluster

• Smoothing using Neighbor Documents– Ignoring global structure

• Different heuristics on θref and interpolation– No clear objective function for optimization– No guidance on how to further improve the existing methods

5


Research Questions

• What is the right corpus structure to use?

• What are the criteria for a good smoothing method? – Accurate language model?

• What are we ending up optimizing?

• Could there be a general optimization framework?

6


Our Contribution

• Formulation of smoothing as optimization over graph structures

• A general optimization framework for smoothing both document LMs and query LMs

• Novel instantiations of the framework lead to more effective smoothing methods

7


A Graph-based Formulation of Smoothing

• A novel and general view of smoothing

8

d

P(w|d): MLEP(w|d): Smoothed

P(w|d) = Surface on top of the Graph

projection on a plain

Smoothed LM = Smoothed Surface!

Collection = Graph (of Documents)

Collection

P(w|d1)P(w|d2)

d1d2


Covering Existing Models

9

dC1

C2

C3

C4

Background

Smoothing with Graph Structure

Smoothing with Nearest Neighbor- Local Graph

Smoothing with Document Clusters- Forest w/ Pseudo docs

Smoothing with Global Background- Star graph

Collection = Graph

Smoothed LM = Smoothed Surfaces


Instantiations of the Formulation

10

Language Models to be Smoothed

Types of Graphs Document LM Query LM

Star Graph w/ Background Node

[Ponte & Croft 98], [Hiemstra & Kraaij 98], [Miller et al. 99], [ Zhai & Lafferty 01]…

N/A

Forest w/ Cluster roots [Liu and Croft 04] N/A

Local kNN graph [Kurland and Lee 04][Tao et al. 06]

N/A

Document Similarity Graph Novel N/A

Word Similarity Graph Novel Novel

Other graphs? ? ?

DocumentGraphs


Smoothing over Word Graphs

w

P(wu|d)/Deg(u)

Smoothed LM = Smoothed Surface!

Similarity graph of words

Given d, {P(w|d)} = Surface over the word graph!

P(wu|d)P(wv|d)

11


The General Objective of Smoothing

12

2

),(

2 ))(,()~

)(()1()(

Evu

vuVu

uu ffvuwffuwCO

ufuf~ 2)

~)(( uu ffuw

Fidelity to MLE 2

),(

))(,(

Evu

vu ffvuw

Smoothness of the surface

)(uw

Importance of vertices

),( vuw

- Weights of edges (1/dist.)


The Optimization Framework

13

2

),(

2 ))(,()~

)(()1()(

Evu

vuVu

uu ffvuwffuwCO

• Criteria: – Fidelity: keep close to the MLE– Surface Smoothness: local and global consistency– Constraint:

• Unified optimization objective:

Fidelity to MLE Smoothness of the surface

)(minarg Find :Smoothing COfuf

u

w

dwpd 1)|( ,


The Procedure of Smoothing

14

Construct a document/word

graph;

d

Vvvuuu

u

ffvuwffuDegf

CO))(,(2)

~)(()1(2

)(

Vv

vuu fuDeg

vuwff

)(

),(~)1( Iterative updating

Define reasonable w(u)

and w(u,v);

AdditionalDirichlet

Smoothing

Define reasonable fu

smoothed

Evuv

vuwuDeguw),(,

),()()(

Definegraph

Definesurfaces

Smoothsurfaces


Smoothing Language Models using a Document Graph

15

Construct a kNN graph of documents;

d w(u): Deg(u) w(u,v): cosine

AdditionalDirichlet

Smoothing

fu= p(w|du); or fu= s(q, du);

uf

;)|()(

),()|()1()|(

Vv

vuMLEu dwPuDeg

vuwdwPdwP

Vv

vuu dqsuDeg

vuwdqsdqs ),(

)(

),(),(~)1(),(or

Document language model:

Alternative: Document relevance score: e.g., (Diaz 05)


Smoothing Language Models using a Word Graph

16

Construct a kNN graph of

words;

w w(u): Deg(u) w(u,v): PMI

AdditionalDirichlet

Smoothing

fu=

uf

Document language model:

Query Language Model

)(

)|(

uDeg

dwP u

)(

)|(or

uDeg

qwP u

Vv

vuMLEu dwPVDeg

vuwdwPdwP )|(

)(

),()|()1()|(

Vv

vuMLEu qwPVDeg

vuwqwPqwP )|(

)(

),()|()1()|(or


Intuitive Interpretation – Smoothing using Word Graph

17

Vv

vuMLu dwPVDeg

vuwdwPdwP )|()

)(

),()|()1(()|(

w

Stationary distribution of a Markov Chain

wWriting a document = random walk on the word Markov chain; write down w whenever passing w

)( uvP


Intuitive Interpretation – Smoothing using Document Graph

d

d1 0

0))|(1)(1(1)|()1()|( uMLuMLu dwPdwPdwP

Vv

vdwPuDeg

vuw)|(

)(

),( Absorption Probability to the “1” state

Writing a word w in a document = random walk on the doc Markov chain; write down w if reaching “1”

)1( uP )0( uP

)( vuP

Act as neighbors do

18


Experiments

Data Sets

# docs

Avg doc length

queries # relevant docs

AP88-90 243k 273 51-150 21829

LA 132k 290 301-400 2350

SJMN 90k 266 51-150 4881

TREC8 528k 477 401-450 4728

19

Liu and Croft ’04Tao ’06

• Smooth Document LM on Document Graph (DMDG)• Smooth Document LM on Word Graph (DMWG)• Smooth relevance Score on Document Graph (DSDG)• Smooth Query LM on word graph (QMWG) • Evaluate using MAP


Effectiveness of the Framework

20

Data Sets Dirichlet DMDG DMWG † DSDG QMWG

AP88-90 0.217 0.254 ***(+17.1%)

0.252 ***(+16.1%)

0.239 ***(+10.1%)

0.239(+10.1%)

LA 0.247 0.258 **(+4.5%)

0.257 **(+4.5%)

0.251 **(+1.6%)

0.247

SJMN 0.204 0.231 ***(+13.2%)

0.229 ***(+12.3%)

0.225 ***(+10.3%)

0.219(+7.4%)

TREC8 0.257 0.271 *** (+5.4%)

0.271 **(+5.4%)

0.261 (+1.6%)

0.260(+1.2%)

† DMWG: reranking top 3000 results. Usually this yieldsto a reduced performance than ranking all the documents

Wilcoxon test: *, **, *** means significance level 0.1, 0.05, 0.01

Graph-based smoothing >> BaselineSmoothing Doc LM >> relevance score >> Query LM


Comparison with Existing Models

21

Data Sets

CBDM(Liu and Croft)

DELM(Tao et al.)

DMDG DMDG(1 iteration)

AP88-90 0.233 0.250 0.254 0.252

LA 0.259 0.265 0.260 0.258

SJMN 0.217 0.227 0.235 0.229

TREC8 N/A 0.267 0.271 0.270

Graph-based smoothing > state-of-the-art More iterations > Single iteration (similar to DELM)


Combined with Pseudo-Feedback

22

Data Sets FB FB+QMWG

AP88-90 0.271 0.273

LA 0.258 0.267

SJMN 0.245 0.246

TREC8 0.278 0.280

Data Sets DMWG FB FB+DMWG

AP88-90 0.252 0.266 0.271 **

LA 0.257 0.257 0.267 **

SJMN 0.229 0.241 0.249 **

TREC8 0.271 0.278 0.292 ***

d1dθ

q

BθF

w

smooth

w

smooth

rerankTop docs


Related Work

• Language modeling in Information Retrieval; smoothing using collection model– (Ponte & Croft 98); (Hiemstra & Kraaij 98); (Miller et al. 99); (Zhai &

Lafferty 01), etc.

• Smoothing using corpus structures– Cluster structure: (Liu & Croft 04), etc.

– Nearest Neighbors: (Kurland & Lee 04), (Tao et al. 06)

• Relevance score propagation (Diaz 05), (Qin et al. 05)

• Graph-based learning– (Zhu et al. 03); (Zhou et al. 04), etc.

23


Conclusions

• Smoothing language models using document/word graphs

• A general optimization framework– Various effective instantiations

• Improved performance over state-of-the-art• Future Work:

– Combine document graphs with word graphs– Study alternative ways of constructing graphs

24


Thanks!

25


Parameter Tuning

26

Fast Convergence