fast pair-wise and node-wise algorithms for commute times and katz scores

48
FAST MATRIX COMPUTATIONS FOR PAIRWISE AND COLUMN-WISE KATZ SCORES AND COMMUTE TIMES David F. Gleich Purdue University Emory University Math/CS Seminar January 20, 2012 With Pooya Esfandiar, Francesco Bonchi, Chen Greif, Laks V. S. Lakshmanan, and Byung-Won On David F. Gleich (Purdue) Emory Math/CS Seminar 1 / 47

Upload: david-gleich

Post on 15-Jan-2015

1.424 views

Category:

Education


0 download

DESCRIPTION

Slides from a seminar I gave at Emory University on 2012-01-20

TRANSCRIPT

Page 1: Fast pair-wise and node-wise algorithms for commute times and Katz scores

FAST MATRIX COMPUTATIONS FOR

PAIRWISE AND COLUMN-WISE KATZ

SCORES AND COMMUTE TIMES

David F. Gleich

Purdue University

Emory University

Math/CS Seminar

January 20, 2012

With Pooya Esfandiar, Francesco Bonchi, Chen Greif,

Laks V. S. Lakshmanan, and Byung-Won On

David F. Gleich (Purdue) Emory Math/CS Seminar 1 / 47

Page 2: Fast pair-wise and node-wise algorithms for commute times and Katz scores

MAIN RESULTS

A – adjacency matrix

L – Laplacian matrix

Katz score :

Commute time:

For Commute

Compute one   fast

Estimate  

David F. Gleich (Purdue) Emory Math/CS Seminar

For Katz Compute one       fast Compute top       fast

2 of 47

Page 3: Fast pair-wise and node-wise algorithms for commute times and Katz scores

OUTLINE

Why study these measures?

Katz Rank and Commute Time

How else do people compute them?

Quadrature rules for pairwise scores

Sparse linear systems solves for top-k

As many results as we have time for…

David F. Gleich (Purdue) Emory Math/CS Seminar 3 of 47

Page 4: Fast pair-wise and node-wise algorithms for commute times and Katz scores

WHY? LINK PREDICTION

David F. Gleich (Purdue) Emory Math/CS Seminar

Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient

Neighborhood based

Path based

4 of 47

Page 5: Fast pair-wise and node-wise algorithms for commute times and Katz scores

NOTE

All graphs are undirected

All graphs are connected

David F. Gleich (Purdue) Emory Math/CS Seminar 5 of 47

Page 6: Fast pair-wise and node-wise algorithms for commute times and Katz scores

LEO KATZ

David F. Gleich (Purdue) Emory Math/CS Seminar 6 of 47

Page 7: Fast pair-wise and node-wise algorithms for commute times and Katz scores

NOT QUITE, WIKIPEDIA

    : adjacency,                     : random walk

PageRank                          

Katz                          

These are equivalent if     has constant degree

David F. Gleich (Purdue) Emory Math/CS Seminar 7 of 47

Page 8: Fast pair-wise and node-wise algorithms for commute times and Katz scores

WHAT KATZ ACTUALLY SAID

Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43

“we assume that each link independently has the

same probability of being effective” …

“we conceive a constant     , depending

on the group and the context of the particular

investigation, which has the force of a probability

of effectiveness of a single link. A k-step chain

then, has probability       of being effective.”

“We wish to find the column sums of the matrix”

David F. Gleich (Purdue) Emory Math/CS Seminar 8 of 47

Page 9: Fast pair-wise and node-wise algorithms for commute times and Katz scores

A MODERN TAKE

The Katz score (node-based) is

                   

The Katz score (edge-based) is

                   

David F. Gleich (Purdue) Emory Math/CS Seminar 9 of 47

Page 10: Fast pair-wise and node-wise algorithms for commute times and Katz scores

RETURNING TO THE MATRIX

                     

                                                     

Carl Neumann                

                                    David F. Gleich (Purdue) Emory Math/CS Seminar 10 of 47

Page 11: Fast pair-wise and node-wise algorithms for commute times and Katz scores

Carl Neumann

I’ve heard the Neumann series called the “von Neumann”

series more than I’d like! In fact, the von Neumann kernel

of a graph should be named the “Neumann” kernel!

David F. Gleich (Purdue) Emory Math/CS Seminar

Wikipedia page

11 / 47

Page 12: Fast pair-wise and node-wise algorithms for commute times and Katz scores

PROPERTIES OF KATZ’S MATRIX

  is symmetric

  exists when  

  is spd when  

Note that   1/max-degree suffices

David F. Gleich (Purdue) Emory Math/CS Seminar 12 of 47

Page 13: Fast pair-wise and node-wise algorithms for commute times and Katz scores

COMMUTE TIME

David F. Gleich (Purdue) Emory Math/CS Seminar

Picture taken from Google images, seems to be

Bay Bridge Traffic by Jim M. Goldstein.

13 / 47

Page 14: Fast pair-wise and node-wise algorithms for commute times and Katz scores

COMMUTE TIME

Consider a uniform random walk on a graph                    

                                                       

                                               

David F. Gleich (Purdue) Emory Math/CS Seminar

Also called the hitting

time from node i to j, or

the first transition time

14 of 47

Page 15: Fast pair-wise and node-wise algorithms for commute times and Katz scores

SKIPPING DETAILS

  : graph Laplacian

  is the only null-vector

David F. Gleich (Purdue) Emory Math/CS Seminar 15 of 47

Page 16: Fast pair-wise and node-wise algorithms for commute times and Katz scores

WHAT DO OTHER PEOPLE DO?

1) Just work with the linear algebra formulations

2) For Katz, Truncate the Neumann series as a few (3-5) terms, or MMQ (Benzi & Boito)

3) Use low-rank approximations from EVD(A) or EVD(L)

4) For commute, use Johnson-Lindenstrauss inspired random sampling

5) Approximately decompose into smaller problems

David F. Gleich (Purdue) Emory Math/CS Seminar

Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,

Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007 16 of 47

Page 17: Fast pair-wise and node-wise algorithms for commute times and Katz scores

THE PROBLEM

All of these techniques are

preprocessing based because

most people’s goal is to compute

all the scores.

We want to avoid

preprocessing the graph.

David F. Gleich (Purdue) Emory Math/CS Seminar

There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse

17 of 47

Page 18: Fast pair-wise and node-wise algorithms for commute times and Katz scores

WHY NO PREPROCESSING?

The graph is constantly changing

as I rate new movies.

David F. Gleich (Purdue) Emory Math/CS Seminar 18 of 47

Page 19: Fast pair-wise and node-wise algorithms for commute times and Katz scores

WHY NO PREPROCESSING?

David F. Gleich (Purdue) Emory Math/CS Seminar

Top-k predicted “links”

are movies to watch!

Pairwise scores give

user similarity

19 of 47

Page 20: Fast pair-wise and node-wise algorithms for commute times and Katz scores

PAIR-WISE ALGORITHMS

David F. Gleich (Purdue) Emory Math/CS Seminar 20 / 47

Page 21: Fast pair-wise and node-wise algorithms for commute times and Katz scores

PAIRWISE ALGORITHMS

Katz  

Commute  

David F. Gleich (Purdue) Emory Math/CS Seminar

Golub and Meurant

to the rescue!

21 of 47

Page 22: Fast pair-wise and node-wise algorithms for commute times and Katz scores

MMQ - THE BIG IDEA

Quadratic form    

Weighted sum    

Stieltjes integral    

Quadrature approximation    

Matrix equation   David F. Gleich (Purdue) Emory Math/CS Seminar

Think  

A is s.p.d. use EVD

“A tautology”

Lanczos

22 of 47

Page 23: Fast pair-wise and node-wise algorithms for commute times and Katz scores

LANCZOS

  , k-steps of the Lanczos method produce

  and  

David F. Gleich (Purdue) Emory Math/CS Seminar

                    =

             

23 of 47

Page 24: Fast pair-wise and node-wise algorithms for commute times and Katz scores

PRACTICAL LANCZOS

Only need to store the last 2 vectors in  

Updating requires O(matvec) work

  is not orthogonal

David F. Gleich (Purdue) Emory Math/CS Seminar 24 of 47

Page 25: Fast pair-wise and node-wise algorithms for commute times and Katz scores

MMQ PROCEDURE

Goal  

Given  

1. Run k-steps of Lanczos on   starting with  

2. Compute   ,   with an additional eigenvalue at   ,

set   3. Compute   ,   with an additional eigenvalue at   , set

  4. Output   as lower and upper bounds on  

David F. Gleich (Purdue) Emory Math/CS Seminar

Correspond to a Gauss-Radau rule, with

u as a prescribed node

Correspond to a Gauss-Radau rule, with

l as a prescribed node

25 of 47

Page 26: Fast pair-wise and node-wise algorithms for commute times and Katz scores

PRACTICAL MMQ

Increase k to become more accurate

Bad eigenvalue bounds yield worse results

  and   are easy to compute

  not required, we can iteratively

update it’s LU factorization

David F. Gleich (Purdue) Emory Math/CS Seminar 26 of 47

Page 27: Fast pair-wise and node-wise algorithms for commute times and Katz scores

PRACTICAL MMQ

David F. Gleich (Purdue) Emory Math/CS Seminar 27 of 47

Page 28: Fast pair-wise and node-wise algorithms for commute times and Katz scores

ONE LAST STEP FOR KATZ

Katz  

David F. Gleich (Purdue) Emory Math/CS Seminar 28 of 47

Page 29: Fast pair-wise and node-wise algorithms for commute times and Katz scores

COLUMN-WISE

ALGORITHMS

David F. Gleich (Purdue) Emory Math/CS Seminar 29 / 47

Page 30: Fast pair-wise and node-wise algorithms for commute times and Katz scores

COLUMN-WISE COMMUTE-TIME

  requires entire diagonal of  

David F. Gleich (Purdue) Emory Math/CS Seminar

Paige and Saunders, 1975

                   

             

Each vector   is computed by the a Lanczos based CG algorithm.

30 of 47

  =    

Page 31: Fast pair-wise and node-wise algorithms for commute times and Katz scores

COLUMN-WISE COMMUTE TIME

David F. Gleich (Purdue) Emory Math/CS Seminar

  following Hein et al. 2010 is a MUCH better and faster approximation.

31

Page 32: Fast pair-wise and node-wise algorithms for commute times and Katz scores

KATZ SCORES ARE LOCALIZED

David F. Gleich (Purdue) Emory Math/CS Seminar

Up to 50 neighbors is 99.65% of the total mass

32 of 47

Page 33: Fast pair-wise and node-wise algorithms for commute times and Katz scores

PARTICIPATION RATIOS

David F. Gleich (Purdue) Emory Math/CS Seminar

Participation

Ratios

“effective non-zeros” in a vector

33 of 47

Page 34: Fast pair-wise and node-wise algorithms for commute times and Katz scores

TOP-K ALGORITHM FOR KATZ

Approximate  

where   is sparse

Keep   sparse too

Ideally, don’t “touch” all of  

David F. Gleich (Purdue) Emory Math/CS Seminar 34 of 47

Page 35: Fast pair-wise and node-wise algorithms for commute times and Katz scores

INSPIRATION - PAGERANK

Approximate  

where   is sparse

Keep   sparse too? YES!

Ideally, don’t “touch” all of   ? YES!

David F. Gleich (Purdue) Emory Math/CS Seminar

McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.

35 of 47

Page 36: Fast pair-wise and node-wise algorithms for commute times and Katz scores

THE ALGORITHM - MCSHERRY

For  

Start with the Richardson iteration

Rewrite

Richardson converges if  

David F. Gleich (Purdue) Emory Math/CS Seminar 36 of 47

Page 37: Fast pair-wise and node-wise algorithms for commute times and Katz scores

THE ALGORITHM

Note   is sparse.

If   , then   is sparse.

Idea

only add one component of   to  

David F. Gleich (Purdue) Emory Math/CS Seminar 37 of 47

Page 38: Fast pair-wise and node-wise algorithms for commute times and Katz scores

THE ALGORITHM

For  

Init:  

How to pick   ?

David F. Gleich (Purdue) Emory Math/CS Seminar 38 of 47

Page 39: Fast pair-wise and node-wise algorithms for commute times and Katz scores

THE ALGORITHM FOR KATZ

For  

Init:  

Pick   as max   David F. Gleich (Purdue) Emory Math/CS Seminar

Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more

39 of 47

Page 40: Fast pair-wise and node-wise algorithms for commute times and Katz scores

CONVERGENCE?

If   1/max-degree then   is sub-stochastic and the PageRank based proof applies because the matrix is diagonally dominant

For   , then for symmetric   , this algorithm is the Gauss-Southwell procedure and it still converges.

David F. Gleich (Purdue) Emory Math/CS Seminar

40 of 47

Page 41: Fast pair-wise and node-wise algorithms for commute times and Katz scores

NONSYMMETRIC ALGORITHM?

Was this algorithm known in the non-symmetric case?

YES! It’s just Gauss-Seidel/Gauss-Southwell, but with a column vs. row orientation.

Called dynamic residual to AMGers

David F. Gleich (Purdue) Emory Math/CS Seminar 41 of 47

Page 42: Fast pair-wise and node-wise algorithms for commute times and Katz scores

RESULTS – DATA, PARAMETERS

All unweighted, connected graphs

Easy                                    

Hard                            

David F. Gleich (Purdue) Emory Math/CS Seminar 42 of 47

Page 43: Fast pair-wise and node-wise algorithms for commute times and Katz scores

KATZ BOUND CONVERGENCE

David F. Gleich (Purdue) Emory Math/CS Seminar 43 of 47

Page 44: Fast pair-wise and node-wise algorithms for commute times and Katz scores

COMMUTE BOUND CONVERG.

David F. Gleich (Purdue) Emory Math/CS Seminar 44 of 47

Page 45: Fast pair-wise and node-wise algorithms for commute times and Katz scores

KATZ SET CONVERGENCE

David F. Gleich (Purdue) Emory Math/CS Seminar

For arXiv graph.

45 of 47

Page 46: Fast pair-wise and node-wise algorithms for commute times and Katz scores

TIMING

David F. Gleich (Purdue) Emory Math/CS Seminar 46 of 47

Page 47: Fast pair-wise and node-wise algorithms for commute times and Katz scores

CONCLUSIONS

These algorithms are faster than many alternatives.

For pairwise commute, stopping criteria are simpler with bounds

For top-k problems, we often need less than 1 matvec for good enough results

David F. Gleich (Purdue) Emory Math/CS Seminar 47 of 47

Page 48: Fast pair-wise and node-wise algorithms for commute times and Katz scores

By AngryDogDesign on DeviantArt

Paper at WAW2010 and J. Internet Mathematics

Slides online

Code is online already

www.cs.purdue.edu/

homes/dgleich/codes/fast-katz-2011

David F. Gleich (Purdue) Emory Math/CS Seminar 48