combinatorial hodge theory and information processing 姚 远 2010.3.25

44
Combinatorial Hodge Theory and Information Processing 2010.3.25

Upload: blanche-richard

Post on 17-Jan-2016

257 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Combinatorial Hodge Theory and Information Processing

姚 远2010.3.25

Page 2: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Hodge Decomposition

Page 3: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Visual Image Patches• Ann Lee, Kim Pedersen, David Mumford (2003) studies statistical

properties of 3x3 high contrast image patches of natural images (from Van Heteran’s database)

• Gunnar Carlsson, Vin de Silva, Tigran Ishkhanov, Afra Zomorodian (2004-present) found those image patches concentrate around a 2-dimensional klein bottle imbedded in 7-sphere

• They build up simplicial complex from point cloud data• 1-D Harmonic flows actually focus on densest region -- 3 major

circles

Page 4: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

1-D Harmonic Flows on the space of 3x3 Image Patches

• 左上图: Klein Bottle of 3x3 Image Patch Space (Courtesy of Carlsson-Ishkhanov, 2007)

• 左下图: Harmonic flows focus on 3 major circles where most of data concentrate

Page 5: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Ranking

Psychology: L. L. Thurstone (1928) (scaling)

Statistics: M. Kendall (1930s, rank corellation), F. Mosteller, Bradley-Terry,…, P. Diaconis (group theory), etc.

Economics: K. Arrow, A. Sen (voting and social choice theory, Nobel Laureates)

Computer Science: Google’s PageRank, HITS, etc.

We shall focus on ranking in the sequel as the construction of abstract simplicial complex is more natural and easier than point cloud data.

Page 6: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Had William Hodge met Maurice Kendall

Hodge Theory

Pairwise ranking

The Bridge of Sighs in Cambridge,

St John’s College

Page 7: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Ranking on Networks

• “Multicriteria” ranking/decision systems– Amazon or Netflix’s recommendation system (user-product)– Interest ranking in social networks (person-interest)– S&P index (time-price)– Voting (voter-candidate)

• “Peer-review” systems– Publication citation systems (paper-paper)– Google’s webpage ranking (web-web)– eBay’s reputation system (customer-customer)

Page 8: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Ranking on Networks

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

• Multicriteria

mv1 mv2 mv3

usr1 1 - -

usr2 2 5 -

usr3 - - 4

usr4 3 2 5

• Peer-Review

Page 9: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Characteristics• Aforementioned ranking data are often

– Incomplete: typically about 1%

– Imbalanced: heterogeneously distributed votes

– Cardinal: given in terms of scores or stochastic choices

• Pairwise ranking on graphs: implicitly or explicitly, ranking data may be viewed to live on a simple graph G=(V,E), where– V: set of alternatives (webpages, products, etc.) to be

ranked– E: pairs of alternatives to be compared

Page 10: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Pagerank

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

• Model assumption: – A Markov chain random walk on

networks, subject to the link structure

• Algorithm [Brin-Page’98]– Choose Link matrix L, where L(i,j)=#

links from i to j.– Markov matrix M=D-1 L, where D = eT

L, e is the all-one vector. – Random Surfer model: E is all-one

matrix– PageRank model: P = c M + (1-c) E/n,

where c = 0.85 chosen by Google.– Pagerank vector: the primary

eigenvector v0 such that PT v0 = v0

Note: SVD decomposition of L gives HITS [Kleinberg’99] algorithm.

Problem: Can we drop Markov Chain model assumption?

Page 11: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Netflix Customer-Product Rating

Page 12: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Netflix example continued

• The first order statistics, mean score for each product, is often inadequate because of the following:

– most customers would rate just a very small portion of the products

– different products might have different raters, whence mean scores involve noise due to arbitrary individual rating scales (right figure)

How about high order statistics?

Page 13: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

From 1st order to 2nd order: Pairwise Ranking

Page 14: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Pairwise Ranking Continued

Page 15: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Another View on Pagerank

Define pairwise ranking:

wij = logPijPji

Where P is the Pagerank Markov matrix.

Claim: if P is a reversible Markov chain, i.e.

π iPij = π jPji

Then

wij = logπ i − logπ j

Page 16: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Skew-Symmetric Matrices of Pairwise Ranking

Page 17: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Pairwise Ranking of Top 10 IMDB Movies

• Pairwise ranking graph flow among top 10 IMDB movies

Page 18: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Web Link among Chinese Universities

2002, http://cybermetrics.wlv.ac.uk/database/stats/data

Link structure correlated with Research Ranking?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

159

53

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

3418 17

12

Page 19: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Classical Ordinal Rank Aggregation Problem

Page 20: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Rank Aggregation Problem

Page 21: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Answer: Not Always!

Page 22: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Triangular Transitivity

Page 23: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Hodge Decomposition: Matrix Theoretic

Page 24: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Hodge Decomposition: Graph Theoretic

Page 25: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Hodge Theory: Geometric Analysis on Graphs

(and Complexes)

图 ( 复形 ) 上的几何分析

Page 26: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Clique Complex of a Graph

Page 27: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Discrete Differential Forms

Page 28: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Discrete Exterior Derivatives: coboundary maps

Page 29: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Curl (旋度 ) and Divergence (散度 )

Page 30: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Basic Alg. Top.: Boundary of Boundary is Empty

Page 31: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

High Dim. Combinatorial Laplacians

Page 32: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Hodge Decomposition Theorem

Page 33: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Hodge Decomposition Theorem: Illustration

Page 34: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Harmonic rankings: locally consistent but globally inconsistent

Page 35: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Rank Aggregation as Projection

Page 36: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Don Saari’s Geometric Illustration of Different Projections

Page 37: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Measuring Inconsistency by Curls

• Define the cyclicity ratio by

Cp :=proj

im(curl* )w

w

⎜ ⎜

⎟ ⎟

2

≤1

This measures the total inconsistency within the data and model w.

Page 38: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Choose 6 Netflix Movies with Dynamic Drifts

Page 39: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Model Selection by Cyclicity Ratio

MRQE: Movie-Review-Query-Engine (http://www.mrqe.com/)

Page 40: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Just for fun: Chinese Universities (mainland) Ranking

Rank Research 2002

Pagerank HITS Authority

HITS Hub

Pku.edu.cn 1 2 2 1

Tsinghua.edu.cn

1 1 1 6

Fudan.edu.cn

3 7 50 21

DATA: 2002, http://cybermetrics.wlv.ac.uk/database/stats/data

Page 41: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Further Application I: Scanpath via Eyetracker

with Yizhou Wang and Wei Wang (PKU) et al.

Page 42: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Further Application II: Click-Pattern Analysis

queryID Search Results Click Pattern

Frequency

q1 [["u1","u2","u3"],["u4","u5","u6"]] [[],[2]] 1

q1 [["u1","u2","u3"],["u7","u6","u8"]] [[2],[]] 1

q1 [["u1","u2","u9"],["u10","u11","u7","u8","u4"]] [[],[4]] 1

q1 [["u12","u2","u9","u3","u13","u14"]] [[5]] 1

q1 [["u12","u2","u9"]] [[1]] 1

Pairwise Comparison discloses that ‘url1’ is less preferred to ‘url13’, in a contrast to the frequency count.

with Hang LI (MSR-A) et al.

Page 43: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Acknowledgement• Collaborators:

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

林力行 (Lim Lek-Heng)UC Berkeley

姜筱晔Stanford ICME

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

叶阴宇Stanford MS&E

Many Others: Steve Smale, Gunnar Carlsson, Hang Li, Yizhou Wang, Art Owen, Persi Diaconis, Don Saari, et al.

Jiang-Lim-Yao-Ye, 2009, arxiv.org/abs/0811.1067, Math. Prog. Accepted.

Page 44: Combinatorial Hodge Theory and Information Processing 姚 远 2010.3.25

Reference• [Friedman1996] J. Friedman, Computing Betti Numbers via Combinatorial Laplacians.

Proc. 28th ACM Symposium on Theory of Computing, 1996. http://www.math.ubc.ca/~jf/pubs/web_stuff/newbetti.html

• [Forman] R. Forman, Combinatorial Differential Topology and Geometry, New perspective in Algebraic Combinatorics, MSRI publications, http://math.rice.edu/~forman/combdiff.ps

• [David1988] David, H.A. (1988). The Method of Paired Comparisons. New York: Oxford University Press.

• [Saari1995] D. Saari, Basic Geometry of Voting, Springer, 1995

44