eigenrank: a ranking-oriented approach to collaborative filtering ids lab. seminar spring 2009 강...

33
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 강 강 [email protected] May 21 st , 2009 Nathan N. Liu & Qiang Yang SIGIR 2008 Center for E-Business Technology Seoul National University Seoul, Korea

Upload: hortense-fowler

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

EigenRank:A Ranking-Oriented Approach to Collaborative Fil-tering

IDS Lab. Seminar

Spring 2009

강 민 석[email protected]

May 21st, 2009

Nathan N. Liu & Qiang Yang

SIGIR 2008

Center for E-Business TechnologySeoul National UniversitySeoul, Korea

Page 2: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Contents

Introduction

Related Work

Rating Oriented Collaborative Filtering

Ranking Oriented Collaborative Filtering

Experiments

Conclusions

2

Page 3: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Introduction

Recommender Systems Content-based filtering

Analyze content information associated with items and users E.g. product descriptions, user profiles, etc.

Represent users and items using a set of features

Collaborative filtering

NOT require content information about items

Assumption that a user is interested in items preferred by other similar users

shirt

color

red blue black

brand size

User A

Item 1 Item 2 Item 3User B

Item 1 Item 2 Item 3

Content-based filtering collaborative filtering

3

Page 4: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Introduction

Collaborative Filtering Application Scenario Rating prediction

one individual item at a time with a predicted rating

Top-N recommended items

an ordered list of top-N recommended items

Rating Prediction (MovieLens) Top-N List (Amazon)

4

Page 5: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Introduction

Motivation In most CF, adopt rating-oriented approach

predict potential ratings first, then rank them

Higher accuracy in rating prediction does NOT necessarily lead to better ranking effectiveness

Example

Same error for two prediction algorithm, but for “predicted 2”, predicted ranking is incorrect

Most existing methods predict ratingwithout considering user’s preferences regarding pair of items

5

Item i Item j error

True rating 3 4

Predicted 1 2 5

Predicted 2 4 3

2)45()32(22 2)43()34(

22

Page 6: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Introduction

Overview Ranking-oriented Approach to CF

directly address item ranking problem

Without inter-mediate step of rating prediction

Contribution Similarity measure for two user’s rankings

Kendall rank correlation coefficient

Methods for producing item rankings

Greedy order algorithm, Random walk model

6

Rating prediction Rank items

Page 7: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Contents

Introduction

Related Work

Neighborhood-based Approach

Model-based Approach

Rating Oriented Collaborative Filtering

Ranking Oriented Collaborative Filtering

Experiments

Conclusions

7

Page 8: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Neighborhood-based Approach

User-based Model Estimate unknown ratings of a target user

based on ratings of neighboring users by using user-user similarity

Difficulties in User-based Model Raw ratings may contain biases

E.g. Some tends to give high ratings.

Use user-specific means

User-item ratings data is sparse

dimensionality reduction

data-smoothing methods

User u item User v

4 Item A 2

5 Item B 2

5 Item C 1

5 Item D 4

4 Item E 3

5 Item F 2

4.67 Mean 2.33

0.52 Stdev 1.03

8

Page 9: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Neighborhood-based Approach

Item-based Model similar, but use item-item similarity

Less sensitive to sparsity problem

# of items < # of users

Higher accuracy while allowing more efficient computations

Sarwar et al., 2001

Item-based model (Amazon)

9

Page 10: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Model-based Approach

Model-based Approach Use observed user-item ratings to train a compact model

Rating prediction via the model instead of directly manipulating data

Algorithms

Clustering methods

Aspect models

Bayesian networks

Learning to Rank Rank items represented in some feature space

Methods Try to

Learn an item scoring function

Learn a classifier for classifying item pairs

10

Page 11: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Contents

Introduction

Related Work

Rating Oriented Collaborative Filtering

Similarity Measure

Rating Prediction

Ranking Oriented Collaborative Filtering

Experiments

Conclusions

11

Page 12: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Rating-based Similarity Measures

Pearson Correlation Coefficient Similarity between two users

normalize ratings using average

Vector Similarity Another way of user-user similarity

view each user as a vector

cosine of the angle between two vectors

Item-Item similarity

Adjusted cosine similarity most effective

12

Page 13: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Rating Prediction

User-based Model select a set of k most similar users

compute weighted average of ratings

Item-based Model similar to user-based model

Set of k items most similar to i

13

Page 14: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Contents

Introduction

Related Work

Rating Oriented Collaborative Filtering

Ranking Oriented Collaborative Filtering

Similarity Measure – Kendall Rank Correlation Coefficient

Preference Functions – Greedy Order & Random Walk Model

Experiments

Conclusions

14

Page 15: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Similarity Measure

Motivation PCC and VS are rating-based measures

In ranking-based, similarity is determined by users’ preferences over items.

E.g. for user 1 and 2, rating values are different, but preferences are very close.

Kendall Rank Correlation Coefficient

Item A Item B Item C Ranking rating diff

User 1 2 3 4 C > B > A

User 2 3 4 5 C > B > A

15

3

2

2

different preference

same preference2)1(

2 nn

nC

Page 16: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Preference Functions

Modeling a user’s preference function Given two items i and j, which item is more preferable and how

much?

means item i is more preferable

indicates the strength of preference

Characteristics

For same item :

Anti-symmetric :

NOT transitive : do not imply

16

Page 17: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Preference Functions

Derive Preference Function Key challenge is to get preference that have NOT been rated.

Use the same idea of neighborhood-based CF

Find the set of neighbors of target user who have rated both items

17

Page 18: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Preference Functions

Produce Ranking Given preference function, we want to get a ranking of items.

Ranking that agree with pairwise preferences as much as possible

Ranking ρ : ranking of item in item set I

: item i is ranked higher than j

Value function

How ρ is consistent with the preference function Ψ

Our goal is to find that maximizes value function

Optimal solution

NP-Complete problem : Use Greedy algorithm

18

Page 19: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Greedy Order Algorithm

Motivation Find an approximately optimal ranking

Algorithm Input : item set I, preference function Ψ

Output : ranking

Complexity is O(n2), more than half of optimal

19

potential valuehigher when more items less preferred than i

find highest ranked item

remove highest one,then iterate

Page 20: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Random Walk Model for Item Ranking

Random Walk based on User Preferences Motivation

some rated i > j, others rated j > k, but only few rated all three i, j, k

want to infer preference between i and k (implicit relationships)

Use multi-step random walks

Markov chain model

Google PageRank

Random walk on Web pages based on hyperlink Surfer randomly pick hyperlink

Stationary distribution used to PageRank

Model for item ranking

Similarly, there are implicit links between two items less preferred item j link to more preferred item i

transitional probability

Stationary distribution used to item ranking

20

At each step the system may change its state from the current state to another state according to a probability distribution. The changes of state are called transitions … (Wikipedia)

page pagelink

item itempreference

Page 21: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Random Walk Model for Item Ranking

Random Walk based on User Preferences Transitional probability

Probability of switching current item i to another item j

higher for items that are more preferred than i

depend on user’s preference function

21

Why exp function? non-nega-tive

Page 22: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Random Walk Model for Item Ranking

Compute the Item Rankings Think of PageRank algorithm you may know

We can use matrix notations

P : transition matrix

entry : transition probability

: probability of being at item i after t walking steps

define

get these probabilities using power iteration method for solving eigenvec-tor

Stationary probabilities

It works?

Existence and uniqueness guaranteed iff P is irreducible entries of P are all non-negative

22

Page 23: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Random Walk Model for Item Ranking

Personalization Vector (teleport) To avoid the reducibility of the stochastic matrix (Brin and Page,

1998)

Revised transition matrix

PageRank

Web surfer sometimes “teleport” to other pages.

Teleport according to probability distribution defined by personalization vector v

ε controls how often surfer teleport rather than following hyperlinks.

Our model

similar idea to define personalization vector Teleport to items with high ratings more often

Unrated items have equal probabilities

23

Page 24: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Contents

Introduction

Related Work

Rating Oriented Collaborative Filtering

Ranking Oriented Collaborative Filtering

Experiments

Conclusions

24

Page 25: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Experiments

Issues

1. Is ranking-oriented approach better than rating-oriented?

2. Which is better, greedy order algorithm and random walk model?

3. Is the ranking-oriented similarity measure (Kendall’s) more effec-tive?

25

Pearson’s / Vec-tor

Similarity

Kendall’s rankSimilarity

Rating User / Item

Ranking

Greedy

Random Walk

1

2

3

Page 26: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Experiments

Data Sets Two Movie ratings data sets

EachMovie and Netflix

Users rate >40 different movies

10,000 for training

100 for parameter tuning

500 for testing

Evaluation Protocol For each user in the test set,

50% for model construction

50% for hold-out data for evaluation

26

EachMovie Netflix

# of ratings 2.8 M → ? 100 M → ?

# of users72,000 → 10,600

480,000 → 10,600

# of movies 1.628 18,000 → 2.000

Rating scale 1 to 6 1 to 5

density 6.1 % 6.6 %

Page 27: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Evaluation Metric

Which metric to use? Rating-oriented CF

MAE (Mean Absolute Error) and RMSE (Root Mean Square Error)

Focus on difference between true rating and predicted rating

Ranking-oriented CF

Our emphasis is on improving item rankings.

NDCG (Normalized Discounted Cumulative Gain) Evaluate over the top-k items on ranked list

27

discounting factorIncrease with position in ranking

Page 28: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Impact of Parameters

Impact of Neighborhood Size size of neighborhood affect performance

Result

When neighbor size ↑, NDCG ↑ until 100because given more neighbors, preference function more accurate

But, start to decrease when exceed 100, due to many non-similar users

28

Page 29: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Impact of Parameters

Impact of ε How often “teleport” operation affect performance?

Result

When ε ↑, NDCG ↑

But, NOT too big (0.8~0.9)

29

Page 30: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Comparisons with Other Algorithms

30

Issues

1. Is ranking-oriented approach better than rating-oriented?

2. Which is better, greedy order algorithm and random walk model?

3. Is the ranking-oriented similarity measure (Kendall’s) more effec-tive?

Comparison 4 rating oriented settings, 6 ranking oriented settingsPCC VS KRCC

RatingUser UPCC UVS

Item IPCC IVS

Rank-ing

Greedy GOPCC GOVS GOKRCC

Random Walk

RWPCC RWVS RWKRCC

Page 31: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Comparisons with Other Algorithms

Result Ranking-oriented is better than rating-oriented about 8.8% for

NDCG1

Random walk model outperformed all the rating-oriented

Random walk model is little better than greedy order

Kendall rank correlation coefficient is more effective for rank-ing-oriented

31

Page 32: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

Copyright 2009 by CEBT

Kendall rank corr. coeff.

Conclusion

Ranking-oriented Framework for CF Item ranking w/o rating prediction as intermediate step

Extend neighborhood-based CF by identifying preferences

Two methods for computing item ranking

Greedy order algorithm

Random walk model

32

Similarity measure Preference function

Greedy order

Random walk model

Page 33: EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan

33

Clustering the Tagged Web

Thank you~