夏ゼミプレゼン 4xp

30
http://umekoumeda.net/ Summer Seminar 2008 @Susukakedai Badrul Sarwar , ”Item-Based Collaborative Filtering Recommendation Algorithms”, WWW 2001 Deguchi Lab. Takashi UMEDA Mail: umeda07[at]cs.dis.titech.ac.jp Web: http://umekoumeda.net/

Upload: umekoumeda

Post on 06-May-2015

1.075 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

Badrul Sarwar, ”Item-Based Collaborative

Filtering Recommendation Algorithms”,

WWW 2001

Deguchi Lab.

Takashi UMEDA

Mail: umeda07[at]cs.dis.titech.ac.jp

Web: http://umekoumeda.net/

Page 2: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

Outline…

• Introduction

• Item-Based CF

• Experimental Procedure

• Experimental Result

• Conclusions

Page 3: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

INTRODUCTION

Chap.1

Page 4: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-1. My Research Domain

• Evaluating recommendation Algorithms by ABM

– Recommendation:

• Rule Based Approach

• Contents Based Approach

• Collaborative Filtering(CF)

• Bayesian Network

– Why CF?

• It’s mainly used in many websites

– Why ABM?

• To use ABM, Algorithms are optimized according to the

market environment

Page 5: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-2. What’s CF? (1/2)

• Have you used Amazon.com ?

Page 6: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-3. What’s CF 2/2

Recommendation

Collaborative Filtering Algorithms(CF) is commonly

used in EC WebSite.

Page 7: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-4. What’s CF 3/3

Book List

Book List

Prof. Kizima

Prof. Deguchi

CF will

recommend

Prof Deguchi

Follow book,

Based on people

that are similar

with him

They have same books

They have similar

preference

Page 8: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-5. Contribution of this paper

• Problem of the Basic CF Algorithms

– Basic CF : Nearest Neighbors

– Scalability(Performance)

• High Scalability : In many users, a system recommend for them quickly

– Accuracy(Quality)

• High Accuracy : if the data were sparse, a system recommend the item that a user may like

• In this paper, the Author proposed new Algorithms

– Item-Based CF

– Performance & Quality can be improved

Page 9: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-6. Collaborative Filtering Process

i1 i2 ・・ i n

u1 a1,2

u2

u3

:

um

Input Data

•U ={ u1,u2,..,um}• I ={i1,i2,..,in}• Iui : item where user ui

evalues, Iui⊆ I• ai,j : evaluation of item ijby user ui

CF-Algorithm

Prediction

Recommendation(Top-N Recommendation)

Output Interface

Pa,j

• Predicted the degree of likeness of item ij by the user ua

• Ir ∩Iua = Φ

A list of N-items that the user will

like the most(Ir⊂I)

•Ir ∩Iua = Φ

User – Item Matrices

Page 10: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-7.Variation of the CF-Algorithm

CF- Algorithm

Memory Based Approach Model Based Approach

•Procedure(Nearest Neighbor) 1. The system defines a set of

users known as neighborsat on-line

2. The system produces a prediction or top-n recommendation

• Procedure1. The system develops a

model of user ratings at off-line

2. By using the model, the system produce a prediction or top n recommendation

• How developing the mode ? • Bayesian Network• clustering

Page 11: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-8.What ‘s online and offline ?

Off-line Computation On-line Computation

At a suitable interval,

offline computation is

performed automatically

When a user used the

system, online

Computation is

performed quickly

EX:Google

• Indexing

• Crowling

• Ranking

If you input a query, the

search engine output the

result.

Page 12: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

1-9.the problem of the basic CF

Weakness of

the Nearest

Neighbor

Accuracy

Sparsity of user-item matrices:

many users may have purchased

well under 1% of the all items →

accuracy of Nearest Neighbor

algorithm may be poor

Scalability

With millions of users and

items, Nearest Neighbor

algorithm may suffer serious

scalability problem

We need new CF-Algorithms………..

Page 13: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

ITEM-BASED CF

Chap.2

Page 14: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

2-1. Overview of Item base CF

Off-line Computation On-line Computation

Item Similarity Computation Prediction Computation

Si,j : Similarity between item ii and ij •Pu,i is the degree of the likeness item-i by user-u ,based on the similarity between items,S

i1 i2 ・・ i n

u1 R1,2

u2

u3

:

um

S2n

Page 15: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

2-2. Item Similarity Computation

• Cosine-Based Similarity

• Correlation-based Similarity

• Adjusted Cosine Similarity

The Difference in rating scale between defferent users

Page 16: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

2-3.Prediction Computation

• Weighted Sum

• Regression

– Ru,n is calculated by Regression model

– Ri: Target item’s rating(explaining variable)

– Rn: Similar item’s rating (explained variable)

normalization coefficient

•N is the set of item that is very

similar with item I

• |N| : neighbor size

Page 17: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

2-4. Time Complexity(1/2)

On-line Computation

User Similarity Computation

Prediction Computation

•Computing 1 user-user similarity, Recommend System scan n scores.→ O(n)• Recommend System must computing m × m user-user similarity. →O(m×m)

Action

TimeComplexity

O(m2n) + O(m)

• Computing 1 Pi,j-Value, Recommend System scan m user-user similarity → O(m)

Time complexity of Nearest Neibhor is…..

Page 18: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

2-4. Time Complexity(2/2)

On-line Computation

Item Similarity Computation

Prediction Computation

Item-Item Similarity is static as opposed the User Similarity → It It’s possible to precompute item Similarity ( = model )

Action

TimeComplexity

O(n)

Computing 1 Pi,j-Value, Recommend System scan n item similarity → O(n)

Time complexity of Item-Based CF is better Performance than Neaest Neighbor

Off-line Computation

Page 19: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

EXPERIMENTAL PROCEDURE

Chap.3

Page 20: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

3-1. Experimental Procedure

the data set is divided into a train and a test portion

The Follow parameters is decided.

• Similarity Algorithms

• Train/ Test Ratio(x) : Sparsity level in data

• neighborhood size

user item rating

u1 i2 3u2 i1 2

u6 i3 3

TestTrain

Evaluation

Parameter Learning

1.Data Dividing

2.To fix the optimal values of a parameter

3.Full Experiment To evalue Item based CF, the follow value is measured

• Performance

• Quality

Page 21: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

3-2. Data Sets

• Data Sets

– Data from website “ MovieLens”

– MovieLens is web based recommender system

– Hundreds of users visit MovieLens to rate and

receive recommendations for movies.

– A data set was converted into a user-item

matrix( 943user × 1682 columns )

Page 22: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

3-3. Evaluation Metrics

• To evaluating the quality of a recomender system, we use MAE as evaluation metrics.

• MAE: Mean Absolute Error

– pi: Predicted Rating for item I (predicted based on a train data)

– qi: true Rating for item I (from a test data)

– The lower the MAE, the more accurately the recommendation engine predicts user ratings.

Page 23: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

EXPERIMENTAL RESULTS

Chap.4

Page 24: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

4-1.Optimal Values of a parameter(1/2)

Item-Similarity Algorithms =

Adjusted cosine is the best

quality

Train-test ratio (x) = 0.8 as an

optimum value

Page 25: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

4-1.Optimal Values of a parameter(2/2)

Considering both trends,

Optimal choise of

Neighborhood Size

Is 30

In Full Experiment, basic

parameter is as follows.

• Similarity Algorithms:

Adjusted Cosine

• test/train ratio: 0.8

• neighborhood size : 30

Page 26: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

4-2. Quality

• Quality

• Item-Based CF ( weighted sum ) out perform the nearest-neighbor• Item-Based CF (regression ) out perform the other two cases at low values of x and at low neighborhood size

Page 27: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

4-3. Performance(1/2)

• model size:– Full model: At item similarity computation,

all item – item similarity(1682×1682) is computed .

– Model size = 200: At item similarity computation, 200 item – 200 item similarity (200×200 ) is computated .

• If model size is small , Good quality is consistent ? – Other model based Approach is consistent

– If it is consistent, online performance is higher than full- model case

• Result: – if model size is 100 ~ 200, it’s possible to

obtain resonably good prediction quality

In the case of not using all item-item similarity , the accurarcy of

prediction don’t down and the performance improve.

Page 28: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

CONCLUSIONS

Chap.5

Page 29: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

5. Conclusion

• Quality– Item-based CF provides better quality of predictions

than nearest neighbor Algorithms.• Independent of Neighborhood size and train/test ratio

– The improvement in quality is not large

• Performance– Item-Similarity Computation can be pre-computed

• Item-similarity is static

– High online Performance

– It is possible to retain only a small subset of items and produce good prediction quality& high Performance

Page 30: 夏ゼミプレゼン 4xp

http://umekoumeda.net/Summer Seminar 2008 @Susukakedai

THANK YOU