项亮 推荐系统实践 从入门到精通

Post on 24-Jun-2015

1.564 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Recommend System 推荐系统实践 从入门到精通

TRANSCRIPT

Recommender System:Algorithms & Architecture

xiangliang@hulu.com

Outline

• Problem• Data• Algorithms• Cold start• Architecture

Recommender System

Problem

Recommend items to users to make user, content partner, websites happy!

Data

• User behaviors dataBehavior User SizePage view All user Very LargeWatch video All user LargeFavorite Register user MiddleVote Register user MiddleAdd to playlist Register user SmallFacebook like Register user SmallShare Register user SmallReview Register user Small

Data

• Which data is most important– Main behavior in the

website– All user can have such

behavior– Cost– Reflect user interests on

items

Behavior User Size

Page view All user Very Large

Watch video All user Large

Favorite Register user Middle

Vote Register user Middle

Add to playlist Register user Small

Facebook like Register user Small

Share Register user Small

Review Register user Small

Data

• Data Structure– User ID– Item ID– Behavior Type– Behavior Content– Context

• Timestamp• Location• Mood

Sheldon watch Star Trek with his friends at home

AlgorithmsRecommender System Method

Collaborative Filtering

Content Filtering

Social Filtering

Graph-based Latent Factor Model

Neighborhood-based

User-based Item-based

……

……

……

……

Neighborhood-based

• User-based– Digg

• Item-based– Amazon, Netflix, YouTube, Hulu, …

User-based

• Algorithm– For user u, find a set of users S(u) have similar

preference as u.– Recommend popular items among users in S(u)

to user u.

User-based CF

( , ) ( )ui uv vi

v S u K N ip w r

∈ ∩

= ∑

( ) ( )w

( ) ( )uv

N u N vN u N v

∩=

Item-based

• Algorithm– For user u, get items set N(u) this user like

before.– Recommend items which are similar to many

items in N(u) to user u.

Item-based CF

( , ) ( )ui ji uj

j S i K N up w r

∈ ∩

= ∑

( ) ( )w

( ) ( )ij

N i N jN i N j

∩=

Item-based CF

( ) ( )w

( )ij

N i N jN i∩

=Why not use ?

Neighborhood-based

• User-based vs. Item-basedUser-based Item-based

Scalability Bad when user size is large

Bad when item size islarge

Explanation Bad Good

Novelty Bad Good

Coverage Bad Good

Cold start Bad for new users Bad for new items

Performance Need to get many users history

Only need to get current user’s history

References

• Amazon.com Recommendations item-to-item Collaborative Filtering.

• Empirical Analysis of Predictive Algorithms for Collaborative Filtering.

Graph-based

• Users’ behaviors on items can be represented by bi-part graph.

A

B

C

D

1

2

3

4

A

B

C

D

1

2

3

4

A

B

C

D

1

2

3

4

A

B

C

D

1

2

3

4

Graph-based

• Two nodes will have high relevance if– There are many paths in graph between two

nodes.– Most of paths between two nodes is short.– Most paths do not go through nodes with high

out-degree.

Graph-based

• Advantage– Heterogeneous data

• Multiple user behaviors• Social Network• Context (Time, Location)

• Disadvantage– Statistical-based– High cost for long path

A

B

C

D

1

2

3

4

References

• A Graph-based Recommender System for Digital Library.

• Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation.

Latent Factor Model

• Users and items are connect by latent features.

A

B

C

D

1

2

3

4

a

b

c

Latent Factor Model

Science Fiction

Universe

Physical

Space Travel

0.5

0.9

0.8

0.8

Animation 0.3

Romance 0.0

Science Fiction

Universe

Physical

Space Travel

0.9

0.9

0.5

0.7

Animation 0.1

Romance 0.0

ui uk ikk

r p q=∑

Latent Factor Model

• How to get p, q?

2 22u

( , )min ( ) ( )ui uk ik i

u i kr p q p qλ− + +∑ ∑

( )( )

uk ui ik uk

ik ui uk ik

p e q pq e p q

α λα λ

+ = −+ = −

Latent Factor Model

• How to define – Rating prediction– Top-N recommendation

• Implicit feedback data: only have positive samples and missing values, how to select negative samples?

uir

1 (Sci-fi) 2 (Crime) 3 (Family) 4 (Horror)

The invisible Man Jaws 101 Dalmatians The Blair Witch Project

Frankenstein Meets the Wolf

ManLethal Weapon Back to the

Future Pacific Heights

Godzilla Total Recall Groundhog Day Stir of Echoes

Star Wars VI Reservoir Dogs Tarzan Dead Calm

The Terminator Donnie Brasco The Aristocats Phantasm

Alien The Fugitive The Jungle Book 2 Sleepy Hollow

Alien 2 La shou Shen tan Antz The Faculty

Latent Factor Model

Latent Factor Model

• Advantage– High accuracy in rating prediction– Auto group items– Scalability is good– Learning-based

• Disadvantage– Incremental updating– Real-time– Explanation

Cold Start

• Problems– User cold start : new users– Item cold start : new items– System cold start : new systems

User Cold Start

• How to recommend items to new users?– Non-personalization recommendation

• Most popular items• Highly Rated items

– Using user register profile (Age, Gender, …)

User Cold Start

• Example: Gender and TV shows

Data comes from IMDB : http://www.imdb.com/title/tt0412142/ratings

User Cold Start

MaleAge : 20-30Theoretical physicistDoctorAmericanIrreligious

How to get user interest quickly

• When new user comes, his feedback on what items can help us better understand his interest?– Not very popular– Can represent a group of items– Users who like this item have different

preference with users who dislike this item

Item Cold Start

• How to recommend new items to user?– Do not recommend

How to recommend news??

Item Cold Start

• How to recommend new items to user?– Using content information

Machine Learning Data Mining Recommendation

System Cold Start

• How to design recommender system when there is no user?– Pandora : Music Genome Project– Jinni : Movie Genome Project

Architecture

• Feature-based recommendation framework:

A

B

C

D

1

2

3

4

a

b

c

User Feature Item

Male

Scientist

Physics

Architecture

Architecture

• Advantage:– Heterogeneous data– Reasonable Explanation

• Disadvantage:– Do not support user-based methods

Open Questions

• How to weight multiple behaviors?• How to improve diversity, novelty?• How to build feedback loop?

Thanks!

top related