using personal-characteristic and friend-ranking in blog search d93944003 趙國成 d97921018...

24
Using Personal- Characteristic and Friend- Ranking in Blog Search D93944003 趙趙趙 D97921018 趙趙趙 2009/1/9

Upload: juliana-johnston

Post on 04-Jan-2016

234 views

Category:

Documents


0 download

TRANSCRIPT

Using Personal-Characteristic and Friend-Ranking in Blog Search

D93944003 趙國成D97921018 陳信宏

2009/1/9

Outline

Scope Problem Solution Evaluation Conclusion & Future Works

Outline

Scope Problem Solution Evaluation Conclusion & Future Works

Scope

The search targets are at the document level, i.e., entries of a feed.

The search target are text only. (no photo, movie, audio, etc.)

Outline

Scope Problem Solution Evaluation Conclusion & Future Works

Specialties of blog

Each article belongs to a specific category. Each article belongs to a member who has

his characteristic like interests. Members may have friends, hence forms a

social network.

Problem

How to adopt these information to improve the searching effectiveness ?

Category Personal Characteristic Friend relation

Outline

Scope Problem Solution Evaluation Conclusion & Future Works

Solution

Query = Keyword + Category Weighting of People Characteristic

How much articles he has posted ? Is his interest falls into the queried category ?

Weighting of friend Are his friends also interested at the queried

category ?

More Precise Definitions

AVGpsn N

mN

mN

cmNcmR

)(

)(

),(),(

),(1),(1)()( cmRcmRdRdR fndpsndoc

Final Ranking

Result of LM Personal Ranking Friend Ranking

member category

||

),(

),(F

cfR

cmR Ffpsn

fnd

Implementation Steps

1. Define categories

2. Crawl pages from blog sites by each category

3. Generate the LM Model of the documents in each categories.

4. Generate the member-page mapping.

5. Generate the member-friend mapping.

Define Categories

We empirically define 13 categories. We hope the categories are mutually

independent.

1, 創作2, 旅遊3, 美食4, 醫療保健5, 運動6, 影視7, 生活休閒

8, 科學科技9, 動漫電玩10, 學習11, 財經12, 社會政經13, 其它

Crawl pages from blog sites by each category Many blog websites provide the function of

browsing by category. But not everyone. We crawl the pages from websites providing

this function as the training documents. For other documents, we use text

classification algorithm to decide their categories.

Generate the member-page mapping

In almost all the blog websites, the URL of each page containing the member-id information.

http://www.wretch.cc/blog/ddedogtoootw/9759034http://blog.udn.com/wong2006/2547710http://tw.myblog.yahoo.com/jun681031-bear/article?mid=5556

Generate the member-page mapping

We can easily find the expression rules and fetch member-id from the URL.

http://www.wretch.cc/blog/ddedogtoootw/9759034http://www.wretch.cc/blog/minyang0925/20688505http://www.wretch.cc/blog/greezydebut/7175512http://www.wretch.cc/blog/ddedogtoootw/http://www.wretch.cc/blog/minyang0925/http://www.wretch.cc/blog/greezydebut/

ddedogtoootwminyang0925greezydebut

Generate the member-friend mapping

What is the definition of friend? My friend? Somebody who set me as his friend? Somebody who has visited my blog? Somebody who has commended my blog? Somebody who has left messages for me? ……

Which definition is suitable for each blog website?

Generate the member-friend mapping

Our definition Somebody whose page-urls are occurred in my art

icles. This relation is usually caused by “reply”.

…http://www.wretch.cc/blog/love6380/20856457 http://www.wretch.cc/blog/oeoehaha/5943390 http://www.wretch.cc/blog/parfaite/15050239 …

http://www.wretch.cc/blog/illyqueen/12364112

source

Conclusion of Solution

For each article, we know its category and author.

For each member (author), we know all the articles he has posted and his friend.

Hence we can calculate R(d).

),(1),(1)()( cmRcmRdRdR fndpsndoc

AVGpsn N

mN

mN

cmNcmR

)(

)(

),(),(

||

),(

),(F

cfR

cmR Ffpsn

fnd

Outline

Scope Problem Solution Evaluation Conclusion & Future Works

Evaluation

How to decide if a document is relevant? Feedback from user.

Comparison Rdoc (pure LM)

R (LM + Rpsn + Rfnd)

What are the effect of α and β ?

),(1),(1)()( cmRcmRdRdR fndpsndoc

Outline

Scope Problem Solution Evaluation Conclusion & Future Works

Conclusion

We adopt these information to improve the searching effectiveness.

Category Personal Characteristic Friend relation

We will compare the effectiveness of with and without our method.

Future works

How about consider feed instead of entry? Are there better definition of Personal Chara

cteristic & Friend? Are there better equation of R(Rdoc,Rpsn,Rfnd)?

Thank you

We appreciate your suggestions !