tag-based social interest discovery by yjhuang 2008.5 yahoo! inc searcher xin li, lei guo,...

34
Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao

Upload: thomasine-burns

Post on 21-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Tag-based Social Interest Discovery

By yjhuang2008.5

Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao

此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Page 2: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Outline

Introduction Data Set Analysis of Tags The Architecture Evaluation

Page 3: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Introduction

Social network systems Del.icio.us, Facebook, MySpace, Youtube

Discovering Social Interests Main challenge

Difficult to detect and represent Existing approaches: online connections

Page 4: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

This paper’s work

Based on user-generated tags Analyze the real-world traces of tags

and web content Develop the Internet Social Interest

Discovery system (ISID) Discover the common user interests Cluster users and urls by topics

Evaluation

Page 5: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Data Set

Delicious Bookmark 4.3m bookmarks, 0.2m users, 1.4m urls

Page 6: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Data Collection and Pre-Processing

Crawl the urls & download the url pages Discard all non-html objects Coding -> UTF-8, remove non-English

pages Stopword List Porter Stemming algorithm 298,350 distinct tags, 4,072,265

keywords

Page 7: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Users, URLs and Tags

Figure 1: Distribution of the frequencies that the

URLs were bookmarked in our data set Log-log scale

Page 8: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Users, URLs and Tags

Figure 2: Distribution of the bookmarking activity Log-log scale

Page 9: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Users, URLs and Tags

Figure 3: Distribution of tag frequencies

Page 10: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Analysis of Tags

Use VSM model Each URL: two vectors

One in the space of all tags, one for doc keywords

A corpus with t terms and d documents A term-document matrix A = . .

Page 11: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Weight Measurements

Tf-based

Tf-Idf based

Page 12: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

An Example of Tags vs. Keywords

A URL bookmarked by users About the resolv.conf in Linux

Table show the top 10 keywords

Page 13: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

The Vocabulary of Tags

Compare the vocabulary of tags with that of keywords in web documents

if the most import words be covered Figure 4 (5)

The coverage of user-generated tags for the tf (tf-idf) keywords of 7000 random docs.

Page 14: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處
Page 15: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處
Page 16: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

The Convergence of Tag Selections

Measure the convergence of tags for all URLs

X-axis: the popularity of URLs Y-axis: the no. of distinct tags

Page 17: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Tags Matched by Documents

Tags: catch the main concept of docs? Matched by the content of the URL?

Statistical analysis Occurrences no. -> weight Tag match ration e(T, U)

T= ti: the set of tags attached to a

given URL U

The total weight of the tags that also appeared in the keyword set

of U

Page 18: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Tags Matched by Documents

Page 19: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Architecture for Social Interest Discovery

1.Find topics of interests

2.Clustering

3.Indexing

Page 20: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Topic Discovery

Find frequent tag pattern for a given set The association rule algorithms

Support Implication rules Identify the frequent tag patterns a frequent tag pattern {a,b}

If w({a,b}) = w({a}) = w({b})

Page 21: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Clustering

Page 22: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Indexing

Page 23: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Evaluation

The URL Similarity of Intra- and Inter- Topics Cosine similarity of tf-idf keyword term vector Cosine similarity of Tag tem vector 500 interest topics

> 30 bookmarked urls Share 5-6 co-occurring tags

Inter-: 10,000 topic-pairs

Page 24: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處
Page 25: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處
Page 26: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處
Page 27: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

User Interest Coverage

For each user Sort his tags by the number of times the

tags have been used by the user

Top-5: the top 5 hot tags of each user Top-10: All:

Page 28: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Human Reviews

4 human editors 10 topics 20 most frequent urls for each topic Scores: 1-5

Page 29: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Cluster Properties(Add)

此頁內容非原作者投影片,如需參考原版請至出處參考

Page 30: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Cluster Properties(Add)

此頁內容非原作者投影片,如需參考原版請至出處參考

Page 31: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Cluster Properties(Add)

此頁內容非原作者投影片,如需參考原版請至出處參考

Page 32: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

Conclusion(Add)

Propose a tag-based social interest discovery approach

Justify user-generated tags to represent user interests

Implement a system in social network such as delicious

此頁內容非原作者投影片,如需參考原版請至出處參考

Page 33: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

References

Xin Li, Lei Guo, Yihong Zhao, Tag-based Social Interest Discovery, www08, Yahoo! Inc

Page 34: Tag-based Social Interest Discovery By yjhuang 2008.5 Yahoo! Inc Searcher Xin Li, Lei Guo, Yihong(Eric) Zhao 此投影片所有權為該著作者所有,在此僅作講解使用。將於最後附上出處

備註 投影片下載出處:http://fusion.grids.cn/wiki/download/att

achments/1313/Tag-based+Socail+Interest+Discovery-by+yjhuang.ppt?version=1

Data Set 網頁http://delicious.com/