1 the eigenrumor algorithm for ranking blogs advisor: hsin-hsi chen speaker: sheng-chung yen (...
DESCRIPTION
3 Motivation Approaches of Page ranking PageRank [2] HITS (Hypertext Induced Topic Selection) [3] Issues The number of links to a blog entry is generally very small. Some time is needed to develop a number of in-links and thus have a higher PageRank score.TRANSCRIPT
1
The EigenRumor Algorithm for Ranking Blogs
Advisor: Hsin-Hsi ChenSpeaker: Sheng-Chung Yen (嚴聖筌 )
2
Outline Motivation Assumed Blog Structure Classification of Blog Ranking The EigenRumor Algorithm
Community model Scores Algorithm
Mapping to Blog community Experiments Related Works Conclusion Future Work References
3
Motivation
Approaches of Page ranking PageRank [2] HITS (Hypertext Induced Topic
Selection) [3] Issues
The number of links to a blog entry is generally very small.
Some time is needed to develop a number of in-links and thus have a higher PageRank score.
4
Assumed Blog Structure
A blog consist a top page and a set of blog entries. A blog is generally updated and maintained by a single blogger.
There are links from the top page of the blog to each blog entry and each blog entry has a permanent URI.
Blog entries are frequently added and the notification of updates is, as an option, sent to a ping server.
A mechanism to construct a trackback [3] is provided.
5
Classification of Blog Ranking
Subject of ranking Space of ranking Temporal space of ranking Semantics of ranking Source of evaluations collected
6
The EigenRumor Algorithm –Community model (1/2)
1
i
m
1
j
n
eij
Agents (1 ~ m) Objects (1 ~ n)
Information provisioning
Information evaluation
7
The EigenRumor Algorithm –Community model (2/2)
When agent i provides (posts) object j, a provisioning link is established from i to j. When agent i evaluates the usefulness of an existing object j with the scoring value eij, an evaluation link is established from i to j. Provisioning matrix P = [pij] to represent all provisioning links in the universe. Evaluation matrix E=[Eij] to represent all evaluation links in the universe.
8
The EigenRumor Algorithm –Scores
Authority score (agent property) This indicates to what level agent i provided
objects in the past that following the community direction.
Hub score (agent property) This indicates to what level agent i submitted
comments (evaluation) that followed the community direction on other past objects.
Reputation score (object property) This indicates the level of support object j
received from the agents.
9
The EigenRumor Algorithm –Algorithm (1/4)
Assumptions The objects that are provided by a “good”
authority will follow the direction of the community.
The objects that are supported by a “good” hub will follow the direction of the community.
The agent that provide objects that follow the community direction are “good” authorities of the community.
The agent that evaluate objects that follow the community direction are “good” hubs of the community.
10
The EigenRumor Algorithm –Algorithm (2/4)
Notations
EP
r
h
a
:Matrix Evaluationn Informatio :Matrix ngProvisionin Informatio
:Vector Reputation
:Vector Hub
:VectorAuthority
11
The EigenRumor Algorithm –Algorithm (3/4)
...(7)
.matrix of eigenvaluelargest theis ;))1(( Where
...(6) ))1((
)1(
...(5) )1(
...(4)
...(3)
...(2)
...(1)
rSr
SEEPPS
rS
rEEPP
rEErPPr
hEaPr
rEh
rPa
hEr
aPr
TT
TT
TT
TT
T
T
12
The EigenRumor Algorithm –Algorithm (4/4)
whileend
||||/
)1(
dotly significan changes while
)1,...,1(
)1,...,1(
)1()1(
)1()1(
2
)()()1(
)()()(
)0(
)0(
kk
kk
kkk
kTkTk
T
T
rEh
rPa
rrr
hEaaPr
r
h
a
13
Mapping to Blog community (1/3)
The links from top page of the blog site to the blog entries => information provisioning links.
The links to blog entries in other blogs => information evaluation links.
(Forward) Trackback => the interest of the blogger.
(Backward) Trackback => be ignored, often generated by spamming.
14
Mapping to Blog community (2/3)
The basic algorithm does not normalize information provisioning matrix P or information evaluation E.
Problem: Some user creates many blog accounts
and interlinks them, he/she can inflate the scores.
15
Mapping to Blog community (3/3)
Solutions: Normalization function 1:
Normalization function 2 (longevity factor):agent. by the evaluated and provided objects ofnumber total theis and
...(8) 1' )..1,..1]('['
...(7) 1' )..1,..1]('['
ii
i
ijij
i
ijij
EP
EpnjmieE
PpnjmipP
[0,1] rangeh factor wit damping are ,created. waslink when time the:)( me,current ti :
...(10) ][
...(9) ][
..1
)(
)()()()(
..1
)(
)()()()(
xxtimet
eeE
ppP
nj
etimet
etimettij
tij
t
nj
ptimet
ptimettij
tij
t
ij
ij
ij
ij
16
Experiments (1/3)
In the database of this system, 9280000 entries from 30500 blog sites (04/10/16 ~ 05/02/03).
Original: 1520000 (16.3%) entries have one or
more hyperlinks. 116000 (1.25%) entries are linked to
other blogs. 107000 (1.15%) entries are referred to
by other blogs.
17
Experiments (2/3)
Applying EigenRumor algorithm: 36200 bloggers have at least one blog entry linked from other blogs. 28300 (9.28%) bloggers have nonzero authority scores => 862000 (9.28%) entries have nonzero reputation scores.
18
Experiments (3/3)
Face-to-Face user survey (40 guests Feb. 2005)
Best result
EigenRumor In-link TFIDF Not determined
Queries 18 (45%) 2 (5%) 1 (2.5%)
19 (48%)
19
Related Works
iRank Technorati provided a commercial
blog search. EigenRumor algorithm:
Agent-to-object, instead of page-to-page or agent-to-agent.
The normalization of link. Dynamic structure of links.
20
Conclusion
The important feature of the algorithm is to widen the coverage of blog entries that are assigned a score by only from static link analysis.
21
Future Work
The problem of spamming. How to choose a better ranking
algorithm for specific keyword?
22
References
[1] K. Fujimura, T. Inoue, and M. Sugisaki, “The EigenRumor Algorithm for Ranking Blogs,” Nippon Telegraph and Telephone, 10 May 2005.[2] S. Brin and L. Page, “The Anatomy of a Large-scale Hypertextual Web Search Engine,” In Proceedings of 7th International World Wide Web Conference, 1998. [3] Wikipedia, http://en.wikipedia.org/.