yukiko kawai*, yusuke fujita*, tadahiko kumamoto**, jianwei zhang*, katsumi tanaka*** * kyoto sangyo...
TRANSCRIPT
Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi
Tanaka***
* Kyoto Sangyo University, Japan ** Chiba Institute of Technology, Japan *** Kyoto University, Japan
Using a Sentiment Map for Visualizing Credibility of News
Sites on the Web
1
OutlineBackgroundResearch goalSystem overview
Offline processingOnline processing
Experimental evaluationConclusion and future work
2
Background3
To answer this question, I want to read some news to have an opinion about this topic.Rapid spread of web news sites (e.g., MSN, GoogleNews)Different sites may have different opinions about the topic
A question:
What is your attitude towards “Iraq war”?
agree or disagree?
Sentiment tendencies of sites
Background4
???
Is the Iraq war right or wrong?
I agree this war
If it is a pro-war site
If it is an anti-war site
???
Is the Iraq war right or wrong?
Well, I have now opinions on different sites
Site A
Site B
I disagree this war
News Site
A misconception may be caused,if sites’ tendencies are not known in advance
positive
negativepositive
negative
Information
credibility is
improved This may cause a more fair-minded judgment
OutlineBackgroundResearch goalSystem overview
Offline processingOnline processing
Experimental evaluationConclusion and future work
5
A concept of sentiment map
A query is “Iraq war”Mapping Graph of sentiment based on location
Top ranked articles from each news site
6
Demonstration
Positive
Negative
OutlineBackgroundResearch goalSystem overview
Offline processingOnline processing
Experimental evaluationConclusion and future work
7
System overview8
Offline processing(Preprocessing)
Online processing(Runtime processing)
query
articles database( including tf-idf,
sentiment values )
Yomiuri( Osaka )
Yomiuri( Tokyo )
news articles collection
morphological analysis
crawling
1) retrieve articles from each news site2) rank the articles based on tf-idf in each site
Asahi( Tokyo )
Web
・・・
tf-idf value calculationsentiment values calculationsentiment
dictionary
news sites
sentiment map
3) calculate the average of sentiment values for each site
4) generate a sentiment map
OutlineBackgroundResearch goalSystem overview
Offline processingOnline processing
Experimental evaluationConclusion and future work
9
Offline processing10
News articles collectionCrawl news articles from various news sites and store
them into DBNews articles analysis
Eliminate HTML tagsMake morphological analysis to extract nouns, verbs,
and adjectivesCalculate tf-idf values of extracted word j for each
news article pi
Attach a sentiment vector to each news articleUse a sentiment dictionary
jall
jij N
N
F
Fidftf log
)log(
)1log(
Fj: the frequency of word j appearing on article pi
Fall: the number of all words on pi
N: the number of all articlesNj: the number of articles including j
Entry word (w)
Sentiment (e)
a: Dark
Bright
Sentiment (e)b:
Rejection
Acceptance
Sentiment (e)
c: Tension
Relaxation
Sentiment (e)
d: Fear
Anger
challenge 0.618 0.687 0.752 0.500
collide 0.344 0.353 0.315 0.529
death 0.28 0.358 0.260 0.364
derailment
0.31 0.546 0.403 0.291
revival 0.91 0.521 0.429 0.000
rich 0.597 0.676 0.761 0.466
11
Oc(death) = 0.260
Sample of sentiment dictionarye = a, b, c, d
• Sentiment value Oe(w) of an entry word w• A value between 0~1, (e.g., 0: dark, 1: bright)• Calculated by analyzing co-occurrence with the original sentiment words, based on 200 million articles of Nikkei newspapers
⇔⇔ ⇔ ⇔
12Calculation of Sentiment value Oe(w)• Sentiments and their corresponding original sentiment words Sentiment (e = a,
b, c, d)Original sentiment words (e1, e2)
a: Bright ⇔ Dark bright, glad, happy
dark, sad, painful
b: Acceptance ⇔ Rejection
approval, love, like
reject, aversion, dislike
c: Relaxation ⇔ Tension
comfortable, peaceful, slow
tension, emergency
d: Anger ⇔ Fear angry, roar
fear, scary, dread
e1
e2
),(),(
),()(
21
1
wePweP
wePwOe
)(
)&(),(
edf
wedfweP
df(e): occurrence times of original sentiment words edf(e&w): co-occurrence times of original sentiment
words e and an entry word w
Sentiment value:
13Calculation of Sentiment value Oe(w)
Sentiment (e = a, b, c, d)
Original sentiment words (e1, e2)
a: Bright ⇔ Dark bright, glad, happy
dark, sad, painful
b: Acceptance ⇔ Rejection
approval, love, like
reject, aversion, dislike
c: Relaxation ⇔ Tension
comfortable, peaceful, slow
tension, emergency
d: Anger ⇔ Fear angry, roar
fear, scary, dread
e1
e2
Sentiment value of word “death” on the dimension c: Oc(death) = 0.260Because df(“comfortable” & “death”), df(“peaceful” & “death”),
df(“slow” & “death”) <<
df(“tension”& “death”), df(“emergency”& “death”)
• Sentiments and their corresponding original sentiment words
Sentiment vector O(TEXT) of a news article 14
a news article text = TEXT TEXT has the number of n keywords
keywords = {w}Each sentiment value Oe(TEXT)
Sentiment vector O(w) of the article for the keyword w
))(),(),(),(()( TEXTOTEXTOTEXTOTEXTOTEXTO dcba
nwOTEXTOn
iiee
0
)()(
OutlineBackgroundResearch goalSystem overview
Offline processingOnline processing
Experimental evaluationConclusion and future work
15
Online processing16
When a user enters query keywords, 1. Retrieve news articles including the keywords2. Rank articles based on tf-idf values for each
news site3. Calculate the average of sentiment vectors of
top n articles for each site4. Attach sentiment graphs to corresponding
locations of news sitesAlso present a list of articles grouped by each
site
OutlineBackgroundResearch goalSystem overview
Offline processingOnline processing
Experimental evaluationConclusion and future work
17
Query: Daisuke MatsuzakaA famous Japanese Major Leaguer
A reviewer read all the retrieved articles of different news sites and decided the sentiments of each news sitepositive, negative or neutral
For comparison, numeric sentiment values given from our system are categorized to discrete values
positive, negative or neutral
Experimental evaluation18
Experimental evaluation19
a: Dark
Bright
b: Rejection
Acceptance
c: Tension
Relaxation
d: Fear
Anger
reviewer Bright Acceptance Tension Neutral
Web site 1
Bright Acceptance Tension Neutral
reviewer Bright Acceptance Relaxation Neutral
Web site 2
Bright Acceptance Tension Neutral
reviewer Bright Acceptance Relaxation Fear
Web site 3
Bright Acceptance Tension Fear
reviewer Neutral Neutral Neutral Anger
Web site 4
Dark Acceptance Tension FearPrecision is about 70%There exist some distinctions among different news
sites
⇔⇔ ⇔ ⇔
OutlineBackgroundResearch goalSystem overview
Offline processingOnline processing
Experimental evaluationConclusion and future work
20
Conclusion and future work21
ConclusionDeveloped a system called sentiment map
for visualizing the sentiment distinction of different news sites
Tested its effectivenessA prototype:
http://klab.kyoto-su.ac.jp/~fujita/cgi-bin/Fuzilla/News/
Future workMore experimentsSentiment analysis of readers and
information recommendation based on it
Thank you for your attention
22
Entry word (w)
Sentiment (e)
a: Bright ⇔ Dark
Sentiment (e)b:
Acceptance ⇔ Rejection
Sentiment (e)c:
Relaxation ⇔
Tension
Sentiment (e)
d: Anger ⇔ Fear
chosen-suru
(challenge)
0.618 0.687 0.752 0.500
1.399 1.330 1.251 1.090
dassen(derailmen
t)
0.31 0.546 0.403 0.291
0.514 0.603 0.737 0.549
hofu-da(rich)
0.597 0.676 0.761 0.466
1.416 1.352 1.299 1.109
shibou(death)
0.28 0.358 0.260 0.364
1.132 1.272 1.306 1.112
shototsu-suru
(collide)
0.344 0.353 0.315 0.529
1.004 1.016 1.099 0.948
sosei(revival)
0.91 0.521 0.429 0.000
0.464 0.582 0.732 0.328
Se(w): impression value
24
Me(w): weight
Sc(death) = 0.260
Mc(death) = 1.306
Sample of sentiment dictionary
e = a, b, c, d
Sentiment (e)e = a, b, c, d
Original impression words (e1, e2)
a: Bright ⇔Dark
akarui (bright), ureshii (glad), tanoshii (happy)
kurai (dark), kanashii (sad), kurushii (painful)
b: Acceptance ⇔ Rejection
shonin (approval), aikou (love), suki-da (like)
kyohi (reject), ken’o (aversion), kirai-da (dislike)
c: Relaxation ⇔
Tension
yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)
kincho (tension), kinkyuu (emergency)
d: Anger ⇔Fear
okoru (angry), dogou (roar)
osoreru (fear), kowai (scary), kyofu (dread)
25
e1
e2
Sentiment value Oe(w) of an entry word w
• Sentiment value Oe(w) of an entry word w• A value between 1~0, (1: positive, 0: negative)• Calculated by analyzing the co-occurrence with the original impression words, based on Nikkei Newspaper Full Text Database (about 200 million articles)
• Original impression words and their correspondence with sentiments
Sentiment (e)e = a, b, c, d
Original impression words (e1, e2)
a: Bright ⇔Dark
akarui (bright), ureshii (glad), tanoshii (happy)
kurai (dark), kanashii (sad), kurushii (painful)
b: Acceptance ⇔ Rejection
shonin (approval), aikou (love), suki-da (like)
kyohi (reject), ken’o (aversion), kirai-da (dislike)
c: Relaxation ⇔
Tension
yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)
kincho (tension), kinkyuu (emergency)
d: Anger ⇔Fear
okoru (angry), dogou (roar)
osoreru (fear), kowai (scary), kyofu (dread)
26
e1
e2
),(),(
),()(
21
1
wePweP
wePwSe
)(
)&(),(
edf
wedfweP
))&()&(log()( 21 wedfwedfwM e
)()()( wMwSwo eee
Sentiment value Oe(w) of an entry word w
Sentiment value of word “death” on the dimension c: Oc(death) = 0.260“comfortable” and “death”, “peaceful” and “death” << “tension” and “death”, “emergency” and “death”
Se(w): impression value
Me(w): weight
A proposition of sentiment map27
Demonstration
query is “scandal”
Sentiment map for each news site
positive
negative
0
0.5
-0.5
Top ranked articles from each news site
System overview28
Offline processing(Preprocessing)
Online processing(Runtime processing)
query
articles database( including tf-idf,
sentiment values )
Yomiuri( Osaka )
Yomiuri( Tokyo )
news articles collection
morphological analysis
crawling
1) retrieve articles from each news site2) rank the articles based on tf-idf in each site
Asahi( Tokyo )
Web
・・・
tf-idf value calculationsentiment values calculation
sentimentdictionary
news sites
sentiment map
3) calculate the average of sentiment values for each site
4) generate a sentiment map