yukiko kawai*, yusuke fujita*, tadahiko kumamoto**, jianwei zhang*, katsumi tanaka*** * kyoto sangyo...

28
Yukiko Kawai*, Yusuke Fujita *, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology, Japan *** Kyoto University, Japan Using a Sentiment Map for Visualizing Credibility of News Sites on the Web 1

Upload: kailyn-stanforth

Post on 29-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi

Tanaka***

* Kyoto Sangyo University, Japan ** Chiba Institute of Technology, Japan *** Kyoto University, Japan

Using a Sentiment Map for Visualizing Credibility of News

Sites on the Web

1

Page 2: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

2

Page 3: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Background3

To answer this question, I want to read some news to have an opinion about this topic.Rapid spread of web news sites (e.g., MSN, GoogleNews)Different sites may have different opinions about the topic

A question:

What is your attitude towards “Iraq war”?

agree or disagree?

Page 4: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Sentiment tendencies of sites

Background4

???

Is the Iraq war right or wrong?

I agree this war

If it is a pro-war site

If it is an anti-war site

???

Is the Iraq war right or wrong?

Well, I have now opinions on different sites

Site A

Site B

I disagree this war

News Site

A misconception may be caused,if sites’ tendencies are not known in advance

positive

negativepositive

negative

Information

credibility is

improved This may cause a more fair-minded judgment

Page 5: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

5

Page 6: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

A concept of sentiment map

A query is “Iraq war”Mapping Graph of sentiment based on location

Top ranked articles from each news site

6

Demonstration

Positive

Negative

Page 7: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

7

Page 8: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

System overview8

Offline processing(Preprocessing)

Online processing(Runtime processing)

query

articles database( including tf-idf,

sentiment values )

Yomiuri( Osaka )

Yomiuri( Tokyo )

news articles collection

morphological analysis

crawling

1) retrieve articles from each news site2) rank the articles based on tf-idf in each site

Asahi( Tokyo )

Web

・・・

tf-idf value calculationsentiment values calculationsentiment

dictionary

news sites

sentiment map

3) calculate the average of sentiment values for each site

4) generate a sentiment map

Page 9: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

9

Page 10: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Offline processing10

News articles collectionCrawl news articles from various news sites and store

them into DBNews articles analysis

Eliminate HTML tagsMake morphological analysis to extract nouns, verbs,

and adjectivesCalculate tf-idf values of extracted word j for each

news article pi

Attach a sentiment vector to each news articleUse a sentiment dictionary

jall

jij N

N

F

Fidftf log

)log(

)1log(

Fj: the frequency of word j appearing on article pi

Fall: the number of all words on pi

N: the number of all articlesNj: the number of articles including j

Page 11: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Entry word (w)

Sentiment (e)

a: Dark

Bright

Sentiment (e)b:

Rejection 

Acceptance

Sentiment (e)

c: Tension 

Relaxation

Sentiment (e)

d: Fear  

Anger

challenge 0.618 0.687 0.752 0.500

collide 0.344 0.353 0.315 0.529

death 0.28 0.358 0.260 0.364

derailment

0.31 0.546 0.403 0.291

revival 0.91 0.521 0.429 0.000

rich 0.597 0.676 0.761 0.466

11

Oc(death) = 0.260

Sample of sentiment dictionarye = a, b, c, d

• Sentiment value Oe(w) of an entry word w• A value between 0~1, (e.g., 0: dark, 1: bright)• Calculated by analyzing co-occurrence with the original sentiment words, based on 200 million articles of Nikkei newspapers

⇔⇔ ⇔ ⇔

Page 12: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

12Calculation of Sentiment value Oe(w)• Sentiments and their corresponding original sentiment words Sentiment (e = a,

b, c, d)Original sentiment words (e1, e2)

a: Bright ⇔ Dark bright, glad, happy

dark, sad, painful

b: Acceptance ⇔ Rejection

approval, love, like

reject, aversion, dislike

c: Relaxation ⇔ Tension

comfortable, peaceful, slow

tension, emergency

d: Anger ⇔ Fear angry, roar

fear, scary, dread

e1

e2

),(),(

),()(

21

1

wePweP

wePwOe

)(

)&(),(

edf

wedfweP

df(e): occurrence times of original sentiment words edf(e&w): co-occurrence times of original sentiment

words e and an entry word w

Sentiment value:

Page 13: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

13Calculation of Sentiment value Oe(w)

Sentiment (e = a, b, c, d)

Original sentiment words (e1, e2)

a: Bright ⇔ Dark bright, glad, happy

dark, sad, painful

b: Acceptance ⇔ Rejection

approval, love, like

reject, aversion, dislike

c: Relaxation ⇔ Tension

comfortable, peaceful, slow

tension, emergency

d: Anger ⇔ Fear angry, roar

fear, scary, dread

e1

e2

Sentiment value of word “death” on the dimension c: Oc(death) = 0.260Because df(“comfortable” & “death”), df(“peaceful” & “death”),

df(“slow” & “death”) <<

df(“tension”& “death”), df(“emergency”& “death”)

• Sentiments and their corresponding original sentiment words

Page 14: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Sentiment vector O(TEXT) of a news article 14

a news article text = TEXT TEXT has the number of n keywords

keywords = {w}Each sentiment value Oe(TEXT)

Sentiment vector O(w) of the article for the keyword w

))(),(),(),(()( TEXTOTEXTOTEXTOTEXTOTEXTO dcba

nwOTEXTOn

iiee

0

)()(

Page 15: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

15

Page 16: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Online processing16

When a user enters query keywords, 1. Retrieve news articles including the keywords2. Rank articles based on tf-idf values for each

news site3. Calculate the average of sentiment vectors of

top n articles for each site4. Attach sentiment graphs to corresponding

locations of news sitesAlso present a list of articles grouped by each

site

Page 17: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

17

Page 18: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Query: Daisuke MatsuzakaA famous Japanese Major Leaguer

A reviewer read all the retrieved articles of different news sites and decided the sentiments of each news sitepositive, negative or neutral

For comparison, numeric sentiment values given from our system are categorized to discrete values

positive, negative or neutral

Experimental evaluation18

Page 19: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Experimental evaluation19

a: Dark

Bright

b: Rejection

Acceptance

c: Tension

Relaxation

d: Fear

Anger

reviewer Bright Acceptance Tension Neutral

Web site 1

Bright Acceptance Tension Neutral

reviewer Bright Acceptance Relaxation Neutral

Web site 2

Bright Acceptance Tension Neutral

reviewer Bright Acceptance Relaxation Fear

Web site 3

Bright Acceptance Tension Fear

reviewer Neutral Neutral Neutral Anger

Web site 4

Dark Acceptance Tension FearPrecision is about 70%There exist some distinctions among different news

sites

⇔⇔ ⇔ ⇔

Page 20: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

20

Page 21: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Conclusion and future work21

ConclusionDeveloped a system called sentiment map

for visualizing the sentiment distinction of different news sites

Tested its effectivenessA prototype:

http://klab.kyoto-su.ac.jp/~fujita/cgi-bin/Fuzilla/News/

Future workMore experimentsSentiment analysis of readers and

information recommendation based on it

Page 22: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Thank you for your attention

22

Page 23: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,
Page 24: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Entry word (w)

Sentiment (e)

a: Bright ⇔ Dark

Sentiment (e)b:

Acceptance ⇔ Rejection

Sentiment (e)c:

Relaxation ⇔

Tension

Sentiment (e)

d: Anger ⇔ Fear

chosen-suru

(challenge)

0.618 0.687 0.752 0.500

1.399 1.330 1.251 1.090

dassen(derailmen

t)

0.31 0.546 0.403 0.291

0.514 0.603 0.737 0.549

hofu-da(rich)

0.597 0.676 0.761 0.466

1.416 1.352 1.299 1.109

shibou(death)

0.28 0.358 0.260 0.364

1.132 1.272 1.306 1.112

shototsu-suru

(collide)

0.344 0.353 0.315 0.529

1.004 1.016 1.099 0.948

sosei(revival)

0.91 0.521 0.429 0.000

0.464 0.582 0.732 0.328

Se(w): impression value

24

Me(w): weight

Sc(death) = 0.260

Mc(death) = 1.306

Sample of sentiment dictionary

e = a, b, c, d

Page 25: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Sentiment (e)e = a, b, c, d

Original impression words (e1, e2)

a: Bright ⇔Dark

akarui (bright), ureshii (glad), tanoshii (happy)

kurai (dark), kanashii (sad), kurushii (painful)

b: Acceptance ⇔ Rejection

shonin (approval), aikou (love), suki-da (like)

kyohi (reject), ken’o (aversion), kirai-da (dislike)

c: Relaxation ⇔

Tension

yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)

kincho (tension), kinkyuu (emergency)

d: Anger ⇔Fear

okoru (angry), dogou (roar)

osoreru (fear), kowai (scary), kyofu (dread)

25

e1

e2

Sentiment value Oe(w) of an entry word w

• Sentiment value Oe(w) of an entry word w• A value between 1~0, (1: positive, 0: negative)• Calculated by analyzing the co-occurrence with the original impression words, based on Nikkei Newspaper Full Text Database (about 200 million articles)

• Original impression words and their correspondence with sentiments

Page 26: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

Sentiment (e)e = a, b, c, d

Original impression words (e1, e2)

a: Bright ⇔Dark

akarui (bright), ureshii (glad), tanoshii (happy)

kurai (dark), kanashii (sad), kurushii (painful)

b: Acceptance ⇔ Rejection

shonin (approval), aikou (love), suki-da (like)

kyohi (reject), ken’o (aversion), kirai-da (dislike)

c: Relaxation ⇔

Tension

yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)

kincho (tension), kinkyuu (emergency)

d: Anger ⇔Fear

okoru (angry), dogou (roar)

osoreru (fear), kowai (scary), kyofu (dread)

26

e1

e2

),(),(

),()(

21

1

wePweP

wePwSe

)(

)&(),(

edf

wedfweP

))&()&(log()( 21 wedfwedfwM e

)()()( wMwSwo eee

Sentiment value Oe(w) of an entry word w

Sentiment value of word “death” on the dimension c: Oc(death) = 0.260“comfortable” and “death”, “peaceful” and “death” << “tension” and “death”, “emergency” and “death”

Se(w): impression value

Me(w): weight

Page 27: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

A proposition of sentiment map27

Demonstration

query is “scandal”

Sentiment map for each news site

positive

negative

0

0.5

-0.5

Top ranked articles from each news site

Page 28: Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi Tanaka*** * Kyoto Sangyo University, Japan ** Chiba Institute of Technology,

System overview28

Offline processing(Preprocessing)

Online processing(Runtime processing)

query

articles database( including tf-idf,

sentiment values )

Yomiuri( Osaka )

Yomiuri( Tokyo )

news articles collection

morphological analysis

crawling

1) retrieve articles from each news site2) rank the articles based on tf-idf in each site

Asahi( Tokyo )

Web

・・・

tf-idf value calculationsentiment values calculation

sentimentdictionary

news sites

sentiment map

3) calculate the average of sentiment values for each site

4) generate a sentiment map