yukiko kawai*, yusuke fujita*, tadahiko kumamoto**, jianwei zhang*, katsumi tanaka*** * kyoto sangyo...

Post on 29-Mar-2015

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Yukiko Kawai*, Yusuke Fujita*, Tadahiko Kumamoto**, Jianwei Zhang*, Katsumi

Tanaka***

* Kyoto Sangyo University, Japan ** Chiba Institute of Technology, Japan *** Kyoto University, Japan

Using a Sentiment Map for Visualizing Credibility of News

Sites on the Web

1

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

2

Background3

To answer this question, I want to read some news to have an opinion about this topic.Rapid spread of web news sites (e.g., MSN, GoogleNews)Different sites may have different opinions about the topic

A question:

What is your attitude towards “Iraq war”?

agree or disagree?

Sentiment tendencies of sites

Background4

???

Is the Iraq war right or wrong?

I agree this war

If it is a pro-war site

If it is an anti-war site

???

Is the Iraq war right or wrong?

Well, I have now opinions on different sites

Site A

Site B

I disagree this war

News Site

A misconception may be caused,if sites’ tendencies are not known in advance

positive

negativepositive

negative

Information

credibility is

improved This may cause a more fair-minded judgment

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

5

A concept of sentiment map

A query is “Iraq war”Mapping Graph of sentiment based on location

Top ranked articles from each news site

6

Demonstration

Positive

Negative

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

7

System overview8

Offline processing(Preprocessing)

Online processing(Runtime processing)

query

articles database( including tf-idf,

sentiment values )

Yomiuri( Osaka )

Yomiuri( Tokyo )

news articles collection

morphological analysis

crawling

1) retrieve articles from each news site2) rank the articles based on tf-idf in each site

Asahi( Tokyo )

Web

・・・

tf-idf value calculationsentiment values calculationsentiment

dictionary

news sites

sentiment map

3) calculate the average of sentiment values for each site

4) generate a sentiment map

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

9

Offline processing10

News articles collectionCrawl news articles from various news sites and store

them into DBNews articles analysis

Eliminate HTML tagsMake morphological analysis to extract nouns, verbs,

and adjectivesCalculate tf-idf values of extracted word j for each

news article pi

Attach a sentiment vector to each news articleUse a sentiment dictionary

jall

jij N

N

F

Fidftf log

)log(

)1log(

Fj: the frequency of word j appearing on article pi

Fall: the number of all words on pi

N: the number of all articlesNj: the number of articles including j

Entry word (w)

Sentiment (e)

a: Dark

Bright

Sentiment (e)b:

Rejection 

Acceptance

Sentiment (e)

c: Tension 

Relaxation

Sentiment (e)

d: Fear  

Anger

challenge 0.618 0.687 0.752 0.500

collide 0.344 0.353 0.315 0.529

death 0.28 0.358 0.260 0.364

derailment

0.31 0.546 0.403 0.291

revival 0.91 0.521 0.429 0.000

rich 0.597 0.676 0.761 0.466

11

Oc(death) = 0.260

Sample of sentiment dictionarye = a, b, c, d

• Sentiment value Oe(w) of an entry word w• A value between 0~1, (e.g., 0: dark, 1: bright)• Calculated by analyzing co-occurrence with the original sentiment words, based on 200 million articles of Nikkei newspapers

⇔⇔ ⇔ ⇔

12Calculation of Sentiment value Oe(w)• Sentiments and their corresponding original sentiment words Sentiment (e = a,

b, c, d)Original sentiment words (e1, e2)

a: Bright ⇔ Dark bright, glad, happy

dark, sad, painful

b: Acceptance ⇔ Rejection

approval, love, like

reject, aversion, dislike

c: Relaxation ⇔ Tension

comfortable, peaceful, slow

tension, emergency

d: Anger ⇔ Fear angry, roar

fear, scary, dread

e1

e2

),(),(

),()(

21

1

wePweP

wePwOe

)(

)&(),(

edf

wedfweP

df(e): occurrence times of original sentiment words edf(e&w): co-occurrence times of original sentiment

words e and an entry word w

Sentiment value:

13Calculation of Sentiment value Oe(w)

Sentiment (e = a, b, c, d)

Original sentiment words (e1, e2)

a: Bright ⇔ Dark bright, glad, happy

dark, sad, painful

b: Acceptance ⇔ Rejection

approval, love, like

reject, aversion, dislike

c: Relaxation ⇔ Tension

comfortable, peaceful, slow

tension, emergency

d: Anger ⇔ Fear angry, roar

fear, scary, dread

e1

e2

Sentiment value of word “death” on the dimension c: Oc(death) = 0.260Because df(“comfortable” & “death”), df(“peaceful” & “death”),

df(“slow” & “death”) <<

df(“tension”& “death”), df(“emergency”& “death”)

• Sentiments and their corresponding original sentiment words

Sentiment vector O(TEXT) of a news article 14

a news article text = TEXT TEXT has the number of n keywords

keywords = {w}Each sentiment value Oe(TEXT)

Sentiment vector O(w) of the article for the keyword w

))(),(),(),(()( TEXTOTEXTOTEXTOTEXTOTEXTO dcba

nwOTEXTOn

iiee

0

)()(

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

15

Online processing16

When a user enters query keywords, 1. Retrieve news articles including the keywords2. Rank articles based on tf-idf values for each

news site3. Calculate the average of sentiment vectors of

top n articles for each site4. Attach sentiment graphs to corresponding

locations of news sitesAlso present a list of articles grouped by each

site

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

17

Query: Daisuke MatsuzakaA famous Japanese Major Leaguer

A reviewer read all the retrieved articles of different news sites and decided the sentiments of each news sitepositive, negative or neutral

For comparison, numeric sentiment values given from our system are categorized to discrete values

positive, negative or neutral

Experimental evaluation18

Experimental evaluation19

a: Dark

Bright

b: Rejection

Acceptance

c: Tension

Relaxation

d: Fear

Anger

reviewer Bright Acceptance Tension Neutral

Web site 1

Bright Acceptance Tension Neutral

reviewer Bright Acceptance Relaxation Neutral

Web site 2

Bright Acceptance Tension Neutral

reviewer Bright Acceptance Relaxation Fear

Web site 3

Bright Acceptance Tension Fear

reviewer Neutral Neutral Neutral Anger

Web site 4

Dark Acceptance Tension FearPrecision is about 70%There exist some distinctions among different news

sites

⇔⇔ ⇔ ⇔

OutlineBackgroundResearch goalSystem overview

Offline processingOnline processing

Experimental evaluationConclusion and future work

20

Conclusion and future work21

ConclusionDeveloped a system called sentiment map

for visualizing the sentiment distinction of different news sites

Tested its effectivenessA prototype:

http://klab.kyoto-su.ac.jp/~fujita/cgi-bin/Fuzilla/News/

Future workMore experimentsSentiment analysis of readers and

information recommendation based on it

Thank you for your attention

22

Entry word (w)

Sentiment (e)

a: Bright ⇔ Dark

Sentiment (e)b:

Acceptance ⇔ Rejection

Sentiment (e)c:

Relaxation ⇔

Tension

Sentiment (e)

d: Anger ⇔ Fear

chosen-suru

(challenge)

0.618 0.687 0.752 0.500

1.399 1.330 1.251 1.090

dassen(derailmen

t)

0.31 0.546 0.403 0.291

0.514 0.603 0.737 0.549

hofu-da(rich)

0.597 0.676 0.761 0.466

1.416 1.352 1.299 1.109

shibou(death)

0.28 0.358 0.260 0.364

1.132 1.272 1.306 1.112

shototsu-suru

(collide)

0.344 0.353 0.315 0.529

1.004 1.016 1.099 0.948

sosei(revival)

0.91 0.521 0.429 0.000

0.464 0.582 0.732 0.328

Se(w): impression value

24

Me(w): weight

Sc(death) = 0.260

Mc(death) = 1.306

Sample of sentiment dictionary

e = a, b, c, d

Sentiment (e)e = a, b, c, d

Original impression words (e1, e2)

a: Bright ⇔Dark

akarui (bright), ureshii (glad), tanoshii (happy)

kurai (dark), kanashii (sad), kurushii (painful)

b: Acceptance ⇔ Rejection

shonin (approval), aikou (love), suki-da (like)

kyohi (reject), ken’o (aversion), kirai-da (dislike)

c: Relaxation ⇔

Tension

yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)

kincho (tension), kinkyuu (emergency)

d: Anger ⇔Fear

okoru (angry), dogou (roar)

osoreru (fear), kowai (scary), kyofu (dread)

25

e1

e2

Sentiment value Oe(w) of an entry word w

• Sentiment value Oe(w) of an entry word w• A value between 1~0, (1: positive, 0: negative)• Calculated by analyzing the co-occurrence with the original impression words, based on Nikkei Newspaper Full Text Database (about 200 million articles)

• Original impression words and their correspondence with sentiments

Sentiment (e)e = a, b, c, d

Original impression words (e1, e2)

a: Bright ⇔Dark

akarui (bright), ureshii (glad), tanoshii (happy)

kurai (dark), kanashii (sad), kurushii (painful)

b: Acceptance ⇔ Rejection

shonin (approval), aikou (love), suki-da (like)

kyohi (reject), ken’o (aversion), kirai-da (dislike)

c: Relaxation ⇔

Tension

yuttari (comfortable), nonbiri (peaceful), yukkuri (slow)

kincho (tension), kinkyuu (emergency)

d: Anger ⇔Fear

okoru (angry), dogou (roar)

osoreru (fear), kowai (scary), kyofu (dread)

26

e1

e2

),(),(

),()(

21

1

wePweP

wePwSe

)(

)&(),(

edf

wedfweP

))&()&(log()( 21 wedfwedfwM e

)()()( wMwSwo eee

Sentiment value Oe(w) of an entry word w

Sentiment value of word “death” on the dimension c: Oc(death) = 0.260“comfortable” and “death”, “peaceful” and “death” << “tension” and “death”, “emergency” and “death”

Se(w): impression value

Me(w): weight

A proposition of sentiment map27

Demonstration

query is “scandal”

Sentiment map for each news site

positive

negative

0

0.5

-0.5

Top ranked articles from each news site

System overview28

Offline processing(Preprocessing)

Online processing(Runtime processing)

query

articles database( including tf-idf,

sentiment values )

Yomiuri( Osaka )

Yomiuri( Tokyo )

news articles collection

morphological analysis

crawling

1) retrieve articles from each news site2) rank the articles based on tf-idf in each site

Asahi( Tokyo )

Web

・・・

tf-idf value calculationsentiment values calculation

sentimentdictionary

news sites

sentiment map

3) calculate the average of sentiment values for each site

4) generate a sentiment map

top related