wsdm’08 xiaowen ding 、 bing liu 、 philip s. yu department of computer science university of...

21
WSDM’08 Xiaowen Ding Bing Liu Philip S. Yu Department of Computer Science University of Illinois at Chicago Conference on Web Search and Data Mining 1

Upload: eustace-bryant

Post on 31-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

WSDM’08

Xiaowen Ding 、 Bing Liu 、 Philip S. YuDepartment of Computer Science University of Illinois at Chicago

Conference onWeb Search and Data Mining

1

Target: Customer Reviews of Products an increasing number of people are writing reviews

→ user 沒讀完全部 review 的話 , 或許會得到偏頗的意見→ business 考量上 , 要能追蹤商品 It is thus highly desirable to produce a summary of reviews

opinion mining or sentiment analysis product features that have been commented on by reviewers whether the comments are positive or negative (Neutral)

lexicon-based method “small” can indicate a positive or a negative opinion on a

product feature depending on the product feature and the context

2

基本上這篇經典 M. Hu and B. Liu. Mining and summarizing

customer reviews. KDD’04, 2004. 被作掉了

A-M. Popescu and O. Etzioni. Extracting Product Features and Opinions from Reviews. EMNLP-05, 2005.

3

Review - Amazon

4

5

解決形容跟上下文有關問題

解決一句內有好評有壞評問題

a holistic approach that can accurately infer the semantic orientation of an opinion word based on the review context

a new function aggregating multiple opinion words in the same sentence

better than the state-of-the-art existing methods

6

Two main research directions are sentiment classification and feature-based opinion mining Document level vs. Sentence level

▪ based on identification of opinion words or phrases▪ corpus-based approaches▪ dictionary-based approaches

Holistic lexicon-based approach to identifying the orientations of context dependent opinion words is closely related to works that identify domain opinion words use conjunction rules to find such words from large domain corpora “this room is beautiful and spacious” “the battery life is very long” && “it takes a long time to focus”

7

Object the entity that has been commented on 被評

東 has a set of components (or parts) and also a

set of attributes (or properties) 其成分 can be hierarchically decomposed according

to the part-of relationship 階層的成分關係

8

Example 1 特定品牌的數位相機 : object 電池 : component 畫素 : attribute 電池壽命 : attribute of component

Example 2 User 可以對 object, component or attribute 表示意見

Example 3 “This camera is too large”

▪ “large” is called a feature indicator “The battery life of this camera is too short”

▪ “Size” is an implicit feature in the following sentence as it does not appear in the sentence

9

“I do not like this camera”,

“the picture quality of this camera is poor”

Definition (explicit and implicit opinion): An explicit opinion on feature f is a subjective sentence that directly expresses a positive or negative opinion. An implicit opinion on feature f is an objective sentence that implies an opinion.

Example 4: The following sentence expresses an explicit positive opinion: “The picture quality of this camera is amazing.” following sentence expresses an implicit negative opinion: “The earphone broke in two days.”

10

“The picture quality is good, but the battery life is short”.

Definition (opinion holder) The holder of a particular opinion is the person or the

organization that holds the opinion. “John expressed his disagreement on the treaty”

Definition (semantic orientation of an opinion) The semantic orientation of an opinion on a feature f

states whether the opinion is positive, negative or neutral.

11

complex case“the view-finder and the lens of this camera are too close”,

Both F and W are unknown. Then, in opinion analysis, we need to perform three tasks

Task 1 Identifying and extracting object features that have

been commented on in each review d D.∈ Task 2

Determining whether the opinions on the features are positive, negative or neutral.

Task 3 Grouping synonyms of features, as different people

may use different words to express the same feature.

12

F is known but W is unknown. This is similar to Problem 1, but slightly easier.

All the three tasks for Problem 1 still need to be performed,

but Task 3 becomes the problem of matching discovered features with the set of given features F

13

W is known (then F is also known).

We only need to perform Task 2 above, namely, determining whether the opinions on the known features are positive, negative or neutral

after all the sentences that contain them are extracted.

14

The final output for each evaluative text d is a set of pairs.

Each pair is denoted by (f, SO) f is a feature SO is the semantic or opinion orientation

(positive or negative) expressed in d on feature f

15

to use the opinion words around each product feature in a review sentence to determine the opinion orientation on the product feature ( 蒐集 words, idioms)

1. how to combine multiple opinion words (which may be conflicting) to arrive at the final decision

2. how to deal with context or domain dependent opinion words without any prior knowledge from the user

3. how to deal with many important language constructs which can change the semantic orientations of opinion words

16

the feature itself can be an opinion word as it may bean adjective representing a feature indicator, “This camera is very reliable”

Negation Rules

“But” Clause Rules

17

18

Adjectives as feature indicators

▪ “this camera is very small” Explicit features that are not adjectives

▪ “the battery life of this camera is long” Intra-sentence conjunction rule

“the battery life is very long” “This camera takes great pictures and has a long battery life”

Pseudo intra-sentence conjunction rule “The camera has a long battery life, which is great”

Inter-sentence conjunction rule “The picture quality is amazing. The battery life is long”

19

多數決

20

21