extracting and ranking product features in opinion documents

23
Extracting and Ranking Product Features in Opinion Documents 陈陈 陈陈陈陈 陈陈陈 ,,

Upload: wangheda

Post on 11-May-2015

216 views

Category:

Technology


1 download

DESCRIPTION

This is a presentation we made in the 2012 Spring Data Mining class of Tsinghua University. The presentation is about a paper by Lei Zhang, Bing Liu, Suk Hwan Lim, Eamonn O’Brien-Strain

TRANSCRIPT

Page 1: extracting and ranking product features in opinion documents

Extracting and Ranking Product Features in Opinion Documents陈欣,王鹤达,张文昌

Page 2: extracting and ranking product features in opinion documents

A Story

• Retina Display• 3-axis gyro & accelerometer• A4 CPU• Multitask• Face Time• iBook

• Antenna Gate

Page 3: extracting and ranking product features in opinion documents

Why mining product features?

• Clearly knowing the response from consumers will help company win more market share.

• Consumers could also make correct choices when shopping.

Page 4: extracting and ranking product features in opinion documents

Recent Research

• In recent years, opinion mining has been an active research area in NLP. The most important problem is to extracting features from a corpus.

• HMM, ME, PMI,CRF methods.• Double Propagation is a state-of-art unsupervised technique

for solving this problem, though it has its own significant limitations.

Page 5: extracting and ranking product features in opinion documents

Double Propagation

• Proposed by researchers from Illinois University and Zhejiang University.

• Mainly extracts noun features, woks well for medium-size corpora.

• No additional resources but initial seed opinion lexicon needed.

Page 6: extracting and ranking product features in opinion documents

DP Mechanism

Basic Assumption: Features are nouns/noun phrases and opinion words are adjectives.

Dependency Grammar: Describe the dependency relations between words in a sentence, including direct relations(a)(b) and indirect relations(c)(d).

The camera has a good lens.

Class

Opinion

Feature

Page 7: extracting and ranking product features in opinion documents

DP Limitations

Non-opinion adjectives may be extracted as opinion words. This will introduce more and more noise during the extracting process.

current

entireNoun+

Some important features do not have opinion words modifying them.

There is a valley on my mattress.

No opinion word modified feature

Page 8: extracting and ranking product features in opinion documents

Proposed Methods

Two-Step feature mining method:

Feature Extraction• Double Propagation• Part-whole pattern• No pattern

Feature Ranking• New angle to solve the noise problem.• Use relevance & frequency to rank features.

Page 9: extracting and ranking product features in opinion documents

Ranking Principles

• Three strong clue indicates a correct feature:• Modified by multiple opinion words.• Could be extracted by multiple part-whole pattern.• Combination of the part-whole, no pattern and opinion word

modification.

• Frequent appearing indicates an important feature.

Page 10: extracting and ranking product features in opinion documents

Process

• Feature extraction• part-whole relation• “no” pattern

• Feature ranking• HITS algorithm• consider frequency

Page 11: extracting and ranking product features in opinion documents

Part-whole relation

• Ambiguous / Unambiguous• Phrase pattern• NP + Prep + CP• CP + with + NP• NP CP or CP NP

• Sentence pattern• CP Verb NP

Page 12: extracting and ranking product features in opinion documents

“no” Pattern

• no + features• no noise, no indentation

• Exceptions• no problem, no offense• manually compiled an exception list

Page 13: extracting and ranking product features in opinion documents

Apply HITS Algorithm

• HITS Algorithm• hub score / authority score• iteration to optimize

• Apply HITS Algorithm• split feature and feature indicator• use directed graph• compute feature revelance

Page 14: extracting and ranking product features in opinion documents

Feature Ranking

• Utilize feature relevance and frequency• Step 1. Compute authority score using power iteration.• Step 2. Compute final score by

where is the frequency of feature , and is the authority score of feature .

Page 15: extracting and ranking product features in opinion documents

Data Sets & Evaluation Metrics

Data Sets Cars Mattress Phone LCD

# of Sent. 2223 13233 15168 1783

“Cars” and “Mattress”: product review sites.“Phone” and “LCD”: forum sites.

Precision@N metric:Percentage of correct features that are among the top N feature candidates in a ranked list.

Page 16: extracting and ranking product features in opinion documents

Recall & Precision Comparison

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Our RecallDP Recall

Our PrecisionDP Precision

0.560.64

0.44

0.55

0.55 0.54

0.23

0.43

0.78 0.77

0.680.66

0.79 0.79

0.69 0.68

Our RecallDP RecallOur PrecisionDP Precision

Results of 1000 sentences

Page 17: extracting and ranking product features in opinion documents

Recall & Precision Comparison

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Our RecallDP Recall

Our PrecisionDP Precision

0.69

0.66

0.5

0.56

0.65

0.58

0.42

0.52

0.660.7 0.7

0.62

0.7 0.70.67

0.64

Our RecallDP RecallOur PrecisionDP Precision

Results of 2000 sentences

Page 18: extracting and ranking product features in opinion documents

MattressPhone

0.45

0.5

0.55

0.6

0.65

0.7

Our Recall

DP Recall

Our Precision

DP Precision

0.67

0.51

0.59

0.48

0.66

0.62

0.650.64

Our RecallDP RecallOur PrecisionDP Precision

Results of 3000 sentences

Recall & Precision Comparison

Page 19: extracting and ranking product features in opinion documents

Ranking Comparison

Cars

Mat

tress

Phon

e

LCD

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Our PrecisionDP Precision

0.94

0.9

0.76 0.76

0.840.81

0.640.68

Our PrecisionDP Precision

Precision at top 50

Page 20: extracting and ranking product features in opinion documents

Cars

Mat

tress

Phon

e

LCD

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Our PrecisionDP Precision

0.88

0.85

0.750.73

0.820.8

0.650.68

Our PrecisionDP Precision

Precision at top 100Ranking Comparison

Page 21: extracting and ranking product features in opinion documents

CarsMattress

Phone

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

Our Precision

DP Precision

0.80.79

0.760.75

0.710.7 Our Precision

DP Precision

Precision at top 200Ranking Comparison

Page 22: extracting and ranking product features in opinion documents

Conclusion

• Use part-whole and “no” patterns to increase recall

• Rank extracted feature candidates by feature importance, determined by two factors:• Feature relevance • Feature frequency(HITS was applied)

Page 23: extracting and ranking product features in opinion documents

Thank you