extracting and ranking product features in opinion documents

Post on 11-May-2015

216 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a presentation we made in the 2012 Spring Data Mining class of Tsinghua University. The presentation is about a paper by Lei Zhang, Bing Liu, Suk Hwan Lim, Eamonn O’Brien-Strain

TRANSCRIPT

Extracting and Ranking Product Features in Opinion Documents陈欣,王鹤达,张文昌

A Story

• Retina Display• 3-axis gyro & accelerometer• A4 CPU• Multitask• Face Time• iBook

• Antenna Gate

Why mining product features?

• Clearly knowing the response from consumers will help company win more market share.

• Consumers could also make correct choices when shopping.

Recent Research

• In recent years, opinion mining has been an active research area in NLP. The most important problem is to extracting features from a corpus.

• HMM, ME, PMI,CRF methods.• Double Propagation is a state-of-art unsupervised technique

for solving this problem, though it has its own significant limitations.

Double Propagation

• Proposed by researchers from Illinois University and Zhejiang University.

• Mainly extracts noun features, woks well for medium-size corpora.

• No additional resources but initial seed opinion lexicon needed.

DP Mechanism

Basic Assumption: Features are nouns/noun phrases and opinion words are adjectives.

Dependency Grammar: Describe the dependency relations between words in a sentence, including direct relations(a)(b) and indirect relations(c)(d).

The camera has a good lens.

Class

Opinion

Feature

DP Limitations

Non-opinion adjectives may be extracted as opinion words. This will introduce more and more noise during the extracting process.

current

entireNoun+

Some important features do not have opinion words modifying them.

There is a valley on my mattress.

No opinion word modified feature

Proposed Methods

Two-Step feature mining method:

Feature Extraction• Double Propagation• Part-whole pattern• No pattern

Feature Ranking• New angle to solve the noise problem.• Use relevance & frequency to rank features.

Ranking Principles

• Three strong clue indicates a correct feature:• Modified by multiple opinion words.• Could be extracted by multiple part-whole pattern.• Combination of the part-whole, no pattern and opinion word

modification.

• Frequent appearing indicates an important feature.

Process

• Feature extraction• part-whole relation• “no” pattern

• Feature ranking• HITS algorithm• consider frequency

Part-whole relation

• Ambiguous / Unambiguous• Phrase pattern• NP + Prep + CP• CP + with + NP• NP CP or CP NP

• Sentence pattern• CP Verb NP

“no” Pattern

• no + features• no noise, no indentation

• Exceptions• no problem, no offense• manually compiled an exception list

Apply HITS Algorithm

• HITS Algorithm• hub score / authority score• iteration to optimize

• Apply HITS Algorithm• split feature and feature indicator• use directed graph• compute feature revelance

Feature Ranking

• Utilize feature relevance and frequency• Step 1. Compute authority score using power iteration.• Step 2. Compute final score by

where is the frequency of feature , and is the authority score of feature .

Data Sets & Evaluation Metrics

Data Sets Cars Mattress Phone LCD

# of Sent. 2223 13233 15168 1783

“Cars” and “Mattress”: product review sites.“Phone” and “LCD”: forum sites.

Precision@N metric:Percentage of correct features that are among the top N feature candidates in a ranked list.

Recall & Precision Comparison

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Our RecallDP Recall

Our PrecisionDP Precision

0.560.64

0.44

0.55

0.55 0.54

0.23

0.43

0.78 0.77

0.680.66

0.79 0.79

0.69 0.68

Our RecallDP RecallOur PrecisionDP Precision

Results of 1000 sentences

Recall & Precision Comparison

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Our RecallDP Recall

Our PrecisionDP Precision

0.69

0.66

0.5

0.56

0.65

0.58

0.42

0.52

0.660.7 0.7

0.62

0.7 0.70.67

0.64

Our RecallDP RecallOur PrecisionDP Precision

Results of 2000 sentences

MattressPhone

0.45

0.5

0.55

0.6

0.65

0.7

Our Recall

DP Recall

Our Precision

DP Precision

0.67

0.51

0.59

0.48

0.66

0.62

0.650.64

Our RecallDP RecallOur PrecisionDP Precision

Results of 3000 sentences

Recall & Precision Comparison

Ranking Comparison

Cars

Mat

tress

Phon

e

LCD

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Our PrecisionDP Precision

0.94

0.9

0.76 0.76

0.840.81

0.640.68

Our PrecisionDP Precision

Precision at top 50

Cars

Mat

tress

Phon

e

LCD

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Our PrecisionDP Precision

0.88

0.85

0.750.73

0.820.8

0.650.68

Our PrecisionDP Precision

Precision at top 100Ranking Comparison

CarsMattress

Phone

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

Our Precision

DP Precision

0.80.79

0.760.75

0.710.7 Our Precision

DP Precision

Precision at top 200Ranking Comparison

Conclusion

• Use part-whole and “no” patterns to increase recall

• Rank extracted feature candidates by feature importance, determined by two factors:• Feature relevance • Feature frequency(HITS was applied)

Thank you

top related