extracting and ranking product features in opinion documents
DESCRIPTION
This is a presentation we made in the 2012 Spring Data Mining class of Tsinghua University. The presentation is about a paper by Lei Zhang, Bing Liu, Suk Hwan Lim, Eamonn O’Brien-StrainTRANSCRIPT
Extracting and Ranking Product Features in Opinion Documents陈欣,王鹤达,张文昌
A Story
• Retina Display• 3-axis gyro & accelerometer• A4 CPU• Multitask• Face Time• iBook
• Antenna Gate
Why mining product features?
• Clearly knowing the response from consumers will help company win more market share.
• Consumers could also make correct choices when shopping.
Recent Research
• In recent years, opinion mining has been an active research area in NLP. The most important problem is to extracting features from a corpus.
• HMM, ME, PMI,CRF methods.• Double Propagation is a state-of-art unsupervised technique
for solving this problem, though it has its own significant limitations.
Double Propagation
• Proposed by researchers from Illinois University and Zhejiang University.
• Mainly extracts noun features, woks well for medium-size corpora.
• No additional resources but initial seed opinion lexicon needed.
DP Mechanism
Basic Assumption: Features are nouns/noun phrases and opinion words are adjectives.
Dependency Grammar: Describe the dependency relations between words in a sentence, including direct relations(a)(b) and indirect relations(c)(d).
The camera has a good lens.
Class
Opinion
Feature
DP Limitations
Non-opinion adjectives may be extracted as opinion words. This will introduce more and more noise during the extracting process.
current
entireNoun+
Some important features do not have opinion words modifying them.
There is a valley on my mattress.
No opinion word modified feature
Proposed Methods
Two-Step feature mining method:
Feature Extraction• Double Propagation• Part-whole pattern• No pattern
Feature Ranking• New angle to solve the noise problem.• Use relevance & frequency to rank features.
Ranking Principles
• Three strong clue indicates a correct feature:• Modified by multiple opinion words.• Could be extracted by multiple part-whole pattern.• Combination of the part-whole, no pattern and opinion word
modification.
• Frequent appearing indicates an important feature.
Process
• Feature extraction• part-whole relation• “no” pattern
• Feature ranking• HITS algorithm• consider frequency
Part-whole relation
• Ambiguous / Unambiguous• Phrase pattern• NP + Prep + CP• CP + with + NP• NP CP or CP NP
• Sentence pattern• CP Verb NP
“no” Pattern
• no + features• no noise, no indentation
• Exceptions• no problem, no offense• manually compiled an exception list
Apply HITS Algorithm
• HITS Algorithm• hub score / authority score• iteration to optimize
• Apply HITS Algorithm• split feature and feature indicator• use directed graph• compute feature revelance
Feature Ranking
• Utilize feature relevance and frequency• Step 1. Compute authority score using power iteration.• Step 2. Compute final score by
where is the frequency of feature , and is the authority score of feature .
Data Sets & Evaluation Metrics
Data Sets Cars Mattress Phone LCD
# of Sent. 2223 13233 15168 1783
“Cars” and “Mattress”: product review sites.“Phone” and “LCD”: forum sites.
Precision@N metric:Percentage of correct features that are among the top N feature candidates in a ranked list.
Recall & Precision Comparison
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Our RecallDP Recall
Our PrecisionDP Precision
0.560.64
0.44
0.55
0.55 0.54
0.23
0.43
0.78 0.77
0.680.66
0.79 0.79
0.69 0.68
Our RecallDP RecallOur PrecisionDP Precision
Results of 1000 sentences
Recall & Precision Comparison
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Our RecallDP Recall
Our PrecisionDP Precision
0.69
0.66
0.5
0.56
0.65
0.58
0.42
0.52
0.660.7 0.7
0.62
0.7 0.70.67
0.64
Our RecallDP RecallOur PrecisionDP Precision
Results of 2000 sentences
MattressPhone
0.45
0.5
0.55
0.6
0.65
0.7
Our Recall
DP Recall
Our Precision
DP Precision
0.67
0.51
0.59
0.48
0.66
0.62
0.650.64
Our RecallDP RecallOur PrecisionDP Precision
Results of 3000 sentences
Recall & Precision Comparison
Ranking Comparison
Cars
Mat
tress
Phon
e
LCD
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Our PrecisionDP Precision
0.94
0.9
0.76 0.76
0.840.81
0.640.68
Our PrecisionDP Precision
Precision at top 50
Cars
Mat
tress
Phon
e
LCD
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Our PrecisionDP Precision
0.88
0.85
0.750.73
0.820.8
0.650.68
Our PrecisionDP Precision
Precision at top 100Ranking Comparison
CarsMattress
Phone
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
Our Precision
DP Precision
0.80.79
0.760.75
0.710.7 Our Precision
DP Precision
Precision at top 200Ranking Comparison
Conclusion
• Use part-whole and “no” patterns to increase recall
• Rank extracted feature candidates by feature importance, determined by two factors:• Feature relevance • Feature frequency(HITS was applied)
Thank you