copyright 2009 by cebt meeting lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사...
TRANSCRIPT
Copyright 2009 by CEBT
Meeting
Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정
포장이사 견적 & 냉난방기 이전 설치 견적
정보과학회 데이터베이스 논문지 1 차 심사 완료 오타 수정
수식 설명 추가 요구
STFSSD 발표자료 작성
Semantic Tech & Context - 1
A Holistic Approach toProduct Review Summarization
Jung-Yeon Yang, Jaeseok Myung, Sang-goo LeeDepartment of Computer Science and EngineeringSeoul National University
Center for E-Business TechnologySeoul National University
Seoul, Korea
Copyright 2009 by CEBT Semantic Tech & Context - 3
Outline
Introduction
Related Work
Motivation
Proposed Models
Process of a Review Summarization
Feature Extraction
Sentiment Analysis
Feature Scoring
Experiment
Conclusion & Future work
Copyright 2009 by CEBT
Product reviews Reviews contains users’ opinion about a product Many customers references others’ reviews when they buy some
products As a number of reviews increase, it is hard to read and grasp the
whole reviews
Review Summarization To know the whole opinions at a glance Show the evaluation of product
– Overall score about the product
– Score on each representative features
– An evaluation should be givenon each product feature
Opinion mining To find user’s opinion in a text To find representative features
Introduction
4
Copyright 2009 by CEBT
Related Work
Feature extraction frequencies of words a structural information of sentences in a review
Sentiment analysis Natural Language Processing (NLP)–based approach
– Using a word corpus (the WordNet or the SentiwordNet)
Computational Statistics-based approach– Using a Point-wise Mutual Information (PMI) between opinion words
Feature scoring Calculate an evaluation score of each feature
– Use a sentimental score that is from the WordNet or the SentiwordNet
– Use a rating score of a review document
FeatureExtraction
SentimentAnalysis
FeatureScoring
Re-viewDoc.
Re-viewDoc.
Sum-mary
5
Copyright 2009 by CEBT
Related Work (Cont.)
< Opinion Observer >Using NLP,
sentimental polarity summation
< RedOpal >Using rating score ,
based on a specific feature
< Pulse >Using Term frequencies,
Clustering
6
Copyright 2009 by CEBT
Motivation
Problems in previous work Workloads to extract features
– Many strategies and methods
Using a word corpus– Sentiment polarities are based on general usages of words
– It cannot deal with context-sensitive words (e.g. big, small, long, short, …)
Using a rating score of a review– In previous works, whole features that are extracted from the same re-
view has the same evaluation score
– Each features has to have a own evaluation score in every reviews
Challenges A dynamic and easy method to extract features is needed. (through
Tools) We want to find out a meaning of an opinion about a feature that is
modified by context-sensitive words A better way to scoring a product feature is needed.
7
Copyright 2009 by CEBT
Example: using user scores of re-views
Rat-ing
scoreSize Cost Design Utility
Shut-ter
speed
battery time
A/S color
★★★★★ O O
★★★★★ O O
★★★★★ O O O O
★★★★ O O O O
★★★★ O O O
★★★★ O O O O
★★★ O O
★★★ O O
★★★ O O
★ O O
★ O O
Bad
Good
5
5
4
5
5
5
5
5 5
4
4 4 4
4 4
4 4 4 4
3 3
3 3
3 3
1 1
1 1
8
Copyright 2009 by CEBT
Example: Considering sentimental polarities
Rat-ing
scoreSize Cost Design Utility
Shut-ter
speed
battery time
A/S color
★★★★★ O O
★★★★★ O O
★★★★★ O O O
★★★★ O O O O
★★★★ O O O
★★★★ O O O O
★★★ O O
★★★ O O
★★★ O O
★ O O
★ O O
Bad
Rating score : ★★★★The size of camera is good to hold in one hand and comfortable.a design is so cool, nice body!!. But battery time is short. So, in outdoor, additional batteries are needed.This camera is almost perfect!!
Good
9
Copyright 2009 by CEBT
Proposed Models
R1
f11 o11 st11 sp11 e11
f21 o21 st21 sp21 e21
fi1 oi1 sti1 spi1 ei1
fm1 om1 stm1 spm1 em1
……
us1Rj
f1j o1j st1j sp1j e1j
f2j o2j st2j sp2j e2j
fij oij stij spij eij
fmj omj stmj spmj emj
……
usjRn
f1n o1n st1n sp1n e1n
f2n o2n st2n sp2n e2n
fin oin stin spin ein
fmn omn stmn spmn emn
……
usn
… …
R : reviewus : user scoref : featureo : opinionst : strength of an opinion,sp : sentimental polarity of an opinione : evaluation score of a feature in a re-viewE : overall evaluation score of a feature
Rj
usj
fij
oij
fij
oij
fij
oij
stij
spij
stij
spij
stij
spij
eijeij
eij
Ei
Review Model
Review Summarization Model
10
Copyright 2009 by CEBT
Process of a Review Summariza-tion
ㅍ
ProductReviews
Featureextrac-
tion
Senti-ment
analysis
Featurescoring
Feature-opinionpairs
Extract fea-tures
Extract opinion word
POS tagger
Review parser
Classify senti-
ment po-larity
Pattern rulesWord fre-quency
SentimentDictionar-
ies
ConstructDictionariesautomati-
cally
Sentiment polarities of Fea-tures
Title
Main text
Re-viewer
Reviewdate
Rate
Featureco-occur-
renceFeature
frequency
Sentimentdistribution
Evaluation scores of product fea-
tures
Derive a score of feature
Review Summary
N-gram
11
Copyright 2009 by CEBT
Feature Extraction
PicAChoo (Pick And Choose; a text analyzing framework)
Reducing manual efforts to obtain feature and opinion words
Enabling dynamic composition of several extraction methods
– 4 primitive methods (freq., co-occurrence, sequential pattern, plug-in)
– 2 composite methods (logical & arithmetical methods)
Utilizing characteristics of textual data
documentsTokenized Document
Preprocessing
Composition of primitive extraction methods(freq. , co-occurrence, pattern-rules, …)
SelectedWords
Opinion Mining
Summarization
User Modeling
…
12
Copyright 2009 by CEBT
Find out sentimental polarities of opinions in reviews Consider a context of opinion word
SO=SA(opinion word, Product category, product feature, user’s evaluation)
Point-wise Mutual Information (PMI) A measure of association between two words
Sentiment Analysis
)()(
),(log),(
21
2121 wordpwordp
wordwordpwordwordPMI
.,),(),(
.,),(),(
cnegativeDiOFwhenofPMIofPMI
cpositiveDiOFwhenofPMIofPMI
neg
pos
),(),(),( ofPMIofPMIofnOrientatioSentiment negpos
ReviewDoc.
positive wordDictionary
negative wordDictionary
SentimentAnalysis
(feature,opinion)
(feature,opinion,polarity)
• Build automat-ically
• use user scores• POS-tagging Dic.={reviewID, catID, type, POS, word, userScore, s_no, w_no}
13
Copyright 2009 by CEBT
Feature Scoring Scoring strategies
Only use user score (in previous work)
Consider a distribution of sentimental polarities of user’s opinion
f1 f2 f3 f4 f5 f6 f7 … fn
R1 P P N P N
R2 P P P
R3 N P P P
R4 P N N P
R5 P P
R6 P P P N
R7 N P P P
…
Rn P N N
f1 ~ fn : features R1 ~ Rn : reviewsP : positive opinion N : negative opinion
• Use the distribution of sentimental polarities in the same review
• Calculate evaluation scores of each feature through the adjustment of rating scores
Summary = { E1, E2, … , Ei, … , Em } , m = Number of features
, n = Number of reviews that contain the ith feature
= number of opinions in the jth review= number of positive opinions in the jth review= number of negative opinions in the jth review
F(fi, j) = frequency of fi in the jth reviewspij = Sentiment Polarity(fij, oij)
numbernegativeaisspifRNRNstus
numberpositiveaisspifRNRNstuse
ijjtotaljposijj
ijjtotaljnegijj
ij ,)()(1
,)()(1
n
eE
n
kik
i
1
),(122 jfFij
ist
)( jtotal RN)( jpos RN)( jneg RN
14
Copyright 2009 by CEBT
Experiments
Data
ePinions.com
Sentiment Analysis
Feature Scoring
Improvement of our method in comparison with a previous work
– about 20%
15
Productcate-gory
re-views
positive re-views
negative reviews
Prod-uct
feature
<f, o> pair
Context-sensitive
word
Hand phone 2947
2196 (74.5%)
418 (25.5%) 48 734 124 (16.9%)
Digital camera 12917
9940 (76.9%)
1740 (23.1%) 37 974 137 (14.1%)
Precision Our method Previous method (PMI using Web doc. Search)
AllContext-non-
sensitiveContext-sensi-
tive AllContext-non-
sensitiveContext-sensi-
tive
Hand phone 0.784 0.775 0.847 0.786 0.834 0.508
Digital camera 0.764 0.758 0.797 0.817 0.866 0.515
Copyright 2009 by CEBT
Conclusion
Proposed the models
Product review model
Review summarization model
Proposed new approaches to summarize product reviews
Handle context-sensitive words in the sentiment analysis process
Feature scoring method
– Utilizing user scores and sentimental polarities of opinions
Develop a text analyzing framework for feature extraction
16
uKnow iKnowFeature Extraction
<Feature, Opinion>Pairs
Opinion Extraction
FeatureScoring
ScoreSummarization
SentimentClause
Sentiment Analysis
Feature Score
ProductSummary
ProductRecommend
ProductComparison
weKnow
• NLP ap-proach
• use Parse Trees
• use the Sentiment Dictionary (defined by experts manually)
• find out Sentimental Polarities of Features
• derive scores of <Feature, Opinion> pairs
• Statistical ap-proach
• use Probabilities• use the POS tags
• use the Sentiment Dictionaries (constructed automatically)
• use Rating data of Reviews• use a PMI values between Feature
and Opinion• derive the sentimental polarities
• use Rating data of Re-views
• use frequencies of fea-tures
• use a distribution of sen-timents
• use the users’ pro-files
• use inputs from users
• use Comparative Objects
17
Copyright 2009 by CEBT
E-mail : [email protected]
Intelligent Database Systems Lab. : http://ids.snu.ac.kr
Thank youQ & A
18