extracting opinion topics for chinese opinions using dependence grammar
DESCRIPTION
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar. Zhejiang University 浙江大學 SIGKDD Workshop on Data Mining and Audience Intelligence for Advertising ADKDD 07. Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen. Guang Qiu, Kangmiao Liu, Jiajun Bu*, - PowerPoint PPT PresentationTRANSCRIPT
1
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar
Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang
Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen
Zhejiang University 浙江大學SIGKDD Workshop on Data Mining and Audience Intelligence for Advertising
ADKDD 07
22
Introduction Problem Definition: Determining opinion sentence, and
extracting the topic from a opinion sentence.
Advertisements promoting systems recommend without considering the sentiment polarity of the texts.
A reasonable advertisement should be about that of a rival or solutions to user’s complain
Kim, Soo-Min and Hovy, Eduard: Opinion is described as a quadruple including Topic, Holder, Claim and Sentiment.
Most of previous work only focuses on sentiment classification assuming topics are given in prior.
22
33
Related Work (1/2)Sentiment Classification1.Hatzivassiloglou and McKeown, 1997
Pairs of adjectives conjoined by and, or, but, either-or, or neither-nor
2.Wiebe, 2000Focuses on subjectivity tagging which distinguishes opinion sentences
3
4
Related Work (2/2)
3.Esuli and Sebastiani, 2005Text classification by glosses.
4.Ku, Liang, and Chen, 2006Sentiment orientation of sentences can be concluded from that of words
5.Pang, Lee and Vaithyanathan, 2002Machine learning methods:Naive Bayes,
Maximum Entropy and SVM.
5
Method - Acquire sentiment words (1/2)
Assumption: 1. Regard sentences with sentiment words as opinion
ones2. Topic is assured to exist in these opinion sentences
Data set of sentiment words: WS1: NTUSD
2812 positive words and 8276 negative words
D1: Emotion classification by Bruce and Wiebe1256 positive blog articles and 1238 negative
ones
6
Method - Acquire sentiment words (2/2)
D2: Blog search results of Baidu 372 names of products as queries, 24146 snippets Manual label 1685 snippets of POS, NEG, NEU Select adj. correlated with of the adj. in D1. Calculate the probability of each word occurs in
each sentiment category
WS1+D1+D2: 3269 positive words and 9621 negative words
7
Method - Extracting the topics using rules (1/4)
<ROLE_SENTI, RELA, ROLE_TOPIC>
8
Method - Extracting the topics using rules (2/4)
1. <VOB, SIBLING, SBV>
2. <DE, GRANDPARENT-SIBLING, SBV>
8
9
Method - Extracting the topics using rules (3/4)
3. <ATT, PARENT-SIBLING, SBV>
4.<HED, CHILD, VOB>
10
Method - Extracting the topics using rules (4/4)
5. <HED, CHILD, SBV>
6. <ADV, SIBLING, VOB>
7. <ANY, NEAREST, ANYNOUN>
11
Experiments and Result(1/2)Data:
Blog search results of Baidu372 queries22461 snippets unlabeled as POS, NEG, NEU51661 sentencesTwo annotators to annotate topics, and POS,
NEG, NEU570 sentences250 for sentiment and 320 for neutral
11
12
Experiments and Result(2/2) Opinion sentence
SVM (using unigram words as the features)
Topic extraction218 sentences are correctly extracted out
of 250 opinion sentences, with the accuracy of 87.2%.
Exectly match 12
1313
ConclusionProposed a rule-based approach to
extracting topics in opinion sentences.
Employ a syntactic parsing on sentences and take advantage of the syntactic roles of words and their dependency relationships to extract the topic.
1313
14
Future Work Negation words Noise filtering method Co-reference resolution Enlarge current rules to cover more
situations
15
HIT LTP system http://ir.hit.edu.cn/demo/ltp/ltp_v2.0.py
1616
Thank you!
1616