copyright 2009 by cebt meeting lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사...

18
Copyright 2009 by CEBT Meeting Lab. 이이 3 이 28( 이 )~29( 이 ) 이이 이이 이이이이 이이 & 이이이이 이이 이이 이이 이이이이이 이이이이이이 이이이 1 이 이이 이이 이이 이이 이이 이이 이이 이이 STFSSD 이이이이 이이 Semantic Tech & Context - 1

Upload: elfrieda-oliver

Post on 13-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Meeting

Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정

포장이사 견적 & 냉난방기 이전 설치 견적

정보과학회 데이터베이스 논문지 1 차 심사 완료 오타 수정

수식 설명 추가 요구

STFSSD 발표자료 작성

Semantic Tech & Context - 1

Page 2: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

A Holistic Approach toProduct Review Summarization

Jung-Yeon Yang, Jaeseok Myung, Sang-goo LeeDepartment of Computer Science and EngineeringSeoul National University

Center for E-Business TechnologySeoul National University

Seoul, Korea

Page 3: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT Semantic Tech & Context - 3

Outline

Introduction

Related Work

Motivation

Proposed Models

Process of a Review Summarization

Feature Extraction

Sentiment Analysis

Feature Scoring

Experiment

Conclusion & Future work

Page 4: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Product reviews Reviews contains users’ opinion about a product Many customers references others’ reviews when they buy some

products As a number of reviews increase, it is hard to read and grasp the

whole reviews

Review Summarization To know the whole opinions at a glance Show the evaluation of product

– Overall score about the product

– Score on each representative features

– An evaluation should be givenon each product feature

Opinion mining To find user’s opinion in a text To find representative features

Introduction

4

Page 5: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Related Work

Feature extraction frequencies of words a structural information of sentences in a review

Sentiment analysis Natural Language Processing (NLP)–based approach

– Using a word corpus (the WordNet or the SentiwordNet)

Computational Statistics-based approach– Using a Point-wise Mutual Information (PMI) between opinion words

Feature scoring Calculate an evaluation score of each feature

– Use a sentimental score that is from the WordNet or the SentiwordNet

– Use a rating score of a review document

FeatureExtraction

SentimentAnalysis

FeatureScoring

Re-viewDoc.

Re-viewDoc.

Sum-mary

5

Page 6: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Related Work (Cont.)

< Opinion Observer >Using NLP,

sentimental polarity summation

< RedOpal >Using rating score ,

based on a specific feature

< Pulse >Using Term frequencies,

Clustering

6

Page 7: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Motivation

Problems in previous work Workloads to extract features

– Many strategies and methods

Using a word corpus– Sentiment polarities are based on general usages of words

– It cannot deal with context-sensitive words (e.g. big, small, long, short, …)

Using a rating score of a review– In previous works, whole features that are extracted from the same re-

view has the same evaluation score

– Each features has to have a own evaluation score in every reviews

Challenges A dynamic and easy method to extract features is needed. (through

Tools) We want to find out a meaning of an opinion about a feature that is

modified by context-sensitive words A better way to scoring a product feature is needed.

7

Page 8: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Example: using user scores of re-views

Rat-ing

scoreSize Cost Design Utility

Shut-ter

speed

battery time

A/S color

★★★★★ O O

★★★★★ O O

★★★★★ O O O O

★★★★ O O O O

★★★★ O O O

★★★★ O O O O

★★★ O O

★★★ O O

★★★ O O

★ O O

★ O O

Bad

Good

5

5

4

5

5

5

5

5 5

4

4 4 4

4 4

4 4 4 4

3 3

3 3

3 3

1 1

1 1

8

Page 9: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Example: Considering sentimental polarities

Rat-ing

scoreSize Cost Design Utility

Shut-ter

speed

battery time

A/S color

★★★★★ O O

★★★★★ O O

★★★★★ O O O

★★★★ O O O O

★★★★ O O O

★★★★ O O O O

★★★ O O

★★★ O O

★★★ O O

★ O O

★ O O

Bad

Rating score : ★★★★The size of camera is good to hold in one hand and comfortable.a design is so cool, nice body!!. But battery time is short. So, in outdoor, additional batteries are needed.This camera is almost perfect!!

Good

9

Page 10: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Proposed Models

R1

f11 o11 st11 sp11 e11

f21 o21 st21 sp21 e21

fi1 oi1 sti1 spi1 ei1

fm1 om1 stm1 spm1 em1

……

us1Rj

f1j o1j st1j sp1j e1j

f2j o2j st2j sp2j e2j

fij oij stij spij eij

fmj omj stmj spmj emj

……

usjRn

f1n o1n st1n sp1n e1n

f2n o2n st2n sp2n e2n

fin oin stin spin ein

fmn omn stmn spmn emn

……

usn

… …

R : reviewus : user scoref : featureo : opinionst : strength of an opinion,sp : sentimental polarity of an opinione : evaluation score of a feature in a re-viewE : overall evaluation score of a feature

Rj

usj

fij

oij

fij

oij

fij

oij

stij

spij

stij

spij

stij

spij

eijeij

eij

Ei

Review Model

Review Summarization Model

10

Page 11: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Process of a Review Summariza-tion

ProductReviews

Featureextrac-

tion

Senti-ment

analysis

Featurescoring

Feature-opinionpairs

Extract fea-tures

Extract opinion word

POS tagger

Review parser

Classify senti-

ment po-larity

Pattern rulesWord fre-quency

SentimentDictionar-

ies

ConstructDictionariesautomati-

cally

Sentiment polarities of Fea-tures

Title

Main text

Re-viewer

Reviewdate

Rate

Featureco-occur-

renceFeature

frequency

Sentimentdistribution

Evaluation scores of product fea-

tures

Derive a score of feature

Review Summary

N-gram

11

Page 12: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Feature Extraction

PicAChoo (Pick And Choose; a text analyzing framework)

Reducing manual efforts to obtain feature and opinion words

Enabling dynamic composition of several extraction methods

– 4 primitive methods (freq., co-occurrence, sequential pattern, plug-in)

– 2 composite methods (logical & arithmetical methods)

Utilizing characteristics of textual data

documentsTokenized Document

Preprocessing

Composition of primitive extraction methods(freq. , co-occurrence, pattern-rules, …)

SelectedWords

Opinion Mining

Summarization

User Modeling

12

Page 13: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Find out sentimental polarities of opinions in reviews Consider a context of opinion word

SO=SA(opinion word, Product category, product feature, user’s evaluation)

Point-wise Mutual Information (PMI) A measure of association between two words

Sentiment Analysis

)()(

),(log),(

21

2121 wordpwordp

wordwordpwordwordPMI

.,),(),(

.,),(),(

cnegativeDiOFwhenofPMIofPMI

cpositiveDiOFwhenofPMIofPMI

neg

pos

),(),(),( ofPMIofPMIofnOrientatioSentiment negpos

ReviewDoc.

positive wordDictionary

negative wordDictionary

SentimentAnalysis

(feature,opinion)

(feature,opinion,polarity)

• Build automat-ically

• use user scores• POS-tagging Dic.={reviewID, catID, type, POS, word, userScore, s_no, w_no}

13

Page 14: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Feature Scoring Scoring strategies

Only use user score (in previous work)

Consider a distribution of sentimental polarities of user’s opinion

f1 f2 f3 f4 f5 f6 f7 … fn

R1 P P N P N

R2 P P P

R3 N P P P

R4 P N N P

R5 P P

R6 P P P N

R7 N P P P

Rn P N N

f1 ~ fn : features R1 ~ Rn : reviewsP : positive opinion N : negative opinion

• Use the distribution of sentimental polarities in the same review

• Calculate evaluation scores of each feature through the adjustment of rating scores

Summary = { E1, E2, … , Ei, … , Em } , m = Number of features

, n = Number of reviews that contain the ith feature 

= number of opinions in the jth review= number of positive opinions in the jth review= number of negative opinions in the jth review

F(fi, j) = frequency of fi in the jth reviewspij = Sentiment Polarity(fij, oij)

numbernegativeaisspifRNRNstus

numberpositiveaisspifRNRNstuse

ijjtotaljposijj

ijjtotaljnegijj

ij ,)()(1

,)()(1

n

eE

n

kik

i

1

),(122 jfFij

ist

)( jtotal RN)( jpos RN)( jneg RN

14

Page 15: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Experiments

Data

ePinions.com

Sentiment Analysis

Feature Scoring

Improvement of our method in comparison with a previous work

– about 20%

15

Productcate-gory

re-views

positive re-views

negative reviews

Prod-uct

feature

<f, o> pair

Context-sensitive

word

Hand phone 2947

2196 (74.5%)

418 (25.5%) 48 734 124 (16.9%)

Digital camera 12917

9940 (76.9%)

1740 (23.1%) 37 974 137 (14.1%)

Precision Our method Previous method (PMI using Web doc. Search)

AllContext-non-

sensitiveContext-sensi-

tive AllContext-non-

sensitiveContext-sensi-

tive

Hand phone 0.784 0.775 0.847 0.786 0.834 0.508

Digital camera 0.764 0.758 0.797 0.817 0.866 0.515

Page 16: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

Conclusion

Proposed the models

Product review model

Review summarization model

Proposed new approaches to summarize product reviews

Handle context-sensitive words in the sentiment analysis process

Feature scoring method

– Utilizing user scores and sentimental polarities of opinions

Develop a text analyzing framework for feature extraction

16

Page 17: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

uKnow iKnowFeature Extraction

<Feature, Opinion>Pairs

Opinion Extraction

FeatureScoring

ScoreSummarization

SentimentClause

Sentiment Analysis

Feature Score

ProductSummary

ProductRecommend

ProductComparison

weKnow

• NLP ap-proach

• use Parse Trees

• use the Sentiment Dictionary (defined by experts manually)

• find out Sentimental Polarities of Features

• derive scores of <Feature, Opinion> pairs

• Statistical ap-proach

• use Probabilities• use the POS tags

• use the Sentiment Dictionaries (constructed automatically)

• use Rating data of Reviews• use a PMI values between Feature

and Opinion• derive the sentimental polarities

• use Rating data of Re-views

• use frequencies of fea-tures

• use a distribution of sen-timents

• use the users’ pro-files

• use inputs from users

• use Comparative Objects

17

Page 18: Copyright 2009 by CEBT Meeting Lab. 이사 3 월 28( 토 )~29( 일 ) 잠정 예정 포장이사 견적 & 냉난방기 이전 설치 견적 정보과학회 데이터베이스 논문지

Copyright 2009 by CEBT

E-mail : [email protected]

Intelligent Database Systems Lab. : http://ids.snu.ac.kr

Thank youQ & A

18