a fine-grained analysis of user-generated content to support decision making

55
A Fine‐Grained Analysis of User‐Generated Content to Support Decision Making Marcirio Silveira Chaves h/p://mchaves.wikidot.com Informa<on Systems Research Group Business and Informa<on Technology Research Centre (BITREC) Ins<tute for Scien<fic and Technological Research of Universidade Atlân<ca (ISTR) Workshop

Upload: marcirio-chaves

Post on 18-Dec-2014

459 views

Category:

Education


1 download

DESCRIPTION

User-generated content (UGC) such as online reviews is freely available in the web. This kind of data has been used to support clients’ and managerial decision-making in several industries, e.g. books, tourism, or hospitality. In this workshop, I will introduce a fine-grained characterisation of UGC and a new multidomain and multilingual conceptual data model to represent UGC. Moreover, I will present a domain-specific ontology for accommodations that can be also used to support managerial decision making and end-user applications. Instead of the few categories commonly provided by Web 2.0 portals, this ontology enables accommodation managers to find specific information. The ontology is also used as input for an algorithm to recognise sentiment in online reviews. Finally, I will describe some of the main approaches to deal with sentiment analysis. In short, I will address some of the main challenges of UGC introducing: a) A proposal for a fine-grained characterisation of UGC; b) A structured representation of UGC which leverages the information provided by the use of Web 2.0 applications; c) The main approaches to perform sentiment analysis; d) An ontology to represent knowledge in the accommodation sector.

TRANSCRIPT

Page 1: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

AFine‐GrainedAnalysisofUser‐GeneratedContenttoSupportDecisionMaking

MarcirioSilveiraChavesh/p://mchaves.wikidot.com

Informa<onSystemsResearchGroup

BusinessandInforma<onTechnologyResearchCentre(BITREC)Ins<tuteforScien<ficandTechnologicalResearchofUniversidadeAtlân<ca(ISTR)

Workshop

Page 2: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

User‐GeneratedContent(UGC)•  Asknownas

–  User‐GeneratedData–  User‐CreatedContent–  User‐ContributedData–  Consumer‐GeneratedMedia

–  …

•  Canbeexpressedthrought–  Opinions–  Reviews–  Comments–  Posts

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 2

• Notes:• Alltheexamplesdescribedinthisworkshoparerealdata.• Somepapersmen<onedhereareunderreview.• Colorlegend:

• Examples• Posi<vefeature• Nega<vefeature

Page 3: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

ExampleofUGC•  AnopinionpostedinFacebookDec‐10‐2011,12:30pm

– “wouldhighlyrecommendInfinityMotorcycles,Southamptonforallmotorbikinggear.Veryreasonablepeople.Earliertheygavemeafullmoneybackforaunused(a\erexplainingwhyitwasunused)ladiesmotorbikejacket(nodefectswhasoever)andtodaythezipperonmynewjacketwasbrokenandtheygavemeabrandnewone(noques<onsasked,noreceiptbusinessandnofusscreated).FiveStarservice.”

– Thisuserhad226friends.

Apr‐18‐12 3MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 4: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Somesta<s<csaboutUGC•  Morethan50%ofallinternetvisitsarenowtoUGC/socialmediasites.

•  Morethan75%of<mespentontheinternetis"social”.

•  Facebooknowcapturesasmuch<mespentontheinternetasGoogle,Yahoo,andAOL.

•  Morethan80%ofconsumersareinfluencedbySocialMarkeJng.

Source: http://www.bbrisco.com/2010/05/social.html

Apr‐18‐12 4MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 5: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

MainObjec<vesofthisWorkshop•  In‐depthanalysisofUGC

•  UseUGCtosupportdecisionmaking

•  StudyadomainontologytosupportAr<ficialIntelligencetasks

•  Addressapproachesforsen<mentanalysis

•  Fromtheorytoprac<ce:Hands‐onSessionApr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 5

Page 6: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

OutlinePart1

•  WorkshopContext

•  User‐GeneratedContent(UGC)

•  Characterisa<onofUGC•  KnowledgeEngineering‐

OntologyDevelopment

•  Hands‐onSession(IndividualTask):DealingwithUGC

Part2

•  Sen<mentAnalysis/OpinionMining

•  PolarityRecognizerinPortuguese(PIRPO)

•  Informa<onVisualisa<on

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 6

Page 7: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

ContextWorkshop

AframeworkforCustomerKnowledgeManagementbasedonSocialSeman<cWeb.

Chaves,MarcirioSilveira;Trojahn,CássiaandPedron,Cris<aneDrebes.AFrameworkforCustomerKnowledgeManagementbasedonSocialSeman<cWeb:AHotelSectorApproach.In:CustomerRela<onshipManagementandtheSocialandSeman<cWeb:EnablingCliensConexus.Colomo‐Palacios,R.;Varajão,J.andSoto‐Acosta,P.(Eds.).p.141‐157,Hershey,PA:IGIGlobal,2012.ISBN:978‐161‐35‐0044‐6

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 7

Page 8: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

AnFine‐grainedAnalysisofUGC•  OverallopinionaboutatopicisonlyapartoftheinformaJonofinterest.

•  Document‐levelsenJmentclassificaJonfailstodetectsen<mentaboutindividualaspectsofthetopic.Inreality,forexample,thoughonecouldbegenerallyhappyabouthiscar,hemightbedissaJsfiedbytheenginenoise.

•  Tothemanufacturers,theseindividualweaknessesandstrengthsareequallyimportanttoknow,orevenmorevaluablethantheoverallsa<sfac<onlevelofcustomers.(Tangetal.2009)

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 8

Page 9: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

UGC

AnopinionissimplyaposiJveornegaJvesenJment,view,aPtude,emoJon,or

appraisalaboutanenJtyoranaspectoftheenJty(HuandLiu,2004;Liu,2006)fromanopinionholder(Bethardetal.,2004;Kimand

Hovy,2004;Wiebeetal.,2005).

Apr‐18‐12 9MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 10: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC•  Opinion’sCharacterisa<on–  Iuseandextendthedefini<onproposedby(Dingetal.,2008;Liu,2010;Mar<nandWhite,2005)toanalysethesentencesofreviews.

– Letthereviewber.

–  Inthemostgeneralcase,rischaracterisedasasetofthefollowingelements{O,F,SO,H,S,A,R,I,SG},where:

Apr‐18‐12 10MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 11: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC•  Opinion’sCharacterisa<on– O:Object– F:Feature– SO:Seman<c‐Orienta<on– H:Holder– S:Source– A:A%tude– SG:Sugges.on– R:Recommenda.on–  I:Inten.on

Apr‐18‐12 11MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 12: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC1  ‐Object(O)– Anobjectisaproduct(e.g.movieandbook)oraservice(e.g.hotelandrestaurant)underreviewwhichiscomposedbyfeatures.– ObjectsarealsocalledenJJes.

2‐Feature(F)– Afeatureisacomponentorpartofanobject.•  actorandphotographyarefeaturesonamovie.•  poolandstaffarefeaturesonahotel.

– FeaturesarealsocalledaXributesorfacets.– Afeaturecanbemen<onedexplicitlyorimplicitlyinareview(Dingetal.2008).

Apr‐18‐12 12MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 13: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC2.1‐ExplicitFeature(F)–  Ifafeaturefappearsinreviewr,itiscalledanexplicitfeatureinr.

– Thehotelislocatedverynearthecentercity.•  loca<onisanexplicitfeature.

2.2‐ImplicitFeature(F):–  Iffdoesnotappearinrbutisimplied,itiscalledanimplicitfeatureinr.

– Hotelisfarfrompublictransporta<on.•  loca<onisanimplicitfeature.

Apr‐18‐12 13MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 14: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC3‐Sentence‐OrientaJon(SO)– Areviewconsistsofasequenceofsentencesr=⟨s1,s2,…,sm⟩(Dingetal.,2008).

– Asentencecanbeevaluatedasthefollowingperspec<ves:

Apr‐18‐12 14MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 15: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC3.1ObjecJvity– Anobjec<vesentencecontainsormenJonfacts.•  Thishotelisfarfromtheairport,ca.15km.

– Asubjec<vesentencedoesnotmenJonanyfact.•  Theparkingcouldbefree.

3.2Polarity–  ItdescribestheorientaJonpresentinasentence(i.e.posiJve,negaJve,neutralandirrelevant).

Apr‐18‐12 15MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 16: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC3.3Intensity(strengthofthepolarity)–  Itreferstothestrengthoftheprivatestatethatisbeingexpressed,inotherwords,howstrongisanemo<onoraconvic<onofbelief(Wilson,2008).

–  Itdescribeshowintenseitwastheexperienceusingaproductorservice:•  veryposiJve,posiJve,neutral,negaJveandverynegaJve.

•  Verykindlystaff.referstoaveryposi<veimpressiononthestaffservice.

Apr‐18‐12 16MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 17: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC4‐OpinionHolder(H)–  Theholderofapar<cularopinionisthepersonortheorganisaJonthatholdstheopinion(Dingetal.,2008).

– Aholderisiden<fiedwithdemographiccharacterisJcs(e.g.name,cityandcountry).

–  Sitessuchastripadvisor.comandbooking.comclassifyholdersastypesincluding:•  familieswitholderchildren

•  familieswithyoungchildren•  maturecouples

•  groupsoffriends•  solotravellers•  youngcouples

Apr‐18‐12 17MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 18: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Characterisa<onofUGC5–Source– Aninforma<onsourceisawebsitewhichprovidesasetofreviews.•  tripadvisor.com

•  booking.com•  amazon.com

•  A:A%tude

•  SG:Sugges.on•  R:Recommenda.on

•  I:Inten.onApr‐18‐12 18MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 19: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

OutlinePart1

•  WorkshopContext

•  User‐GeneratedContent(UGC)

•  Characterisa<onofUGC•  KnowledgeEngineering‐

OntologyDevelopment

•  Hands‐onSession(IndividualTask):DealingwithUGC

Part2

•  Sen<mentAnalysis/OpinionMining

•  PolarityRecognizerinPortuguese(PIRPO)

•  Informa<onVisualisa<on

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 19

Page 20: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Limita<onsforrepresen<ngknowledgeintheaccommoda<onsector

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 20

language?

Page 21: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Morelimita<ons•  Actually,webagentsareunabletoanswerques<onssuchas:– WhatarethehotelswithlongerindoorswimmingpoolJmetableinRoma?

– WhatarethehotelswiththecheapestbreakfastinLisbon?

– WhatarethecheapesthotelswithfamilysuiteroomwithseaviewinBarcelona?

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 21

Page 22: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

KnowledgeEngineering•  OntologyasasupporttoevaluateUGC– Setofconceptstoaspecificdomain

– Humanandmachinereadable– Supporttofine‐grainedanalysisoftheinstances(e.g.reviews)

– Hontology(Hstandsforhotel,hostalandhostel)•  Arobust,coherentandmul<lingualrepresenta<onoftheaccommoda<onsector.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 22

Page 23: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

ContextWorkshop

AframeworkforCustomerKnowledgeManagementbasedonSocialSeman<cWeb.

Chaves,MarcirioSilveira;Trojahn,CássiaandPedron,Cris<aneDrebes.AFrameworkforCustomerKnowledgeManagementbasedonSocialSeman<cWeb:AHotelSectorApproach.In:CustomerRela<onshipManagementandtheSocialandSeman<cWeb:EnablingCliensConexus.Colomo‐Palacios,R.;Varajão,J.andSoto‐Acosta,P.(Eds.).p.141‐157,Hershey,PA:IGIGlobal,2012.ISBN:978‐161‐35‐0044‐6

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 23

Page 24: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

KnowledgeEngineering•  DevelopmentMethodology

–  Iden<fyexis<ngontologiesonrelateddomains–  Selectthemainconceptsandproper<es–  Organizeconceptsandproper<eshierarchicallyintocategories–  Translatetheontology(manual)–  Expandconceptsandproper<esbasedoncomments–  Translatethenewconceptsandproper<es(manual)–  Generatetheontologyinseveralformats

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 24

Chaves,M.S.andTrojahn,C.TowardsaMulJlingualOntologyforOntology‐drivenContentMininginSocialWebSites.Proc.oftheISWC2010Workshops,VolumeI,1stInternaJonalWorkshoponCross‐CulturalandCross‐LingualAspectsoftheSemanJcWeb.Shanghai,China,November7th,2010.

Page 25: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

KnowledgeEngineering•  Hontology– AmulJlingualontologyfortheaccommodaJonsector.

•  DemoProtégé

Chaves,M.S.;Freitas,L.A.andVieira,R.(2012).Hontology:AmulJlingualontologyfortheaccommodaJonsector.4thInternaJonalConferenceonKnowledgeEngineeringandOntologyDevelopment,Barcelona,Spain,4‐7October.(SubmiXed)

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 25

Page 26: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

KnowledgeEngineering

PreliminaryHontologySta<s<cs

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 26

Metrics ValueNumberofConcepts 285NumberofObjectProper<es 10NumberofDataProper<es 31

ConceptAxiomsSubconceptaxioms 270Equivalentconceptsaxioms 4Disjointconceptsaxioms 93

ObjectPropertyAxiomsFunc<onalobjectpropertyaxioms 6Objectpropertydomainaxioms 11Objectpropertyrangeaxioms 8

DataPropertyAxiomsFunc<onaldatapropertyaxioms 12Objectdatadomainaxioms 17Objectdatarangeaxioms 1

Page 27: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Hands‐onSession•  Theaimofthishands‐onsessionistoallowyouthinking

in‐depthaboutUGConthecontextoftheaccommoda<onsector.

•  Youaregoingtoreceiveasetof4or5reviewsaboutaccommoda<onsandshouldevaluateeachoneaccordingtothefollowingparameters:–  Featurespresentinthereview(seetheconceptsofHontology)

–  Intensity(StrengthofthePolarity):(verynega<ve,nega<ve,neutral,posi<ve,veryposi<ve)

•  Notes:–  Evaluateonefeatureperline.–  Please,[email protected]:UB:GX

–  X=numberofthegroup.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 27

Page 28: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

OutlinePart1

•  WorkshopContext

•  User‐GeneratedContent(UGC)

•  Characterisa<onofUGC•  KnowledgeEngineering‐

OntologyDevelopment

•  Hands‐onSession(IndividualTask):DealingwithUGC

Part2

•  Sen<mentAnalysis/OpinionMining

•  PolarityRecognizerinPortuguese(PIRPO)

•  Informa<onVisualisa<on

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 28

Page 29: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  AnalysisandautomaJcextracJonofSemanJcOrientaJon

•  SemanJcorientaJonreferstothepolarityandstrengthofwords,phrases,ortexts.

•  Approaches–  Lexicon‐based

•  Dic<onariesofwordsannotatedwiththeword´sseman<corienta<on,orpolarity.

•  AmanuallybuiltdicJonaryprovidesasolidfoundaJonforalexicon‐basedapproach(Taboadaet.al.,2011).

–  StaJsJcalorMachine‐learning•  Supervisedclassifica<on

Apr‐18‐12 29MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 30: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  Lexicon‐basedApproach– Sen<ment‐bearingwords:alistofnouns,verbs,adjecJvesandadverbs(Chesleyetal.,2006)• useverbsandadjec<vestoclassifyEnglishopinionatedblogtexts.

– ListofconjuncJonsandconnecJves(Liu,2010).– Useofauxiliaryverbstogetfeaturesandopinion‐orientedwordsaboutproductsfromtexts(Khanetal.,2010).

Apr‐18‐12 30MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 31: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  Seedwords– areasmallsetofwordswithstrongnegaJveorposiJveassocia<ons,suchasexcellentorabysmal.

–  Inprinciple,aposi<veadjec<veshouldoccurmorefrequentlyalongsidetheposi<veseedwords,andthuswillobtainaposi<vescore,whereasnega<veadjec<veswilloccurmosto\eninthevicinityofnega<veseedwords,thusobtaininganega<vescore(Taboadaet.al.2011).•  Thisrestauranthasabadandexpensivefood.

Apr‐18‐12 31MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 32: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  Part‐of‐Speech(PoS)–  Inordertoevaluateasentenceinareview,weshouldconsidertheparts‐of‐speechmen<onedsuchasadjecJves,adverbsandverbs.

– Adjec<vesareclassifiedas:•  posi<ve(good,excellentandclean),•  nega<ve(awful,boringandterrible),•  neutral(regularandindifferent)and•  dual,whichcanexpressposi<veandnega<veopinion(small,long).

–  Insomeapproachesnounsarerepresentedbyconceptsofadomainontologyandmappedasfeatures.

Apr‐18‐12 32MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 33: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  ConjuncJonandConnecJve(CC)– Connec<vesarewordsthathelpiden<fyingaddiJonaladjecJveopinionwordsandtheirorientaJons.

– Oneoftheconstraintsisaboutconjunc<on(i.e.and),whichsaysthatconjoinedadjec<vesusuallyhavethesameorienta<on(Liu,2010).•  Thisroomisbeau<fulandspacious.

–  ifbeau<fulisknowntobeposi<ve,itcanbeinferredthatspaciousisalsoposi<ve.

– HeurisJc:•  PeopleusuallyexpressthesameopiniononbothsidesofaconjuncJon.

Apr‐18‐12 33MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 34: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  ConjuncJonandConnecJve(CC)– Rulesorconstraintsarealsodesignedforotherconnec<ves(e.g.or,but,either‐or,andneither‐nor).•  Thishotelisbeau<fulbutdifficulttogetthere.

–  Theoccurrencea\ertheconnec<vebutisanindicatorofanega<veopinion.

Apr‐18‐12 34MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 35: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  StrengthofthePolaJryorIntensityorIntensificaJon– Amplifiers(very,alot)increasetheseman<cintensityofaneighboringlexicalitem;

– AXenuators/Downtoners(ali/le,slightly)decreaseit.

•  SomeapproacheshaveimplementedintensifiersusingsimpleaddiJonandsubtracJon–  ifaposi<veadjec<vehasanSOvalueof2:•  anamplifiedadjec<vewouldhaveanSOvalueof3,and•  adowntonedadjec<veanSOvalueof1.

Apr‐18‐12 35MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 36: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Sen<mentAnalysis•  NegaJon– Theobviousapproachtonega<onissimplytoreversethepolarityofthelexicalitemnexttoanegator,changinggood(+3)intonotgood(−3).

– Not,none,nobody,never,andnothing,andotherwords,suchaswithoutorlack.

Apr‐18‐12 36MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 37: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PolarityRecognizerinPortuguese(PIRPO)•  PolarityRecognizerinPortuguesetoclassifysenJmentin

onlinereviews.

•  PIRPOwasbuiltfromthegroundtoPortugueseforrecognisingthepolarityoftheuseropiniononaccommoda<onreviews.

•  Eachreviewisanalysedaccordingtoconceptsfromadomainontology.

•  Wedecomposethereviewinsentencesinordertoassignapolaritytoeachconceptoftheontologyinthesentence.

Chaves,M.S.,Freitas,L.,Souza,M.andVieira,R.PIRPO:AnAlgorithmtodealwithPolarityinPortugueseOnlineReviewsfromtheAccommodaJonSector.17thInternaJonalconferenceonApplicaJonsofNaturalLanguageProcessingtoInformaJonSystems(NLDB),Groningen,TheNetherlands,26‐28June2012.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 37

Page 38: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PIRPOInforma<onArchitecture

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 38

Page 39: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PIRPO•  Reviews– Fulldataset:1500reviewsfromJanuary2010toApril2011inPortuguese,EnglishandSpanish,fromwhich180inPortuguese.

•  OntologyConcepts– TheconceptsusedtoclassifythereviewsareprovidedbyHontology,whichinitscurrentversion,has110concepts.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 39

Page 40: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PIRPO•  ListofadjecJves:Itiscomposedbysen<ment‐bearingwords.– ThislistofpolaradjecJvesinPortuguese•  contains30.322entries.•  iscomposedbythenameoftheadjecJveandapolaritywhichcanassignoneofthreevalues:+1,‐1and0.

•  ThesevaluescorrespondingtotheposiJve,negaJveandneutralsensesoftheadjec<ve.

– PIRPOusesthislisttocalculatethesemanJcorientaJonoftheconceptsfoundinthesentences.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 40

Page 41: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PIRPOAlgorithm

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 41

Page 42: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PIRPOMeasureEvalua<on

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 42

•  Precision

•  Recall

•  F‐score(harmonicmeanofprecisionandrecall)

P ={relevantConcepts}∩{retrievedConcepts}

{retrievedConcepts}

R ={relevantConcepts}∩{retrievedConcepts}

{relevantConcepts}

F = 2 × P × RP + R

Page 43: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PIRPOPreliminaryResults

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 43

Page 44: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

PIRPO:DiscussionontheResults•  PIRPOreachedabe/errecallforconceptswithposi<vepolarity,whilemixedpolarityhadahigherprecision.

•  ThelowF‐scorecanbemainlyduetothealgorithmhasassignedapolaritytoaspecificconceptoftheontology,whilethehumanclassifiedthereviewasawhole.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 44

Page 45: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

OutlinePart1

•  WorkshopContext

•  User‐GeneratedContent(UGC)

•  Characterisa<onofUGC•  KnowledgeEngineering‐

OntologyDevelopment

•  Hands‐onSession(IndividualTask):DealingwithUGC

Part2

•  KnowledgeEngineering‐ModellingUGC

•  Sen<mentAnalysis/OpinionMining

•  PolarityRecognizerinPortuguese(PIRPO)

•  Informa<onVisualisa<on

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 45

Page 46: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

ContextWorkshop

AframeworkforCustomerKnowledgeManagementbasedonSocialSeman<cWeb.

Chaves,MarcirioSilveira;Trojahn,CássiaandPedron,Cris<aneDrebes.AFrameworkforCustomerKnowledgeManagementbasedonSocialSeman<cWeb:AHotelSectorApproach.In:CustomerRela<onshipManagementandtheSocialandSeman<cWeb:EnablingCliensConexus.Colomo‐Palacios,R.;Varajão,J.andSoto‐Acosta,P.(Eds.).p.141‐157,Hershey,PA:IGIGlobal,2012.ISBN:978‐161‐35‐0044‐6

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 46

Page 47: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Informa<onVisualisa<on•  Whatisthevisualmodelofthepoten<alend‐user?

•  Howshouldweproperlymapandrender:–  themostvaluedaccommoda<onfeatures?

–  thepercep<onofthequalityofferedbythehotel?–  thecorrela<onbetweentheguest’sprofileandthemostlyrelevantfeatures?

–  theintensityoftheposi<vityornega<vityofthefeatures?

•  Doestheuseofadvancedvisualtechniques(suchastreeoriented)tomaptheresultswillhelptheaccommoda<onmanagersandgueststohaveabe/erinsightofthedata?

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 47

Page 48: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

ExploringInforma<onVisualisa<on•  Inthenextfigures

– ThecolorwasusedtomapthepolarityandthestrengthofthepolarityvaluesontheCO.

– ThesizewasusedtomapthefrequencythattheCOismen<onedinthereviews.

Apr‐18‐12 48MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 49: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

ExploringInforma<onVisualisa<on

Resultoftheapplica<onofBubbleTreevisualisaJonoftherela<onamongconceptsoftheontology,polarity(le\)and

strengthofthepolarity(right).

•  Carvalho,E.;Chaves,M.S.,2012.ExploringUser‐GeneratedDataVisualizaJonintheAccommodaJonSector.16thInternaJonalConferenceInformaJonVisualisaJon,IEEE.(SubmiXed)

Apr‐18‐12 49MarcirioChaves‐marcirioc@uatlan<ca.pt

Page 50: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

ExploringInforma<onVisualisa<on

Apr‐18‐12 50MarcirioChaves‐marcirioc@uatlan<ca.pt

ResultsusingTreemapvisualisaJonoftherela<onamongtypeofcustomer,conceptsoftheontologyandpolarity.

Page 51: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Ques<onnaire(inSpanish)•  Youaregoingtoreceiveaques<onnaireaboutinforma<onvisualisa<onusingUGCinthecontextoftheaccommoda<onsector.

•  Please,clickhereh/p://kwiksurveys.com?u=Infovisestoanswerit.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 51

Page 52: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

FinalRemarks•  In‐depthanalysisofUGCcanbeusedasinputtoimprovedecisionmaking.

•  Itis<metothinkaboutnewmodelstostoreUGCdata.

•  ItisnecessarythebuildingfromthegroundofnewalgorithmstodealwithUGCforlanguagesotherthanEnglish.

•  InformaJonvisualisaJonofUGCisinitsinfancystate.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 52

Page 53: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

MainReferences•  S.Bethard,H.Yu,A.Thornton,V.Hatzivassiloglou,andD.Jurafsky,2004.Automa<cextrac<onofopinionproposi<onsand

theirholders.inProceedingsoftheAAAISpringSymposiumonExploringA%tudeandAffectinText.•  Chesley,P.;Vincent,B.;Xu,L.andSrihariR.,2006.Usingverbsandadjec<vestoautoma<callyclassifyblogsen<ment.in

AAAISymposiumonComputa<onalApproachestoAnalysingWeblogs(AAAI‐CAAW),27–29.

•  Ding,X.,Liu,B.,andYu,P.S.,2008.Aholis<clexicon‐basedapproachtoopinionmining.ProceedingsoftheConferenceonWebSearchandWebDataMining(WSDM).

•  M.HuandB.Liu,2004.Miningopinionfeaturesincustomerreviews.InProceedingsofAAAI,pp.755–760.

•  S.‐M.KimandE.Hovy,2004.Determiningthesen<mentofopinions.InProceedingsoftheInterna.onalConferenceonComputa.onalLinguis.cs(COLING),2004.

•  Liu,Bing,2010.Sen<mentAnalysisandSubjec<vity.InHandbookofNaturalLanguageProcessing,SecondEdi<on,Eds:N.IndurkhyaandF.J.Damerau),CRCPress,TaylorandFrancisGroup,BocaRaton,FL.Chapter28.

•  Mar<n,J.R.andWhite,P.R.R.,2005.TheLanguageofEvalua<on,AppraisalinEnglish,PalgraveMacmillan,London&NewYork.

•  Taboada,M.,Brooke,J.,Tofiloski,M.,Voll,K.D.,Stede,M.,2011.Lexicon‐basedmethodsforsen<mentanalysis.Computa<onalLinguis<cs37(2),267–307.

•  Tang,H.,Tan,S.,Cheng,X.,2009.Asurveyonsen<mentdetec<onofreviews.ExpertSystemswithApplica<ons36(7),10760–10773.

•  Whitelaw,C.;Garg,N.andArgamon,S.,2005.Usingappraisalgroupsforsen<mentanalysis.InProceedingsofthe14thACMinterna<onalconferenceonInforma<onandknowledgemanagement(CIKM'05).ACM,NewYork,NY,USA,625‐631.

•  Wilson,T.,2008.Fine‐GrainedSubjec<vityAnalysis.PhDDisserta<on,IntelligentSystemsProgram,UniversityofPi/sburgh.

•  Wilson,T.,Wiebe,J.,Hoffmann,P.,2009.Recognizingcontextualpolarity:Anexplora<onoffeaturesforphrase‐levelsen<mentanalysis.Computa<onalLinguis<cs35,399–433.

•  Y.Wu,F.Wei,S.Liu,N.Au,W.Cui,H.Zhou,andH.Qu,2010.OpinionSeer:Interac<veVisualisa<onofHotelCustomerFeedback.IEEETransac<onsonVisualiza<onandComputerGraphics,6,1109‐1118.Nov‐Dec.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 53

Page 54: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Open‐sourcesen<ment‐analysistools• PythonNLTK(NaturalLanguageToolkit)–  h/p://www.nltk.organdh/p://text‐processing.com/demo/sen<ment

• R,TM(textmining)moduleh/p://cran.r‐project.org/web/packages/tm/index.html

• RapidMinerh/p://rapid‐i.com/content/view/184/196/

• GATE,theGeneralArchitectureforTextEngineeringh/p://gate.ac.uk/sen<ment

• UIMA‐plug‐inannotatorsforsen<ment—ApacheUIMAistheUnstructuredInforma<onManagementArchitecture,h/p://uima.apache.org/

•  SenJmentclassifiersfortheWEKAdata‐miningworkbench,h/p://www.cs.waikato.ac.nz/ml/weka/.

•  StanfordNLPtools‐h/p://www‐nlp.stanford.edu/so\waremaximum‐entropyclassifica<onapproachforsen<ment.

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 54

Page 55: A Fine-Grained Analysis of User-Generated Content to Support Decision Making

Thankyouverymuchforyoura/en<on!!

Ques<ons

Apr‐18‐12 MarcirioChaves‐marcirioc@uatlan<ca.pt 55