inferring win–lose product network from user behavior · inferring win–lose product network...

8
Inferring Win–Lose Product Network from User Behavior Shuhei Iitsuka e University of Tokyo Hongo 7-3-1 Bunkyo, Tokyo, Japan [email protected] Kazuya Kawakami University of Oxford Wolfson Building, Parks Road Oxford, UK [email protected] Seigen Hagiwara Recruit Marketing Partners Co., Ltd. Kyobashi 2-1-3 Chuo, Tokyo, Japan [email protected] Takayoshi Kawakami Industrial Growth Platform, Inc. Marunouchi 1-9-2 Chiyoda, Tokyo, Japan [email protected] Takayuki Hamada IGPI Business Analytics & Intelligence, Inc. Marunouchi 1-9-2 Chiyoda, Tokyo, Japan [email protected] Yutaka Matsuo e University of Tokyo Hongo 7-3-1 Bunkyo, Tokyo, Japan [email protected] ABSTRACT Various data mining techniques to extract product relations have been examined, especially in the context of building intelligent recommender systems. Most such techniques, however, speci- cally examine co-occurrences of browsed or purchased products on e-commerce websites, which provide lile or no useful infor- mation related to the direct relation of superiority or the factor which forms that superiority. For marketers and product managers, understanding the competitive advantages of a given product is important to consolidate their product dierentiation strategies. As described in this paper, we propose a win–lose relation,a new product relation analysis method that retrieves the superiority relation between competitive products in terms of product arac- tiveness. Our proposed method uses the dierence between user browsing and purchasing behaviors, assuming that a purchased product is superior to products that are browsed but not purchased. We also propose superiority factor analysis to examine keywords that represent the superiority factor by mining product reviews. We evaluate our methods using an actual dataset from Zexy, the largest wedding portal website in Japan. Our experimental evalua- tion revealed that our proposed method can estimate actual user preferences observed from a user study using only log data. Results also show that our proposed method raises the accuracy of superi- ority factor extraction by around 17% by considering the win–lose relation of products. CCS CONCEPTS Information systems Web log analysis; Electronic commerce; Data mining; KEYWORDS product network, e-commerce, review mining Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. WI ’17, Leipzig, Germany © 2017 ACM. 978-1-4503-4951-2/17/08. . . $15.00 DOI: 10.1145/3106426.3106502 ACM Reference format: Shuhei Iitsuka, Kazuya Kawakami, Seigen Hagiwara, Takayoshi Kawakami, Takayuki Hamada, and Yutaka Matsuo. 2017. Inferring Win–Lose Product Network from User Behavior. In Proceedings of WI ’17, Leipzig, Germany, August 23-26, 2017, 8 pages. DOI: 10.1145/3106426.3106502 1 INTRODUCTION e e-commerce market is expanding along with the increasingly wider use of the internet. As the market expands widely, various data mining techniques are becoming used to extract useful infor- mation for e-commerce marketing. Starting from market basket analysis, extracting association rules of products has been studied extensively [9]. By combination with a network analysis approach, they enabled product marketers to extract interesting information in the form of product networks [2]. Most such methods, however, provide no information about their mutual superiority relations in terms of product aractiveness and the reason why the superiority is formed. Understanding the competitive advantages and relative benets of a product is important for marketers to consolidate their branding strategies and to dierentiate their products from those of their competitors. Nevertheless, few data mining techniques have been proposed to address this important problem. We propose a new product relation analysis method that speci- cally examines the superiority relation among substitutable prod- ucts, which we call the win–lose relation. Our proposed method uses the dierence between users’ browsing and purchasing be- haviors on e-commerce websites. e methodology is simple: we dene a win–lose relation from products that are browsed but not purchased to ones purchased by a given user. By aggregating the win–lose relations, one can dene the win–lose network of products, which provides an overview of their superiority relations. We also propose superiority factor analysis, which examines keywords that represent the superiority factor by mining product reviews. We present an overview of the proposed method in Figure 1 with the example of substitutable camera products. Each product has dierent selling points and competitors. e stylish design might be the selling point for a compact camera against its competitive compact cameras, although its size might be the point of appeal

Upload: others

Post on 11-Sep-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

Inferring Win–Lose Product Network from User BehaviorShuhei Iitsuka

�e University of TokyoHongo 7-3-1

Bunkyo, Tokyo, [email protected]

Kazuya KawakamiUniversity of Oxford

Wolfson Building, Parks RoadOxford, UK

[email protected]

Seigen HagiwaraRecruit Marketing Partners Co., Ltd.

Kyobashi 2-1-3Chuo, Tokyo, [email protected]

Takayoshi KawakamiIndustrial Growth Platform, Inc.

Marunouchi 1-9-2Chiyoda, Tokyo, [email protected]

Takayuki HamadaIGPI Business Analytics &

Intelligence, Inc.Marunouchi 1-9-2

Chiyoda, Tokyo, [email protected]

Yutaka Matsuo�e University of Tokyo

Hongo 7-3-1Bunkyo, Tokyo, Japan

[email protected]

ABSTRACTVarious data mining techniques to extract product relations havebeen examined, especially in the context of building intelligentrecommender systems. Most such techniques, however, speci�-cally examine co-occurrences of browsed or purchased productson e-commerce websites, which provide li�le or no useful infor-mation related to the direct relation of superiority or the factorwhich forms that superiority. For marketers and product managers,understanding the competitive advantages of a given product isimportant to consolidate their product di�erentiation strategies.

As described in this paper, we propose a win–lose relation, anew product relation analysis method that retrieves the superiorityrelation between competitive products in terms of product a�rac-tiveness. Our proposed method uses the di�erence between userbrowsing and purchasing behaviors, assuming that a purchasedproduct is superior to products that are browsed but not purchased.We also propose superiority factor analysis to examine keywordsthat represent the superiority factor by mining product reviews.We evaluate our methods using an actual dataset from Zexy, thelargest wedding portal website in Japan. Our experimental evalua-tion revealed that our proposed method can estimate actual userpreferences observed from a user study using only log data. Resultsalso show that our proposed method raises the accuracy of superi-ority factor extraction by around 17% by considering the win–loserelation of products.

CCS CONCEPTS•Information systems→Web log analysis; Electronic commerce;Data mining;

KEYWORDSproduct network, e-commerce, review mining

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permi�ed. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci�c permission and/or afee. Request permissions from [email protected] ’17, Leipzig, Germany© 2017 ACM. 978-1-4503-4951-2/17/08. . .$15.00DOI: 10.1145/3106426.3106502

ACM Reference format:Shuhei Iitsuka, Kazuya Kawakami, Seigen Hagiwara, Takayoshi Kawakami,Takayuki Hamada, and Yutaka Matsuo. 2017. Inferring Win–Lose ProductNetwork from User Behavior. In Proceedings of WI ’17, Leipzig, Germany,August 23-26, 2017, 8 pages.DOI: 10.1145/3106426.3106502

1 INTRODUCTION�e e-commerce market is expanding along with the increasinglywider use of the internet. As the market expands widely, variousdata mining techniques are becoming used to extract useful infor-mation for e-commerce marketing. Starting from market basketanalysis, extracting association rules of products has been studiedextensively [9]. By combination with a network analysis approach,they enabled product marketers to extract interesting informationin the form of product networks [2]. Most such methods, however,provide no information about their mutual superiority relations interms of product a�ractiveness and the reason why the superiorityis formed. Understanding the competitive advantages and relativebene�ts of a product is important for marketers to consolidate theirbranding strategies and to di�erentiate their products from those oftheir competitors. Nevertheless, few data mining techniques havebeen proposed to address this important problem.

We propose a new product relation analysis method that speci�-cally examines the superiority relation among substitutable prod-ucts, which we call the win–lose relation. Our proposed methoduses the di�erence between users’ browsing and purchasing be-haviors on e-commerce websites. �e methodology is simple: wede�ne a win–lose relation from products that are browsed but notpurchased to ones purchased by a given user. By aggregating thewin–lose relations, one can de�ne the win–lose network of products,which provides an overview of their superiority relations. We alsopropose superiority factor analysis, which examines keywords thatrepresent the superiority factor by mining product reviews.

We present an overview of the proposed method in Figure 1 withthe example of substitutable camera products. Each product hasdi�erent selling points and competitors. �e stylish design mightbe the selling point for a compact camera against its competitivecompact cameras, although its size might be the point of appeal

Page 2: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

WI ’17, August 23-26, 2017, Leipzig, Germany Iitsuka et al.

browse

purchase

stylish,modern

compact,lightwin-lose

relation

win-lose network

superiority factor

browse

purchase

win-loserelation

user

aggregationuser

Figure 1: Overview of win–lose relation and superiority fac-tor.

against professional cameras. Our proposed method enables visual-ization of the dynamics of competition and keywords that representselling points against a speci�c competitor.

In analysis results, we used an actual dataset from the largestJapanese wedding portal website, Zexy1, which is associated withthousands of wedding venues in Japan. Assuming the weddingvenues as products, we analyzed the competitive and win–loserelations and visualized them in the form of a network. Usingreviews of the wedding venues, we also conducted superiorityfactor analysis and examined the superiority factor.

For experimental evaluation, we evaluated that our proposedmethod can make a good estimation of the actual user behaviorand preference extracted from Zexy’s user survey results. Resultsshowed a signi�cant correlation between them, which means thatour proposedmethod can estimate the actual win–lose relation fromlog data as long as we accept the de�nition of a win–lose relationwe propose in this paper. We also demonstrated that our proposedmethod can infer superiority factor keywords by improving accu-racy by around 17% compared to the baseline method, which doesnot consider win–lose relation.

�e contributions of this paper are summarized as follows.• We proposed a new data mining method to analyze a su-

periority relation in terms of product a�ractiveness. E-commerce website owners can estimate users’ preferencesfrom log data, which enables them to plan e�ective mar-keting and promotion strategies.

• We proposed a text mining method to analyze the superior-ity factor using information from product reviews. Productmanagers can use this for their di�erentiation strategies.Researchers can introduce widely diverse natural languageprocessing methods for additional and sophisticated inves-tigations.

• We evaluated log data as a substitute for user survey re-sults in terms of estimating the actual superior relationsof products and the superiority factor. Product managersand e-commerce website owners can bene�t by saving thecosts of conducting user surveys, which can be huge costsinvolving outlays for questionnaire distribution and datacollection.

1Zexy h�p://zexy.net/

�e remainder of this paper is organized as explained below. Weintroduce related works in Section 2 and our proposed method inSection 3. Section 4 introduces some application examples with anactual dataset. Section 5 describes the evaluation experiment. A�erthe discussion presented in Section 6, we conclude this paper inSection 7.

2 RELATEDWORKSProduct relations have been studied extensively by both microe-conomics and data mining research communities. In the �eld ofmicroeconomics, understanding of product relations is regardedas necessary for product managers to make marketing mix deci-sions [6]. In consumer theory, product relations are categorizedinto two kinds: substitute or complementary. A mobile device, forexample, manufactured by another brand is a substitute productfor a mobile device, although a mobile charger is a complementaryproduct. Product substitutability and complementarity have longbeen common means of perceiving product relations [10]. �is ideahas been applied to widely diverse markets to ascertain consumerbehavior [12].

Product relation analysis has been imported to e-commerce mar-keting, especially for implementation of advanced recommendersystems. Zheng implemented a recommender system to meet userneeds in di�erent purchase stages considering product substitutabil-ity and complementarity [13]. McAuley de�ned co-purchased prod-ucts as complementary products and co-browsed products as sub-stitute products, and proposed a method to infer a network of com-plementary and substitute products, which is useful to generatecontext-relevant recommendations [7]. Product relation analysishave been examined to improve e-commercemarketing as describedabove, but few studies have speci�cally examined directional andadversarial relation between substitute products, which can be use-ful to plan product branding and di�erentiation strategy amongcompetitors. Win–lose relations are a new form of product networkthat examines the superiority relation between substitute productsby using both browsing and purchasing behaviors on e-commercewebsites.

Several studies have proposed the use of product networks for e-commerce marketing. Hao et al. de�ned product networks based ontheir association rule and visualized them on a spherical surface [2].Various characteristics of the product network have been studiedby application of network analysis methods and by assuming itas a social network [8, 9]. Zinoviev proposed a method to dis-cover users’ contexts underlying purchasing behavior from productnetworks [14]. Product networks have been studied from variousaspects as described above, but few researchers have examined thedirected network of a superiority relation in product a�ractivenesssuch as a win–lose relation.

Review analysis is a major research �eld related to e-commercemarketing [11]. Especially, review summarization and review se-lection have been studied extensively to extract useful informationfrom thousands of user reviews to help users make good deci-sions [3, 5]. Archak et al. proposed a method to extract importantproduct features from product reviews by particularly addressing

Page 3: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany

the included noun phrases and associated adjectives [1]. �at pro-posal inspired us to formalize the superiority factor analysis, whichuses noun phrases included in product reviews.

3 PROPOSED METHOD3.1 Formalization of Product RelationsWhen users visit an e-commerce website and consider purchasingproducts, they might browse multiple products to make compar-isons. �is tendency might become most apparent when they aremaking large purchases. We can de�ne a subset of products browsedby the given user. Also, we can assume that the member productsare in competition because they are in consideration for purchaseby the same user who has speci�c needs. �erefore, we assume thatthese products share a competitive relation when they are browsedsimultaneously by the same user.

As a result of the transaction, products are separable into threecategories: products not browsed, products browsed but not pur-chased, and products purchased. �e products browsed but notpurchased are explicitly declined by the user as a result of consider-ation, which means that they are inferior to the purchased productsin terms of product a�ractiveness. �erefore, we can de�ne a su-periority relation of product a�ractiveness among these products,naming the products browsed but not purchased as loser productsand designating the purchased products as winner products. Wedesignate this directed relation as a win–lose relation, connectingloser products to winner products. We de�ne no win–lose relationamong loser products or winner products.

Consider a set of products P handled on an e-commerce websiteand a user u ∈ U who browses a set of products Pbrowse

u ⊂ P

and purchases a set of products Ppurchaseu ⊂ P . In this case,

we can de�ne winner products as a set of purchased productsPwinu = P

purchaseu and loser products as a subtraction of purchased

products from browsed products P loseu = Pbrowseu \P

purchaseu . �e

competitive relation is de�ned among browsed products, whereasthe win–lose relation is de�ned from any loser product to any win-ner product. �erefore, the set of competitive relations for user uis denoted as RCu = {{pi ,pj }|∀pi ,pj ∈ Pbrowse

u , i , j}. �e set ofwin–lose relations is RWL

u = {(pi ,pj ) |∀pi ∈ Pwinu ,∀pj ∈ P

loseu }.

Assuming that an e-commerce website is handling a set of prod-ucts P = {p1, · · · ,p5}, for example, and assuming that a user pur-chased a subset of products Ppurchaseu = {p1,p2} a�er browsinga subset of products Pbrowse

u = {p1,p2,p3,p4}, then the winnerproduct set is Pwin

u = {p1,p2}. �e loser product set is P loseu =

{p3,p4}, which presents six competitive relations of RCu = {{p1,p2},{p1,p3}, {p1,p4}, {p2,p3}, {p2,p4}, {p3,p4}} and four win–lose rela-tions RWL

u = {(p1,p3), (p1,p4), (p2,p3), (p2,p4)}.

3.2 Product NetworkVisualizing the product relation as a network is useful to survey thecharacteristics of their competitive relations. By applying networkanalysis methods to this problem, one might identify clusters ofcompetition or an interesting set of win–lose circulating relationssimilar to rock–paper–scissors. In this subsection, we introducethe competitive network, which represents competitive relations

Table 1: Example of product relations.

User u Loser P loseu Winner Pwinu

u1 pB pCu2 pA,pB pCu3 pC pB

(a) Competitive network. (b) Win–lose network.

Figure 2: Example of a product network.

between products and the win–lose network, which represents su-periority relations in product a�ractiveness.

�e competitive network is an undirected graph GC = (P ,RC ),which consists of the nodes of products p ∈ P and the edges ofcompetitive relation RC =

⋃u ∈U RCu . We de�ne that the weight

of the edge between node pi and node pj is represented as thenumber of times that the competitive relation is de�ned betweenthem, i.e., wC

i j = |{u ∈ U |{pi ,pj } ∈ RCu }|. However, the win–losenetwork is a directed graph GWL = (P ,RWL ), which consists ofthe nodes of products p ∈ P and the edges of win–lose relationRWL =

⋃u ∈U RWL

u . Similarly, the weight of the edge from nodepi to node pj is the number of times that the win–lose relation isde�ned in that direction, i.e.,wWL

ij = |{u ∈ U |(pi ,pj ) ∈ RWLu }|.

For example, we assume three users who perceive their own loserproducts and winner products as described in Table 1. In this case,the competitive network and the win–lose network are expressed asshown in Figure 2. �e edge thickness re�ects its weight. �e nodesize represents the number of users who purchased the product.

3.3 Superiority Factor AnalysisInterpreting the product relation is useful for e-commerce market-ing activities, but marketers might also want to clarify the decisivefactors which di�erentiate their own products from those of speci�ccompetitors. E-commerce website owners can use this informa-tion for recommendation by considering a given user’s preference.Product managers might conceive a di�erentiation plan for theirnext product design. Understanding the superiority factors mightenable them to ascertain which strategy is be�er: reinforcing theirstrength or repairing their weaknesses. We introduce the superi-ority factor analysis method, which extracts keywords, or factorwords, that represent superiority factors to speci�c competitorsusing product reviews.

A simple means of extracting the factor word is calculatingthe term frequency of noun phrases that represent the productcharacteristics in product reviews. We can also use some techniquesto �lter out common terms or stop words. �ese techniques might

Page 4: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

WI ’17, August 23-26, 2017, Leipzig, Germany Iitsuka et al.

・・・

・・・

・・・

・・・

・・・

Morphological Analysis

Top K tf-idf Score

Factor WordsNoun PhrasesAggregated Reviews

Users

PurchasedProducts

Product Reviews

Set of Noun Phrases fromAll Product Reviews

purchase

Figure 3: Overview of superiority factor analysis method.

be helpful to elucidate the product’s general selling points in themarket, but our proposed method di�ers from them in the pointof retrieving the superiority factors against a speci�c competitorusing a directed win–lose relation among them. We assume thateach user has speci�c needs and preferences that are satis�ed bythe purchased products. Standing on that assumption, the reviewsof products purchased by a set of users who support a speci�c win–lose relation might represent the point of view which constructedthe win–lose relation. �erefore, we extract the factor words of aspeci�c win–lose relation by application of a keyword extractionmethod to the reviews of products purchased by a user group thatsupports the win–lose relation.

We present an overview of the superiority factor analysis in Fig-ure 3 with an example of extracting the factor words between win-ner product pi and loser product pj . We �rst de�ne the set of usersfor whom winner products include product pi and loser productsinclude product pj . We denote such users, who support the win–lose relation pi � pj , as Upi �pj = {u ∈ U |pi ∈ Pwin

u ,pj ∈ P loseu }.Subsequently, we aggregate the product reviews of their purchasedproducts, denoted as Ppi �pj = {p ∈ P

purchaseu |u ∈ Upi �pj }, to pro-

duce a single aggregated review. We apply morphological analysisto the aggregated review and process it into a set of noun phrases.

Finally, we calculate the tf–idf score of every noun phrase in theprocessed review text by taking all product reviews as the corpus.We extract top K noun phrases with the highest tf–idf score asthe factor words of the given win–lose relation. �e tf–idf scoreof word w is calculated as a product of the term frequency t fwand the inverse document frequency id fw . �e term frequencyt fw denotes the frequency at which the given wordw appears inthe review text in interest. �e inverse document frequency id fwis calculated as id fw = log(N /Nw ), where N denotes the totalnumber of product reviews and Nw denotes the number of productreviews that include the wordw .

4 ANALYSIS RESULTSWe apply our proposed method to the largest Japanese weddingportal website, Zexy, and present results to demonstrate the useof our proposed method with the actual dataset. Our proposedmethod works be�er when users browse many products for com-parison and make careful considerations for purchase because ourproposed method relies on di�erences between users’ browsing and

purchasing behaviors. �e more products are browsed, the moreproduct relations are de�ned, which makes the analysis more so-phisticated. Wedding venues might �t to our analysis method wellbecause reserving a venue is an important contract that demandscareful consideration for most people. �erefore, we selected thiswebsite as the subject of this analysis. Zexy is not an e-commercewebsite in the strict sense, but we assume that it is in a broadersense because users take conversion actions such as making reser-vations for wedding venue tours and sending inquiries as a resultof comparison on this website. We apply our proposed methodby assuming browsing activity on wedding venues as browsingbehavior and by assuming making reservations for venue tours aspurchasing behavior.

�e dataset comprises access log data collected during January1, 2012 – October 31, 2012. Each record consists of an anonymizedsession ID, the browsed page URL which can be decoded as ID of thewedding venue, and the �ag which indicates that the reservation fora venue tour is completed on that venue. Hereina�er, we assumethis anonymized session ID as the identi�er of users. During thistime period, we found that around several million sessions arebrowsing the wedding venues, and that several tens of thousandsessions are making reservations. For analysis, we use only thosesessions which include browsing and reservation actions.

Figure 4 presents the competitive network of wedding venuesthroughout Japan. Each node represents a wedding venue. Eachedge represents their mutual competitive relation. �e node sizedenotes the number of times the given product is purchased. �eedge thickness represents the weight of the edge. �e node colorshows the region of Japan in which the wedding venue is located.For simplicity, we removed nodes from the illustration if its numberof browsed times is less than a threshold. As a result, the �gureshows approximately 1,000 nodes and 75,000 edges. �e network islaid out based on a kinetic model, which locates connected nodescloser, simulating the spring force between them and unconnectednodes simulating the magnetic force [4]. As this �gure shows, theyform some clusters of competition. �e color segments match wellwith the cluster segments. We ascertain that the competition takesplace primarily in each region.

Figure 5 portrays the win–lose network of 10 selected weddingvenues (pA, · · · ,p J ) in Tokyo. We selected these popular wedding

Page 5: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany

Figure 4: Competitive network of wedding venues through-out Japan.

SetagayaBunkyoEbisuAoyamaTokyo StationRoppongiOdaiba

Figure 5: Win–lose network of selected wedding venues inTokyo (colored by area).

venues because they are more likely to have su�cient data for anal-ysis than other venues. Results show that edges among weddingvenues pA,pC ,pE , and pH are especially bold, which means thatthey share a strong competitive relation. When we speci�callyexamine the win–lose relation from wedding venue pH to weddingvenuepA, the outbound edge is much thicker than the inbound edge,which means that wedding venue pH is inferior to wedding venuepA in terms of product a�ractiveness. �erefore, the e-commercewebsite owner can expect that users tend to make conversion ac-tions on wedding venue pA rather than wedding venue pH whenthey are browsed during the same session.

Finally, we conducted superiority factor analysis of weddingvenues pA,pH , and p J , which have large node sizes and a competi-tive relation among the 10 selected wedding venues. We use severalthousand product reviews for the wedding venues in Tokyo as thecorpus collected on the website. We treat every product review as adocument and calculated tf–idf values of included noun phrases and

Table 2: Factor words of wedding venue pH against weddingvenue pA and p J .

Against pA Against p Jceremony Japanese dish (kaiseki)garden Japanese-style roombanquet ceremonysolemnity gardenphotograph bus

extracted factor words. We conducted morphological analysis ofreview texts in Japanese using Mecab2, which separates a sentenceinto a list of words with part-of-speech tagging information.

Table 2 shows top �ve factor words of wedding venue pH againstwedding venues pA and p J . Results show that product pH has dif-ferent superiority factors against each competitor. Against weddingvenue pA, garden and banquet arise as decisive factors. Both wed-ding venues pA and pH o�er Japanese traditional style weddings,but wedding venue pH apparently has superiority in terms of thischaracteristic. In addition, the photograph service quality is oneimportant feature that users who prefer wedding venue pH cannotcompromise. Against wedding venuep J , Japanese dish (kaiseki) andJapanese-style room come to the top of the list. Because weddingvenue p J is a hotel, the Japanese-style ceremony itself seems to bethe strength of wedding venue pH against wedding venue p J . �etransportation service also seems to be regarded as a decisive factorof product a�ractiveness of wedding venue pH . As described above,the superiority factor analysis enables us to capture the strength ofthe given product against a speci�c competitor in the form of thelist of keywords.

5 EVALUATION EXPERIMENTWe evaluate our proposed method as useful for e-commerce mar-keting by comparing user perceptions, as estimated using mininglog data, with actual perceptions investigated by conducting a usersurvey. Our objective is to extract user perception of the superi-ority relation in product a�ractiveness and its decisive factors inan e-commerce website. We conducted a user survey of coupleswho used Zexy for the reservation for venue tours and who held aceremony at one of the candidate venues for investigating actualuser perception, which are answered explicitly through the survey.

We conduct two experiments in this section. �e �rst experimentis evaluation of the correlation between the product networks: oneis extracted from the user survey results. �e other is calculatedfrom the log data. We infer that the log data can be a good alterna-tive of survey results in terms of extracting products’ competitiverelations and win–lose relations, which have been de�ned in thispaper. �e other experiment evaluates the number of matches offactor words. We presume a baseline method that outputs factorwords using only winner product review texts and compare thenumber of matches to the actual factor words extracted from theuser survey results.

2Mecab h�p://taku910.github.io/mecab/

Page 6: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

WI ’17, August 23-26, 2017, Leipzig, Germany Iitsuka et al.

Table 3: Correspondence between the user survey results Dand log data D.

User survey D Log data DUsers Responders (N=173) Visitors (N=202)

Browsing A�ending tour BrowsingPurchasing Holding a ceremony Making a reservation

Text Reason for selection Product review

5.1 Experimental SetupIn both experiments, we use Zexy as the objective website andspeci�cally examine the 10 wedding venues selected in Section4. �e user survey was administered from January 23, 2012 toDecember 14, 2013, targeting the respondents who booked multiplewedding venue tours on Zexy and who �nally held a ceremonyby selecting one of them. �e survey asks respondents about thewedding venue at which they held the wedding ceremony andthe reason they ultimately selected that wedding venue as a resultof that comparison. �e user survey results can be regarded asrepresenting user preferences for wedding venue selection becausethey are explicit responses by users. Evaluating the methods usingmultiple datasets is preferred, but we speci�cally examine thiswebsite in this paper because it is rare to have a large-scale e-commerce website accompanied by an o�ine user survey thatstudies actual users’ preferences.

Generally speaking, the user survey setup entails huge costsexempli�ed by questionnaire distribution and data collection. How-ever, the log data are recorded implicitly by observing user activities,which does not impose any burden on users to input informationmanually. �erefore, estimating the product relation from the logdata instead of conducting actual user surveys is valuable. As de-scribed in this paper, we designate the product relation extractedfrom the user survey results as the actual product relation and thefactor words as actual factor words. We evaluate whether we canestimate them, or not, by application of our proposed methods tothe log data.

Table 3 presents correspondence between the user survey resultsand the log data. We use the sessions from log data showing bothbrowsing and purchasing behavior for the target 10 wedding venues,i.e., the sessions contribute to form a win–lose relation amongthem. �e browsing behavior on the website can correspond tothe o�ine behavior to a�end the wedding venue tours. Makingreservations for venue tours on the website can correspond to theo�ine behavior to decide the wedding venue at which to holdthe wedding ceremony eventually. In addition, the review textscan correspond to survey responses explaining why they selectedthe wedding venue. According to these correspondences, we canextract the product relation by application of our proposed methodon the user survey results as well.

5.2 Evaluation of Product Relation AnalysisFirst, we evaluate the correlation of the competitive relations. Wecalculate the estimated weight wC from the log data D and theactual weightwC from the user survey results D for 45 competitiverelations between the given 10 venues. Hereina�er, we assume that

the relation between the two products is determined independentlyof other products’ relations. Results revealed a Pearson correlationcoe�cient of 0.685 with a p-value of 2.06 × 10−7, which is a signi�-cant correlation. Actually, 6 out of 45 pairs had a value of zero ineither the actual weight or the estimated weight, but the remainderof them had positive weights in both. �e results demonstrated thatour proposed method yields good estimation of the competitiverelation among the products.

Second, we evaluate the correlation of the win–lose relations.We can extract 90 win–lose relations from the 10 venues because thewin–lose relations have directions unlike the competitive relations.Along with the competitive relation analysis, we assume that therelations between the two products are determined independentlyfrom others. We calculated the estimated weight wWL from the logdata D and calculated the actual weightwWL from the user surveyresults D. Results show that the Pearson correlation coe�cient is0.648, with a p-value of 5.02 × 10−12, which indicates a signi�cantcorrelation between them. Actually, 20 out of 90 pairs had a zerovalue in either the actual weight or the estimated weight, but theremainder of them had positive weights in both. Results show thatour proposed method is making a good estimation of the win–loserelation among the products.

5.3 Evaluation of Superiority Factor AnalysisFinally, we evaluate the performance of the superiority factor anal-ysis. To begin with, we explain the extraction method of the actualfactor words of wedding venue pi against wedding venue pj fromthe user survey results. We select the responses of which responderselected wedding venue pi to hold the wedding ceremony a�era�ending a wedding venue tour of wedding venue pj , i.e., the re-sponses which perceive product pi as the winner and product pj asthe loser. We aggregate their reason for selection into one text andconduct morphological analysis of it to process it into a set of nounphrases. We assume this set as the actual factor words Tpi �pj .

We calculate the estimated factor words using our proposedmethod Tproposedpi �pj by following the same procedure introduced inSection 3, which extracts top K keywords by taking the productreviews of all wedding venues in Tokyo as the corpus. For evalua-tion of the e�ectiveness of our proposed method, we set a baselinemethod to compare which does not consider any win–lose relation.�e baseline method assumes the keywords extracted from theproduct review of the winner product pi as the estimated factorwords Tbaselinepi , no ma�er which the loser product is. In otherwords, it simply assumes that the characteristics of the winnerproduct is the decisive factors of the win–lose relation for any loserproduct. �is method also uses the same corpus as the proposedmethod and returns the top K words with the highest tf–idf score.

�is experiment evaluates the performance of the analysis meth-ods using the number of matches between the actual factor wordsTpi �pj and estimated factor words using the proposed methodTproposedpi �pj and the baseline methodTbaselinepi for the win–lose pair(pi ,pj ). We use these metrics assuming that the actual factor wordsare capturing the aspects and characteristics users care most. Weset K = 20 for this experiment. Additionally, we skip evaluationof the pair if the number of actual factor words is less than 5 or if

Page 7: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany

Table 4: Factor words of product pD against product pG .Matched words are highlighted in bold.

Method Factor WordsBaselineTbaselinepD

map, cathedral, forbidden, she, Akka, order, im-pression, problem, standard, cloud, stainedglass, church, European, minute, overall, ex-change, movie, ring, Omotesando, bringing (3matches)

ProposedTproposedpD �pG

chapel, hospitality, ceremony, guest, feel-ing, day, impression, sta�, stained glass,banquet, dish, atmosphere, church, lo-cation, San, photograph, lovely, venue,Omotesando, weddings (13 matches)

the number of estimated factor words does not meet K by eithermethod.

Results show that 45 out of 90 pairs satis�ed the criteria de-scribed above. Among them, 23 pairs showed that our proposedmethod outperforms the baseline method. In all, 21 pairs showed atie. �e remaining one pair showed that the baseline method out-performed our proposed method. In all, 1548 actual factor wordswere extracted and 284 estimated factor words matched our pro-posed method, whereas 230 factor words matched with the baselinemethod. As a result of the chi-square test of independence, wefound a statistically signi�cant di�erence between the match ratesof the two groups with the p-value of 3.09%. �erefore, our pro-posed method is be�er than the baseline method at estimating theactual factor words from log data.

Table 4 presents the estimated factor words of wedding venuepDagainst wedding venue pG , which demonstrates that our proposedmethod succeeded in estimating the actual factor words with thegreatest discrepancy in terms of the match ratio. Against 68 actualfactor words, our proposed method estimated 13 words, whereasthe baseline method matched 3 words. Wedding venue pD knows too�er Western style weddings. �e baseline method can capture thatfeature with keywords such as church and stained glass. However,this is a well-known point of appeal with this wedding venue, whichis not such an interesting �nding. In contrast, our proposed methodcaptured another aspect of the characteristics such as hospitality,atmosphere, and sta�. By reviewing these words, we ascertainthat their customer services and hospitality are also evaluated bycustomers when it is compared to wedding venue pG .

Experimental results show that the baseline method can extractfeatures of the purchased product, but it cannot extract featuresabout which users care in a broader sense. Sometimes, it becomestoo speci�c to the product and sometimes becomes too general toexplain the decisive factor. Our proposed method, however, canextract the purchased product’s speci�c characteristics and canextract important features that satisfy users for whom preferencessupport the givenwin–lose relation. Our proposedmethod can o�erappropriate factor words by widening the scope of considerationfrom the purchased product to users’ preferences, which contributeto formation of the product relations.

6 DISCUSSIONWe compared the product relations calculated from the log dataand the user survey results, which presumably re�ects the actualperceived superiority relation among products. Results show thattheir competitive and win–lose relations are mutually correlated.�ese results show that our proposed method can capture the actualusers’ perceptions by analyzing log data in the form of productrelations de�ned in this paper. �erefore, these results also showthat log data can be a good alternative of the user survey. Forthat reason, we might save the cost for conducting user surveys,which entails expensive processing work such as questionnairedistribution and collection, and data inputs.

We also evaluated the superiority factor analysis method us-ing the actual dataset. Experimental results show that our pro-posed method provides good estimation of the actual factor wordsextracted from the user survey results than the baseline methodwhich speci�cally examines the purchased products’ review textand which does not consider win–lose relations of products. �ere-fore, our proposed method can extract the features perceived asimportant by patrons from product reviews by considering win–lose relations among the products.

Our proposed method speci�cally examines the keywords anddoes not capture the sentiment on features, which might representnegative opinions on the factor words. Nevertheless, extractingfactor words is useful to capture aspects of the product or servicethat customers perceive as important. It will be a helpful hint whenmarketers dig into the product reviews closely. We used a com-bination of simple approaches aiming to validate the existence offactor words, but the possibility exists that we can conduct moresophisticated analyses that consider the polarity of the sentence.Additionally, the keyword extraction may be improved by consid-ering the reviews related to Ppi �pj in comparison to ones relatedto Ppj �pi .

If the e-commerce website is handling multiple categories ofproducts, then we might require some technique to build productnetworks within each category to avoid extracting meaninglesscross-category relations such as “camera versus detergent”. Some e-commerce websites such as consumer-to-consumer commerce andonline auctions might have multiple sellers o�ering di�erent pricesfor one product. In this case, the a�ractiveness of each o�eringcan be included as a noise factor in the product network. Ourproposed method is a generalized method that is compatible withany e-commerce website as long as log data are available, but itwill work be�er with marketplaces for which the law of one priceholds.

Additionally, it is noteworthy that competitive and win–lose re-lations are de�ned between substitute products, not between com-plementary products. �is assumption holds on Zexy because it is awedding portal website. All handled products are wedding venues,which are inherently competitors. However, generic e-commercewebsites such as Amazon3 handle complementary products withina category (e.g., a laptop and its ba�ery in the electronics category).Our proposed method is directly applicable to some e-commercewebsites that are specialized for speci�c products, but some tech-nique might be necessary to identify and �lter complementary

3Amazon h�ps://www.amazon.com/

Page 8: Inferring Win–Lose Product Network from User Behavior · Inferring Win–Lose Product Network from User Behavior WI ’17, August 23-26, 2017, Leipzig, Germany the included noun

WI ’17, August 23-26, 2017, Leipzig, Germany Iitsuka et al.

products when we apply this method to a generic e-commercewebsite.

As seen in some win–lose relations such as between weddingvenues pA, pC , and pE in Figure 5, there can be arrows of compa-rable thickness going in either direction. We can regard them ascancelled, but it also means that a comparable number of usersperceive the win–lose relation in either direction. If we can captureuser clusters that separate the con�icting win–lose relation well,then the information will be useful to capture multiple dynamicsof product competition, which di�er depending on user charac-teristics. �is information is also useful for product marketers toelucidate the characteristics of their already convinced customersand those to whom they must appeal.

�e possibility exists that the promoted products might tendto be the winner product during that period if there is a promo-tional campaign on the objective e-commerce website. However, ifusers visit to the promoted items directly without considering otherproducts, then the behavior does not a�ect the original win–loserelation of products because there is no loser product in such acase. Similarly, user might purchase a product in the e-commercewebsite because they o�er a lower price than their competitorsaside from product a�ractiveness. In that case, it is likely that usershave already decided the item to purchase in their mind when theyvisit the website. �erefore, other products are less likely to bebrowsed, which means there is li�le e�ect on the product network.Our proposed method has robust characteristics to temporal eventssuch as promotional marketing campaigns and the di�erence ofselling price across e-commerce websites.

For cases in which numerous products are handled in the website,we can think of some techniques to remove products that are notpurchased much or merge similar products. �ese techniques willbe useful to avoid making the product network sparse. For casesin which very few products are handled, the possibility exists thatthe product network becomes too dense and di�cult for readers tograsp an overview. We can think of a technique to �lter edges withsmall weight.

If there are too many users who belong to the analysis, then itis possible to extract various product relations, but the possibil-ity exists that various user preferences are mixed to the productrelation, which makes it di�cult to analyze their underlying su-periority factors. In this case, we might be able to categorize theusers beforehand based on their a�ributes and be able to conductproduct relation analysis for each user category to extract more so-phisticated and actionable insights. However, our proposed methodfunctions well even if there are few users because it can scale upthe data volume using users’ browsing behavior in addition topurchasing behavior.

7 CONCLUSIONWe proposed a new product relation analysis method that uses thelog data of e-commerce websites to reveal the superiority relationin product a�ractiveness. Our proposed method also analyzes thesuperiority factor using text data associated with the products. Weapplied our analytical method to a Japanese wedding portal website,Zexy, to extract the product network and superiority factors fromtheir actual dataset. In the evaluation experiment, we compared

product networks extracted from the website log data and usersurvey results. Experimental results showed that the weights ofboth product networks are correlated with statistical con�dence,which demonstrates that the log data can be a good alternativeof the user survey for extracting user preferences. Results showthat our proposed superiority factor analysis method can estimatethe actual decisive factors of product a�ractiveness, which is ex-tracted from user survey results. Our proposed method is bene�cialfor marketers and product managers to understand the competi-tive advantages of a given product and consolidate their productdi�erentiation strategies.

REFERENCES[1] Nikolay Archak, Anindya Ghose, and Panagiotis G Ipeirotis. 2007. Show me the

money!: deriving the pricing power of product features by mining consumerreviews. In Proceedings of the 13th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining. ACM, New York, NY, USA, 56–65.

[2] Ming C Hao, Umeshwar Dayal, Meichun Hsu, �omas Sprenger, and Markus HGross. 2001. Visualization of directed associations in e-commerce transaction data.Springer, Vienna.

[3] Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. InProceedings of the Tenth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining. ACM, New York, NY, USA, 168–177.

[4] Mathieu Jacomy, Sebastien Heymann, Tommaso Venturini, and Mathieu Bas-tian. 2011. Forceatlas2, a continuous graph layout algorithm for handy networkvisualization. Medialab Center of Research.

[5] �eodoros Lappas, Mark Crovella, and Evimaria Terzi. 2012. Selecting a char-acteristic set of reviews. In Proceedings of the 18th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA,832–840.

[6] James M La�in and Leigh McAlister. 1985. Using a variety-seeking model toidentify substitute and complementary relationships among competing products.Journal of Marketing Research 22, 3 (1985), 330–339.

[7] Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networksof substitutable and complementary products. In Proceedings of the 21st ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,New York, NY, USA, 785–794.

[8] Troy Raeder and Nitesh V Chawla. 2009. Modeling a store’s product space as asocial network. In Proceedings of International Conference on Advances in SocialNetwork Analysis and Mining. IEEE Computer Society, Washington, DC, USA,164–169.

[9] Troy Raeder and Nitesh V Chawla. 2011. Market basket analysis with networks.Social Network Analysis and Mining 1, 2 (2011), 97–113.

[10] Allan D Shocker, Barry L Bayus, and Namwoon Kim. 2004. Product complementsand substitutes in the real world: the relevance of other products. Journal ofMarketing 68, 1 (2004), 28–40.

[11] Zhiang Wu, Youquan Wang, Yaqiong Wang, Junjie Wu, Jie Cao, and Lu Zhang.2015. Spammers detection from product reviews: a hybrid model. In Proceedingsof the IEEE International Conference on Data Mining. IEEE Computer Society,Washington, DC, USA, 1039–1044.

[12] Jiao Xu, Chris Forman, Jun B Kim, and Koert Van I�ersum. 2014. News mediachannels: complements or substitutes? Evidence from mobile phone usage.Journal of Marketing 78, 4 (2014), 97–112.

[13] Jiaqian Zheng, Xiaoyuan Wu, Junyu Niu, and Alvaro Bolivar. 2009. Substitutesor complements: another step forward in recommendations. In Proceedings ofthe Tenth ACM Conference on Electronic Commerce. ACM, New York, NY, USA,139–146.

[14] Dmitry Zinoviev, Zhen Zhu, and Kate Li. 2015. Building mini-categories inproduct networks. In Complex Networks VI. Vol. 597. Springer, Cham, 179–190.