data driven approaches in user experience analysis: customer...

저 시-비 리- 경 지 2.0 한민

는 아래 조건 르는 경 에 한하여 게

l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.

다 과 같 조건 라야 합니다:

l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.

l 저 터 허가를 면 러한 조건들 적 되지 않습니다.

저 에 른 리는 내 에 하여 향 지 않습니다.

것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.

Disclaimer

저 시. 하는 원저 를 시하여야 합니다.

비 리. 하는 저 물 리 목적 할 수 없습니다.

경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/legalcode

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/

공학박사학위논문

Data Driven Approaches in User Experience

Analysis: Customer-voice classification, User

segmentation and Design elements selection

데이터 분석 방법론 기반 사용자 경험 디자인: 사용자 요구사항

분류, 사용자 세그멘테이션 및 디자인 요소 선정

2019 년 2 월

서울대학교 대학원

산업공학과

이 영 훈

Data Driven Approaches in User Experience

Analysis: Customer-voice classification, User

segmentation and Design elements selection

데이터 분석 방법론 기반 사용자 경험 디자인: 사용자

요구사항 분류, 사용자 세그멘테이션 및 디자인 요소 선정

지도교수 조 성 준

이 논문을 공학박사 학위논문으로 제출함

2018 년 11 월

서울대학교 대학원

산업공학과

이 영 훈

이영훈의 공학박사 학위논문을 인준함

2018 년 12 월

위 원 장 윤 명 환 (인)

부위원장 조 성 준 (인)

위 원 박 우 진 (인)

위 원 정 재 윤 (인)

위 원 홍 지 영 (인)

Abstract

Data Driven Approaches in User ExperienceAnalysis: Customer-voice classification, Usersegmentation and Design elements selection

Younghoon Lee

Department of Industrial Engineering

The Graduate School

Seoul National University

In this thesis, data driven approaches in user experience analysis is proposed. Even

if lots of studies from both academia and industry are tried to propose various tech-

niques to improve the user experience of smartphone, there are few problems since

it is usually performed heuristically by user experience designer. The objective of

this study is to effectively address those problems and it focuses on three subjects in

the whole user experience design process: 1) Customer-voice classification, 2) User

segmentation and 3) Design elements selection. First, this study proposes advanced

document de-nosing method and representation for an effective document classifica-

tion task that is appropriate for the customer-voice data to address the limitation

of inefficiency of previous manual classification. Second, this study proposes a novel

way of user segmentation method utilizing app usage sequence of real users to ad-

dress the problem of limited utilizing sources. Last, this study proposes two design

elements selection methods for help contents re-organization and product attribute

i

prioritization with high-end deep learning techniques to deal with the previous lim-

itations of not considering the users needs and characteristics. With the meaningful

results of this thesis, it is concluded that data driven approaches effectively addresses

the previous problems cause by heuristic approaches. And it can provide meaningful

insights to several UI designers regarding customer-voice analysis, user segmenta-

tion, product development or layout design. Future studies can extend the scope of

researches based on this study for other tasks in the whole user experience design

process.

And this thesis published in the SCI/SCIE/SSCI journals of:

Lee, Y., Cho, S., Choi, J. (2018). De-noising documents with a novelty detection

method utilizing class vectors. Intelligent Data Analysis, 22(4), 717-733.

Lee, Y., Park, I., Cho, S., Choi, J. (2018). Smartphone user segmentation based

on app usage sequence with neural networks. Telematics and Informatics, 35(2), 329-

339.

Lee, Y., Im, J., Cho, S., Choi, J. (2018). Applying convolution filter to matrix

of word-clustering based document representation. Neurocomputing, 315, 210-220.

Lee, Y., Chung, M., Cho, S., Choi, J. (2019). Extraction of Product Evalua-

tion Factors with a Convolutional Neural Network and Transfer Learning. Neural

Processing Letters, 1-16.

Lee, Y., Song, S., Cho, S., Choi, J. (2019). Document representation based on

probabilistic word clustering in customer-voice classification. Pattern Analysis and

Applications, (Accepted).

Lee, Y., Cho, S., Choi, J. (2019). Smartphone help contents re-organization

considering user specification via conditional GAN. International Journal of Human-

ii

Computer Studies, (Accepted).

Keywords: User experience, Data analysis, Document classification, User segmen-

tation, Design elements selection

Student Number: 2016-30254

iii

Contents

Abstract i

Contents viii

List of Tables x

List of Figures xii

Chapter 1 Introduction 1

Chapter 2 Literature Review 7

2.1 Traditional approaches for analysis of user experience design . . . . . 7

2.1.1 Focus group discussion . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Personal interview . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.3 Quantitative approaches . . . . . . . . . . . . . . . . . . . . . 9

2.2 Related studies on document classification . . . . . . . . . . . . . . . 11

2.2.1 Document classification method . . . . . . . . . . . . . . . . . 11

2.2.2 Word-clustering based document representation method . . . 12

2.2.3 Novelty detection in the textual domain . . . . . . . . . . . . 13

2.3 Related studies on user segmentation . . . . . . . . . . . . . . . . . 14

2.4 Related studies on product attributes prioritization . . . . . . . . . . 16

v

2.5 Related studies on help system improvements . . . . . . . . . . . . . 17

2.5.1 Help system user interface . . . . . . . . . . . . . . . . . . . . 17

2.5.2 User specification . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6 Review on related architecture . . . . . . . . . . . . . . . . . . . . . 19

2.6.1 Probabilistic clustering method . . . . . . . . . . . . . . . . . 19

2.6.2 Neural embedding architecture . . . . . . . . . . . . . . . . . 21

2.6.3 Variational auto-encoder and Neural variational document model 22

2.6.4 t-distributed stochastic neighbor embedding (t-SNE) . . . . . 23

2.6.5 Seq2seq architecture . . . . . . . . . . . . . . . . . . . . . . . 24

2.6.6 Louvain method . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6.7 Explainable machine learning algorithms . . . . . . . . . . . . 26

2.6.8 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.6.9 Conditional GAN . . . . . . . . . . . . . . . . . . . . . . . . . 28

Chapter 3 Customer-voice classification 31

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.1 De-nosing documents . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.2 Probabilistic word clustering based document representation 38

3.2.3 Word-clustering based document representation with VAE and

its probabilistic version . . . . . . . . . . . . . . . . . . . . . 44

3.2.4 Matrix representation of word-clustering based document rep-

resentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.5 Applying convolution filter to matrix representation . . . . . 47

3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

vi

3.3.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3.2 Experiments setup . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3.3 Experiments results . . . . . . . . . . . . . . . . . . . . . . . 54

Chapter 4 User segmentation 63

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.1 Variant of the seq2seq based approach . . . . . . . . . . . . . 65

4.2.2 App clustering and relative similarity-based segmentation . . 69

4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74




Chapter 5 Design elements selection 83

5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.2.1 Prioritization of product attributes . . . . . . . . . . . . . . . 85

5.2.2 Help contents re-organization . . . . . . . . . . . . . . . . . . 90

5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94




Chapter 6 Conclusion 105

vii

Bibliography 109

국문초록 129

감사의 글 131

viii

List of Tables

Table 3.1 Word list located closest to the centroid . . . . . . . . . . . . 39

Table 3.2 Word list located far from the centroid . . . . . . . . . . . . . 39

Table 3.3 Customer-voice dataset . . . . . . . . . . . . . . . . . . . . . 51

Table 3.4 Words with lowest novelty score . . . . . . . . . . . . . . . . . 55

Table 3.5 Words with highest novelty score . . . . . . . . . . . . . . . . 55

Table 3.6 Accuracy of classification performance (*: Proposed method) 57

Table 3.7 Accuracy of classification performance of customer-voice data 60

Table 3.8 Example of representation interpretation . . . . . . . . . . . . 61

Table 4.1 User segmentation results obtained by domain experts. . . . . 77

Table 4.2 Comparison of the similarities between the segmentations ob-

tained by each method and the answer set (*: proposed method,

(c): utilizing cosine distance, (m): utilizing mahalanobis dis-

tance). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Table 4.3 Example of representation interpretation. . . . . . . . . . . . 82

Table 5.1 Structure of convolutional neural network for aspect extraction 87

Table 5.2 Examples of keywords in the same cluster . . . . . . . . . . . 90

Table 5.3 Baselines utilized in first experiment . . . . . . . . . . . . . . 98

Table 5.4 Performance of attributes extraction and prioritization (NDGC)101

ix

Table 5.5 Examples of extracted attributes . . . . . . . . . . . . . . . . 102

Table 5.6 Result of effectiveness comparison . . . . . . . . . . . . . . . 102

Table 5.7 Confusion matrix of help contents usage prediction . . . . . . 103

Table 5.8 Average of help contents selection for top-k prediction . . . . 103

x

List of Figures

Figure 1.1 Process of smartphone user experience design . . . . . . . . 2

Figure 2.1 Original seq2seq architecture . . . . . . . . . . . . . . . . . . 25

Figure 2.2 Example of Grad CAM image . . . . . . . . . . . . . . . . . 27

Figure 3.1 Summary of customer-voice data analysis process . . . . . . 32

Figure 3.2 Scope of proposed approaches . . . . . . . . . . . . . . . . . 32

Figure 3.3 Limitation of the previously stated novelty detection method 34

Figure 3.4 Advantage of proposed novelty detection method . . . . . . 35

Figure 3.5 Document representation based on probabilistic word clustering 40

Figure 3.6 The Reason for rearranging the each representation . . . . . 48

Figure 3.7 Preserve semantic distance . . . . . . . . . . . . . . . . . . . 50

Figure 3.8 One-to-one correspondence . . . . . . . . . . . . . . . . . . . 50

Figure 3.9 Rearrange the elements . . . . . . . . . . . . . . . . . . . . . 50

Figure 3.10 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 3.11 Neural embedding based word clustering [61, 127] . . . . . . 58

Figure 3.12 Probabilistic word clustering based approach [72] . . . . . . 58

Figure 3.13 Topic vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 3.14 LSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 3.15 Accuracy of classification performance . . . . . . . . . . . . 58

xi

Figure 3.16 Accuracy of classification performance of customer-voice data 61

Figure 4.1 Summary of our proposed method . . . . . . . . . . . . . . . 65

Figure 4.2 Variant of the seq2seq architecture (our proposed architecture) 67

Figure 4.3 Determination of user segmentation . . . . . . . . . . . . . . 69

Figure 4.4 Summary of app clustering-based user representation. . . . . 70

Figure 4.5 Comparison between actual and predicted segmentation results 72

Figure 4.6 Summary of our proposed method for considering relative

similarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Figure 4.7 Example of the app usage sequence. . . . . . . . . . . . . . . 75

Figure 4.8 Example of user segmentation by domain experts . . . . . . 76

Figure 4.9 Example of app clustering. . . . . . . . . . . . . . . . . . . . 78

Figure 4.10 User network construction. . . . . . . . . . . . . . . . . . . . 79

Figure 5.1 Example of smartphone help system . . . . . . . . . . . . . 85


Figure 5.3 Example of weight visualization . . . . . . . . . . . . . . . . 89


Figure 5.5 Preprocessing of help usage data . . . . . . . . . . . . . . . . 92

Figure 5.6 CGAN architecture for help usage prediction . . . . . . . . . 93

Figure 5.7 Example of help contents re-organization . . . . . . . . . . . 94

Figure 5.8 Spec sheet of LG V30 (Resource: GSM Arena) . . . . . . . . 95

xii

Chapter 1

Introduction

User experience (UX) is an experience that consists of all aspects of users’ interac-

tions with a certain product or service [98]. Since the revolutionary success of Apple,

the competitive advantage of most Information & Communication Technology (ICT)

products and services in the contemporary market is now gained from the domain

of UX beyond the functionality and efficiency especially in smartphone [20].

Researchers from both academia and industry propose lots of techniques to im-

prove the user experience of smartphone in each design step (Figure 1.1). In the

concept building step, 1) Trend research to answer the question ‘Which trends are

there or will there be in the future? And which of these are relevant to us?’, 2) User

segmentation to maximize the value of each customer to the business, and 3) Focus

group interview are performed to derive the new insight and concept for the future

UX. In the validation step, 1) Prototyping, and 2)Acceptance testing to evaluate the

concept’s compliance with the user requirements and assess whether it is acceptable

for delivery are carried out to validate the derived concept. In the design step, 1)

Define common guideline to govern the various UX design of each application, 2)

Element selection for layout and flow design, and 3) Graphic design to wear the

graphic element is performed for final UX design of the device [67, 35].

1

Figure 1.1: Process of smartphone user experience design

However, in most companies, user experience design is performed heuristically

by individual designer, thus there are a few problems associated with it. The first

problem relates to the lack of consistency. It is performed by several individuals,

and thus results of tasks vary with individuals. Thus, additional steps are required

for correcting inconsistencies. The second problem relates to resource management.

Those tasks needs to be carried by a domain expert with a great amount of back-

ground knowledge in the field. Not only is it a highly time-consuming task but it

will be expensive to find an adequate domain expert.

The objective of this study is to effectively address the issues listed above and

apply the data driven approach in user experience design. In details, this study

focuses on three research scopes of the whole process of user experience design stage:

Customer-voice classification, User segmentation and Design elements selection as

illustrated in the Figure 1.1.

With respect to customer-voice analysis, as the classification of customer-voice

data is performed manually, there are a few problems associated with it. The first

problem relates to the lack of consistency. Classification tasks are performed by sev-

2

eral individuals, and thus, results vary with individuals. Thus, additional steps may

be required for correcting inconsistencies. The second problem relates to the time-

consuming nature of the classification. In some cases, it may be necessary to respond

to customer voice urgently, especially when the issue is related to quality assurance.

The time consumed by the classification task delays customer-voice analysis and

requires an immediate response. The last problem relates to resource management.

Unnecessary allocation of human resources to a classification task may lead to a

shortage of human resources for more important tasks, which will not be helpful for

optimizing human resource management.

Thus, this study focused on building an automatic classifier for customer-voice

data and newly proposes an advanced document representation method that is ap-

propriate for customer-voice data. The customer-voice data used in this study were

obtained from various channels, including phones, e-mails, or websites, and the data

were stored in a text document. Thus, customer-voice analysis starts with the docu-

ment classification which allows it to be delivered to relevant departments and also

provides overall information on customer-voice distribution according to function.

In details, this study proposes 1) document de-noising method to clear the raw

documents, 2) probabilistic word clustering based document representation method

to provide interpretability of document and 3) another novel method to applying

convolution filter to document representation to increase the classification perfor-

mance.

In the user segmentation step, there are several limitations in previous ap-

proaches that are based on demographics and reported usage. First, they are inher-

ently subjective and prone to skewing by observers and participants. Second, these

3

studies were predominantly performed heuristically with persons who already have

extensive domain knowledge and background information about the smartphone in-

dustry by carrying out user segmentation with limited information. Therefore, the

user segmentation tasks based on previous studies are costly and time consuming

because they require participants who report their usage data to be gathered, and

domain experts are invited to analyze the participants’ reported usage [95, 27, 144].

Thus, this study proposes novel ways of segmenting smartphone users based on

app usage sequences collected from real smartphone logs. Hundreds of applications

are often installed in users smartphone, and a log of their application usage is a

powerful resource for user segmentation because it contains meaningful information

regarding the user’s preferences, behaviors and interests. In details, we proposed

two novel ways to segment users: 1) Variant of the seq2seq architecture based ap-

proach, and 2) App clustering and relative similarity based approach to provide

interpretability to user segmentation results.

Finally, according to element selection, most of the previous studies on devel-

oping the evaluation or purchasing factor were heuristically performed by those,

who already have comprehensive domain knowledge and background information of

the product industry, and were expensive and time-consuming. These studies were

mostly based on existing studies and focus group interviews with a few participants.

Thus, they were likely to skew and could not catch the latest improvement on smart-

phone products, which are one of the most rapidly changing devices in the industry.

And, with respect to contents selection such as help system, they often provide the

same help contents to all users without considering individual user’s persona and

characteristic. This causes the user to question the effectiveness of the help system

4

and results in reduced frequency of using the help system.

Thus, this study deals with two subjects: 1) Prioritization of product attributes

with Convolutional Neural Network (CNN) based aspect extraction method, and 2)

Contents re-organization method based on conditional Generative Adversarial Net-

work (GAN). In the product attribute prioritization, this study newly proposes an

aspect extraction method combine the Convolutional Neural Network and transfer

learning. Additionally we utilized the explainable neural network such to calcu-

late the relative importance of each product attributes. And in the contents re-

organization, this study proposes a new method of re-organizing help content by

considering each user’s interests and preferences using their app usage sequence.

The remainder of this paper is structured as follows: Section 2 discusses various

studies on each subject and other algorithms that are utilized herein; Section 3

proposes the few algorithms for document classification; Section 4 newly presents

two user segmentation methods, Section 5 proposes two small subjects regarding to

design elements selection and Section 6 provides the conclusions and discussion, as

well as the directions for future work.

5

Chapter 2

Literature Review

2.1 Traditional approaches for analysis of user experience

design

2.1.1 Focus group discussion

A focus group discussion (FGD) is a good way to gather together people from

similar backgrounds or experiences to discuss a specific topic of interest. The group

of participants is guided by a moderator or facilitator who introduces topics for

discussion and helps the group to participate in a lively and natural discussion

amongst themselves.

It is utilized in various UX design process such as user segmentation or ideation.

The strength of FGD relies on allowing the participants to agree or disagree with each

other so that it provides an insight into how a group thinks about an issue, about

the range of opinion and ideas, and the inconsistencies and variation that exists in

a particular community in terms of beliefs and their experiences and practices.

FGD can be used to explore the meanings of survey findings that cannot be

explained statistically, the range of opinions/views on a topic of interest and to

collect a wide variety of local terms. In bridging research and policy, FGD can be

useful in providing an insight into different opinions among different parties involved

7

in the change process, thus enabling the process to be managed more smoothly. It

is also a good method to employ prior to designing questionnaires.

FGD sessions need to be prepared carefully through identifying the main objec-

tive of the meeting, developing key questions, developing an agenda, and planning

how to record the session. The next step is to identify and invite suitable discussion

participants; the ideal number is between six and eight.

The crucial element of FGD is the facilitation. Some important points to bear in

mind in facilitating FGDs are to ensure even participation, careful wording of the

key questions, maintaining a neutral attitude and appearance, and summarizing the

session to reflect the opinions evenly and fairly. A detailed report should be prepared

after the session is finished. Any observations during the session should be noted and

included in the report.

2.1.2 Personal interview

A personal interview survey, also called as a face-to-face survey, is a survey method

that is utilized when a specific target population is involved. The purpose of con-

ducting a personal interview survey is to explore the responses of the people to

gather more and deeper information.

Personal interview surveys are used to probe the answers of the respondents and

at the same time, to observe the behavior of the respondents, either individually or

as a group. The personal interview method is preferred by researchers for a couple of

advantages. But before choosing this method for your own survey, you also have to

read about the disadvantages of conducting personal interview surveys. In addition,

you must be able to understand the types of personal or face-to-face surveys.

8

It is also utilized in the various steps in UX design process similar to FGD. One of

the main reasons why researchers achieve good response rates through this method

is the face-to-face nature of the personal interview survey. Unlike administering

questionnaires, people are more likely to readily answer live questions about the

subject simply because they can actually see, touch, feel or even taste the product.

If designer wish to probe the answers of the respondents, they may do so using a

personal interview approach. Open-ended questions are more tolerated through in-

terviews due to the fact that the respondents would be more convenient at expressing

their long answers orally than in writing.

2.1.3 Quantitative approaches

There are various quantitative approaches utilized in UX design process such as

usability testing, A/B Testing, Eyetracking or Questionnaires. Although not used as

often, quantitative usability testing is a lot like qualitative usability testing — users

are asked to perform realistic tasks using a product. The primary difference between

the two is that qual usability testing prioritizes observations, like identifying usability

issues. In contrast, quantitavie usability testing is focused on collecting metrics like

time on task or success. Once designer have collected those metrics with a relatively

large sample size, they can use them to track the progress of your product’s usability

over time, or compare it to the usability of your competitors’ products. The type of

usability testing you choose (in-person, remote moderated, or remote unmoderated)

will impact the cost and difficultly associated with this method. Since the goals of

quantitative and qualitative usability studies are different, the structure of the test

and the tasks used will need to be different as well.

9

While designer can use analytics metrics to monitor your product’s performance,

they can also create experiments that detect how different UI designs change those

metrics — either through A/B testing or multivariate testing. In A/B testing, teams

create two different live versions of the same UI, and then show each version to

different users to see which version performs best. Multivariate testing is similar,

but involves testing several design elements at once. For example, the test could

involve different button labels, typography, and placement on the page. Both of

these analytics-based experiments are great for deciding among different variations

of the same design — and can put an end to team disputes about which version is

best. A major downside to this methodology is that it’s often abused. Some teams

fail to run the tests as long as they should, and make risky decisions based on small

numbers.

Eyetracking studies require special equipment that tracks users’ eyes as they

move across an interface. When many participants perform the same task on the

same interface, meaningful trends start to emerge and designer can tell, with some

reliability, which elements of the page will attract people’s attention. Eyetracking

can help them identify which interface and content elements need to be emphasized

or de-emphasized, to enable users to reach their goals. A major obstacle to running

eyetracking studies is the highly specialized, prohibitively expensive, and somewhat

unstable equipment that requires lots of training to use.

10

2.2 Related studies on document classification

2.2.1 Document classification method

Document representation is a key step in the document classification problem. This

section reviews the major document representation methods. Many text and senti-

ment classifiers are still solely based on different sets of words contained in docu-

ments, such as the bag-of-words or bag-of-n-grams approaches, and do not consider

sentence and discourse structure or meaning. It is a straightforward method and

provides an intuitive interpretation. However, these approaches are limited when

a large number of documents are involved. It could have high dimensionality and

sparsity to measure the proximity between documents [68, 142].

Latent Semantic Analysis (LSA) [30], probabilistic Latent Semantic Analysis

(pLSA) [15], and a more comprehensive method based on Latent Dirichlet Allocation

(LDA) were suggested [9] to reduce dimensionality and select more discriminative

features. However, these techniques could lose the innate interpretability and suffer

from few disadvantages because it continues to be based on word co-occurrences. It

ignores the semantic relevance among words and does not consider context informa-

tion to a lesser extent when compared with the bag-of-words method. Furthermore,

the inference process is too sensitive to the initial condition, especially with respect

to the LDA-based model.

Additionally, word2vec, one of the neural embedding approaches, is based on

the assumption of distributed hypothesis, which implies that words occurring in

a similar context tend to have similar meanings [46]. Based on this assumption,

word2vec uses a neural network model such as skip-gram or continuous bag of word

11

(CBOW) that predicts the neighboring words of input words [70, 85]. The most

important aspect of word2vec is that words with similar meaning are located close to

each other in a vector space. The word2vec model can be utilized to construct dense

document vectors with reasonable dimensions when compared with the bag-of-words

approach, in which the dimensionality and sparsity of a document vector can increase

significantly. Various document representation methods have been suggested based

on the word2vec model. Even a simple representation method, in which average word

vectors are contained in document, shows a good representation performance [142].

A promising representation method based on the word2vec model corresponds to

the doc2vec model. The doc2vec model utilizes contextual information of words and

documents to represent a document.

2.2.2 Word-clustering based document representation method

This section reviews the major document representation methods based on word-

clustering. The bag-of-concepts approach, one of the word-clustering based doc-

umentation representation method, combines the advantages of previous studies.

Semantically similar terms are clustered into a common concept by clustering the

words generated from a neural embedding architecture, thereby incorporating the

impact of semantically similar words for preserving document proximity. Document

vectors are subsequently represented by the frequencies of these concepts [61]. Sim-

ilarly, Paniagua et al. utilized word vectors and word clusters generated by the

neural embedding architecture to add the word-clustering result in the feature set

of documents [127].

And in Mitrofanova et al, a set of key words describing major topics of the plot are

12

assigned to each text; the clusters of words with similar distributions were created for

each key word based on word vector model utilizing co-occurrence matrix [86, 112].

Moreover, Saha et al. constructs word-clustering based cosine similarity for named

entity recognition task [111], and Bekkerman et al. more directly compared the

simple bag-of-words approach and word-clustering based document representation

approach to prove the effectiveness of word-clustering based document representa-

tion.

2.2.3 Novelty detection in the textual domain

Novelty detection can be defined as the task of recognizing that data differ in some

respects from the data that are considered as normal. Novelty detection methods

are commonly classified into five categories, namely probabilistic approach, dis-

tance/density based approach, reconstruction based approach, domain based ap-

proach, and information theoretic techniques. The probabilistic approach and dis-

tance/density based approach are commonly used among the fore-mentioned ap-

proaches [99, 122]. Probabilistic approach uses probabilistic density estimation and

assumes that low-density areas correspond to low probabilities of including nor-

mal data. The distance/density based approach assumes that normal data is tightly

clustered and located close to each other in contrast to novel data. This study in-

cludes improvements of these novelty methods that combines a Gaussian mixture

model that is a probabilistic approach with the k-means clustering based that is a

distance/density based approach.

Novelty detection in the textual domain aims to detect novel documents, sen-

tences, words, or interesting topics. There are many examples of novelty detection

13

methods in the textual domain and these studies apply various methods including

the statistical approach, mixture of models approach, neural networks based ap-

proach, support vector machine based approach, and clustering based approach in

novelty detection [3, 124, 147, 6, 79, 80]. However, these studies focused on novelty

detection of a document or sentence level. That is mainly because various features

could be easily extracted from a document or sentence such as word frequency, fre-

quent POS list, and average length [42, 41, 43]. Meanwhile, novelty detection studies

of word levels are mostly based on a dictionary or a corpus only due to the lack of

suitable methods to represent words in a vector space [47, 17].

2.3 Related studies on user segmentation

According to Kotler, user segmentation refers to the classification of users into groups

depending on their characteristic and behaviors in order to identify those who may

require separate products [66]. User segmentation has also been identified as a key

element of product development. With user segmentation, product developers can

develop differentiated and personalized products for each segment, and marketing

personnel can create segmented advertisements and marketing communications for

each segment [25].

As mentioned earlier, many studies have focused on mobile internet services

based on their usage pattern. Cheng and Sun used messages, entertainment, and

micro-payment services to segment users with an improved segmentation model,

which is called the TFM (time, frequency, money) model [18]. Wu and Chou devel-

oped a soft clustering method that uses a latent mixed-class membership clustering

approach to classify online users based on their purchasing data across categories.

14

Bose and Chen selected internet usage, revenue, services, and user categories as re-

search indicators that were employed to cluster users [12]. Shafig et al. provided

a fine-grained characterization of the geospatial dynamics of application usage in

cellular networks [118].

However, this study focuses on the sequential pattern of mobile internet service

usage, which is only one aspect of the entire smartphone usage, so the clustering

result does not fully reflect the various smartphone usage behaviors.

Several studies have tried to collect additional data sources and consider the ef-

fects of other aspects on users’ smartphone behavior, unlike previous mobile internet

service-based methods. Uronen, Falaki, and Lin obtained mobile usage data using

call detail records collected by an operator, and segmented users using those voice

call usage data [134, 31, 74]. Walsh and Plaza utilized demographics: their results

show that younger users are most likely to be extensively involved with their mobile

phone [137], and the other finds that elderly people utilize mobile phones primar-

ily to communicate with relatives, as memory and daily-life aids, as enjoyment, for

self-actualization, and as tools to feel safe and secure [100].

In addition, Sell, Tao, Bouwman, and de Reuver and Bouwman utilized psy-

chology by combining those sets of information with demographics and behavioral

segmentation, and they found that each group has different motivations and product

attributes [116, 130]. In particular, Bouwman presents a psychographic segmenta-

tion that is based on sociological factors to understand how people deal with their

social lives and psychological factor of the person [87]. De Reuver and Bouwman

found that each segment moderates the effect on the context-use of mobile phones

towards a user’s intention to use products and services [28].

15

The smartphone industry stands to benefit from user segmentation more than

other industries because of the following reasons: 1) smartphones have the capability

to collect and store various types of information, 2) several hundreds of applications

are often installed on a user’s smartphone, and 3) a log of their application usage is a

powerful resource for user segmentation because it contains meaningful information

regarding the user’s preferences, behavioral patterns, and interests. However, these

studies were mainly based on reported usage and limited sources, such as voice calls

and data usage.

One recent study utilized the smart log data that is stored in each device to

segment users in objective and quantitative ways [45]. It utilized the average number

of calls and messages, average amount of data used, average number of URLs visited,

and the average number of applications that are installed and run daily. The use of

smartphone log data to segment users is meaningful, but it is also limited in terms

of its ability to use data from the apps that are used by each user as well as the

sequence in which the apps are used, even if app usage sequences are key elements

for effective user segmentation, as mentioned earlier.

2.4 Related studies on product attributes prioritization

The previous works on aspect extraction are categorized into supervised and unsu-

pervised approaches. However, our discussion here focuses on supervised approaches,

which are utilized in our method. Supervised learning methods are mostly based on

standard sequence labeling approaches, such as Conditional Random Field (CRF)

and Hidden Markov Model (HMM). Huang et al. proposed treated product feature

extraction as a sequence labeling task and employed a discriminative learning model

16

using CRF [49]. In comparison, Choi et al. applied a hierarchical parameter sharing

technique using CRF for a fine-grained opinion analysis, combinedly detecting the

boundaries of the opinion expressions [21]. Moreover, Yang et al. proposed a joint

inference model that leveraged knowledge from predictors optimizing the subtasks

of an opinion [145] and many of the other studies also based on HMM [53, 73, 133].

Meanwhile, Jin et al. extracted highly specific product-related entities based on

lexicalized HMMs [58]. Furthermore, a few domain-knowledge-based methods [139,

52] have been utilized in supervised approaches. CNN-based approaches [101] have

recently been suggested, and they show state-of-the-art performance compared to

those used in the previous studies. The authors of that study utilized Amazon em-

beddings for word representation and constructed a seven-layer CNN architecture.

The present study basically utilizes this CNN structure in the first phase and in-

troduces variations to address the limitations of the previous study considering the

rapidly changing smartphone industry.

However, previous studies mostly focused only on the extraction of aspects and

not on the relative importance of the extracted aspects. Although recent few studies

deal with the relative importance of the aspects [8], they are intuitively based on

the frequency of each aspect in the textual review. Thus, we focus on deriving the

relative importance of the extracted aspects utilizing an explainable neural network.

2.5 Related studies on help system improvements

2.5.1 Help system user interface

There have been several studies conducted on help system during the past decade.

As previously stated, however, those studies majorly focused on the design aspect

17

or common guidelines for usability, and not on the content organization problem,

considering users’ specifications.

In the design aspect, those studies focused on graphical user interface (GUI) to

ensure that users did not find it difficult to locate information, or fin it confusing,

time-consuming, or frustrating [1]. Baker et al. provided tips and practical advice

for using colors, such as avoiding reserved colors for on-line help systems [7]. Al-

berts and Geest also recommended using a maximum of three colors in on-line help

documentation, and argued for functional use of colors [2].

In the usability aspect, most studies focus on providing general guidelines or

tips for designing help systems [89]. Ellison et al. proposed 7 golden rules of on-line

help design, and Crane et al. presented 12 techniques for improving on-line help.

Moreover, Roy et al. proposed a guide for appropriately choosing and designing

task support tools based on tasks and characteristics of help tools [108], and Corbin

et al. presented the design attributes of on-line help systems in a series of design

checklists [24].

2.5.2 User specification

As previously mentioned, item recommendation studies considering users’ charac-

teristics and preferences in an on-line commerce field are primarily based on col-

laborative filtering. In the smartphone industry, however, these approaches are not

appropriate as the required information, such as users’ purchasing history or meta-

data, is not enough for the smartphone user.

Thus, previous user specification studied in the context of the smartphone has

typically been based on demographics and reported usage, which are inherently sub-

18

ject and prone to be skewed by the observers and participants. Furthermore, those

studies were predominantly performed by domain experts who already have compre-

hensive domain knowledge and background information regarding the smartphone

industry.

These can further be classified into several types as follows: (1) geographic seg-

mentation based on dividing the market into different geographical areas, such as

nations, regions, and cities; (2) demographic segmentation based on age, gender,

family size, etc.; (3) psychographic segmentation based on social class, lifestyle,

and/or personality characteristics; and (4) behavior segmentation based on occasion

segmentation, benefit segmentation, service usage, and intention to use [115, 22].

Thus, these studies used the app usage sequence collected by each user, which

are the most meaningful and interesting source of identifying a user’s preferences

and characteristics effectively [51].

2.6 Review on related architecture

2.6.1 Probabilistic clustering method

The studies mentioned in the previous section, however, utilized hard clustering

methods such as K-means, K-medoids, or spherical K-means clustering and did not

consider the membership strength of each word with respect to each cluster. There-

fore, in the present study, an advanced document representation method utilizing

neural embedding architecture based on the probabilistic clustering method was pro-

posed to capture the membership strength of each word. The utilized probabilistic

clustering method included the fuzzy C-means (FCM) clustering method [55] and

the Gaussian mixture model (GMM) clustering method [33].

19

The FCM algorithm attempts to partition a finite collection of n elements X =

{X1, ..., Xn} into a collection of c fuzzy clusters with respect to a specified criterion.

Given a finite set of data, the algorithm returns a list of c cluster centers C =

{C1, ..., Cc} and a partition matrix W = wij ∈ [0, 1], i = 1, ..., n, j = 1, ..., c where

each element wij specified the degree to which element Xi belongs to cluster Cj .

The FCM algorithm aims to minimize an objective function as follows:

argminC

n∑i=1

c∑j=1

wmij dist2(xi, cj)

where

wij =1∑c

k=1

(dist(xi, cj)

dist(xi, ck)

) 2m−1

A GMM is a parametric probability density function that is represented as the

weighted sum of Gaussian component densities. In a multivariate distribution, p(x|θ)

is defined as a finite mixture model with J components, and each component is a

multivariate Gaussian density defined with parameters θj = {µj ,Σj} as follows:

p(x|θ) =J∑j=1

αjpj(x|zj , θj),

pj(x|θj) =1

(2π)d/2|Σj |1/2e−

12

(x−µj)tΣ−1j (x−µj)

and αj = p(zj) denote the mixture weight representing the probability that a

randomly selected x was generated by component J , and ΣJj=1αj = 1. After each

parameter was calculated using the expectation-maximization (EM) algorithm, the

membership weight of data point is computed as follows:

20

wij = p(zij = 1|xi, θ) =pj(xi|zj , θj) · αj∑J

m=1 pm(xi|zm, θm) · αm

2.6.2 Neural embedding architecture

As mentioned earlier, the neural embedding architecture is based on the assumption

of the distributed hypothesis, which implies that words occurring in a similar context

tend to have similar meanings [46]. Based on this assumption, word2vec, which is one

of the neural embedding architectures, uses a neural network model, such as skip-

gram or a continuous bag of words (CBOW) that predicts the neighboring words of

input words [70, 85]. The neural network model in that particular architecture is first

trained with respect to the optimization function 1T

T−k∑t=k

log(p(ωt|ωt−k, ..., ωt+k)) in

CBOW or 1T

T−k∑t=k

log(p(ωt−k, ..., ωt+k|ωt)) in skip-gram, where T denotes the number

of words, and k denotes the window size of the neighboring words. Hidden nodes

can then be used as representations of words wt. The most important aspect of

word2vec is that words with similar meaning are located close to each other in the

vector space.

A class vector is trained from a neural network similar to simple neural embed-

ding model. Sachan and Kumar suggested architecture to embed word vectors in

conjunction with a class vector by incorporating both into a neural network [110].

In a manner similar to simple neural embedding model, the neural network model is

trained with an optimization function∑V

i=1 log p(wi|wcontext)+∑k

j=1

∑Vi=1 log(wi|cj)

when V denotes the number of words, and k denotes the number of classes. The cal-

culation of a class vector cj as well as word vectors wi lead to class vectors with

high cosine similarity with words that discriminate between classes. For instance,

21

with respect to the IMDB dataset, there are two classes of words, namely positive

words and negative words. Negative words, such as ‘awful ’ , are located close to

the negative class vector, while positive words, such as ‘wonderful ’ or ‘lovely ’ are

located close to the positive class vector [97].

2.6.3 Variational auto-encoder and Neural variational document

model

VAE is a directed model that uses learned approximate inference and can be trained

purely with gradient-based methods. To generate a sample from the model, the

VAE first draws a sample z from the code distribution pmodel(z). The sample is

then run through a differentiable generator network g(z). Finally, x is sampled from

a distribution pmodel(x; g(z)) = pmodel(x|z). During the training, the approximate

inference network (or encoder) q(z|x) is used to obtain z, and pmodel(x|z) is then

viewed as a decoder network. It is then trained by maximizing the variational lower

bound L(q) with data point x:

L(q) = Ez∼q(z|x)log pmodel(z, x) +H(q(z|x)) (2.1)

= Ez∼q(z|x)log pmodel(x|z)−DKL(q(z|x) ‖ Pmodel(z)) (2.2)

≤ log pmodel(x) (2.3)

The VAE usually has Gaussian distribution for pmodel(x; g(z)) and maximizing a

lower bound on the likelihood of such a distribution is similar to training a traditional

auto-encoder [64, 105, 71].

Neural variational document model (NVDM) utilized these VAE framework to

22

derive document representation [84]. In this process, word representation is also

derived from the model. In detail, an encoder network q(z|x) compresses document

representation into hidden vector z and a softmax decoder p(x|z) =∏Ni=1 p(xi|z)

reconstructs the documents by independently generating the words where N is the

number of words in the document. Similar to VAE, NVDM is trained by maximizing

the variational lower bound:

L(q) = Eq(z|x)

[N∑i=1

log pmodel(xi|z)

]−DKL [q(z|x) ‖ p(z)] (2.4)

In addition, conditional probability over words p(xi|z) is modeled by multinomial

logistic regression and shared across documents:

P (xi|z) =exp(E(xi; z))∑|V |j=1 exp(E(xi; z))

(2.5)

E(xi; z) = −zTRxi − bxi (2.6)

where R is the word representation matrix(RK×|V |) derived from the VAE architec-

ture.

2.6.4 t-distributed stochastic neighbor embedding (t-SNE)

t-SNE [77] is a nonlinear dimensionality reduction technique that is particularly

well-suited for embedding high-dimensional data into a space of low dimensions

while preserving the distance between data points. Specifically, it models each high-

dimensional object by low-dimensional point in such a way that similar objects are

modeled by nearby points and dissimilar objects are modeled by distant points.

23

The t-SNE algorithm comprises two main stages. First, t-SNE constructs a prob-

ability distribution over pairs of high-dimensional objects in such a way that similar

objects have a high probability of being picked, while dissimilar points have an

extremely small probability of being picked. Second, t-SNE defines a similar prob-

ability distribution over the points in the low-dimensional map, and it minimizes

the Kullback–Leibler divergence between the two distributions with respect to the

locations of the points in the map.

2.6.5 Seq2seq architecture

This study proposes herein variants to the previously established seq2seq architec-

ture to represent each app usage sequence in vector space. The seq2seq architecture

is based on recurrent neural networks (RNN), which is a family of neural networks for

processing sequential data [109]. The RNN creates an internal state of the network,

which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural

networks, RNNs can use their internal memory to process arbitrary sequences of

inputs [39].

The seq2seq architecture was first proposed by Cho (2014) and Sutskever (2014),

as illustrated in Figure 2.1 [19, 128]. An encoder or input RNN is processed as the

input sequence, and the encoder emits the context C usually as a simple function of

its final hidden state. A decoder or output RNN is conditioned on that fixed-length

vector to generate an output sequence. In the seq2seq architecture, the two RNNs

are jointly trained to maximize the average of logP (y(1), ..., y(ny)|x(1), ..., x(ny)) over

all the pairs of x and y sequences in the training set.

24

Figure 2.1: Original seq2seq architecture

2.6.6 Louvain method

The Louvain method is a network-clustering algorithm that optimizes the modularity

to detect nodes that are more densely connected [11]. This technique is a greedy

optimization method that does not always assure a globally optimal result; however,

the method’s time complexity is O(n log n). The modularity function to be optimized

in the Louvain method is presented as follows:

Q =1

2m

∑ij

[Aij −

kikj2m

]δ(ci, cj)

where m represents the edge weight sum of all of the edges in the graph, Aij denotes

the edge weight of nodes i and j, ki and kj are the sums of all edge weights connected

to nodes i and j, respectively, ci and cj represent the communities of the given nodes

i and j, respectively, and δ denotes the delta function.

25

The Louvain method consists of two phases between which iterations optimize

the modularity and detect communities accordingly. For the first step, all nodes are

randomly assigned to a small community. For each node i, j is removed from its

own community and transferred to the community of i’s neighbors j. The change in

modularity is then calculated and denoted as ∆Q.

∆Q =

[Σin + ki,in

2m−(

Σtot + ki2m

)]−

[Σin

2m−(

Σtot

2m

)2

−(ki2m

)2]

Once ∆Q has been calculated for all communities connected to i, j is then moved

to the community in which the change in modularity has increased the most. The

above-mentioned steps are repeated until the value of ∆Q can no longer be uploaded.

For the second step, each community that is formed in the first step is expressed

as a node upon the completion of the first step. The links within the same commu-

nity are expressed as self-loops, while those between different community nodes are

expressed as weighted edges. The first step is then executed on the newly constructed

networks.

2.6.7 Explainable machine learning algorithms

An explainable algorithm concept is proposed to explain how machine learning algo-

rithms arrive at a specific decision in contrast with the black-box characteristic of the

existing machine learning algorithms. In this study, we introduces variations to Grad

CAM (Gradient-weighted Class Activation Mapping) [117], one of the explainable

machine learning algorithms derived for image classification, to calculate the relative

importance of the extracted aspects. Grad CAM uses the gradient information flow-

ing into the last convolutional layer of a CNN to understand the importance of each

26

neuron for a decision of interest (Figure 2.2). Similarly, we construct a sentiment

classification model utilizing the concept of the Grad CAM algorithm to capture the

importance of each aspect.

Figure 2.2: Example of Grad CAM image

Additionally, we utilize other explainable machine learning algorithms as base-

lines of our proposed method in the experiments, such as a sequence model based on

attention mechanism [140] and LIME (local interpretable model-agnostic explana-

tions) [106]. The attention mechanism allows a decoder to consider different parts of

a source sentence at each step of the output generation. Then, the model learns how

to generate a context vector for each output time step and what to focus on based on

the input sentence and what it has produced [143]. Moreover, LIME is an algorithm

that can explain the prediction of any classifier authentically by approximating it

locally with an interpretable model.

2.6.8 Transfer learning

Transfer learning is a machine learning method where a model developed for a task

is reused as the starting point for a model on a second task. In a classification

task in one domain of interest, we only have sufficient training data in another

domain of interest, where the latter data may be in a different feature space or

27

follow a different data distribution. For example, knowledge gained while learning

to recognize cars could apply when trying to recognize trucks. In such cases, transfer

learning, if done successfully, would significantly improve the performance of learning

by avoiding expensive data-labeling efforts [96]. This study utilizes the off-the-shelf

feature approach of transfer learning herein. In this approach, we use the outputs of

one or more layers of a network trained on a different task as generic feature detectors

and train a new shallow model based on these features for target data [119, 135].

It is a popular approach in deep learning where pre-trained models are used as

the starting point on natural language processing tasks given the vast compute and

time resources required to develop neural network models on these problems and

from the huge jumps in skill that they provide on related problems, wherein much

training data can be found in one domain, but little to none in another [123] such

as sentiment classification [37, 10].

2.6.9 Conditional GAN

Conditional Generative Adversarial Nets (CGAN), which is an extension of vanilla

GAN, is originally designed to generate artificial image that can scarcely be distin-

guished from real image under the specific condition of continuous vector value.

GAN simultaneously trains two networks: a generator that learns to generate

fake samples from an unknown distribution or noise and a discriminator that learns

to distinguish fake from real samples [38].

In the CGAN, the generator learns to generate a fake sample with a specific

condition or characteristics (such as, a label associated with an image or a more

detailed tag) rather than a generic sample from unknown noise distribution. To add

28

such a condition to both generator and discriminator, a vector y must be fed into

both networks. Hence, both the discriminator D(X, y) and generator G(z, y) are

jointly conditioned to two variables, z or X and y.

The objective function of CGAN is:

minGmaxDV (D,G) = Ex∼Pdata(x)[logD(x)] + Ez∼Pz(z)[log(1−D(G(z, y), y))]

The difference between GAN loss and CGAN loss lies in the additional parameter

y in both a discriminator and generator function. The architecture of CGAN shown

in the following figure now has an additional input layer (in the form of condition

vector C) that is fed into both the discriminator and generator networks.

29

Chapter 3

Customer-voice classification

3.1 Background

Customer voice (Voice of the customers, VOC) is a term that denotes the feelings of

customers regarding their experience with a product, service, or business. Explicit

complaints and requirements, as well as the unsatisfied needs of customers and over-

all satisfaction, are inherent in customer voice. By analyzing customer voice, thus,

product developers obtain a detailed understanding of customer requirements and

appropriate design specifications for a new product. Additionally, it could be a com-

mon language for a team to proceed forward during product development and a

highly useful springboard for product innovation [36, 40].

Thus, several companies attempt to identify and respond to customer needs and

expectations through customer-voice analysis [59] [131], and it is important to cat-

egorize customer-voice data for relevant departments and responsible individuals.

For instance, the categorization of customer-voice data of a mobile device into sys-

tem, user interface, design, and appearance categories allows it to be delivered to

relevant departments and also provides overall information on customer-voice dis-

tribution according to function. Therefore, it is necessary for customer-voice data to

be classified into functional categories prior to analyzing the data.

31

Figure 3.1: Summary of customer-voice data analysis process

Then, the customer-voice data is gleaned across a variety of channels including

phone, e-mail, and the web, and it is stored in a text document such as Figure 3.1.

The customer-voice data consists of extremely unstructured text since e-mail con-

tents or phone call recordings are stored without any proofreading. Thus, it typically

includes mistakes, such as typo’s and other informal terms including interjections

and slang. With respect to the aspects related to the representation and classifi-

cation of customer-voices, these words are considered as noisy data since they do

not provide significant information on the meaning of a customer-voice. Further-

more, noisy data typically exerts a negative effect on the classification task, and

even small amounts of noisy data can severely decrease overall performance [78].

Figure 3.2: Scope of proposed approaches

32

Thus, this study mainly focused on proposing document de-nosing method to

clear the customer-data and an advanced document representation method that is

appropriate for customer-voice data, while building the automatic classifier because

the representation of a document is an essential task in document classification.

Moreover, the performance of a document representation method for customer-voice

data must be better than previous methods. Further, it must provide representa-

tional interpretability, as it might be analyzed for various purposes after the classifi-

cation task. Thus, we also consider the interpretability factor in our proposed docu-

ment representation method (Figure 3.2). Additionally this study proposes another

novel approach to apply convolution filter to document representation to improve

the classification performance.

3.2 Methodology

3.2.1 De-nosing documents

As described above, customer-voice data involves extremely unstructured data con-

taining mistakes such as typo‘s or other informal terms. It also contains less impor-

tant words to effectively represent each class. The removal of these noisy words by

novelty detection improves the representation and classification performance.

First, it is necessary to consider the application of the previously described nov-

elty detection method in a vector space of words calculated by neural embedding

model to detect the noisy words. A data set as shown in figure 3.3 is assumed to

exist. Each circle refers to word vectors calculated by neural embedding model. Ide-

ally, it is expected that purple circles and green circles are clustered into two main

clusters, and a yellow circle is classified as a novelty. However, the application of the

33

GMM novelty detection method on these data leads to the detection of both green

and yellow circles as novelties since these words are located at a distance from other

words as shown in figure 3.3. Additionally, red ‘+’ and blue ‘+’ indicate the means

of each Gaussian distribution. This implies that words that are distant from other

words due to their uniqueness and low frequency are classified as novel words based

on the previously described novelty detection method although these words consti-

tute meaningful words that explain specific classes or important words with respect

to the classification task. Thus, the application of the previously stated novelty de-

tection method without modification is not sufficient for the effective detection of

novel words.

Figure 3.3: Limitation of the previously stated novelty detection method

The utilization of the class vector addresses this limitation. As described in

section 2, class vectors have high cosine similarity with words that discriminate

between classes. Hence, each class vector is assumed as a mean or centroid of each

words distribution to consider words that are close to each class vector or have

high PDF value as meaningful words to effectively explain each class. Meanwhile,

34

words that are far from the class vector or possess a low PDF value are considered

as noisy words, such as typo‘s, or less important words to discriminate between

classes. Figure 3.4 shows the advantage of the proposed novelty detection method

that utilizes a class vector. In the proposed method, the class vector is located near

the centroid of each word distribution that is composed of words that represent each

class. Although a word distribution composed of a small number of green words

exists, a class vector that is indicated by a green ‘+’ is located near the centroid

of the word distribution. Therefore, the proposed method effectively classifies the

meaningful words and novel words by utilizing a class vector. Thus, in this study,

an alternative is proposed to previous novelty detection methods, such as Gaussian

mixture model and K-means clustering approach which are most frequently used in

novelty detection task, to utilize a class vector.

Figure 3.4: Advantage of proposed novelty detection method

The details of the proposed novelty detection are presented below. Formally,

let set of documents D = {d1, ..., dN} where N denotes the number of documents.

Additionally, the set of words W = {w1, ..., wV } where V denotes the total number

35

of word in D, and C = {c1, ..., ck} where k denotes the total number of class in D.

Word vector wi and class vector cj is h-dimensional vector that represents each

word and each class, and h denotes the number of hidden nodes as defined by a user

in the neural embedding model. The number of class vectors is equal to the data

classes.

1) Calculate vector dimension of each words wi and each class cj. Specifically, wi

and cj are calculated by optimizing function∑V

i=1 log p(wi|wcontext)+∑k

j=1

∑Vi=1 log(wi|cj).

2) Calculate the novelty score with improvements of the Gaussian mixture model

and the K-means clustering method utilizing a class vector.

(1) Improvements of Gaussian mixture model :

Apply improvements of GMM method considering each class vector as the means

of each distribution. Each distribution is assumed as the distribution of words of

each class. The improvements of the GMM method is also represented as a weighted

sum of k component Gaussian densities as given by the following equation:

p(W |µ,Σ) =

k∑j=1

mjg(W |µj ,Σj)

where mj, j = 1, ..., k, denotes the mixture weight and g(W |µj ,Σj), j = 1, ..., k, de-

note the component Gaussian densities. Each component density belongs to a Gaus-

sian function of the following form:

g(W |µj ,Σj) =1

(2π)h/2|Σj |1/2e−

12

(x−µj)tΣ−1j (x−µj)

Then, mean vector µj is fixed with each class vector cj, and only mj and Σj is

36

calculated and updated by the Expectation-Maximization (EM) algorithm as follows.

mj =1

V

V∑i

mjp(wi|µj ,Σj)

p(wi|µ,Σ)

Σj =

∑Vi=1(wi − µj)(wi − µj)T

mjp(wi|µj ,Σj)

p(wi|µ,Σ)∑Vi=1

mjp(wi|µj ,Σj)

p(wi|µ,Σ)

(2) Improvements of K-means clustering :

The improvements of the KMC method considers each class vector as the cen-

troids of each cluster. Each cluster is assumed as the cluster of words of each class.

The improvements of the KMC method aims to minimize an objective function J

known as a squared error function given by the following expression:

J =

k∑j=1

∑W∈Sk

dist(W,µj)2

where S = {S1, ..., Sk} denotes sets of clusters. The centroid vector µj is then fixed

with each class vector and assigns the data point to the cluster center whose distance

from the cluster center corresponds to the minimum of all the cluster centers. It does

not require an additional step to recalculate and obtain a new centroid. The distance

between word wi and centroid of cluster containing wi is utilized as a novelty score.

3) Finally, PDF value, weighted sum of k component Gaussian densities are

utilized as a novelty score in the variation of GMM approach, and the distance

between a word wi and the centroid of the cluster containing wi is utilized as a

novelty score in a variation of the KMC method to detect novel words. This implies

that words with a PDF value lower than specific probability, user define, are consider

37

as novel words in the variation of the GMM approach. Words with a distance from

the centroid that exceeds the specific distance, user set, are considered as novel words

in the variation of the KMC approach.

In step 1), each word vector wi and class vector cj is calculated by neural embed-

ding model. The number of dimensions of wi and cj denotes the number of hidden

nodes of the neural embedding as defined by a user. A variation of the Gaussian

mixture model and K-means clustering method that utilizes a class vector to calcu-

late the novelty score in step 2) is used. The novelty score is calculated by the PDF

value or the distance from the centroid in each method. In step 3), novel words are

detected by the threshold of novelty score as defined by the user in each method.

3.2.2 Probabilistic word clustering based document representation

Consideration of the membership strength

As mentioned above, previous word-clustering-based approaches has a limitation

related to reflecting the membership strength of words with respect to each cluster.

That is, previous approaches represent a document based on the hard clustering

method and does not differentiate in terms of frequency count as to whether a word

is located closest to the centroid of each cluster or located far from the centroid.

Words are clustered in customer-voice data for a mobile device collected from

LG Electronics by using the spherical K-means method [148] in a manner identical

to that in a previous study to show the limitation of not considering the member-

ship strength of a word with respect to a cluster. With respect to the spherical

K-means method, data located near each centroid are considered to exhibit a strong

membership strength with each centroid. Table 3.1 & Table 3.2 show the lists of

38

words in the 7th cluster among 70 clusters with respect to cosine dissimilarity from

the centroid. Cosine dissimilarity, 1− cos(x, y), is the distance measure used in the

spherical K-means method. A close look at the 7th cluster indicates that it may

contain words related to water damage or breakage. Additionally, it is also revealed

that words located near centroid, such as rust, humidity, and LCD are meaningful

keywords to clearly represent the property of a cluster, while other words located

far from the centroid, such as think, daily, and terminal appear as relatively general

words that are not strongly related to the water damage or breakage topic. A domain

expert of LG electronics was involved in the study and shared the same opinion as

the observations of this study.

Table 3.1: Word list located closest to the centroid

Word Dissimilarity Word Dissimilarity Word Dissimilarity

rust 0.3053 careful 0.3417 broken 0.3718

mistake 0.3252 carelessness 0.3579 part 0.3809

humidity 0.3317 LCD 0.3662 dent 0.3846

throw 0.3405tempered

glass0.3697 appearance 0.3895

Table 3.2: Word list located far from the centroid

Word Dissimilarity Word Dissimilarity Word Dissimilarity

integrated 0.9137 sticker 0.8922 do 0.8422

just 0.9078 two 0.8873 ambiguous 0.8314

pay 0.9023 tear 0.8713 grudge 0.8076

terminal 0.8973 daily 0.8573 think 0.8033

Hence, it is reasonable to differentiate between words in the frequency count.

That is, words exhibiting a strong membership strength with clusters need to have

a higher representation in the frequency count, as these words better represent the

39

property of the cluster. The consideration of membership strength is expected to in-

crease the impact of meaningful keywords in document representation. Further, it is

expected that the proposed representation method will be more robust with respect

to noisy words, as noisy words can have a lower representation in the frequency

count.

Probabilistic document representation

In this study, two soft clustering methods, namely, the FCM and GMM clustering

methods, are applied to measure the membership strength of each word with respect

to clusters. The application of the soft clustering methods enabled the measurement

of the membership strength of words by wij . In the FCM clustering method, wij

denotes the degree to which wordi belongs to cluster Cj , and wij denotes the prob-

ability that the wordi is generated from the distribution of cluster Cj in the GMM

clustering method.

The following figure 3.5 summarizes the proposed document representation method.

Figure 3.5: Document representation based on probabilistic word clustering

Formally, let the set of documents D = {d1, ..., dN} and set of words W =

40

{w1, ..., wn} where n denotes the total number of words in D. c denotes the number

of clusters the user defines, and dist(a, b) denotes the cosine dissimilarity between a

and b. Furthermore, centj denotes the centroid of clusterj .

The membership strength mij denotes the scalar value that represents the mem-

bership strength of wordi with jth cluster in which mij ∈ [0, 1]. Additionally, the

document vector Vk and its normalized vector Vk correspond to the jth document,

respectively.

Then, the proposed document representation method is calculated as follows:

1) Calculate the h-dimensional vector of each word in W by using the neural

embedding model, where h denotes the number of hidden nodes in the model. Each

wi is calculated by optimizing the function

n∑i=1

log p(wi|wi−k, ..., wi+k)

where k denotes the window size of neighboring words. Then, the hidden nodes could

be used as the representations of words wi

2) Cluster all words wi and calculate the membership strength mij for all i, j by

using the FCM and the GMM clustering methods.

(1) Apply the FCM clustering method to calculate the membership strength mij

by using the following equation:

mij =1∑c

k=1

(dist(wi, centj)

dist(wi, centk)

)2 (3.1)

41

while minimizing an objective function as follows:

argminC

n∑i=1

c∑j=1

m2ijdist

2(wi, cj)

(2) Apply the GMM clustering method to calculate the membership strength

mij by using the following equation:

mij =pj(wi|zj , θj) · αj∑ck=1 pk(wi|zk, θk) · αk

(3.2)

while p(w|θ) is defined as a finite mixture model with c components, and each

component is a multivariate Gaussian density defined with parameter θj = {µj ,Σj}

as follows

p(w|θ) =c∑j=1

αjpj(w|zj , θj),

pj(w|θj) =1

(2π)h/2|Σj |1/2e−

12

(w−µj)tΣ−1j (w−µj)

and αj = p(zj) denotes the mixture weight that represents the probability that

a randomly selected w is generated by components j, where Σcj=1αj = 1. Each

parameter is updated by the EM algorithm.

3) Calculate the document vector Vk = [vk1, ..., vkj , ..., vkc] by the following equa-

tion:

vkj =∑i

(cf ijk ×mij) (3.3)

where cf ijk denotes the frequency of wi that is included in jth cluster in dk.

4) Calculate the normalized document vector Vk = [vk1, ..., vkj , ..., vkc] by the

42

following equation:

vkj =vkj∑j vkj

× logN

df j(3.4)

where j = 1, ..., c, k = 1, ..., N , df j denotes the number of documents containing

words included in the jth cluster.

In step 1), the word vector wi is calculated by the neural embedding model. As

described previously, the number of dimensions of wi corresponds to the number of

hidden nodes of the neural embedding model that is defined by the user.

The membership strength mij for all i, j is calculated in step 2). Two soft

clustering methods, namely the fuzzy C-means method and Gaussian mixture model,

are used. Equation (1) is used to calculate mij with the FCM clustering method,

and Equation (2) is used to calculate mij with the GMM clustering method.

In step 3), the document vector prior to normalization is calculated by multiply-

ing the membership strength mij and cf ijk, which is the frequency of wordi that is

included in the jth cluster in the kth document based on equation (3).

In step 3), each dimension is first divided by summing the entire dimension for

normalizing based on equation (4). Normalization is applied to create a robust doc-

ument representation based on the length of the document. As mentioned above,

the customer-voice data are represented in extremely unstructured texts of various

lengths, a longer text often contains a large amount of repetition. Without normal-

ization, the customer-voice data of different lengths containing similar contents can

be differently represented and classified into different categories. Second, logN/df j

is multiplied with each dimension according to equation (4) for concept frequency-

inverse document frequency (CF-IDF) effect used on the previously specified bag-of-

concepts approach. The CF-IDF corresponds to the weighting scheme that readjusts

43

the count of concepts based on its frequency in the entire corpus. If a certain concept

occurs in every document in the corpus, it is considered as relatively unimportant,

thus reducing its frequency.

3.2.3 Word-clustering based document representation with VAE

and its probabilistic version

Formally, let the set of documents D = {d1, ..., dN} and the set of words W =

{w1, ..., wn} where N and n respectively denotes the number of documents and the

total number of words in D. c and dist(a, b) respectively denotes the number of

clusters user defines and the cosine distance between a and b. Word vector wi is h-

dimensional vector that represents each word, and h denotes the number of hidden

nodes the user defines in the VAE model. And membership strength mij denotes the

scalar value that represents the membership strength of wordi with clusterj , in which

mij ∈ [0, 1]. Document vector Vk denotes the c-dimensional vector of document, and

formally corresponds to Vk = [vk1, ..., vkj , ..., vkc], where k = 1, ..., N .

First, calculate vector dimension of each word wi using the VAE architecture. wi

is calculated by Rxi in the function E(xi; z) = −zTRxi− bxi as explained in section

2. Second, cluster all words wi and calculate membership strength mij for all i, j.

In hard clustering version, membership strength is assigned as binary value 1 or 0.

And it is calculated using following equation in the probabilistic clustering version:

mij =pj(wi|zj , θj) · αj∑ck=1 pk(wi|zk, θk) · αk

(3.5)

where i = 1, ..., n, j = 1, ..., c, while p(w|θ) is defined as a finite mixture model

with c components, and each component is a multivariate Gaussian density defined

44

with parameter θj = {µj ,Σj} as follows

pj(wi|θj) =1

(2π)h/2|Σj |1/2e−

12

(w−µj)tΣ−1j (w−µj) (3.6)

The EM algorithm updates each parameter. Finally, calculate the document

vector Vk = [vk1, ..., vkj , ..., vkc] by the following equation:

vkj =∑i

(cf ijk ×mij) (3.7)

where cf ijk calculated by the frequency of wi that is included in clusterj of Vk.

3.2.4 Matrix representation of word-clustering based document rep-

resentation

As aforementioned, previous document representation studies based on word cluster-

ing utilized word representation from individual architecture such as co-occurrence

or neural embedding. Thus, discriminative power, one of the criteria of document

representation performance, varies with the kind or attribute of the document [62].

Thus this study proposed matrix representation approaches to concatenate the var-

ious word clustering based document representation methods shown in figure 5.4.

In previous studies, each document is represented by vector representation de-

rived from individual architecture. By contrast, our proposed method represents

each document in matrix form. Irrespective of the architecture or word representa-

tion method, the matrix form could be constructed by solely specifying the number

of clusters and easily combining the document representation derived from various

word representation algorithms or various clustering algorithms. For instance, we

45

easily concatenate the additional document representation derived from other algo-

rithm, such as Fuzzy C-means clustering, by calculating the mij by the equation as

follows:

mij =1∑c

k=1

(dist(wi, centroidj)

dist(wi, centroidk)

)2 (3.8)

and minimizing an objective function argminc∑n

i=1

∑cj=1m

2ijdist

2(wi, centoridj).

where centroidj denotes the centroid of each clusterj in section 3.1.

As a footnote, we devised our matrix representation approach from the approach

of multiple feature extraction approach and ensemble learning. Multiple feature ex-

traction approach used in many studies in document analysis [146]. And ensemble

learning helps improve machine learning results by combining several models and

this approach allows the production of better predictive performance compared to

a single model. Furthermore, the efficiency of the ensemble approach of classifier

are proved both theoretically and practically in many studies [82]. In this sense, we

assumed and expected that the concatenation of document representation will show

the better discriminative power rather than individual representation, similar to the

multiple feature extraction approch or ensemble of classifiers.

In experiments, we construct matrix representation by combining 7 different

word-clustering based representations such as co-occurrence based word-clustering

approach, Neural embedding based word-clustering approach, its probabilistic ver-

sion with FCM and GMM, VAE based word-clustering approach, its probabilistic

version with FCM and GMM.

46

3.2.5 Applying convolution filter to matrix representation

As mentioned previously, matrix representation is appropriate for word clustering

based document representation and has similar effectiveness of the ensemble ap-

proach. In spite of those advantages, we need to deal with the relatively large size of

matrix representation compared to the individual vector representation which lead

to increase the complexity and possibility of over-fitting for further classification

model. Thus, we apply convolution filter to matrix representation for addressing

those limitations.

Rearrange the elements of each document representation vector

To apply convolution filter to matrix representation, each element of representation

vector is rearranged by semantic meaning while it preserve the semantic distance

among the word clusterings.

The Figure 3.6 illustrates the reason why the rearrangement process is required.

In the word clustering based document representation, each element contains the se-

mantic meaning. For instance of customer-voice, greyscale elements contains design

related words or blue scale elements battery related words. Without rearrangement,

there are no correlation between neighboring elements like the left dog image in the

Figure 3.6 and we could not extract appropriate local feature with applying convolu-

tion filter. Thus we apply rearrangement process while preserving semantic meaning

of each document representation to get the appropriate matrix representation output

like the right dog image.

In the rearrangement process, first, we apply t-SNE algorithm to one specific

representation vector used as benchmark to determine the order of elements pre-

47

Figure 3.6: The Reason for rearranging the each representation

serving the semantic distance of word clusters. We project all word cluster to 1-

dimensional space with t-SNE algorithm since it preserve the distance of each data

points when embedding high-dimensional space into low-dimensional space. Then,

we utilize those semantic order determined by t-SNE algorithm as the order of ele-

ments of representation vector as figure 3.7.

In details, individual document representation is constructed in 1-dimension

while each element represents each word clustering. By projecting all word clus-

ter into 1-dimension, we easily match the semantic order of word cluster, used as

benchmark, and elements in other document representation.

Second, we put the elements in one-to-one correspondence between presentation

vectors based on the semantic meaning of benchmark representation vector. For this

end, we linearly transformed word clustering space of each representation vector to

those of benchmark representation while minimizing the squared sum of error of

48

distance between data points, namely word. Then, we correspond the word clusters

which are located closest each other as figure 3.8. In correspondence, we find the

closest clusters heuristically in order of silhouette index of benchmark representa-

tion since clusters with low silhouette index will not have much significance to the

representation.

Once aforementioned process is done, all elements of individual document rep-

resentation will be arranged by same semantic order of benchmark representation.

For instance, if elements order of benchmark representation configured as follows:

design related word cluster, performance related word cluster, ..., battery related

word cluster, then other representations are also rearranged by same semantic order

with our proposed method.

Applying convolution filter

After rearranging the elements of each representation vector, namely word clustering,

considering the semantic order, we apply convolution filter to matrix representation.

We use two levels of convolution filter with the size of 3x1 (within the each repre-

sentation vector) and 2x2 respectively. (figure 5.4). We used the relatively small size

of 2-layered convolution filter instead of large sized filter with reference to experi-

mental result of VGG network [121] or Inception-v2 [129]. Then, finally we add the

fully connected neural network layer for final classification model in the following

experiments section.

49

Figure 3.7: Preserve semantic distance

Figure 3.8: One-to-one correspondence

Figure 3.9: Rearrange the elements

50

3.3 Experiments

3.3.1 Data description

In order to verify the discriminative power of our proposed method, we collected

the customer-voice data, one of the real business text data, the Reuter news and

20 Newsgroup dataset, two of the public text data. First, customer-voice data was

collected from Mobile Communication (MC) department in LG Electronics between

April 23, 2014 and March 23, 2017 (Table 3.3). The data were manually labeled by

domain experts in LG Electronics into 12 classes. In order to avoid a class imbalance

problem, 900 customer-voice datasets were collected from each class.

Table 3.3: Customer-voice dataset

ClassesNumberof data

Averagewords

per dataClasses

Numberof data

Averagewords

per data

OS upgrade 900 107.62Network

connection900 111.37

Multimedia 900 121.64 Call & message 900 108.49

Hard key &input error

900 119.57Heating &processing

900 110.27

Water-proof &dust-proof

900 108.26Battery &

power900 81.24

Accessory 793 84.31Appearance &

display900 90.34

Security &backup

900 95.73 User Interface 900 93.66

3.3.2 Experiments setup

First experiment is performed to verify the effectness of de-nosing effect. De-nosing

customer-voice data representation is constructed and is composed of only words

that are not determined as a novelty to compare the representational effectiveness

51

and the classification performance of the customer-voice data representations by ap-

plying the proposed method and the previous method. Each 1%, 2%, 3%, 4%, 5%,

6%, 8%, 10%, 12%, 15% or 20% of novel words detected by the proposed method

and the previous method is removed preliminarily as a means of de-noising prior

to constructing the customer-voice data representation. Then, the result of repre-

sentational effectiveness and classification performance of each customer-voice data

representation is compared by applying de-noising.

The customer-voice data representation methods include the Term Frequency-

Inverse Document Frequency(TF-IDF), Latent Semantic Analysis(LSA), topic vec-

tor, neural embedding based word clustering approach, and probabilistic word clus-

tering based approach. The TF-IDF is most common document representation method

in which a document is fundamentally represented by the counts of word occurrences

within the document [5, 81]. LSA is the technique applying singular value decompo-

sition(SVD) in term-frequency matrix to reduce the number of rows while preserving

the similarity structure among columns [69]. The topic vector is an inferred topic

proportion that is typically used as a topic feature to represent the document [16].

Additionally, in the neural embedding based word clustering approach [61, 127] and

probabilistic word clustering based approach [72], semantically similar terms are

clustered into a common cluster by clustering word generated from neural embed-

ding. Document vectors are subsequently represented by the frequencies of these

clusters. The only difference between these methods is that the probabilistic word

clustering based approach additionally considers the membership strength of words

by utilizing a soft clustering method. In this experiment, the number of clusters is

fixed at 150 for the neural embedding based word clustering approach and prob-

52

abilistic word clustering based approach to minimize the impact of the number of

clusters in the experiments.

Second experiment is performed to measure the classification performance of our

proposed method. The classification result is considered as correct if the document

is predicted as its actual class by the prediction model. A major voting ensemble

model of neural networks that is used in several studies [94, 13, 92], is constructed

for the classification task.

In the experiments, classification performance based on the proposed document

representation method are compared to those generated from the bag-of-words,

co-occurrence based word-clustering approach, doc2vec, Neural embedding based

word-clustering approach and its probabilistic version, VAE based word-clustering

approach and its probabilistic version.

Additionally, we also performed a comparison of the proposed method to ordered

document representation methods such as a convolutional neural-networks (CNN)

based model [29, 63] and recurrent neural-networks (RNN) based model [76] to

validate the performance of the proposed method more significantly. Those two

methods were designed to share the same parameter with the experiments in their

studies.

Moreover, in order to solely focus on the effect of applying convolution filter, we

experimented other variations of matrix decomposition approach; Singular Vector

Decomposition (SVD) [44], Non-Negative Matrix Decomposition (NMF) [23, 32].

We implement our proposed method and other benchmark method with Python,

Tensorflow. In pre-processing stage, we removed stop word with NLTK library and

stemmed word with snowball library. And The proposed method, doc2vec method,

53

and neural-embedding-based word-clustering approach were designed to share the

same window size of eight and the number of hidden layers in the training word vec-

tor (300) to minimize the influence of hyperparameters on the experiment. Hence,

in order to observe the impact on the overall experiment, several values were exper-

imented with the number of clusters beginning from 20 to 200 with increments of

10. And CNN and RNN based approach in implemented following aforementioned

papers [29, 63, 76].

Lastly, we measure the accuracy of classification performance by counting the

number of correct predictions are located in the diagonal of the confusion matrix.

Confusion matrix is a specific table layout that allows visualization of the perfor-

mance of an algorithm. Each row of the matrix represents the instances in a predicted

class while each column represents the instances in an actual class.

3.3.3 Experiments results

De-nosing documents

Table 3.4 shows words with the lowest novelty score as determined by the proposed

method and the previous method. Novelty score of GMM method is calculated by

minus of logarithm of the PDF value and that of KMC is calculated by distance

from the closest centroid. In the proposed method, words with lowest novelty score

constitute considerably discriminative words to represent each class such as ‘LCD ’,

‘voice’, ‘security ’, and ‘WiFi ’. In the previous method, words with the lowest novelty

score are general words to discriminate between classes such as ‘phone’, ‘again’, and

‘after ’. Especially, in the previous KMC method, extremely general words, such as

‘my ’, ‘of ’, and ‘it ’ are extracted. This implies that the novelty score of the proposed

54

Table 3.4: Words with lowest novelty score

Novelty detectionmethod

Examples of words (Novelty score)

GMM with classvector (Proposed)

LCD(-143.32), breakage(-142.87), Marshmellow(-141.85),break(-141.69), health(-137.66), voice(-133.93),GPS (-133.23), battery(-132.46), volume(-130.04),QWERTY (-127.24)

KMC with classvector (Proposed)

touch(0.1728), restore(0.1947), security(0.2676),Lollipop(0.3927), WiFi(0.3022), ringtone(0.3169),LCD(0.3173), memo(0.3414), message(0.3503),backup(0.3850)

Previous GMMis(-176.90), do(-175.71), again(-175.75), various(-174.43),season(-162.19), after(-160.06), opposite(-154.20),Samsung(-152.29), phone(-149.64), important(-147.67)

Previous KMCand(0.1584), my(0.1606), of (0.1697), it(0.1754),your(0.1808), was(0.1811), have(0.1989), this(0.1990),is(0.2000), no(0.0.2006)

method is a proper measure when compared to the previous method to determine

whether each word effectively represents each class.

Table 3.5: Words with highest novelty score

Novelty detectionmethod

Examples of words (Novelty score)

GMM with classvector (Proposed)

aguardo(146.61), de(146.61), suddenly(146.59),regards(146.59), may(146.57), holiday(146.45),sus(146.38), poseedor(146.32), why(146.30),method(146.28)

KMC with classvector (Proposed)

both(0.7652), volkswagen(0.7652), normal(0.7652),blah(0.7651), if (0.7651), time(0.7651), age(0.7651),last(0.7651), uu(0.7647), SIRS (0.7647)

Previous GMM

electronic(29.39), statement(29.33), eBay(29.16),YouTube(29.07), native(28.95), connection(28.86),showing(28.65), progress(28.52), VOLTE (28.33),photography(28.33)

Previous KMC

premium(0.4109), repair(0.4101), than(0.4101),Media(0.4092), provide(0.4090), open(0.4074),Windows(0.4071), read(0.4070), music(0.4064),GUI (0.4043)

Table 3.5 shows words with the highest novelty score determined by each method.

55

Typo‘s including ‘de’, ‘sus’, and ‘uu’ and meaningless words including ‘blah’, ‘last ’

and ‘may ’ are effectively detected in the proposed method and not detected in the

previous method. From a qualitative viewpoint, these results indicated that the pro-

posed method performed better in the detection of novel words. Additionally, it is

intuitively expected that the representational effectiveness and classification perfor-

mance will improve when those words are detected and removed by the proposed

method.

Table 3.6 and Figure 3.15 show the results of the classification performance of

customer-voice data by applying the proposed method and the previous method. In

a manner similar to the results of representational effectiveness, the results of the

classification performance of the proposed method improve steadily with increases

in the removal ratio of novel words and outperform that of the previous method

with respect to all representation methods. The reason for the better performance

of the proposed method is attributed to the fact that it can detect novel words more

effectively than previous method by utilizing a class vector.

Document representation

We represent the results of classification performance of customer-voice data (Table

3.7) with respect to varying dimensions.

First, the matrix representation with convolution filter outperforms all other doc-

ument representation methods in all dimensions. As we expected, by concatenating

the each representation vector it shows the better discriminative power rather than

not only other word clustering based representations but also other ordered method

such as RNN or CNN based approach. Moreover, our proposed method shows quiet a

56

Table 3.6: Accuracy of classification performance (*: Proposed method)

Noveltydetectionmethod

No de-noising

5% de-noising

10% de-noising

20% de-noising

GMM withclass vector*

0.6403 0.6471 0.6510 0.6523

TF-IDFKMC with class

vector*0.6403 0.6506 0.6522 0.6545

Previous GMM 0.6403 0.6311 0.6358 0.6364

Previous KMC 0.6403 0.6329 0.6391 0.6346


0.6723 0.6902 0.6982 0.7027

Neural embeddingbased

clustering [61, 127]

KMC with classvector*

0.6723 0.6874 0.6918 0.7053

Previous GMM 0.6723 0.6668 0.6498 0.6555

Previous KMC 0.6723 0.6739 0.6700 0 0.6690


0.8638 0.8808 0.8876 0.8907

Probabilisticclusteringbased [72]

KMC with classvector*

0.8638 0.8867 0.8856 0.8994

Previous GMM 0.8638 0.8642 0.8661 0.8657

Previous KMC 0.8638 0.8695 0.8605 0.8738


0.6401 0.6626 0.6651 0.6698

Topic vectorKMC with class

vector*0.6401 0.6616 0.6719 0.6758

Previous GMM 0.6401 0.3890 0.6270 0.6419

Previous KMC 0.6401 0.3802 0.6487 0.6497


0.6443 0.6532 0.6572 0.6627

LSAKMC with class

vector*0.6443 0.6552 0.6631 0.6728

Previous GMM 0.6443 0.6469 0.6514 0.6507

Previous KMC 0.6443 0.6502 0.6467 0.6439

57

Figure 3.10: TF-IDF

Figure 3.11: Neural embedding based word clustering [61, 127]

Figure 3.12: Probabilistic word clustering based approach [72]

Figure 3.13: Topic vector

Figure 3.14: LSA

Figure 3.15: Accuracy of classification performance

58

stable performance with respect to varying dimension while previous methods show

extremely low classification performance in the dimension of 20 or 30. It means that

matrix representation approach is appropriate representation method to be used in

document classification task which accord with our expectation.

And in the comparison of convolution filter, 2x2 filter shows the rather higher

performance than 3x1 filter. It means that local feature between individual repre-

sentation has meaningful effects on classification of the document. We can infer that

difference between each word clustering result lead to make a difference between

semantic order of element and it works as meaningful feature in the classification

task.

In the effect of matrix factorization, we cannot find out the critical difference

after applying matrix factorization, rather decrease the classification performance

a bit than naive matrix representation. It indicates that matrix factorization has

no other benefit except dimension reduction effect in the matrix representation of

document. Additionally, VAE based presentations show quite a higher result than

other representation vectors. However, it does not outperform the neural embedding

based presentation since VAE based approach could not capture the contextual

information of word while deriving the word representation.

Furthermore, this study provides an intuitive interpretation for the generated

vector. The strength of the approach is inherited by the approach proposed in the

present study. Table 4.3 shows that the proposed method successfully offers a clear

interpretation of the generated vector. The words in the cluster listed in table 4.3

indicates that each cluster contains words that are closely related to each class.

This implies that the customer-voice data in each class are represented by words in

59

Table 3.7: Accuracy of classification performance of customer-voice data

Number of clusters

40 80 120 160 200

Matrix representation(2x2 filter)

80.64% 85.90% 88.22% 87.45% 88.73%

Matrix representation(3x1 filter)

79.78% 84.78% 86.74% 86.37% 87.90%

Matrix representation(NMF)

78.42% 83.79% 85.28% 86.42% 86.59%

Matrix representation(SVD)

78.43% 83.46% 85.86% 86.22% 86.23%

Matrix representation(Naive)

79.51% 84.24% 86.99% 87.24% 86.45%

VAE based probabilisticclustering

71.51% 78.51% 83.23% 82.10% 81.45%

VAE based wordclustering

72.13% 77.78% 76.75% 78.73% 78.94%

Neural embedding basedprobabilistic clustering

77.51% 80.24% 84.05% 83.61% 84.45%

Neural embedding basedword clustering [61, 127]

71.13% 76.78% 78.05% 78.84% 78.94%

Co-occurrence basedword clustering [86]

65.40% 66.42% 65.81% 64.46% 65.75%

CNN based [29, 63] 83.19%

RNN based [76] 81.64%

VAE based documentrepresentation [84]

70.67%

Doc2Vec [70] 72.22%

Bag-of-words 64.67%

frequent clusters and the name or topic can be easily assigned to each cluster by

viewing those keywords.

60

Figure 3.16: Accuracy of classification performance of customer-voice data

Table 3.8: Example of representation interpretation

Customer-voice example ClassMost

frequentcluster

Words in mostfrequent cluster

(Cosine dissimilaritywith centroid)

There are other problems, most ofwhich involves display brightness...

Display 3rd / 70Screen(0.1649),

Brightness(0.1873),Display(0.1947)

When I took pictures, they weresaved in the sd card until now...

Multimedia 24th / 70Camera(0.2073),Photo(0.2491),Shutter(0.2556)

After mounting camplus, I pressthe shutter button. The backupbattery is activated...

Battery &Power

12th / 70Battery(0.2134),Charge(0.2619),

Charging(0.2843)I got various accessories along withthe G5. However, the VR deviceleaves much to be desired...

Accessory 52th / 70Accessory(0.2267),Toneplus(0.2341),

VR(0.2682)

61

Chapter 4

User segmentation

4.1 Background

The term “user segmentation” refers to classifying users into groups depending on

their specific needs, characteristics, or behaviors to identify those who might re-

quire separate products or services [65]. Users can be segmented in different ways.

One way is to characterize the target customers by homogeneous preferences, that

is, grouping together customers that have roughly the same preferences [66]. User

segmentation has been identified as a key element of product development and mar-

keting. For instance, with user segmentation, product/service developers can develop

differentiated and personalized products/services for each segment, and marketing

personnel can create segmented advertisements and marketing communications for

each segment.

Applying user segmentation strategies for information gathering is highly bene-

ficial, particularly in the smartphone industry. First, smartphones have the capabil-

ity to collect and store various types of information, including the user‘s location,

communications, social networks, and lifestyle, which are effective sources of user

segmentation [26]. Second, hundreds of applications are often installed in a user‘s

smartphone, and a log of their application usage is a powerful resource for user

63

segmentation because it contains meaningful information regarding the user‘s pref-

erences, behaviors, interests.

In the smartphone industry, user segmentation is typically performed based on

the user‘s preferences, interests, or willingness to use. Furthermore, the applications

used by each user are the most meaningful and interesting source of identifying a

user‘s preferences and interests [51]. Therefore, considering which apps a user uses

and what patterns their apps are used in with regard to smartphone user segmen-

tation is essential. In this study, we propose novel ways of segmenting smartphone

users based on their app usage log collected from LG smartphones.

This study proposes a variant of the seq2seq architecture to represent each app

usage sequence, which processes a whole sequence, and not within limited windows,

and represents the sequence itself, and not the corresponding sequence. We then cal-

culate the vector representation of each user based on the representation of each app

usage sequence and derive the segmentation results by clustering the representation

of each user.

Irrespective of these meaningful results of first approaches, this approach could

not provide an intuitive interpretation of user segmentation because the users are

represented in a continuous vector space that is generated from a seq2seq archi-

tecture. Therefore, their study fell short of real business applications, which would

determine which app is most critical for user segmentation.

Here, for user segmentation, we additiaonllay propose two types of approaches

that are able to provide an intuitive interpretation based on the observations in the

study: (1) app clustering-based user segmentation and (2) network representation-

based segmentation. First, each app is embedded in the vector space by calculating

64

each app’s vector representation value using neural embedding architecture, and

characteristically similar apps, which are located close to each other in the vector

space, are grouped into a cluster. Each user is represented by the frequencies of these

clusters.

4.2 Methodology

4.2.1 Variant of the seq2seq based approach

Figure 4.1: Summary of our proposed method

By thoroughly reviewing the app usage sequences, we could determine that the

usage of each app is closely related to the usage of other neighboring apps. For

instance, similar categories of gallery apps are usually used next to the camera app,

and similar categories of voice call apps are usually used next to call log apps. In

addition, many people have their own habits of running through SNS apps, such

as Facebook, Twitter, and Instagram. In other words, sequential and contextual

information are meaningful in the app usage sequence of each user.

The contextual information of the app usage sequence is meaningful, as well as

65

the words and documents. Accordingly, the neural embedding architecture would be

the first option to represent app usage sequences because it is designed to represent

words and documents based on their contextual information. However, the neural

embedding architecture considers only words within the window size, and not the

whole sequence. That is, the neural embedding model is limited in its ability to

represent the entire app usage sequence.

Thus, the existing seq2seq architecture would be the second option, which is orig-

inally proposed to generate sequences of words by predicting the next word while

considering the entire sequence. This architecture performs very well in the machine

translation field. As mentioned earlier, sequential and contextual information of app

usage sequences are also meaningful. Therefore, app usage sequences can be used

as data to be processed with these kinds of architectures. Moreover, the seq2seq

architecture contains a context C node. This node is suitable for the representation

of sequences because it is originally designed to summarize all the encoded infor-

mation. Thus, we utilize herein the context vector as representation of the encoded

app sequence information, which further supports the applicability of these kinds

of architectures to the representation of app usage sequences. However, previously

developed architectures are composed of different input and output parameters be-

cause they were originally designed to generate corresponding sequences, and not

sequences.

Thus, we propose the use of a variant of the conventional seq2seq architecture

that receives an app usage sequence as the input of the encoder and generates

the same app usage sequence in the decoder instead of using the seq2seq architec-

ture (Figure 4.2). By training the architecture this way, we take advantage of both

66

the neural embedding architecture, which calculates vector representation by be-

ing trained to predict context words, and the seq2seq architecture, which considers

the entire sequence in the training step. We expect to calculate a more appropriate

context vector C to represent each usage sequence with our proposed method com-

bining the advantages of both architectures. Summary of our proposed method is

illustrated in Figure 4.1.

Each app usage sequence A = (a1, a2, . . . , aT ) specifically defines the series of

apps used from the time the smartphone screen is turned on to the time it is turned

off. Each user normally has several app sequences per day.

Figure 4.2: Variant of the seq2seq architecture (our proposed architecture)

The hidden state h1 of the encoder in each time step t is updated by the following

equation:

h(t) = f(h(t−1), xt)

After reading the end of the sequence, the hidden state of the encoder becomes

the context vector C of the whole input sequence. The decoder of the proposed

model is trained to generate the output sequence by predicting the next app used

given the hidden state st. The hidden state decoder at time t, st, is computed as

67

follows:

h(t) = f(h(t−1), y(t−1), c)

The conditional distribution of the next app used is:

p(yt|y(t−1), y(t−2), ..., y1, c) = g(h(t), y(t−1), c)

where f is a sigmoid function, and g is a softmax function. The two components

of the proposed architecture are jointly trained to maximize the conditional log-

likelihood:

maxθ

1

N

N∑n=1

log pθ(yn|xn)

where θ is the set of model parameters.

As regards the details of our neural network architecture, our network contains

three levels of hidden layer in each sequence. The length of the encoder/decoder

sequence corresponds to 15, considering the maximum length of the app usage se-

quence (Figure 4.2). Moreover, sequences shorter than the maximum length are

padded with a constant value used in most real implementations to respond to a

variable size of the app usage sequence [132, 60].

We utilize context vector C as the vector representation of each app usage se-

quence after training the architecture using the set of the app usage sequences. This

calculated vector represents the usage sequence, and not the user. Thus, we need an

additional step for the final user segmentation result.

First, we segment the app usage sequence using the GMM method, which shows

the highest performance among all the other clustering methods, including K-means

68

clustering and fuzzy C-means clustering. We cluster each sequence based on the

highest conditional probability pj(x|θj) and fix the number of clusters to ten, which

is the same number of clusters identified by the domain expert. Each user is then

assigned to the segment, where most of his/her usage sequences are found (Figure

4.3).

Figure 4.3: Determination of user segmentation

4.2.2 App clustering and relative similarity-based segmentation

In this section, we 1) describe an interpretable approach of user representation based

on app clustering, 2) present two novel techniques to normalize and adjust the vector

value of user representation, and 3) propose a novel user segmentation method to

address the inherent limitations arising from absolute similarity by determining the

relative similarities between users.

App clustering-based user representation

In this approach, each app in the usage sequence of each user is represented using

neural embedding architecture. Based on the vector representations of the respective

apps derived from the architecture, characteristically similar apps are gathered into

69

neighborhoods and neighboring apps are then gathered into common clusters. Each

user is assigned a vector representing the counts of their total app usage within

each cluster, and the users are segmented based on these representations, as shown

schematically in Figure 4.4. This approach can also serve as an effective dimension-

ality reduction method for user representation to address the sparsity problem of

the N-gram model.

Figure 4.4: Summary of app clustering-based user representation.

Additional techniques for effective user representation

In this sub-section, we propose two novel techniques to make our proposed user

representation method more effective through assessments of relative importance

between app clusters and the significance of each app within a cluster.

Our initial goal is to make our proposed method robust to the effects of app clus-

tering. Under most clustering algorithms, the apps are normally assigned to different

clusters, with the obvious differences between apps located close to and distant from

the centroid of each cluster representing the characteristics of the cluster. We there-

fore seek to differentiate the calculation of app usage frequency by considering the

membership strength of each app in its cluster. To do so, we utilize a probabilistic

70

clustering method based on a Gaussian mixture model (GMM) to assess the mem-

bership strength of each app in calculating the app usage frequency. This increases

the effects of apps located near the cluster centroids while reducing the effects of

apps located away from the centroids.

We then normalize the user representation based on the relative importance

of various app clusters. In other words, we seek to discount app clusters that are

frequently used across most users as ineffective/insignificant clusters for representing

and segmenting users. To this end, we apply our normalizing method to emphasize

significant app clusters while reducing the impact of commonly used clusters.

Formally, this is done as follows. Letting cj denote the centroid of each app cluster

and the membership strength mij represent the membership strength of app ai in

cluster jth cluster, we first derive the vector value of each app ai from the neural

embedding architecture, then cluster all ai, and finally calculate the membership

strength mij for all i, j using the GMM clustering methods.

From this, we calculate the kth user vector Uk = [..., ukj , ...] using the equation

ukj =∑

i(fijk ×mij), where fijk is the frequency of ai within the jth cluster in uk

and c is the number of clusters defined by the user. In this step, we consider the

membership strength derived from the probabilistic clustering method to address

the user similarity limitations described above.

Finally, we normalize each user representation vector value using the equation

ukj∑j vuj× log N

ufj, where uf j denotes the number of users using apps included within

the jth cluster. Using this equation, it is possible to emphasize significant app clusters

while reducing the impact of commonly used clusters.

71

Relative similarity-based user segmentation approach

By considering the relative importance of app clusters and the significance of apps

within these clusters, our proposed user representation approach provides an effective

method of user representation. However, it is still necessary to consider other aspects

of the user segmentation problem, which normally requires that users be evenly

distributed among various clusters as opposed to mostly belonging to a specific

cluster, as illustrated in Figure 4.5. An examination of the segmentation results

produced on our dataset based on absolute similarity reveals that, in contrast to the

predicted results, several clusters tend to contain many users.

Figure 4.5: Comparison between actual and predicted segmentation results

To address this issue, we propose a novel user segmentation method based on

relative similarity between users instead of absolute similarity. Under this approach,

pairs of relatively similar users are found based on our app clustering-based user

representation and a network is constructed using these pairs. Users are then seg-

72

mented using a modularity detection algorithm. This approach is summarized in

Figure 4.6.

Figure 4.6: Summary of our proposed method for considering relative similarity.

The segmentation approach is implemented as follows. First, all users on the

embedding space learned from the app clustering-based representation are looked

up. For each user, the top k users with the greatest pairwise cosine similarity to that

user are selected. Once all of the users have been looked up, the overlaps among the

returned users are counted.

Using the counting results, a bipartite graph of users in which the edge weights

are determined by the number of edges shared by their corresponding pairs is con-

structed. For example, if user 1 and user 2 have one of the higher cosine similarities,

the edge weight between user 1 and user 2 becomes two.

Using our experimental set of 540 users (see the Experiments section below) and

a parameter k = 10, a nearly fully connected projected network is produced. Be-

cause this projected network requires edge pruning, edges with weights of less than

75% of the maximum source-target weight are removed. We assessed the influence

of the edge pruning parameter on the overall experimental results produced by the

73

proposed method by testing several values from 60 to 80% in increments of 5%. Al-

though none of the values within the assessed range produces significantly dominant

results, we selected the parameter value 75%, which exhibited the best results.

Finally, the users represented in the network are segmented using the Lou-

vain method [11], which demonstrated the best performance among all modularity-

maximizing methods, including the smart local moving (SLM) algorithm [138] and

the Infomap algorithm [107]. The Louvain algorithm is a hierarchical agglomerative

method that takes a greedy approach to local optimization using an iterated two-

step procedure. In the first step, it iterates over the nodes in the graph and assigns

each node to a community if the assignment leads to an increase in modularity. In

the second step, it creates super-nodes out of the clusters found in the first step.

The process is iteratively repeated using the base-graph to compute the gains in

modularity.

Finally, the algorithm determines the modularity for 10 clusters, which is the

same number of clusters identified by the domain expert. In the practical application

of our proposed method, the number of clusters varies with the product developer’s

needs or intentions; for evaluation purposes, however, we set the number of clusters

to correspond to the segmentation identified by the domain expert.

4.3 Experiments


To perform the comparison experiments, we obtained the app usage sequence data

from LG Electronics. Each app usage sequence consisted of a sequence of apps that

the users accessed between the time they turned on the screens of their smartphones

74

and the time that these were turned off (Figure 4.7).

Figure 4.7: Example of the app usage sequence.

We also obtained the results of the user segmentation performed by the domain

experts using LG Electronics (Figure 4.8), and used them as the answer set in our

experiments. According to LG Electronics, the segmentation results were derived

through widespread consultation of 32 domain experts. We mentioned those expla-

nations in the experiment section and the demographic information of some of the

domain experts are presented in the acknowledgements.

The user segmentation results of the domain experts consisted of 10 segments

presented in Table 4.1. We collected the user segmentation results of 180 people

and 180,000 app usage sequences (1,000 usage sequences were randomly selected per

user) for the experiments. All of the datasets were processed after anonymization

was performed.


We evaluated the similarities of our proposed method with an answer set established

by the domain experts to verify the utilization and performance of our proposed user

segmentation method. As mentioned previously, the user segmentation results of the

domain experts consisted of 10 segments presented in Table 4.1.

75

Figure 4.8: Example of user segmentation by domain experts

Then, we set the number of clusters for the app clustering as 50. The proposed

method was influenced by the number of clusters of app clustering. Hence, to observe

the influence on the overall experiments, several values were tested, with the number

of clusters varying from 10 to 100 in increments of 10. Next, we set the number

of clusters as 50, which shows the best results; however, there are no significant

improvements when the number of clusters exceeds 50.

We also compared the similarities with the answer set and the following bench-

mark methods: (1) the method proposed by Hamka et al., which was the first to uti-

lize smartphone logs for user segmentation; (2) the neural embedding-based method,

and (3) the seq2seq-based method; and (4) the N-gram model, which represents se-

quential data as the frequency of the whole n-gram combination [120, 14]. We also

experimented on a few matrix reduction techniques as another benchmark (i.e.,

singular-vector decomposition (SVD) [44] and non-negative matrix decomposition

76

Table 4.1: User segmentation results obtained by domain experts.

Segments DescriptionNumberof users

ConversationalistsUse smartphones primarily for making calls,sending messages, and chatting with very low“other app” usage

14

UtilitariansUsage is primarily utility driven, and they spendthe greatest amount of time on apps, such asorganizers and productivity apps.

16

Social starsIdentified by greatest engagement on socialnetworking and chat platforms.

15

Photographers

Identified by greatest engagement on camera-relatedapps. They usually use several dozen camera appswith different features and post their photos toseveral types of SNS or their communities.

22

Music lovers

People who discover and listen to music whereverthey are. They usually use push and in-appmessages that highlight new songs, new playlists,and new artists.

12

News andmagazine readers

Identified by the greatest amount of time spent onbrowsing and reading articles. Their dataconsumption is also very high.

13

Video streamersUsage is dedicated to watching missed shows andmovies when they commute and rest.

13

Gaming buffsUsage primarily involves playing games on theirsmartphones.

16

Power usersIdentified as spending the most time on theirsmartphones, regardless of the type of apps. Theirengagement with shopping apps is greatest.

27

BeginnersThey use a very limited number of apps. Most usersin this segment are senior users.

32

(NMF) [23, 32]) to address the sparsity problem of the N-gram model.

In this study, we set the number of clusters to 10, which is similar to the number

of segments performed by the domain experts. For the similarity measurement, we

utilized the (1) Adjusted Rand index (ARI), which is defined as the number of

pairs that are either in the same group or in different groups in both partitions,

77

divided by the total number of pairs [136] [50]; (2) Normalized Mutual Information

(NMI), which is a variation of mutual information [83, 126]; and (3) Homogeneity

and Completeness. These metrics should not consider the absolute values of the

cluster labels, but rather applies if this clustering defines separations of the data

similar to some answer set of classes.


Observation of app clustering and user network construction result

Figure 4.9: Example of app clustering.

Figure 4.9 shows an example of the app clustering results derived from the

neural embedding architecture. Each app was clustered by characteristics in the

case of k = 50. Cluster 6 contained camera-related apps, while cluster 8 contained

social networking apps. Thus, we conclude that each app cluster effectively contained

a characteristically similar app, and the user representation based on this clustering

could be an effective representation result.

Figure 4.10 shows the network construction of users. The user network consisted

of 180 nodes, which is the number of users, and 1800 edges, which represent the top

78

10 similar users per user. For visualization, we used the Yifan Hu layout, which was

basically a force-directed graph-drawing technique, for this network [48]. The nodes

that were likely to be in the same community were clearly located together, and the

users exhibiting a similarity were more closely located to one another.

Figure 4.10: User network construction.

Comparison of segmentation results

To validate our proposed method, we compared the similarities between its segmen-

tation results and the results produced by the other assessed approaches to an answer

set established by the domain experts. Table 5.6 lists the results of the similarity

analysis.

The proposed app clustering method generally outperformed the other methods

because it successfully captured each app’s semantic characteristic by clustering

apps with contextual similarity. Furthermore, the proposed relative similarity-based

79

approach outperformed the baseline methods because it tended to evenly cluster

users rather than group closely located users into a specific cluster. It is also a

more straightforward method for capturing usage patterns than the conventional

seq2seq-based method, which, as a black box mechanism, produced no information

through training. Finally, there were no significant differences between the relative

and absolute similarity measure results.

The seq2seq-based architecture, which is a state-of-the-art architecture, also re-

turned better similarity results than the other methods, in particular outperform-

ing the other sequence models, including latent representations such an RNN-AE,

LSTM-AE, and RNN-VAE. This confirms that the seq2seq-based architecture is a

more straightforward method that does not require additional calculation for latent

representations such as AE or VAE, which allowed it to outperform other methods.

The study performed by Hamka et al. was limited in terms of sources of data use

and number of apps used to determine the users’ preferences. Consequently, they

produced less meaningful results, with only the Power and Beginner user groups

effectively segmented.

The N-gram model assessed in this study produced better results than those

obtained by Hamka et al. because it considered app usage data when segmenting

users. However, this model also had a sparsity problem owing to its large matrix size,

which led to a worse similarity result than produced by the proposed method. In

addition, no significant changes were observed when matrix decomposition methods

such as NMF and SVD were applied.

Addtionally, through our proposed user representation method, each user rep-

resentation is easily understandable because each dimension of representation in-

80

Table 4.2: Comparison of the similarities between the segmentations obtained byeach method and the answer set (*: proposed method, (c): utilizing cosine distance,(m): utilizing mahalanobis distance).

ARI NMI Homogeneity Completeness

Relative similarity-based(c)*

0.6004 0.6996 0.7294 0.7504

Relative similarity-based (m) 0.5946 0.6841 0.7203 0.7349

App clustering-based (c)* 0.5776 0.6496 0.6783 0.7010

App clustering-based (m) 0.5713 0.6311 0.6731 0.6973

Seq2seq-based approach (Leeet al.)

0.5671 0.6314 0.6697 0.6927

RNN-AE-based approach 0.5472 0.6148 0.6404 0.6761

LSTM-AE-based approach 0.5317 0.6031 0.6308 0.6673

RNN-VAE-based approach 0.5391 0.6079 0.7271 0.6656

Vanilla neuralembedding-based (win:2)

0.4946 0.5543 0.5973 0.6075

Vanilla neuralembedding-based (win:4)

0.5273 0.5878 0.6343 0.6276

Hamka et al. 0.2298 0.3004 0.4015 0.417

Bi-gram 0.3873 0.4404 0.5175 0.5137

Tri-gram 0.3901 0.4373 0.5137 0.5705

Bi-gram (SVD) 0.3781 0.4215 0.4735 0.5264

Bi-gram (NMF) 0.3974 0.4318 0.5157 0.5076

tuitively shows the frequency of each app cluster. This allows an analyst to easily

grasp the underlying logic of the derived segmentation results and the characteris-

tics associated with each segment by viewing the representation of users who belong

to it. In other words, as each app cluster represents a specific characteristic, it is

possible to perceive each user as a collection of interests and intuitively understand

the components of the generated user vectors.

An examination of Table 4.3 reveals that the proposed method successfully offers

a clear interpretation of the generated user representation. The apps in the clusters

81

listed in the table clearly indicate the characteristics of each cluster, implying that

the users in each segment are represented by the apps in frequent clusters and

allowing a name or characteristic to be easily assigned to each segment.

Table 4.3: Example of representation interpretation.

SegmentID

Most frequentapp cluster

Apps in the most frequent clusterCorresponding

segment of answerset

#2 3rd/50 Instagram, Facebook, Twitter Social stars

#4 24th/50 Spotify, SoundCloud, LG Music Music lovers

#7 12th/50LG camera, Candy camera, Camera

MXPhotographers

82

Chapter 5

Design elements selection

5.1 Background

In this section, we proposed two small subjects for elements section: 1) Prioritization

of product attributes and 2) Help contents selection and re-organization.

Customers generally make purchase decisions based on their evaluation and

knowledge of the attributes of a product [54, 113]. Thus, product developers or

marketers are frequently interested in identifying the product attributes that are

considered most important by the customers during their evaluation and purchase

of products [34]. For instance, they select the attribute identified as the important

product attribute for the product promotion. Another example is a spec sheet (Fig-

ure 5.8), which is a list describing the specifications of a product in a commercial

site. By identifying the significant product attribute, they effectively selected the

specifications contained in the spec sheet.

Recently, with the growing prominence and availability of user-generated reviews,

numerous product attribute extraction studies are being performed based on these

textual reviews [104]. However, most of the previous studies only focused on the

extraction of the product aspects by considering them as product attributes and

not on the relative importance of the extracted aspects although it is a critical

83

information utilized for the promotion or development of spec sheets as mentioned

previously. For example, the sentence ‘I love the touchscreen of this, but the battery

life is too short.’ contains two aspects [102], namely touchscreen and battery life.

However, we would not be able to capture the relative importance of touchscreen

and battery life with the previous approaches.

Thus, the present study firstly focuses on the development of an attribute set

for a product by considering the relative importance of the extracted attributes. We

select a smartphone as a target product because it is the most frequently purchased

electronic device. Moreover, we utilize thousands of customer reviews collected from

commercial and review sites of LG Electronics.

Second, there are several terms and help systems used in web sites or digital

devices, such as ‘Help’, ‘FAQ’, and ‘Docs’. These contents intend to provide assis-

tance to users (Figure 5.1). Thus, help systems should be conveniently accessible so

that users can get answers to their questions. For example, when users begin using

devices and when they can benefit from useful information [114].

In smartphones, in particular, help systems are critical because smartphones

constantly add new features and improvements, and a help system is one of the last

places users consult when they have difficulty using a device. Moreover, smartphone

manufacturers explain their major improvements effectively through the help system

and boost user satisfaction [93, 75].

In this study, app usage sequence was used as it is a powerful resource for user

specification because it contains meaningful information regarding the user’s prefer-

ences, behaviors, interests, and even demographic information such as age, gender,

and occupation [26]. Based on user specification derived from app usage sequence

84

Figure 5.1: Example of smartphone help system

information, the help contents organization reflecting user’s needs and characteristic

were generated and predicted. Although there are few studies utilizing app usage

sequence, it is limited to context/pattern modeling [90, 88, 125] or next app predic-

tion [4, 150, 149]. Thus, this is the first study that addresses the complicated user

interface problem concerning content recommendation using app usage sequence.

5.2 Methodology

5.2.1 Prioritization of product attributes

Our proposed method is composed of two phases: 1) Attribute extraction: using a

CNN and transfer learning, and 2) Calculation of the relative importance of the

extracted attributes: Applying variants of Grad CAM with a sentiment classifica-

tion model. Additionally, we perform minor refinements such as attribute clustering

85

(Figure 5.2).


Extraction of product aspect

For the first phase, we utilized a CNN approach, which is a state-of-the-art super-

vised approach, to extract the attributes following the study of Poria et al [101]. We

also used another useful approach of the transfer learning concept to capture the

latest improvements of a smartphone, one of the most rapidly changing products,

for which the data become easily outdated.

We first embedded all the customer reviews in a 300-dimensional vector space

before the CNN model was constructed utilizing the word2vec architecture [85].

86

Amazon and smartphone review datasets collected from LG Electronics were used

for the word embedding task.

We constructed and trained the CNN (Figure 5.4) after the word embedding

tasks using the existing datasets of SemEval 2014 [91] and Qui et al. [103]. We

inputted each word with a window size of 5 into the CNN because the features of

an aspect term depended on its context words.

The network contained one input layer, three convolutional layers, three max-

pooling layers, and two fully connected layers with a softmax output. The convo-

lutional layers are constructed as described in Table 5.1, and the stride in each

convolutional layer was 1 because we wanted to tag each word.

Table 5.1: Structure of convolutional neural network for aspect extraction

Layer Number of feature map Size of filter

1st layer 100 3×3

2st layer 50 2×2×100

3rd layer 25 2×2×50

The pool size we used in the max-pooling layer was 2×2. The output of each

convolutional layer was computed using a hyperbolic tangent. The other parameters

of the CNN were based on previous studies [101]. Additionally, we used regularization

with dropout on the penultimate layer with constraint L2-norms of the weight vectors

having 50 epochs.

We applied the off-the-shelf feature concept after training the basic convolutional

network. We maintained the weights of the convolutional layer of a previous model

and only re-trained the last two fully connected layers with respect to each product,

such as V10, G5, V20, G6, and V30. The dataset used to train the CNN and off-

87

the-shelf approaches is described in the Experiments section. We then extracted the

attribute keyword from the entire review dataset of each product with the trained

model. Our smartphone review dataset contained 1000 reviews between September

23, 2014, and July 23, 2018, and labeled the aspect keyword using the domain experts

from the Mobile Communication Department in LG Electronics.

All the above-mentioned datasets were labeled using a widely used coding scheme

for representing sequences. In this example, the first word of each aspect starts with

a B-A tag. The I-A tag denotes the continuation of the aspect, whereas O is used

to tag a word that is not an aspect.

Calculation of relative importance of extracted aspect

Without prioritizing the extracted aspects, as mentioned previously, there numerous

limitations in utilizing them. Moreover, a simple prioritization approach based on

simple frequencies causes a bias so that extremely general aspects are considered

as the most important aspects. Thus, we provide a novel approach to calculate the

relative importance of the extracted aspects based on variants of Grad CAM.

We assume that the aspect that has a significant effect on the sentiment of the

overall product also has a relatively more importance than the other attributes.

Thus, we utilize the weight of each aspect affecting the overall product sentiments

as the importance score of each product attribute.

First, we construct ae sentiment classification model utilizing the CNN. To im-

prove the overall efficiency of our proposed method, we reuse part of the aspect

extraction model described in the previous section for the sentiment classification

model. We retain the parameters of the filter used in each convolution layer and

88

only re-train the final two layers for the sentiment classification model.

Second, we add the weighted layer similarly with Grad CAM to calculate the

weight of each aspect influencing the sentiment decision, as shown in Figure 5.3.

Figure 5.3: Example of weight visualization

Further, we add up the weights of all the aspects for a complete text review to

understand the importance of each aspect. Additionally, the weights of the aspects

in each review text are normalized to remove the bias caused by the different lengths

of the textual reviews.

We then sort the attributes by the order of the importance score to reveal the

relative importance of each attribute. We easily select a relevant attribute from the

limited number of attributes by sorting them.

Evaluation factor clustering and refinement

Furthermore, we conducted additional minor refinements for achieving a better per-

formance. The observations of the extracted attribution factor showed many ty-

pographical errors, incorrect expressions, or different words that imply the same

because the user review data were extremely unstructured texts. Thus, herein, we

applied a clustering technique to assign synonymous words standing for an extracted

attribution factor in the same cluster.

We clustered the words based on the embedding vector of the extracted factors

calculated in the first step using the spherical k-means method [148] to make the

silhouette index the lowest. Cosine dissimilarity 1 − cos(x, y) is the distance mea-

89

sure used in the spherical k-means method. In the clustering result, ‘screen ration’,

‘16:9’, ‘18:9’, and ‘full screen’ were assigned to the same cluster. Table 5.2 provides

examples of the extracted keywords belonging to the same cluster.

Table 5.2: Examples of keywords in the same cluster

[HTML]C0C0C0Attribute

Synonymous extracted keywords

Screen ratio screen ration, 16:9, 18:9, full vision

Design design, look, LG Signature, appearance

OS version OS, N OS, Nougat, Android

User interface User interface, UX, UX4.0, GUI

We converted the indirect expression of a user in a review comment into an ap-

propriate wording representing each attribute after the clustering task. For instance,

we converted ‘fast’ and ‘speed’ into ‘Processor’ and ‘Clearance’, respectively, and

‘Screen color’ into ‘display type’ and ‘glass type’.

5.2.2 Help contents re-organization

Our proposed method consists of four steps: 1) Seq2seq architecture training for user

specification; 2) CGAN architecture training for help contents’ usage generation; 3)

Calculation of the new user’s specification and generation of help contents’ usage

prediction and 4) Re-organization of help contents based on those predictions (Figure

5.4).

User specification based on app usage sequence

First, the user specification value is calculated using seq2seq architecture that is

originally proposed to generate sequences of words by predicting the next word while

considering the entire sequence. App usage sequences can be suitably processed

90


with seq2seq architectures as sequential and contextual information of app usage

sequences are similarly meaningful as words and sentences. Moreover, the seq2seq

architecture contains a context C node that is suitable for representing sequences

because it is originally designed to summarize all the encoded information.

In the proposed method, seq2seq architecture receives an app usage sequence

as the encoder input and generates the same app usage sequence in the decoder.

We utilize context vector C as the vector representation of each app usage sequence

after training the architecture using the set of the app usage sequences.

Each app usage sequence A = (a1, a2, . . . , aT ) specifically defines the series of

apps used from the smartphone screen being turned on to being turned off. Each user

typically has several app sequences per day and user specification is calculated by

averaging those usage sequences. Regarding the details of the proposed architecture,

the network contains three levels of hidden layers in each sequence. The length of

the encoder/decoder sequence corresponds to 15, considering the maximum length

91

of the app usage sequence.

Training of conditional GAN

Next, for training the CGAN to help usage prediction, usage data was preprocessed

as it is appropriate to be input into an architecture. The dataset contained infor-

mation regarding the help contents selected by each user during the first month

after purchasing. Thus, we converted that data into a binary format that indicated

whether each help content was selected during the first month, as shown in figure

5.5.

Figure 5.5: Preprocessing of help usage data

After preprocessing, the help usage data and user specification data is input into

the CGAN architecture, after which user specification vector is processed as condi-

tion and help usage date is processed as input node of real data, as shown in figure

5.6. Then, it is trained to generate help usage prediction for new users as artificial

92

data that can scarcely be distinguished from real help usage data considering user

specification.

In the proposed architecture, help usage data consists of 80 dimensions consid-

ering the number of the help contents data and user specification data consists of

50 dimensions, which is the highest performance in the experiments.

Figure 5.6: CGAN architecture for help usage prediction

Generation of help contents usage and its re-organization

After training the seq2seq and CGAN architecture, the help usage prediction of

new users can be generated. First, new users’ app usage sequences are input into a

seq2seq architecture to calculate their specification vector. These specifications are

then input into the CGAN to generate the help contents usage prediction.

Based on the usage prediction, help contents can be re-organized according to

the value of each help contents’ prediction score. For instance, if usage prediction

score indicates a higher value closest to 1, it indicates that it is likely to be selected

by a user and deserves to be located at the top of the help content lists. On the

other hand, if the usage prediction score is closest to 0, it indicates that it does not

hold the user’s interest and should be hidden in the help lists (figure 5.7).

93

Figure 5.7: Example of help contents re-organization

5.3 Experiments


For the first subject, we acquired the survey results from LG Electronics consisting of

the product attributes ordered by the importance considered as the most significant

purchasing factors. Such surveys are conducted periodically for each device, such as

G4, V10, G4, V20, G6, and V30.

Further, the spec sheet (Figure 5.8), addressed in the second experiment, is a

list describing the specifications of a product or property in a commercial site, such

as Amazon.com. The spec sheet contains information that comes uppermost to the

customers when they collect information about a product, particularly when buying

electronic devices. Thus, the selection of the attribute contained in the spec sheet is

an important task considering the frequency of the spec sheet.

94

Figure 5.8: Spec sheet of LG V30 (Resource: GSM Arena)

Moreover, below, we demonstrate the attributes of the spec sheet presented on

commercial or review sites for a smartphone product. We examine five major web-

sites: Amazon, BestBuy, GSM Arena, CNET, PhoneArena, and the official LG web-

site. The attributes are presented in the order in which the websites are listed.

Amazon (17) Screen size, Display type, Color spectrum, Resolution, Glass type,

Network, Storage, RAM, SD slot, First rear camera resolution, Second rear camera

resolution, Front camera resolution, OS version, Processor, Battery, Wireless charg-

ing, In the box

Best Buy (19) Processor, OS version, Network, Screen size, Screen ratio, Res-

olution, Display type, First rear camera resolution, Second rear camera resolution,

Front camera resolution, Camera angle, Network, Storage, SD slot, Mobile hotspot,

QSlide, QuickMemo, Water resistant, Warranty

GSM Arena (30) OS version, Dimensions, Weight, Materials, Fingerprint,

95

Water resistant, Dust resistant, Colors, Screen size, Resolution, Pixel density, Dis-

play type, Glass type, Sensor, First rear camera resolution, Second rear camera res-

olution, Front camera resolution, Camera angle, Camera feature, Camcorder reso-

lution, Processor, Storage, RAM, SD slot, Battery, Wireless charging, speaker, Mi-

crophone, Network, Voice feature

CNET (34) Weight, Color, Network, Form factor, OS version, User inter-

face, Intelligent assistant, SIM Card, Sensor, Materials, Water resistant, Dust re-

sistant, Messaging, Processor, Wireless interface, Resolution, Pixel density, Screen

size, Screen features, Screen ratio, Audio codec, Video codec, Memory, SD card, Bat-

tery, Wireless charging, Camera feature, Security, RAM, 1st rear camera resolution,

2nd rear camera resolution, Front camera resolution, Warranty, Dimensions

Phone Arena (36) Network, Dimensions, Weight, Materials, Glass type, SIM

card, Display type, Screen size, Resolution, Screen ratio, Multi-touch, Display fea-

ture, User interface, OS version, Processor (CPU), Processor (GPU), Memory, SD

card, SIM card, 1st rear camera resolution, 2nd rear camera resolution, Front camera

resolution, Video resolution, Speaker, Earphone jack, Network, GPS, NFC, Radio,

USB, Sensor, Messaging, Browser, Battery, Colors, Test results

Official site (44) Screen size, Display type, Pixel density, Screen ratio, Cam-

era feature, System features, Display features, 1st rear camera resolution, 2nd rear

camera resolution, Front camera resolution, Front camera angle, Rear camera angle,

Camera feature, Video resolution, Video feature, Voice recording feature hardware,

Voice recording feature, Hi-Fi, DAC, Material, Fingerprint, Dimensions, Weight,

Water resistant, Shock resistant, Glass type, Security features, Productivity features,

Convenience features, Entertainment features, Connectivity features, OS version,

96

User interface, Processor, Battery, Network, Fast charging, USB, Memory, Micro

SD, RAM, Earphone jack, Accessory

As shown above, many differences exist between each site. For instance, Best

Buy and GSM Arena do not contain the User interface attribute, and Amazon only

contained In the box items. The LG Electronics official site and Phone Arena contain

more than double the number of attributes contained on Amazon. Thus, we conclude

that it is relevant to study and select reasonable attributes to influence the purchase

intention of a customer to create an effective spec sheet.

And for the second subject, the app usage sequence data was collected from

LG Electronics to perform the comparison experiments. Each app usage sequence

consisted of the sequence of apps that the users accessed from the time they turned

on their smartphones’ screens to the time they were turned off. The results of the help

contents’ selection of users was also acquired for training and verifying the proposed

architecture. The help contents’ selection data of 1,800 people was collected that

consisted of 60 help contents per user. Further, 180,000 app usage sequences of

1,800 people (1,000 usage sequences per user) were collected for the experiments


For the first subject, we verify the performance of our proposed method with two

experiments. First, we calculate the similarity between our prioritized product at-

tributes with the real survey results conducted internally by LG Electronics to iden-

tify the product attributes considered by real customers as the most important

purchasing factors.

The survey results, utilized as an answer set, consist the product attributes

97

ordered by the importance considered as the most significant purchasing factors.

To compare the order of the product attributes in the answer set and our pro-

posed method, we measure the results with a normalized discounted cumulative

gain (NDCG), which is one of the most well-known evaluation measures in informa-

tion retrieval for ranking systems [56, 57]. NDGC allows each retrieved result to have

a graded relevance, whereas most traditional ranking measures only allow a binary

relevance. In addition, it associates a discount function with the rank, whereas many

other measures uniformly weigh all the positions [141].

We measure the NDCG value with the top 30 extracted attributes and then

compare the results with other baselines, as presented in Table 5.3. We assign the

relevance weight on a scale from 1 to 10 per three attributes, and the weights are

reduced based on a logarithm function from 1.0 to near zero. For instance, the

attributes in the answer set are assigned as [10, 10, 10, 9, 9, 9, 8, 8,..., 2, 1, 1, 1],

and the weights are reduced as [1.0, 1.0, 1.0, 0.6309, 0.6309, 0.6309, 0.5, 0.5, 0.5,

0.4307,...]. Obviously, the NDCG value increases when the largest number comes in

the front order.

Table 5.3: Baselines utilized in first experiment

No. Extraction method Prioritization method

1 CNN-based LSTM attention

2 CNN-based LIME

3 CNN-based [101] Frequency-based

4 HMM-based [58] Frequency-based

5 CRF-based [49] Frequency-based

Second, we compare the effectiveness of our proposed method for the develop-

ment of spec sheets with existing major commercial sites. In detail, we verify that

98

the proposed method can extract the specialized factor of LG V30. The extrac-

tion of each specialized product is one of the most important considerations in the

smartphone industry, which is the most rapidly changing industry.

For the experiment, we conducted two five-point Likert scale user surveys with

40 participants and used the following points: 1) how much influence each attribute

in the spec sheet exerts on their purchase intention of LG V30 and 2) satisfaction

of the overall product and each attribute in the spec sheet for LG V30. We then

constructed a multiple regression model of the satisfaction of the overall product

and each attribute. Subsequently, we compared the coefficients of determination

(R2). The regression model demonstrated the completeness of the composition of

the attribute set in the spec sheet. In the experiment, we set three different numbers

of attributes (i.e., 17, 30, and 44) that corresponded to the minimum, average, and

maximum numbers of the attributes of the previous spec sheet.

And for the second subject, the two experiments were performed to verify the

performance of the proposed method. For the first experiment, the accuracy of help

contents’ usage prediction was verified with a five-fold cross validation method. The

seq2seq architecture was trained for user specification and CGAN architecture was

trained for help contents’ usage prediction with four-fifths of the dataset. Moreover,

the accuracy of the proposed method was tested with the remaining data.

For the second experiment, the effectiveness results of the proposed re-organizing

method were compared with other benchmark methods. The top-20, top-30, and top-

40 help contents were then selected for each user extracted from our proposed method

and each benchmark method. Then, the result considering the number of contents

selected by the user within those top-k contents was compared. The k=20, 30, and

99

40 were established for the experiment as it is the appropriate value shown within

the user’s few scrolling in the smartphone. The results were compared with the fol-

lowing benchmark methods: (1) Average selection of demographically similar users;

(2) Average selection of user who has a similar n-gram app usage within a week;

(3) Average selection of user who has a similar user specification without applying

CGAN; and (4) Applying neural embedding-based user specification method to the

proposed method by replacing only the seq2seq-based user specification approach.

For the neural embedding-based user specification method, the window size was

limited to 4 and 6, considering the average length app usage sequence of 8.13. For the

n-gram models, the number of usage of each app for the week was counted and only

the bi-gram and tri-gram were used, considering the extensive number of n-gram

combinations.


Prioritization of product attributes

Table 5.4 lists the NDGC results of the comparison of the extracted product at-

tributes and the answer set acquired from the user survey results conducted by

LG Electronics. As mentioned previously, we tested our proposed method and few

baselines for each product of LG V10, G5, V20, G5, and V30.

Although the NDGC results vary with the product, our proposed method out-

performs other methods for all the considered products. In the aspect of prioritizing

method, our proposed variants of the Grad CAM approach yield better results com-

pared to those obtained by other explainable machine learning approaches of Long

Short-Term Memory (LSTM) attention and LIME. By examining the detailed re-

100

Table 5.4: Performance of attributes extraction and prioritization (NDGC)

Method V10 G5 V20 G6 V30

Our proposed method 0.9273 0.9046 0.9215 0.9171 0.9013

CNN + LSTM Attention 0.9046 0.8920 0.9103 0.9018 0.8876

CNN + LIME 0.8844 0.8803 0.8961 0.8916 0.8803

CNN + Frequency [101] 0.8013 0.8164 0.7913 0.7813 0.7851

CRF + Frequency [49] 0.7556 0.7418 0.7519 0.7409 0.7491

HMM + Frequency [58] 0.7216 0.7276 0.7137 0.7374 0.7104

sults, we conclude that the LSTM attention shows inconsistent weight calculations

based on the length of each textual review, and thus, it causes bias in the overall

weight calculation. The LIME approach is more appropriate for binary decisions

for each aspect but not for the weight calculation. Nonetheless, clearly, all the ex-

plainable machine learning-based approaches outperform the simple frequency-based

approaches. Based on the experiment results, we conclude that the explainable ma-

chine learning-based approaches provide an effective weight score in the calculation

of the relative importance of the product attributes. Moreover, we also conclude

that the frequency-based approaches cause general aspects to be irrelevantly ranked

highest. Furthermore, the CNN-based method outperforms the other methods such

as the CRF and HMM-based approaches in aspect extraction problem as verified in

the previous studies [101].

Table 5.5 summarizes the extracted attributes obtained by the proposed method

for the LG V30 product. The major key features of the product, such as video

features (Cine-video) and camera lens (Crystal clear lens), are effectively extracted

with the proposed method mostly because the transfer learning approach is applied.

These features also have a relevant slope coefficient, β, in the regression model for

101

the satisfaction score.

Table 5.5: Examples of extracted attributes

Attribute β Attribute β Attribute β

Hi-Fi 0.1019 Voice features 0.0846 Camera angle 0.0785

AI features 0.0743 Camera lens 0.0716 Display type 0.0673

Video features 0.0654 Finger print 0.0584Water

resistance0.0519

As presented in Table 5.6, our proposed method shows a higher influence score

and larger coefficients of determination than the values for the existing spec sheet for

all the corresponding number of attributes. Thus, our proposed method effectively

reflects the interest of a customer and identifies the essential element affecting the

purchasing intention of a customer. We also consider the recent improvements of LG

V30 by utilizing the transfer learning approach.

Therefore, the previous method has a lesser effectiveness than the proposed

method even though it is constructed by a domain expert, who has much back-

ground knowledge of the smartphone industry.

Table 5.6: Result of effectiveness comparison

Source of Average influence score R2

Spec sheet Min Avg Max Min Avg Max

Proposed method 4.13 4.01 3.84 0.6236 0.5329 0.4829

Amazon 3.94 0.5219Best Buy 3.61 0.4917

GSM Arena 3.59 0.4532CNET 3.46 0.4048

Phone Arena 3.33 0.4129Official site 3.42 0.4483

102

Table 5.7: Confusion matrix of help contents usage prediction

n=108,000Predicted

Select Unselect

ActualSelect 30,473 3,839 34,312

Unselect 9,477 64,211 73,688

39,950 68,050

Help contents re-organization

Table 5.7 is a confusion matrix of help contents’ usage prediction with a probability

threshold of 0.5. It presents 88.81% of precision score, 87.14% of recall score, and

0.8797 of F1 score. The experimental results demonstrate a higher absolute perfor-

mance level, given that generative and prediction problem were dealt with and not

the simple classification problem, and considers the user specification vector as well

as the usage data.

Table 5.8: Average of help contents selection for top-k prediction

Top-20 Top-30 Top-40

Proposed method 17.91 26.57 33.16

Average selection of demographicallysimilar user

10.23 15.11 19.67

Average selection based on similarn-gram usage

11.46 17.86 21.13

Average selection based on similaruser specification

13.49 20.01 25.63

Neural embedding based approachwith user specification (win=4)

15.79 23.73 30.51

Neural embedding based approachwith user specification (win=6)

15.91 25.01 31.17

Table 5.8 depicts the result of effectiveness of the proposed method and other

benchmark methods that compare the number of help contents selected by the user

within top-k contents. As shown in the table, the proposed method shows the highest

effectiveness score compared to other benchmark approaches, such as demographics-

103

based approach or n-gram-based approach.

The proposed method has higher performance because it captures user spec-

ification value based on app usage sequence to represent the users’ interests and

characteristics effectively.

Although other benchmark methods considering user specification present rela-

tively higher effectiveness than the first two methods, it exhibited lower performance

compared to the current study’s proposed approach. A reason for using the method

without GAN is that GAN was unable to predict help usage effectively by only av-

eraging the similar user’s selection data and not utilizing the state-of-art generative

model.

Further, the result of neural embedding-based approach was obtained because of

the problem of window size. The neural embedding based method only considered

the neighboring apps with window sizes of four or six, whereas the proposed method

considered entire sequences as contextual information of the app sequences.

104

Chapter 6

Conclusion

Previously, various tasks respect to user experience design is performed heuristi-

cally, thus there are a few problems associated with it. Thus, in this studies, data

driven UX design approaches are proposed for the whole process of user experience

design stage. In details, this study focuses on three research scopes: Customer-voice

classification, User segmentation and Design elements selection.

First, in the customer-voice classification, this study proposes document de-

nosing approach, probabilistic word clustering based document representation and

another novel method to apply convolution filter to matrix of document representa-

tion.

The class vector is utilized in the novelty detection method that modifies the

previous novelty detection methods to observe that the proposed method detects

novel words more effectively than the previous method. In the actual experiments,

classification performance of customer-voice representation by applying the proposed

method outperformed those of the previous method. Therefore, it is concluded that

the novelty score of the proposed method is a more proper measurement when com-

pared with the previous method to determine whether a word effectively represents

a class.

105

And a probabilistic word-clustering-based approach considering the membership

strength of each word with respect to each cluster was proposed. It is expected that

the proposed method would be robust with respect to customer-voice data consist-

ing of extremely unstructured texts, including typos, by considering the membership

strength of those words. The proposed method outperforms all other document rep-

resentation methods in actual experiments with regard to classification performance.

And, this study proposes another novel approach to apply convolution filter to

matrix representation to address the complexity and the number of parameters for

further classification model. For doing this, we rearrange the elements in each docu-

ment representation vector to preserve the semantic distance between those clusters

with t-SNE algorithm and put them in one-to-one correspondence based on the

semantic meaning with linear transformation. It outperforms all other document

representation methods in actual experiments on classification performance. The

reason for higher performance of the proposed method is that it captures the var-

ious aspects of each representation method rather than individual representation,

especially on the customer-voice data, which is extremely unstructured text.

Second, in the user segmentation studies, a variant of a neural network is pro-

posed to effectively utilize the app usage sequence proposed herein a variant of a

neural network to effectively utilize the app usage sequence. We represented the app

usage sequence and each user via sequences in vector space.

And, this study proposes app clustering and relative similarity based approaches

for user segmentation that can provide an intuitive interpretation based on the ob-

servations in the study. With the app clustering-based user representation, each app

was represented as a vector in the vector space generated from the neural embedding

106

architecture, and characteristically similar apps were clustered into a cluster. Each

user was represented by the frequencies of these clusters. As the relative similarity

based user segmentation, we proposed a network representation-based method that

utilized the order of relative similarity. Based on the vector representation of the

app clustering based approach, relatively similar users, which are located closely

each other, were constructed node in the network. The users were then segmented

using the Louvain method on the constructed network.

These approaches provided an interpretation for the generated user representa-

tion. Our proposed methods also outperformed all of the other methods in terms of

the similarity metrics.

Third, in the design elements selection studies, an advanced method to develop

the attributes of a spec sheet that can effectively reflect the user’s purchasing inten-

tion is proposed. Most of the previous studies focused on developing the evaluation

or purchasing factor and were heuristically performed by those, who already have

comprehensive domain knowledge and background information of the product in-

dustry. The experiment section showed that the major key features of each product

were effectively extracted using the proposed method and showed a better extrac-

tion performance than the existing spec sheet for all the corresponding numbers of

attribute.

Lastly, user specification was considered in help contents re-organization while

utilizing the app usage sequence that reflects the user’s interests and preferences

effectively. The experiment depicted a higher absolute performance level of help

contents’ usage prediction, given that the generative and prediction problem were

considered, and not the simple classification problem. It also demonstrated a better

107

performance of effectiveness in re-organization of top-k contents than the existing

benchmark methods for all the corresponding number of k.

With these results, it is concluded that data driven approaches effectively ad-

dresses the previous problems cause by heuristic approaches. And it can provide

meaningful insight to several UI designers regarding customer-voice analysis, user

segmentation, product development or layout design. Future research will involve

the other feature such as duration and absolute time stamp of app usage sequence.

And it can also extend the scope of research based on this study for other tasks in

the whole UX design process such as usage pattern analysis, app recommendation

or graphical design. Finally, these studies are expected to aid the widespread appli-

cation of the proposed data driven UX design approaches in other tasks arising in

the context of real business environments.

108

Bibliography

[1] Y. Al-Raheem, R. Ali, N. Firdaus, and N. Z. Ab Rahim, A systematic

literature review of software help systems limitations, Indian Journal of Science

and Technology, 11 (2018).

[2] W. A. Alberts and T. M. van der Geest, Color matters: Color as trust-

worthiness cue in web sites, Technical communication, 58 (2011), pp. 149–160.

[3] S. Ando, Clustering needles in a haystack: An information theoretic analysis

of minority and outlier detection, in Seventh IEEE International Conference

on Data Mining (ICDM 2007), IEEE, 2007, pp. 13–22.

[4] R. Baeza-Yates, D. Jiang, F. Silvestri, and B. Harrison, Predicting

the next app that you are going to use, in Proceedings of the Eighth ACM In-

ternational Conference on Web Search and Data Mining, ACM, 2015, pp. 285–

294.

[5] R. Baeza-Yates, B. Ribeiro-Neto, et al., Modern information retrieval,

vol. 463, ACM press New York, 1999.

[6] L. D. Baker, T. Hofmann, A. McCallum, and Y. Yang, A hierarchical

probabilistic model for novelty detection in text, in Proceedings of International

Conference on Machine Learning, Citeseer, 1999.

109

[7] T. T. Barker, Writing software documentation, A Task-oriented Approach,

Neddham, (1998).

[8] L. Bing, T.-L. Wong, and W. Lam, Unsupervised extraction of popular

product attributes from e-commerce web sites by considering customer reviews,

ACM Transactions on Internet Technology (TOIT), 16 (2016), p. 12.

[9] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation,

Journal of machine Learning research, 3 (2003), pp. 993–1022.

[10] J. Blitzer, M. Dredze, F. Pereira, et al., Biographies, bollywood, boom-

boxes and blenders: Domain adaptation for sentiment classification, in ACL,

vol. 7, 2007, pp. 440–447.

[11] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast

unfolding of communities in large networks, Journal of statistical mechanics:

theory and experiment, 2008 (2008), p. P10008.

[12] I. Bose and X. Chen, Exploring business opportunities from mobile services

data of customers: An inter-cluster analysis approach, Electronic Commerce

Research and Applications, 9 (2010), pp. 197–208.

[13] H. Bouziane, B. Messabih, and A. Chouarfia, Profiles and majority

voting-based ensemble method for protein secondary structure prediction, Evo-

lutionary bioinformatics online, 7 (2011), p. 171.

[14] M. L. Brocardo, I. Traore, S. Saad, and I. Woungang, Authorship

verification for short messages using stylometry, in Computer, Information

110

and Telecommunication Systems (CITS), 2013 International Conference on,

IEEE, 2013, pp. 1–6.

[15] L. Cai and T. Hofmann, Text categorization by boosting automatically ex-

tracted concepts, in Proceedings of the 26th annual international ACM SIGIR

conference on Research and development in informaion retrieval, ACM, 2003,

pp. 182–189.

[16] Z. Cai, X. Hu, H. Li, and A. Graesser, Can word probabilities from lda

be simply added up to represent documents?, in Proceedings of the 9th Inter-

national Conference on Educational Data Mining, 2016.

[17] S. Chatterji, D. Chatterjee, and S. Sarkar, An efficient technique for

de-noising sentences using monolingual corpus and synonym dictionary., in

COLING (Demos), Citeseer, 2012, pp. 59–66.

[18] L.-C. Cheng and L.-M. Sun, Exploring consumer adoption of new services

by analyzing the behavior of 3g subscribers: An empirical case study, Electronic

Commerce Research and Applications, 11 (2012), pp. 89–100.

[19] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau,

F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase represen-

tations using rnn encoder-decoder for statistical machine translation, arXiv

preprint arXiv:1406.1078, (2014).

[20] J. Choi, B.-J. Kim, and S. Yoon, Ux and strategic management: A case

study of smartphone (apple vs. samsung) and search engine (google vs. naver)

111

industry, in International Conference on HCI in Business, Springer, 2014,

pp. 703–710.

[21] Y. Choi and C. Cardie, Hierarchical sequential learning for extracting opin-

ions and their attributes, in Proceedings of the ACL 2010 conference short

papers, Association for Computational Linguistics, 2010, pp. 269–274.

[22] S. G. Chua et al., The mobile ecosystem in asia pacific-steering economic

and social impact through mobile broadband. ATkearney, 2011.

[23] A. Cichocki and A.-H. Phan, Fast local algorithms for large scale nonneg-

ative matrix and tensor factorizations, IEICE transactions on fundamentals of

electronics, communications and computer sciences, 92 (2009), pp. 708–721.

[24] M. Corbin, Design checklists for online help, Online publication

(http://www. writersua. com/articles/checklist/index. html), (2004).

[25] D. W. Cravens and N. Piercy, Strategic marketing, vol. 7, McGraw-Hill

New York, 2006.

[26] C. d’Alessandro and P. C. Trucco, Business potential and market oppor-

tunities of intelligent lbss for personal mobility–a european case study, Procedia

Computer Science, 5 (2011), pp. 906–911.

[27] C. Davidsson, Mobile application recommender system, 2010.

[28] M. De Reuver, H. Bouwman, and T. De Koning, The mobile context

explored, in Mobile service innovation and business models, Springer, 2008,

pp. 89–114.

112

[29] C. N. dos Santos and M. Gatti, Deep convolutional neural networks for

sentiment analysis of short texts., in COLING, 2014, pp. 69–78.

[30] S. T. Dumais, Latent semantic analysis, Annual review of information science

and technology, 38 (2004), pp. 188–230.

[31] H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govin-

dan, and D. Estrin, Diversity in smartphone usage, in Proceedings of the

8th international conference on Mobile systems, applications, and services,

ACM, 2010, pp. 179–194.

[32] C. Fevotte and J. Idier, Algorithms for nonnegative matrix factorization

with the β-divergence, Neural computation, 23 (2011), pp. 2421–2456.

[33] M. A. T. Figueiredo and A. K. Jain, Unsupervised learning of finite mix-

ture models, IEEE Transactions on pattern analysis and machine intelligence,

24 (2002), pp. 381–396.

[34] C. Fuchs, E. Prandelli, and M. Schreier, The psychological effects of

empowerment strategies on consumers’ product demand, Journal of Marketing,

74 (2010), pp. 65–79.

[35] J. J. Garrett, Elements of user experience, the: user-centered design for the

web and beyond, Pearson Education, 2010.

[36] S. P. Gaskin, A. Griffin, J. R. Hauser, G. M. Katz, and R. L. Klein,

Voice of the customer, Wiley International Encyclopedia of Marketing, (2010).

113

[37] X. Glorot, A. Bordes, and Y. Bengio, Domain adaptation for large-scale

sentiment classification: A deep learning approach, in Proceedings of the 28th

international conference on machine learning (ICML-11), 2011, pp. 513–520.

[38] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-

Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial

nets, in Advances in neural information processing systems, 2014, pp. 2672–

2680.

[39] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and

J. Schmidhuber, A novel connectionist system for unconstrained handwriting

recognition, IEEE transactions on pattern analysis and machine intelligence,

31 (2009), pp. 855–868.

[40] A. Griffin and J. R. Hauser, The voice of the customer, Marketing science,

12 (1993), pp. 1–27.

[41] D. Guthrie, Unsupervised Detection of Anomalous Text, PhD thesis, Univer-

sity of Sheffield, 2008.

[42] D. Guthrie, L. Guthrie, B. Allison, and Y. Wilks, Unsupervised

anomaly detection., in IJCAI, 2007, pp. 1624–1628.

[43] D. Guthrie, L. Guthrie, and Y. Wilks, An unsupervised approach for the

detection of outliers in corpora, LREC, 2008.

[44] N. Halko, P.-G. Martinsson, and J. A. Tropp, Finding structure with

randomness: Stochastic algorithms for constructing approximate matrix de-

compositions, (2009).

114

[45] F. Hamka, H. Bouwman, M. De Reuver, and M. Kroesen, Mobile cus-

tomer segmentation based on smartphone measurement, Telematics and Infor-

matics, 31 (2014), pp. 220–227.

[46] Z. S. Harris, Distributional structure, Word, 10 (1954), pp. 146–162.

[47] J. Heymann, O. Walter, R. Haeb-Umbach, and B. Raj, Unsupervised

word segmentation from noisy input, in Automatic Speech Recognition and

Understanding (ASRU), 2013 IEEE Workshop on, IEEE, 2013, pp. 458–463.

[48] Y. Hu, Efficient, high-quality force-directed graph drawing, Mathematica Jour-

nal, 10 (2005), pp. 37–71.

[49] S. Huang, X. Liu, X. Peng, and Z. Niu, Fine-grained product features ex-

traction and categorization in reviews opinion mining, in Data Mining Work-

shops (ICDMW), 2012 IEEE 12th International Conference on, IEEE, 2012,

pp. 680–686.

[50] L. Hubert and P. Arabie, Comparing partitions, Journal of classification,

2 (1985), pp. 193–218.

[51] B. Insights and C. Insights, Customer segmentation. Bain & Company,

2017.

[52] J. Jagarlamudi, H. Daume III, and R. Udupa, Incorporating lexical pri-

ors into topic models, in Proceedings of the 13th Conference of the Euro-

pean Chapter of the Association for Computational Linguistics, Association

for Computational Linguistics, 2012, pp. 204–213.

115

[53] N. Jakob and I. Gurevych, Extracting opinion targets in a single-and cross-

domain setting with conditional random fields, in Proceedings of the 2010 con-

ference on empirical methods in natural language processing, Association for

Computational Linguistics, 2010, pp. 1035–1045.

[54] A. Jamal and M. Goode, Consumers’ product evaluation: A study of the

primary evaluative criteria in the precious jewellery market in the uk, Journal

of Consumer Behaviour, 1 (2001), pp. 140–155.

[55] C. B. James, Pattern recognition with fuzzy objective function algorithms,

Kluwer Academic Publishers, (1981).

[56] K. Jarvelin and J. Kekalainen, Ir evaluation methods for retrieving highly

relevant documents, in Proceedings of the 23rd annual international ACM SI-

GIR conference on Research and development in information retrieval, ACM,

2000, pp. 41–48.

[57] , Cumulated gain-based evaluation of ir techniques, ACM Transactions on

Information Systems (TOIS), 20 (2002), pp. 422–446.

[58] W. Jin, H. H. Ho, and R. K. Srihari, A novel lexicalized hmm-based

learning framework for web opinion mining, in Proceedings of the 26th annual

international conference on machine learning, 2009, pp. 465–472.

[59] G. M. Katz, The “one right way” to gather the voice of the customer, PDMA

Visions Magazine, 25 (2001), pp. 1–6.

[60] Keras, Keras Documentation, 2017.

116

[61] H. K. Kim, H. Kim, and S. Cho, Bag-of-concepts: Comprehending document

representation through clustering words in distributed representation, Neuro-

computing, (2017).

[62] S. Kim, Novel document representations based on labels and sequential infor-

mation, PhD thesis, Georgia Institute of Technology, 2015.

[63] Y. Kim, Convolutional neural networks for sentence classification, arXiv

preprint arXiv:1408.5882, (2014).

[64] D. P. Kingma and M. Welling, Auto-encoding variational bayes, arXiv

preprint arXiv:1312.6114, (2013).

[65] P. Kotler and G. Armstrong, Principles of marketing, Pearson education,

2010.

[66] P. Kotler and K. L. Keller, Direccion de marketing, Pearson educacion,

2009.

[67] M. Kuniavsky, Observing the user experience: a practitioner’s guide to user

research, Elsevier, 2003.

[68] S. Lai, L. Xu, K. Liu, and J. Zhao, Recurrent convolutional neural networks

for text classification., in AAAI, 2015, pp. 2267–2273.

[69] T. K. Landauer, P. W. Foltz, and D. Laham, An introduction to latent

semantic analysis, Discourse processes, 25 (1998), pp. 259–284.

[70] Q. V. Le and T. Mikolov, Distributed representations of sentences and

documents., in ICML, vol. 14, 2014, pp. 1188–1196.

117

[71] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015),

pp. 436–444.

[72] Y. Lee, S. Song, and S. Cho, Document representation based on probabilis-

tic word clustering in customer-voice classification, (2016).

[73] F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu,

Structure-aware review mining and summarization, in Proceedings of the 23rd

international conference on computational linguistics, Association for Compu-

tational Linguistics, 2010, pp. 653–661.

[74] Q. Lin, Mobile customer clustering analysis based on call detail records, Com-

munications of the IIMA, 7 (2007), p. 95.

[75] J. Linder, How to develop a help system for a communication app, 2015.

[76] P. Liu, X. Qiu, and X. Huang, Recurrent neural network for text classifi-

cation with multi-task learning, arXiv preprint arXiv:1605.05101, (2016).

[77] L. v. d. Maaten and G. Hinton, Visualizing data using t-sne, Journal of

Machine Learning Research, 9 (2008), pp. 2579–2605.

[78] A. Mahapatra, N. Srivastava, and J. Srivastava, Contextual anomaly

detection in text data, Algorithms, 5 (2012), pp. 469–489.

[79] L. Manevitz and M. Yousef, Learning from positive data for document

classification using neural networks, in Proceedings of the 2nd Bar-Ilan Work-

shop on Knowledge Discovery and Learning, 2000.

118

[80] L. M. Manevitz and M. Yousef, One-class svms for document classifica-

tion, Journal of Machine Learning Research, 2 (2001), pp. 139–154.

[81] C. D. Manning and H. Schutze, Foundations of statistical natural language

processing, vol. 999, MIT Press, 1999.

[82] O. Matan, Ensembles for supervised classification learning, PhD thesis, stan-

ford university, 1996.

[83] A. F. McDaid, B. T. Murphy, N. Friel, and N. J. Hurley, Model-

based clustering in networks with stochastic community finding, arXiv preprint

arXiv:1205.1997, (2012).

[84] Y. Miao, L. Yu, and P. Blunsom, Neural variational inference for text

processing, in International Conference on Machine Learning, 2016, pp. 1727–

1736.

[85] T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of

word representations in vector space, arXiv preprint arXiv:1301.3781, (2013).

[86] O. Mitrofanova, Automatic word clustering in studying semantic structure

of texts, Advances in Computational Linguistics: Research in Computing Sci-

ence. Mexico, 41 (2009), pp. 27–34.

[87] F. J. Molina-Castillo, C. Lopez-Nicolas, and H. Bouwman, Explain-

ing mobile commerce services adoption by different type of customers, Journal

of Systemics, Cybernetics and Informatics, 6 (2008), pp. 73–79.

[88] A. Mukherji, V. Srinivasan, and E. Welbourne, Adding intelligence to

your mobile device via on-device sequential pattern mining, in Proceedings of

119

the 2014 ACM International Joint Conference on Pervasive and Ubiquitous

Computing: Adjunct Publication, ACM, 2014, pp. 1005–1014.

[89] L. Muller, L. Cossio, and M. S. Silveira, Won’t it please, please help me?

the (un) availability and (lack of) necessity of help systems in mobile applica-

tions, in International Conference on Human-Computer Interaction, Springer,

2014, pp. 632–637.

[90] D. Natarajasivan and M. Govindarajan, An overview on mobile data

mining, International Journal of Computer Applications, 99 (2014), pp. 11–

14.

[91] I. W. on Semantic Evaluation, Semeval-2014 dataset, 2014.

[92] A. Onan, S. Korukoglu, and H. Bulut, A multiobjective weighted voting

ensemble classifier based on differential evolution algorithm for text sentiment

classification, Expert Systems with Applications, 62 (2016), pp. 1–16.

[93] R. Oppermann, Adaptive user support: ergonomic design of manually and

automatically adaptable software, Routledge, 2017.

[94] C. Orrite, M. Rodrıguez, F. Martınez, and M. Fairhurst, Classifier

ensemble generation for the majority vote rule, in Iberoamerican Congress on

Pattern Recognition, Springer, 2008, pp. 340–347.

[95] A. Oulasvirta, T. Rattenbury, L. Ma, and E. Raita, Habits make

smartphone use more pervasive, Personal and Ubiquitous Computing, 16

(2012), pp. 105–114.

120

[96] S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions

on knowledge and data engineering, 22 (2010), pp. 1345–1359.

[97] E. Park, Supervised feature representations for document classification, PhD

thesis, Seoul National University, 2016.

[98] J. Park and S. H. Han, Defining user value: A case study of a smartphone,

International Journal of Industrial Ergonomics, 43 (2013), pp. 274–282.

[99] M. A. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko, A

review of novelty detection, Signal Processing, 99 (2014), pp. 215–249.

[100] I. Plaza, L. MartıN, S. Martin, and C. Medrano, Mobile applications

in an aging society: Status and trends, Journal of Systems and Software, 84

(2011), pp. 1977–1988.

[101] S. Poria, E. Cambria, and A. Gelbukh, Aspect extraction for opinion

mining with a deep convolutional neural network, Knowledge-Based Systems,

108 (2016), pp. 42–49.

[102] S. Poria, E. Cambria, L.-W. Ku, C. Gui, and A. Gelbukh, A rule-

based approach to aspect extraction from product reviews, in Proceedings of the

second workshop on natural language processing for social media (SocialNLP),

2014, pp. 28–37.

[103] G. Qiu, B. Liu, J. Bu, and C. Chen, Opinion word expansion and target

extraction through double propagation, Computational linguistics, 37 (2011),

pp. 9–27.

121

[104] C. Quan and F. Ren, Unsupervised product feature extraction for feature-

oriented opinion determination, Information Sciences, 272 (2014), pp. 16–28.

[105] D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropa-

gation and approximate inference in deep generative models, arXiv preprint

arXiv:1401.4082, (2014).

[106] M. T. Ribeiro, S. Singh, and C. Guestrin, Why should i trust you?:

Explaining the predictions of any classifier, in Proceedings of the 22nd ACM

SIGKDD international conference on knowledge discovery and data mining,

ACM, 2016, pp. 1135–1144.

[107] M. Rosvall and C. T. Bergstrom, Maps of random walks on complex

networks reveal community structure, Proceedings of the National Academy of

Sciences, 105 (2008), pp. 1118–1123.

[108] M. C. Roy1, Y. Rannou, and L. Rivard, The design of effective online help

in web applications, Journal of Knowledge Management Practice, 8 (2007).

[109] D. E. Rumelhart, P. Smolensky, J. L. McClelland, and G. Hinton,

Sequential thought processes in pdp models, Parallel distributed processing:

explorations in the microstructures of cognition, 2 (1986), pp. 3–57.

[110] D. S. Sachan and S. Kumar, Class vectors: Embedding representation of

document classes, arXiv preprint arXiv:1508.00189, (2015).

[111] S. K. Saha, P. Mitra, and S. Sarkar, Word clustering and word selection

based feature reduction for maxent based hindi ner., in ACL, 2008, pp. 488–495.

122

[112] M. Sahlgren, The Word-Space Model: Using distributional analysis to

represent syntagmatic and paradigmatic relations between words in high-

dimensional vector spaces, PhD thesis, Institutionen for lingvistik, 2006.

[113] S. Samiee, Customer evaluation of products in a global market, Journal of

International Business Studies, 25 (1994), pp. 579–604.

[114] D. Sato, T. Morimura, T. Katsuki, Y. Toyota, T. Kato, and H. Tak-

agi, Automated help system for novice older users from touchscreen gestures,

in Pattern Recognition (ICPR), 2016 23rd International Conference on, IEEE,

2016, pp. 3073–3078.

[115] A. M. Schejter, A. Serenko, O. Turel, and M. Zahaf, Policy im-

plications of market segmentation as a determinant of fixed-mobile service

substitution: What it means for carriers and policy makers, Telematics and

Informatics, 27 (2010), pp. 90–102.

[116] A. Sell, P. Walden, and C. Carlsson, Are you efficient, trendy or skill-

full? an exploratory segmentation of mobile service users, in Mobile Business

and 2010 Ninth Global Mobility Roundtable (ICMB-GMR), 2010 Ninth Inter-

national Conference on, IEEE, 2010, pp. 116–123.

[117] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and

D. Batra, Grad-cam: Why did you say that? visual explanations from deep

networks via gradient-based localization, CoRR, abs/1610.02391, 7 (2016).

123

[118] M. Z. Shafiq, L. Ji, A. X. Liu, J. Pang, and J. Wang, Characteriz-

ing geospatial dynamics of application usage in a 3g cellular data network, in

INFOCOM, 2012 Proceedings IEEE, IEEE, 2012, pp. 1341–1349.

[119] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn

features off-the-shelf: an astounding baseline for recognition, in Proceedings of

the IEEE conference on computer vision and pattern recognition workshops,

2014, pp. 806–813.

[120] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, and

L. Chanona-Hernandez, Syntactic dependency-based n-grams as classifi-

cation features, in Mexican International Conference on Artificial Intelligence,

Springer, 2012, pp. 1–11.

[121] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-

scale image recognition, arXiv preprint arXiv:1409.1556, (2014).

[122] K. Singh and S. Upadhyaya, Outlier detection: applications and techniques,

International Journal of Computer Science Issues, 9 (2012), pp. 307–323.

[123] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, Zero-shot learning

through cross-modal transfer, in Advances in neural information processing

systems, 2013, pp. 935–943.

[124] E. J. Spinosa, F. de Leon, A. Ponce, and J. Gama, Novelty detection

with application to data streams, Intelligent Data Analysis, 13 (2009), pp. 405–

422.

124

[125] V. Srinivasan, S. Moghaddam, A. Mukherji, K. K. Rachuri, C. Xu,

and E. M. Tapia, Mobileminer: Mining your frequent patterns on your phone,

in Proceedings of the 2014 ACM International Joint Conference on Pervasive

and Ubiquitous Computing, ACM, 2014, pp. 389–400.

[126] A. Strehl and J. Ghosh, Cluster ensembles—a knowledge reuse frame-

work for combining multiple partitions, Journal of machine learning research,

3 (2002), pp. 583–617.

[127] V. Suarez-Paniagua, I. Segura-Bedmar, and P. Martınez, Word em-

bedding clustering for disease named entity recognition, in Proceedings of the

Fifth BioCreative Challenge Evaluation Workshop, 2015, pp. 299–304.

[128] I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning

with neural networks, in Advances in neural information processing systems,

2014, pp. 3104–3112.

[129] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Re-

thinking the inception architecture for computer vision, in Proceedings of

the IEEE Conference on Computer Vision and Pattern Recognition, 2016,

pp. 2818–2826.

[130] C.-C. Tao et al., Market segmentation for mobile tv content on public trans-

portation by integrating innovation adoption model and lifestyle theory, Journal

of Service Science and Management, 1 (2008), p. 244.

125

[131] B. D. Temkin, B. Chatham, and M. Amato, The customer experience

value chain: An enterprisewide approach for meeting customer needs, Forrester

Research. March, 15 (2005).

[132] TensorFlow, TensorFlow Tutorials, 2017.

[133] Z. Toh and W. Wang, Dlirec: Aspect term extraction and term polarity

classification system, in Proceedings of the 8th International Workshop on

Semantic Evaluation (SemEval 2014), 2014, pp. 235–240.

[134] M. Uronen, Market segmentation approaches in the mobile service business,

Master’s thesis, Helsinki University of Technology, 2008.

[135] B. van Ginneken, A. A. Setio, C. Jacobs, and F. Ciompi, Off-the-

shelf convolutional neural network features for pulmonary nodule detection in

computed tomography scans, in Biomedical Imaging (ISBI), 2015 IEEE 12th

International Symposium on, IEEE, 2015, pp. 286–289.

[136] N. X. Vinh, J. Epps, and J. Bailey, Information theoretic measures for

clusterings comparison: is a correction for chance necessary?, in Proceedings of

the 26th Annual International Conference on Machine Learning, ACM, 2009,

pp. 1073–1080.

[137] S. P. Walsh, K. M. White, and R. McD Young, Needing to connect:

The effect of self and others on young people’s involvement with their mobile

phones, Australian journal of psychology, 62 (2010), pp. 194–203.

126

[138] L. Waltman and N. J. van Eck, A smart local moving algorithm for large-

scale modularity-based community detection, The European Physical Journal

B, 86 (2013), p. 471.

[139] T. Wang, Y. Cai, H.-f. Leung, R. Y. Lau, Q. Li, and H. Min, Product

aspect extraction supervised with online domain knowledge, Knowledge-Based

Systems, 71 (2014), pp. 86–100.

[140] Y. Wang, M. Huang, L. Zhao, et al., Attention-based lstm for aspect-level

sentiment classification, in Proceedings of the 2016 conference on empirical

methods in natural language processing, 2016, pp. 606–615.

[141] Y. Wang, L. Wang, Y. Li, D. He, and T.-Y. Liu, A theoretical analysis of

ndcg type ranking measures, in Conference on Learning Theory, 2013, pp. 25–

54.

[142] C. Xing, D. Wang, X. Zhang, and C. Liu, Document classification with

distributions of word vectors, in Signal and Information Processing Association

Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, IEEE, 2014,

pp. 1–5.

[143] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov,

R. Zemel, and Y. Bengio, Show, attend and tell: Neural image caption gen-

eration with visual attention, in International Conference on Machine Learning,

2015, pp. 2048–2057.

127

[144] B. Yan and G. Chen, Appjoy: personalized mobile application discovery, in

Proceedings of the 9th international conference on Mobile systems, applica-

tions, and services, ACM, 2011, pp. 113–126.

[145] B. Yang and C. Cardie, Joint inference for fine-grained opinion extraction.,

in ACL (1), 2013, pp. 1640–1649.

[146] H. Zhang, T. W. Chow, and M. Rahman, A new dual wing harmonium

model for document retrieval, Pattern Recognition, 42 (2009), pp. 2950–2960.

[147] J. Zhang, Z. Ghahramani, and Y. Yang, A probabilistic model for on-

line document clustering with application to novelty detection, in Advances in

Neural Information Processing Systems, 2004, pp. 1617–1624.

[148] S. Zhong, Efficient online spherical k-means clustering, in Proceedings. 2005

IEEE International Joint Conference on Neural Networks, 2005., vol. 5, IEEE,

2005, pp. 3180–3185.

[149] H. Zhu, E. Chen, H. Xiong, K. Yu, H. Cao, and J. Tian, Mining mobile

user preferences for personalized context-aware recommendation, ACM Trans-

actions on Intelligent Systems and Technology (TIST), 5 (2015), p. 58.

[150] X. Zou, W. Zhang, S. Li, and G. Pan, Prophet: What app you wish to use

next, in Proceedings of the 2013 ACM conference on Pervasive and ubiquitous

computing adjunct publication, ACM, 2013, pp. 167–170.

128

국문초록

본 논문에서는 데이터 분석에 기반한 사용자 경험 디자인 방법론들을 제안한다. 기존

의 사용자 경험 연구, 특히 스마트폰의 사용자 경험을 향상시키기 위한 많은 연구들이

학계 및 산업계에서 시도되었으나 대부분의 기법이 디자이너 및 기획자의 능력에 의존

하는 것을 가정한 경우가 많아 관련된 여러가지 문제점을 내재하고 있었다. 따라서 본

연구에서는 기존의 문제점을 해결하기 위한 방법론으로써 세부적으로 고객 요구 사항

분류, 사용자 세그멘테이션 및 디자인 요소 선정 등의 주제에 초점을 맞추어 연구를

진행하었다. 첫째, 고객 요구 사항 분류 문제에서는 기존 대비 높은 성능을 보이는 문서

클리닝, 표상, 분류 방법론을 제안하였다. 둘째, 사용자 세그멘테이션 문제에서는 기존

연구들과 달리 실제 유저의 애플리케이션 사용 패턴에 기반한 사용자 세그멘테이션

방법론을 제안하였다. 마지막으로 디자인 요소 선정 문제에서는 콘텐츠를 재구성하는

문제와 스펙 시트의 항목을 선정하기 위한 방법론들을 제안하였다.본문에 기술된 높은

수준의 성능 및 실험 결과를 통해 데이터 분석 기법에 기반한 본 연구가 기존의 방법론

들에 내재되어 있던 문제점들을 효과적으로 해결할 수 있음을 확인할 수 있었고, 또한

사용자 요구 사항 분석, 제품 기획 및 디자인 업무에 연관된 현업 종사자들에게 의미있

는 인사이트를 제공할 수 있음을 예상할 수 있었다. 추후에는 본 연구를 더욱 발전시켜

사용 패턴 및 행태 분석, 앱 사용 추천 및 그래픽 디자인에 이르기까지 본 연구에서

다루지 않았던 전체 UX 디자인 프로세스로 그 연구 범위를 확장시킬 수 있을 것으로

기대한다.

주요어: 사용자 경험, 데이터 분석, 문서 분류, 사용자 세그멘테이션, 디자인 요소 선정

학번: 2016-30254

129

감사의 글

우선 해당 논문이 나오기까지 학문적으로 많은 지도를 아끼지 않으시고,생활적인 면에

서도 정말 많은 배려를 해주신 조성준 교수님께 가장 큰 감사를 드립니다. 그리고 항상

따뜻한 격려로 저를 이끌어주시는 강석호 교수님과 논문 심사 과정에서 훌륭한 조언

을 해주신 윤명환 교수님, 박우진 교수님, 정재윤 교수님, 홍지영 책임님께도 진심으로

감사 드립니다.

무엇보다 박사 과정을 무사히 마칠 수 있도록 수년간 묵묵히 지원해준 사랑하는

아내 경혜와 박사 과정 중에 태어난 사랑스러운 아들 유안이, 그리고 존경하는 부모님,

장인, 장모님, 처남께도 이 작지만 값진 영광을 바치고 싶습니다.

또한 연구를 진행함에 있어서 여러 가지로 조력을 아끼지 않은 연구실 선후배님들

에게도 많은 감사를 드리고 싶습니다. 특히 많은 것을 가르쳐준 선배인 태훈이, 호성이,

진원이, 태욱이, 훈식이, 제혁이, 용대, 은지, 현창이, 현중이. 그리고 함께 연구하면서

많은 것을 배운 동기들 인범이, 석민이, 진배, 동영이, 혜진이, 성환이, 지형이. 마지

막으로 즐거운 연구실 생활을 만들어준 후배들 민기, 도형이, 효창이, 동민이, 노일이,

연국이형 감사합니다.

마지막으로 공부를 병행할 수 있도록 양해해주신 회사 상사 최진해, 안정, 안신희,

조민행, 윤정혁, 김은영, 손주희, 이지은님 및 동료 임소연, 이진희, 문윤정, 김진욱,

이주혜, 윤지은, 정주현, 김성민, 나하나, 채병기님들께도 감사의 인사를 전하고 싶고,

여러 가지로 응원해준 생산방 선후배님들, S3, 꾸러기, 스크린, 탁구, JB, 대진, 세일,

금성 멤버 및 모든 동료들께도 감사를 표합니다.

이논문이앞으로누군가에게조금이나마도움이될수있는결과물이기를바라면서

다시 한번 모든 분들께 감사 드립니다. 감사합니다.

131

data driven approaches in user experience analysis: customer...

Documents