topical and domain-specific frameworks for emotion detection

61
TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION AN EXPERIMENTAL CLUSTER ANALYSIS OF EMOTIONS IN REALITY TV Aantal woorden: 16.919 Annaïs Airapetian Studentennummer: 01600351 Promotor(en): prof. dr. Orphée De Clercq prof. Luna De Bruyne Masterproef voorgelegd voor het behalen van de graad Master in het Vertalen Academiejaar: 2019 - 2020

Upload: others

Post on 03-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

TOPICAL AND DOMAIN-SPECIFIC

FRAMEWORKS FOR EMOTION

DETECTION AN EXPERIMENTAL CLUSTER ANALYSIS OF EMOTIONS IN

REALITY TV

Aantal woorden: 16.919

Annaïs Airapetian Studentennummer: 01600351

Promotor(en): prof. dr. Orphée De Clercq

prof. Luna De Bruyne

Masterproef voorgelegd voor het behalen van de graad Master in het Vertalen

Academiejaar: 2019 - 2020

Page 2: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION
Page 3: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

i

Verklaring i.v.m. auteursrecht

De auteur en de promotor(en) geven de toelating deze studie als geheel voor consultatie beschikbaar te

stellen voor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in

het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen

van gegevens uit deze studie.

Page 4: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

ii

Preface

First of all, a well-deserved thank you goes to my supervisor, prof. dr. De Clercq, and co-supervisor,

prof. De Bruyne, for their guidance and patience. I know I am not the easiest to work with, but their

continuous positivity and faith in my work kept me motivated to bring this dissertation to a good end. I

am very thankful to have been part of such a wonderful team, and I hope my work will help with

future research.

I would also like to thank my friends for believing in me when I did not. Some of them shared this

final journey with me, which was both a blessing and a curse at times. Nevertheless, I could not have

done this without their support, and for that I will be forever grateful. Honestly, anyone who can

handle me during one of my many breakdowns or overdramatic rants deserves a medal.

I am very proud of what I have achieved, and I hope the people around me are too. Enjoy the pinnacle

of my academic career; it is an interesting read if I may say so myself.

Page 5: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

iii

Preamble

Due to the unusual circumstances as a consequence of the coronavirus outbreak, the writing process of

this paper evolved slower and more difficult than usual. However, even though the situation affected

my mental capacity, there was no direct impact on the research itself, as it relied on my individual

work and there were no third parties involved. Thanks to the complete lockdown in the spring of 2020,

I was given some extra time to process the data and analyse the results, which can probably be

considered the only advantage of the pandemic. But as I decided to prioritise course material and

exams during my time in isolation, I had no choice but to postpone the completion of this paper to the

summer of 2020.

Page 6: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

iv

Abstract

In the research field of natural language processing, emotion detection has become a prominent topic.

As there is no standard framework offered for automatic emotion detection, not for a specific domain

nor in general, our goal was to provide an emotion framework that is motivated both theoretically and

empirically for the domain of reality TV. This paper presents a cluster analysis on Dutch reality TV

transcriptions, with label sets for automatic emotion detection as a result. Seeing that automatic

applications first require manually annotated data, an extensive 25 emotion categories model from

psychological research was used to manually annotate 450 utterances from a self-made corpus of

reality TV transcriptions. Three Flemish TV series (“Bloed, Zweet en Luxeproblemen”, “Blind

Getrouwd” and “Ooit Vrij”) were included in the dataset, each representing a different topic. We

conducted a frequency and cluster analysis with the annotations in order to uncover underlying

relations between the emotion categories and eventually present limited, modified label sets. The

results revealed three topical label sets, as well as one general domain-specific label set. Even though

the majority of the emotions in the sets were found to be basic emotions, at least the selection of all

labels from the final sets was supported by empirical research.

Page 7: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

v

Table of contents

List of abbreviations ........................................................................................................................... 0

List of figures ..................................................................................................................................... 0

List of tables ....................................................................................................................................... 0

1 Introduction .................................................................................................................................. 1

2 Theoretical background ................................................................................................................ 3

2.1 Emotion frameworks ............................................................................................................. 3

2.1.1 Dimensional versus categorical models .......................................................................... 3

2.1.2 Ekman’s Basic Six ......................................................................................................... 3

2.1.2.1 Background/context ........................................................................................... 3

2.1.2.2 Definition of emotions ....................................................................................... 4

2.1.2.3 Emotion families ................................................................................................ 4

2.1.2.4 Previous research ............................................................................................... 5

2.2 Cluster analysis ..................................................................................................................... 6

2.2.1 Similarity measures ....................................................................................................... 7

2.2.2 Clustering techniques..................................................................................................... 8

2.2.2.1 Hierarchical agglomerative (bottom-up) ............................................................. 8

2.2.2.2 Hierarchical divisive (top-down) ........................................................................ 9

2.2.2.3 Iterative partitioning ........................................................................................... 9

2.2.2.4 Density search .................................................................................................... 9

2.2.2.5 Factor analysis variants .................................................................................... 10

2.2.3 Linking methods .......................................................................................................... 10

2.2.3.1 Single linkage (nearest neighbour).................................................................... 10

2.2.3.2 Complete linkage (furthest neighbour) .............................................................. 11

2.2.3.3 Average linkage ............................................................................................... 11

2.2.3.4 Ward’s method ................................................................................................. 12

2.2.4 Criteria ........................................................................................................................ 12

2.2.5 Validation .................................................................................................................... 14

3 Methodology .............................................................................................................................. 15

3.1 Annotation .......................................................................................................................... 16

3.2 Frequency analysis .............................................................................................................. 17

3.3 Cluster analysis ................................................................................................................... 18

4 Results ....................................................................................................................................... 20

4.1 Frequency analysis .............................................................................................................. 20

4.2 Cluster analysis ................................................................................................................... 25

Page 8: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

vi

4.2.1 Bloed, Zweet en Luxeproblemen ................................................................................. 25

4.2.2 Blind Getrouwd ........................................................................................................... 27

4.2.3 Ooit Vrij ...................................................................................................................... 29

4.2.4 Combined data ............................................................................................................ 31

5 Discussion .................................................................................................................................. 34

5.1 Analysis of the results ......................................................................................................... 34

5.2 Comparison to previous research ......................................................................................... 41

5.3 Validity, reliability and added value .................................................................................... 45

6 Conclusion ................................................................................................................................. 47

References ........................................................................................................................................ 49

Appendices ....................................................................................................................................... 52

Page 9: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

0

List of abbreviations

NLP: natural language processing

AI: artificial intelligence

BZL: Bloed, Zweet en Luxeproblemen

BG: Blind Getrouwd

OV: Ooit Vrij

IAA: inter-annotator agreement

List of figures

Figure 1: Comparison of five clustering methods for nine criteria ...................................................... 13

Figure 2: 25 emotion categories with their subcategories ................................................................... 16

Figure 3: Frequencies of emotion categories compared per TV series ................................................ 20

Figure 4: Frequencies of emotion categories for Bloed, Zweet en Luxeproblemen ............................. 21

Figure 5: Frequencies of emotion categories for Blind Getrouwd....................................................... 22

Figure 6: Frequencies of emotion categories for Ooit Vrij ................................................................. 23

Figure 7: Frequencies of emotion categories for the three TV series combined .................................. 24

Figure 8a: Initial dendrogram for BZL .............................................................................................. 25

Figure 8b: Adapted dendrogram for BZL without infrequent emotions .............................................. 26

Figure 9a: Initial dendrogram for BG ................................................................................................ 27

Figure 9b: Adapted dendrogram for BG without infrequent emotions ................................................ 28

Figure 10a: Initial dendrogram for OV .............................................................................................. 29

Figure 10b: Adapted dendrogram for OV without infrequent emotions .............................................. 30

Figure 11a: Initial dendrogram for the three TV series combined ....................................................... 31

Figure 11b: Adapted dendrogram for the combined data without infrequent emotions ....................... 32

Figure 12a: Initial dendrogram for tweets .......................................................................................... 42

Figure 12b: Adapted dendrogram for tweets without infrequent emotions .......................................... 42

List of tables

Table 1: Polarity combinations with surprise ..................................................................................... 38

Table 2: Emotion sets........................................................................................................................ 44

Page 10: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

1

1 INTRODUCTION

Emotion detection and emotion analysis have many applications. When it comes to textual data, they

are often performed on product reviews to evaluate customer satisfaction, or on tweets to uncover

linguistic trends in emotional speech. Especially with the technical advancement of today, the automatic

applications are becoming more and more popular research topics in the field of natural language

processing (NLP).

To enable automatic emotion detection, though, first manually annotated data is needed to train

the machines. Emotion analysis might be a popular research field, however no standard framework is

offered for emotion annotation tasks. There are plenty of categorical models available and many

researchers often opt for one of the better-known models, such as Ekman’s basic emotions set (1992)

consisting of anger, disgust, fear, joy, sadness and surprise. While these basic emotions are the most

agreed on universal emotions supported by substantial evidence (Ekman, Friesen, & Ellsworth, 1972)

and are frequently used for this type of research, most of the time no other valid reason is given as to

why those emotions would be the most appropriate for certain data.

Like Mohammad argues (2016, p. 215), it would be beneficial to adapt the emotion set to the

domain of your research. That is why the goal of this study is to propose an empirically grounded

framework for automatic emotion detection on Dutch reality TV data. The study that we have conducted

was inspired by a cluster analysis on Dutch tweets by De Bruyne, De Clercq and Hoste (2019), and can

be seen as follow-up research to their study. We have adopted a similar process, but applied it to different

data to expand the research field.

Similar to our previous study related to emotion analysis (Airapetian, 2019), the data for this

research consists of transcriptions from Flemish reality TV shows. Three TV shows were selected, each

with a different topic. The extensive emotion model used for the annotations consists of 25 emotion

categories and stems from psychological research by Shaver, Schwartz, Kirson and O’Connor (1987).

The compiled corpus of annotated transcriptions was further used as input for a frequency and cluster

analysis. The study presented in this paper was designed to examine emotion clusters and compare the

derived emotion labels from the topical data to those from the general data. This brings us to our main

research question: “Is it possible to deduce a label set from experimental cluster analysis?”. We intend

to provide an answer to these subquestions as well:

- Do the emotion clusters differ depending on the topic?

- Is there a difference between the emotion clusters for reality TV compared to those for

tweets?

- Do basic emotions provide a good foundation for emotion frameworks?

Page 11: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

2

This thesis is structured as follows: section 2 offers some theoretical background. It provides an

overview of the most important aspects in relation to emotion frameworks and cluster analysis. More

specifically, the first part elaborates on the different classification models, Ekman’s basic emotions

model and the meaning of emotions, while the second part focuses on several approaches for the

different features of a cluster analysis, namely similarity measures, clustering techniques and linking

methods. The data and methodology for this study are described in detail in section 3. Then section 4

presents the results of both the frequency and cluster analyses, which are supported by clarifying graphs

and dendrograms. Section 5 further analyses the results and compares them to previous works, while

also reflecting on the validity and added value of this study. Finally, section 6 gives a closing statement

by repeating the main features of the study and summarising the arguments that support the answers to

our research questions.

Page 12: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

3

2 THEORETICAL BACKGROUND

2.1 Emotion frameworks

2.1.1 Dimensional versus categorical models

There are several possible methods to classify emotions. As Buechel and Hahn (2016, pp. 1114-1115)

mention, a distinction can be made between categorical and dimensional models. The categorical

approach divides emotional states into emotion categories. The dimensional approach, on the other hand,

describes emotional states according to emotional dimensions, with the three most common dimensions

being valence, arousal and dominance. Valence is the polarity of the text and can be positive, negative

or neutral. Arousal describes the level of reaction to stimuli and the intensity of the emotion, ranging

from low to high. Finally, dominance means the control given by the emotion, which can range from

dominant to submissive. Dimensional models often use Russell and Mehrabian’s (1977) Valence-

Arousal-Dominance model, while categorical models usually refer to Ekman’s (1992) Basic Emotion

model, which divides emotions into six categories: anger, disgust, fear, joy, sadness and surprise.

2.1.2 Ekman’s Basic Six

2.1.2.1 Background/context

Psychologist and emotion scientist Paul Ekman is well-known for his studies in facial expressions and

emotions. His study of non-verbal behaviour of the Fore tribe in Papua New Guinea was ground-

breaking. Members of the tribe were told a simple story while being shown a set of three faces and were

then asked to select the face that they thought matched the story (Ekman, & Friesen, 1971). Surprisingly,

the subjects generally interpreted the facial expressions in the same way as would someone from a

Western society. The fact that the tribe had lived in complete isolation from the rest of the world but

nonetheless recognized the same emotions as people from the West, proved that facial expressions were

indeed universal.

Ekman, Friesen and Ellsworth (1972) found evidence for six basic emotions and with their

research eventually confirmed that all six had universal facial expressions. Most scientists now agree on

those six basic emotions and their distinctive facial expressions. However, this has not always been the

case and was certainly a gradual process. Charles Darwin (1872) was actually the first to claim that

emotions were a product of evolution and that they were universal, but it was not until Ekman’s research

that there came substantial evidence for that theory (Paul Ekman International, 2018).

Page 13: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

4

2.1.2.2 Definition of emotions

Ekman (1999, p. 46) says that “emotions are designed to deal with inter-organismic encounters”, i.e.

interactions between groups of people or even between humans and animals. However, actual interaction

with a second party is not always necessary. Emotions can also occur when we are not in the presence

of others, no matter whether that presence would be physical or imaginative. The primary function of

emotions remains the same, namely to “mobilize the organism to deal quickly with important

interpersonal encounters” (Ekman, 1999, p.46).

He further elaborates on past, present and future as three important factors that should be taken

into consideration when distinguishing basic emotions. A first factor to consider is the current situation:

what is happening inside and around the person? Are there any factors that could influence the person?

The second factor is the preceding situation: what was the situation before this point in time? Did

something happen that could have possibly triggered a certain emotion or reaction? The third factor is

the possible continuation of the situation: what is most likely to happen next? What are the possible

consequences?

Ekman identifies six basic emotions: anger, disgust, fear, joy, sadness and surprise. His model

is the one that is the most frequently used in the field of emotion analysis. It originates from his research

on emotions and, more specifically, their relation to facial expressions. Plutchik (1980), like many others

(Ekman, 1992; Frijda, 1988; Izard, 1991; Parrot, 2001; Tomkins, 1962), agrees that some emotions are

indeed more basic than others, but he proposes a different set of emotions. He includes trust and

anticipation alongside Ekman’s six, while also further dividing the emotions into different degrees of

intensity. It is also worth mentioning that basic emotions such as those of Ekman can occur on their

own, but can also be combined to form more complex emotions (Ekman, 1999; Plutchik, 1962).

2.1.2.3 Emotion families

Ekman (1999, p. 47) is of the opinion that there is no such thing as a sine qua non for emotions, meaning

that a certain emotion cannot be distinguished from another emotion by a pre-defined set of

characteristics. According to Ekman (1999, p. 55), “each emotion is not a single affective state, but a

family of related states”. Emotions can therefore be divided into so-called emotion families. These can

vary in intensity and form, as well as in whether the emotion can be controlled or not and whether it

occurs spontaneously or deliberately (Ekman, 1997).

Contrary to what is explained above about emotions on their own, members of the same emotion

family do however share the same characteristics, which distinguishes them from other emotion

families. The Atlas of Emotions (2019) is a project of the Dalai Lama in cooperation with Paul Ekman

and describes the five most agreed on basic emotions as follows:

Page 14: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

5

o anger: when we are mentally or physically blocked

e.g. annoyance, bitterness, fury

o disgust: when we are faced with something toxic or unpleasant

e.g. dislike, aversion, loathing

o fear: when our safety or wellbeing is threatened

e.g. anxiety, panic, terror

o joy: when we experience comfort, connection or pleasure

e.g. amusement, excitement, ecstasy

o sadness: when we lose something valuable

e.g. disappointment, misery, grief

2.1.2.4 Previous research

As mentioned before, research using Ekman’s emotions mainly focuses on their link with facial

expressions. However, Ekman’s model is also often applied in the emotion analysis of text. In this

research field, there have been numerous studies on tweets (Bakliwal et al., 2012; Wood, McCrae,

Andryushechkin, & Buitelaar, 2018), reviews (Fang, & Zhan, 2015; Thet, Na, & Khoo, 2010), blog

posts (Bakliwal, Arora, & Varma, 2012) and news stories (Godbole, Srinivasaiah, & Skiena, 2007), but

only few on other kinds of data such as subtitles or transcriptions. As already presented in previous work

(Airapetian, 2019) and repeated above, there are numerous frameworks for emotion analysis. The most

popular categorical model is still that of Ekman (1992), but there are many more options to choose from.

While basic emotion frameworks are usually fairly restricted, extensive emotion frameworks often

introduce more complex secondary emotions by combining basic emotions. However, the emotion

frameworks are often selected arbitrarily and so are not adapted to the task or domain they are intended

for.

One way to group data in order to produce a more limited set, is to conduct a cluster analysis. What

exactly a cluster analysis does, is it attempts to uncover the relations between emotion categories and

based on similarity divides them into different cluster groups, which are comparable to what is referred

to as emotion families in section 2.1.2.3. If each group is then assigned an umbrella term, this results in

a limited set of final emotion labels where each label represents several emotion categories.

Interesting to mention is that this process was employed by De Bruyne et al. (2019). They

conducted a cluster analysis using an extensive emotion framework in an attempt to generate a more

limited emotion set that was much more grounded in its task (emotion detection) and domain (tweets).

They used the annotations of 229 Dutch tweets as input for their cluster analysis, which resulted in a

final label set containing love, joy, anger, nervousness and sadness. Because of their data-driven

Page 15: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

6

approach, this framework for automatic emotion detection was not only motivated theoretically, but also

empirically.

The extensive framework they used is that of Shaver et al. (1987), which originates from the

field of psychology. For that study, 112 psychology students were asked to rate 213 emotion words

based on their prototypicality. The task resulted in a selection of 135 emotion words, which were then

sorted by 100 students into a non-predefined number of categories. The results were used as input for a

cluster analysis, which eventually resulted in a final set of 25 emotions: affection, cheerfulness,

contentment, disappointment, disgust, enthrallment, envy, exasperation, horror, irritability, longing,

lust, neglect, nervousness, optimism, pride, rage, relief, sadness, shame, suffering, surprise, sympathy,

torment, zest.

2.2 Cluster analysis

In the field of AI, cluster analysis is seen as a form of unsupervised learning and more specifically it is

a data mining technique used for the natural grouping of data. Cluster analysis is sometimes also referred

to as typology construction, classification analysis or numerical taxonomy. The objective is to identify

underlying structures by grouping the cases from a dataset that are the most similar. The first step in that

process, though, is gathering and preparing data to form clusters with and later perform an analysis on.

During this process there are a few topics that need to be taken into consideration.

First it might be useful to discuss what exactly a cluster is. There are many descriptions of clusters and

not just one general definition, but what mainly characterizes a cluster is its “high internal homogeneity”

and the “high external heterogeneity” (Lazar, 2012). This means that in a dimensional space, members

of a cluster are located close to each other, but the clusters themselves lie further apart. Lazar (2012)

mentions some elements of the research design that need to be evaluated before processing the data.

These things include variables, size of the dataset, outliers and standardization of the data.

Firstly, it is important that the variables are chosen based on theoretical, conceptual and practical

considerations. Lazar (2012) makes a distinction between two selective methods: feature extraction

enables researchers to derive new and possibly more relevant features from the already existing features,

while with feature selection they solely choose the most relevant features.

Secondly, the dataset should be broad enough so that it represents all relevant categories and the

underlying structure can be studied. If the objective is to identify relatively large groups then a smaller

dataset will suffice, but if the objective is to identify small groups then the dataset must be large enough

to ensure that every group is included. Only then can the dataset be considered as representative.

Thirdly, there is always the possibility of outliers, and so researchers need to decide whether or

not to include these in the results. If a certain observation is not at all representative and could negatively

affect the hierarchy or help produce unrepresentative clusters, then that observation should be removed.

Page 16: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

7

If the observation represents a smaller group which is considered irrelevant for that particular research,

then that observation should be removed as well in order to keep the focus of the resulting clusters on

groups that are relevant. If outliers do however represent a relevant group but are just a poor

representation because of their small number, then they should not be removed from the clustering

results.

Lastly, according to Lazar (2012, p. 33), “clustering variables that are not all of the same scale

should be standardized”. He describes some standardization techniques and makes a distinction between

variable standardization and sample standardization.

Furthermore, something that needs to be considered as well is how to measure the similarity between

individual cases. For this step there are several similarity measures at hand (see Section 2.2.1) and it is

common to draw up a similarity matrix for all cases. When the similarity has been determined, the next

step is to form clusters. It is evident that the most similar cases will be grouped into the same cluster.

However, that is not where the clustering ends. The clusters which are closest to each other are merged

as well, and this is done repeatedly. Another question that might arise is: how do we keep track of the

number of clusters that have eventually been formed? To determine this, researchers can measure the

homogeneity of each cluster by calculating the average distance between cases from the same cluster.

The clustering solution can then be visualized in a graph or tree diagram to make the structure even

clearer.

2.2.1 Similarity measures

According to Fisher and van Ness (1971, p. 92) “one of the first steps in clustering is to get some measure

of closeness between two observations”. Additionally, Blashfield and Aldenderfer (1988, p. 457)

describe similarity as being “fundamental to the process of classification. Objects perceived as similar

are often classified as being in the same group, whereas those perceived as different are placed in other

groups.”

Most of the clustering techniques mentioned later (see Section 2.2.2) rely on the calculation of similarity

between cases. Whereas in everyday life the concept of classification based on similarity comes quite

naturally, scientists need to find an objective way to process their data and measure similarity. In order

to do this, they turn to statistical approaches. There are many ways to determine the similarity between

objects, but most of the similarity measures use the concept of metrics, which means that the degree of

dissimilarity is represented as the distance between cases when they are projected as points in space.

This can for example be supported by creating a NxN similarity matrix, with N referring to the number

of cases being clustered. However, this is not the standard procedure, and it is important that researchers

base their choices on the design of their research. Below are given three similarity measures that are

Page 17: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

8

relevant within the context of cluster analysis, as presented by Sneath and Sokal (1973), Blashfield and

Aldenderfer (1988), and Lazar (2012) among others.

A first way to measure object similarity is correlation coefficients. A correlation coefficient is a method

to determine the correlation between cases for certain variables. The coefficient takes a look at the

variables for each of the cases and assigns a value between -1 and +1. If the value is 0, then there is no

relation between the cases.

A second possibility is distance measures. Distance measures are also described as ‘dissimilarity

measures’, because a high distance value means that the two cases that were compared are quite

dissimilar. In contrast to the correlation coefficient mentioned above, a distance value of zero means

that the cases have the same values for the same variables. Something correlation coefficients and

distance measures have in common is that they both require metric data. There are several distance

measures which are often used, such as the Minkowski distances, the Euclidean distance, which is

actually a special form of the Minkowski distance formula, and Mahalanobis D2, also known as the

generalized distance. The latter also incorporates the correlations among variables.

A final category of similarity measures is that of association coefficients. As opposed to the two

previously mentioned similarity measures, association coefficients do not require metric data.

Finally, an important metric to mention is Dice’s coefficient (Dice, 1945). It is most commonly used to

measure the distance between Boolean vectors, which are vectors that contain no other values than 0

and 1. Additionally, it assigns a higher weight to double positives. With double positives, it is meant

that a case is present in both vectors, and so these are cases of mutual agreement.

2.2.2 Clustering techniques

Like with the similarity measures above, there are several possibilities when it comes to grouping data

and forming clusters. Summarised below are five well-known approaches for clustering, each of them

differing in how the cluster groups are formed (Blashfield, & Aldenderfer, 1988).

2.2.2.1 Hierarchical agglomerative (bottom-up)

This first technique is probably the most frequently used method for clustering. Following this

technique, clusters are combined based on the similarity between the types used for a certain study, until

all types are grouped into one cluster. The degree of similarity, which is needed to form and distinguish

the clusters, is determined by calculating a similarity matrix. It is useful to mention that opting for this

approach results in non-overlapping clusters, meaning that each type can only be part of one cluster.

However, a cluster can belong to a larger cluster, which creates the typical hierarchical aspect of this

Page 18: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

9

technique. The clusters are often graphically displayed in a dendrogram, also known as a tree diagram,

to give a clear overview of the hierarchical relations. Another aspect of the hierarchical agglomerative

technique is the importance of linkage rules. Here again, there are different options to choose from,

which will be elaborated on in 2.2.3.

2.2.2.2 Hierarchical divisive (top-down)

The hierarchical divisive technique is often seen as the opposite of the hierarchical agglomerative one.

Contrary to the abovementioned technique, here all types firstly belong to one and the same cluster,

which is then divided into smaller parts. Even though both of these hierarchical methods are popular,

many researchers prefer the agglomerative technique over the divisive one because algorithms for

divisive strategies need more computing. Within this method, a distinction is made between monothetic

clusters and polythetic clusters, depending on the criteria to be part of a cluster. With the monothetic

strategy, cluster members are determined based on one (or more) specific variable(s). In order for a type

to be part of such a cluster, it needs to have a certain score for those specific variables. That is why the

monothetic divisive strategy is mostly used with binary data. In order to belong to a polythetic cluster,

however, there is not one single variable which is needed; it suffices for the type to have certain subsets

of the variables.

2.2.2.3 Iterative partitioning

With the iterative partitioning technique, a series of processes is followed. First the dataset is divided

into clusters. After the centroids of the clusters have been computed, each type or data point is allocated

to the closest centroid. This results in new clusters, and so new centroids can be computed. This process

is then repeated until each data point stays in the same cluster and no further shifts take place. What

differentiates this technique from some other techniques, is that it works with the data itself, and not the

similarities between data points. It also processes the data more than once. Useful to mention here is

that this method results in single-rank clusters, meaning that the clusters are not nested and so there is

no hierarchy. A well-known example of partitioning-based clustering is K-means clustering.

2.2.2.4 Density search

First of all to clarify, the term ‘density’ refers to the number of points within a certain space. When using

the method of density search, clusters are seen as a region with a high density of data points in relation

to the regions surrounding it. The purpose of this technique is to form new clusters instead of joining

new cases to already existing clusters. To do this, the distance is measured between an existing cluster

and a new case or cluster.

Page 19: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

10

2.2.2.5 Factor analysis variants

The usual methods of factor analysis typically produce a NxN correlation matrix between variables.

However, variants of factor analysis which are used to form clusters produce a correlation matrix

between the cases, not between variables. From that correlation matrix, factors are then extracted, with

each of the factors resulting in a separate cluster. When a case belongs to a certain cluster, it means that

it has a high correlation to that corresponding factor.

2.2.3 Linking methods

When discussing the hierarchical agglomerative approach for clustering (see Section 2.3.1), we briefly

mentioned the use of linkage rules. These are essential when it comes to forming and linking clusters.

While there are many linkage methods, our focus will be on the four most common ones, namely average

linkage, complete linkage, single linkage and Ward’s method. Something all hierarchical agglomerative

methods have in common is that they first look for the two most similar cases in the similarity matrix.

After these two cases have been merged into a cluster, they do the same for the next two most similar

cases in the matrix, and so on. Important to mention here is that, after a case has been merged with

another one, each separate case in the matrix is replaced by the newly formed cluster they now belong

to. That is something all these methods have in common. Where agglomerative methods differ, however,

is the way in which they merge two clusters instead of two cases. This will be discussed below for the

four most popular linking methods as presented by Blashfield and Aldenderfer (1988). Each explanation

is supplemented with the definition of a cluster for that particular method, as well as some other

characteristics.

2.2.3.1 Single linkage (nearest neighbour)

When using single linkage to merge two clusters, the merging is based on a certain similarity between

at least one case from each cluster. This means that a case from the first cluster has one aspect in common

with a case from the second cluster, or in other words, that there is a single link between two cases from

different clusters. Blashfield and Aldenderfer (1988, p. 450) define this type of clusters as “a group of

entities such that every member of the cluster is more similar to at least one member of the same cluster

than it is to any member of another cluster”. The distance between the two clusters is then equal to the

shortest distance between a case from the first cluster and a case from the second cluster.

The biggest advantage of single linkage is that it is one of the few methods that will not be

affected if the data from the similarity matrix changes. Even though single linkage has this advantage

on most other hierarchical agglomerative linking methods and is one of the most commonly used

methods, the main problem with single linkage is that it tends to form long and thin cluster chains. As

Page 20: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

11

the clustering process continues, that long chain will gradually add new cases to the cluster. This can

create meaningless outcomes, for example when the data is divided into only two clusters, with one

cluster containing only one single case and the other cluster containing all the other cases.

2.2.3.2 Complete linkage (furthest neighbour)

This method can be seen as the opposite of the single linkage rule. Whereas single linkage only needs

one aspect of similarity between two cases from different clusters, complete linkage demands all cases

from both clusters to be similar in order to merge those clusters. The high level of similarity implies that

a cluster can be defined as “a group of entities in which each member is more similar to all members of

the same cluster than it is to all members of any other cluster” (Blashfield, & Aldenderfer, 1988, p. 451)

and so the resulting clusters will be relatively compact compared to those of other linking methods.

Complete linkage is said to be a space-diluting method, which means that when clusters are being

merged, they contract and leave more space in between them. This results in smaller and more separated

clusters of approximately the same size. The distance between two clusters is then equal to the greatest

distance between a case from the first cluster and a case from the second cluster. A disadvantage of

complete linkage is that it tends to produce spherical clusters.

2.2.3.3 Average linkage

The average linking rule is some sort of compromise between the single and complete linking methods.

Sneath and Sokal (1973) labelled single linkage too liberal because only a small level of similarity was

required for the clusters to be merged, while labelling complete linkage too conservative because of the

high requirements of similarity. As the name already implies, the average linking method relies on an

average value of similarity between all cases from one cluster and all cases from another cluster. If a

certain level of similarity is reached, then the two clusters are merged. According to Blashfield and

Aldenderfer (1988, p. 452), “this method defines a cluster as a group of entities in which each member

has a greater mean similarity with all members of the same cluster than it does with all members of any

other cluster”.

Contrary to complete linkage, the average linkage method is said to be space-conserving. This

means that the clusters do not contract nor expand, but instead maintain the original distance between

objects in that space. For this method, the distance between two clusters is equal to the average distance

between a case from the first cluster and a case from the second cluster. The main advantages of average

linkage are that it is less affected by outliers than some of the other linking methods are, and that it

usually comes close to recovering known structures in the data. A disadvantage though is that it tends

to generate clusters with approximately the same amount of variance within clusters.

Page 21: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

12

Average linking has also inspired some variations on this method, such as centroid clustering,

median clustering and the weighted average method. With centroid linkage, the distance between two

clusters is the distance between the centres of those two clusters. For median clustering, the idea of

calculating the mean distance between two clusters is mostly the same. But instead of calculating the

average distance, this linkage method takes the median distance between a case in one cluster and a case

in another cluster. Finally weighted clustering assigns a valued weight to the distances between clusters

and cases, which represents the size of the cluster. These weights are not determined by the researchers,

instead they result from the algorithm. The main difference between normal and weighted average

linking is that the unweighted method calculates the proportionate average, while the weighted method

simply calculates the average without taking any proportions into account. By not considering the

proportions, that is where the assigned weights originate from.

2.2.3.4 Ward’s method

The objective of Ward’s method is to minimize the variance within clusters (Ward, 1963). Ward’s

method introduces the concept of ‘the error sum of squares’ (ESS). This value equals zero when all cases

are still in their original cluster, and increases when clusters are merged. For Ward’s method, the distance

between two clusters is equal to how much the sum of squares increases. When two clusters show the

lowest increase in ESS, those clusters are merged. The clusters are usually fairly equal in size. So for

this method, a cluster can be defined as “a group of entities in which the variance among the members

is relatively small” (Blashfield, & Aldenderfer, 1988, p. 452). A great advantage of this method is that

it can find known structures in the data. Just like complete linkage, Ward’s method is space-diluting as

well.

2.2.4 Criteria

Researchers have several ways of choosing which clustering method is the best for their particular

research. One way to compare clustering methods is to subject them to different criteria. In their study,

Fisher and van Ness (1971) compared a variety of conditions for five different clustering methods.

Figure 1 shows the results and suggests that single linkage satisfied the most conditions. Described

below are the nine admissibility conditions or criteria for evaluating clustering methods as presented by

Fisher and van Ness (1971).

o connected admissible: When all cases from the same cluster are connected by lines, the

clustering method should not produce clusters in which lines from other clusters intersect.

o image admissible: It should not be possible for the data to be clustered in any other way which

is considered better than the original clustering. This means that with another method the

Page 22: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

13

differences within clusters should not be larger compared to the original clustering and

differences between clusters should not be smaller.

o convex admissible: The convex hulls of different clusters should not intersect.

o well-structured admissible: The data should be structured in such a way that the clustering

becomes clear. This criteria is further divided into two separate admissibility conditions:

▪ exact tree: The distance matrix can be reconstructed by only consulting the hierarchical

tree structure.

▪ k-group: All distances within the cluster are smaller than all distances between different

clusters.

o point proportion admissible: When part of a dataset is duplicated and added to a modified dataset

consisting of the original dataset plus the duplicated part, then the results for the modified

dataset should remain the same as those for the original dataset, and so the boundaries of the

clusters should not change.

o cluster proportion admissible: When duplicating each cluster, the clustering method should

produce clusters with the same boundaries.

o cluster omission admissible: When the dataset has been clustered and then one cluster is

removed from the original dataset, the clustering method should produce the same clusters from

the modified dataset as from the original dataset.

o monotone admissible (= monotonic invariant): When an element in the similarity or distance

matrix is changed, the clusters should stay the same.

Figure 1: Comparison of five clustering methods for nine criteria (Fisher & van Ness, 1971, p. 94)

Page 23: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

14

2.2.5 Validation

After the cluster analysis has been conducted, there are several possibilities to validate the results. Lazar

(2012) proposes some indices for cluster validation, namely an external, internal and relative index.

Following the external index, the cluster labels are compared to already existing labels provided by

experts. The internal index evaluates the data on its own and does not match the results with external

information. Lastly, the relative index is used to compare various clustering methods and their cluster

solutions. This can be done with both internal and external indices.

Page 24: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

15

3 METHODOLOGY

There are several parts to this research. A first substantial part consisted of an annotation task

(Section 3.1). This was then used as input for the subsequent and most important parts of the research,

namely the frequency analysis (Section 3.2) and cluster analysis (Section 3.3). Below is a detailed

description of the entire process. The main research question we wish to answer is “Is it possible to

deduce a label set from experimental cluster analysis?”. Our hypothesis for that research question is that

it is indeed possible to provide a label set based on emotion clusters, seeing that both Shaver et al. (1987)

as well as De Bruyne et al. (2019) have already succeeded in doing so (see Section 2.1.2.4). Furthermore,

we think the subquestions about topic and domain will have an affirmative answer as well, as domain

adaptation is a frequently appearing problem in the field of NLP (Daumé, & Marcu, 2006; Glorot,

Bordes, & Bengio, 2011).

For the first part, a total of nine episodes was selected from three different Flemish reality tv series. The

series in question are Bloed, Zweet en Luxeproblemen (BZL), Blind Getrouwd (BG), and Ooit Vrij (OV)

and they were selected based on their emotional content. Additionally, each of these series was selected

because they had a different topic, which makes them suitable for a study on domain adaptation. Bloed,

Zweet en Luxeproblemen shows six privileged youngsters being faced with problematic issues in third

world countries; Blind Getrouwd follows people who were chosen to marry their perfect match without

having ever met them before; Ooit Vrij is a documentary series on prisoners in Belgium and their journey

to being released. For each series we selected the first and last episodes, as well as an episode in the

middle to have a good general overview of its content. The nine episodes were either downloaded

beforehand, or could be consulted on the online video player of the providing TV channel. It is important

to mention that even though the video files were already subtitled for certain parts, those subtitles were

not taken into consideration but transcribed manually by a student worker. The transcriber did not

shorten or correct the spoken language in any way and every sentence was transcribed exactly the way

it was said1.

After the episodes were transcribed, 1000 utterances, so approximately 333 utterances per series,

were selected for further research2. For the research presented here, a subset of 450 utterances was

selected from the initial dataset: 145 utterances from BZL, 155 from BG and 150 from OV.

1 The transcriptions are available on request. Please contact [email protected]. 2 For more information on the PhD research in question, see https://research.flw.ugent.be/en/projects/emotionl-

emotion-detection-dutch.

Page 25: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

16

3.1 Annotation

The 450 utterances were first prepared for manual annotation using an emotion set based on

psychological research (see Section 2.1.2.4) containing 25 emotion categories: anger, contentment,

disappointment, disgust, enthrallment, enthusiasm, envy, fear, frustration, irritation, joy, longing, love,

lust, nervousness, optimism, pity, pride, rejection, relief, remorse, sadness, suffering, surprise, and

torment.

The transcriptions were placed in separate Excel documents per TV series, which were designed

as follows (see Appendix 1): The top row of the file contained all utterances next to each other in

different columns. Below that, the first column of the annotation files contained 25 rows for the emotion

categories and several subcategories (Figure 2). There was also an extra row for any further comments,

such as the presence of irony. All utterance columns were labelled using binary 0|1 annotations for all

of the 25 emotion categories, depending on whether or not the emotion was present in that utterance.

The annotation task was not limited to one label per utterance, so that multiple emotion labels could be

assigned. Not only emotion words were taken into account, but also the overall feeling of the utterances.

To do this, the annotator was asked to perform the labelling task from a speaker perspective, meaning

that they took the point of view of the speaker to judge their feelings at that moment. To clarify certain

features of the task, we will present and comment on some examples in the following paragraphs.

Figure 2: 25 emotion categories with their subcategories

The first utterance contains an example of an emotion word, namely ‘content’ (equivalent to the English

adjective ‘content’ meaning ‘glad’ or ‘pleased’). This might indicate the presence of contentment or joy.

Other possible emotion words include ‘blij’, ‘prachtig’, ‘grappig’, ‘leuk’, ‘gelukkig’, ‘benieuwd’,

‘gefrustreerd’, ‘boos’, ‘gechoqueerd’, ‘verschrikkelijk’, ‘teleurgesteld’, ‘nerveus’ (English translations:

Page 26: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

17

‘joyful’, ‘beautiful’, ‘funny’, ‘nice’, ‘happy’, ‘curious’, ‘frustrated’, ‘angry’, ‘shocked’, ‘horrible’,

‘disappointed’, ‘nervous’) and so on.

“Ah dag Aagje. Content dak u zie.”

The next example from the BG dataset (only the bold sentences are meant to be annotated) shows an

utterance that was labelled with multiple emotion categories. The speaker was told he was about to go

skydiving, which came as an unexpected surprise to him. He felt scared and nervous about the news

because it is totally out of his comfort zone, but also seemed excited to try something adventurous. That

is why these sentences in bold were labelled with the four emotions of surprise, fear, nervousness and

enthusiasm.

“What the fuck… Wa gaan wij doen? - Ge ziet daar een vliegtuig staan eh. - Wij gaan vliegen

jongen. - En gij ga springen. - Nee nee nee nee nee. Zijde ant zwanzen ofwa?”

The utterance below was taken from the dataset for OV. It contains an example of when irony was

indicated in the comment section of the annotation file. At the same time, it helps to clarify what is

meant by ‘annotated from a speaker perspective’. The first part of the excerpt (the preceding sentence)

was outed by someone who was teasing her colleague about being bald, while the second part, the ironic

utterance, is the answer of said colleague. While the first person enjoyed the situation and found it quite

funny, her colleague most likely did not. This illustrates the difference in perspective: from the

perspective of an outsider, the teasing is seen as positive, but from the speaker’s perspective, the event

is actually perceived in a negative way and so the utterance in bold should be annotated for negative

emotion categories. Even though the speaker used a positive emotion word, namely ‘geestig’ (a Flemish

word meaning ‘funny’), that word was used in a sarcastic way, making the utterance rather negative.

This ironic use of ‘geestig’ also demonstrates that not only emotion words should be taken into account,

but rather the entire context of the utterance.

“Kan kik jou ook nie verplichten vo jon haar te laten groeien eh. - Zeer geestig wih.”

3.2 Frequency analysis

The annotation results for the three TV series were first compared on the frequency of the emotion

categories. Using the formulas provided by Excel, we calculated absolute frequencies, as well as relative

frequencies in percentages. These frequencies, along with their emotion categories, were then ordered

from high to low. All results were combined in one table to give a clear overview and facilitate the

comparison between the different series. Besides this, all numbers were also added and the frequencies

Page 27: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

18

were recalculated to show what the results would be if no distinction was made between the different

series and topics. This will be useful when it comes to studying the issue of domain adaptation. All this

data was also converted into graphs for visual clarification.

For the frequency analysis, we first studied the overview that shows a comparison of the three

series per emotion category, looking for any striking peaks or patterns. Then we studied the top 3, top 5

and top 10 of most frequent emotions per topic. More specifically, we took a closer look at how many

of these emotions were positive or negative, and how many belonged to the emotion model proposed by

Ekman (1992). Additionally, the number and positive/negative polarity of emotions that occurred less

than ten times were compared, as well as the emotions that did not occur at all for a certain series, if

any. The reason for examining the spread of Ekman’s emotions, is to conclude whether such basic

emotion frameworks would be suitable for emotion analysis. The results for the three series were then

compared to the combined results to study the difference between topical emotion frequencies and

general emotion frequencies. When analysing the combined frequencies, we referred back to the results

per series and examined the positions in the topical ranking in order to find an explanation for their

position in the general ranking.

3.3 Cluster analysis

The next part of our research was the cluster analysis. There are several steps leading up to the actual

cluster analysis. To conduct a cluster analysis, clusters are needed. In order to distinguish different

clusters, first it is necessary to form tree diagrams, also known as dendrograms. These dendrograms

depict the relations between the emotion categories, so we started by calculating the Dice dissimilarity.

As explained in section 2.2.1, the Dice dissimilarity is calculated to measure the similarity of Boolean

data. For our study, this was done by taking the binary 0|1 annotations and using the resulting Dice

dissimilarity between emotion pairs to draw up a 25x25 distance matrix, which gave us a first insight on

which emotion categories are most similar. This data was then used as input for a hierarchical clustering

algorithm. For each TV series, we generated dendrograms using the seven different linkage methods

described in section 2.2.3: average, centroid, complete, median, single, weighted linkage and Ward’s

method. Dendrograms were also generated for the overall results of the three series combined. After

comparing all linkage methods and using the process of elimination, Ward’s method was chosen to be

the most suitable one.

The next step was to decide on a cut-off value and define the clusters. After trying out and

considering several options, 1.6 was found to be a suitable cut-off value, as it produced an acceptable

number of clusters with a clear structure. If the cut-off line happened to coincide with a horizontal

linking line between members or clusters, we decided to disregard that link and split the clusters. Each

resulting cluster was assigned a different colour to create a clear overview of the separate clusters. For

the umbrella term per cluster, it was decided to adopt the Ekman emotion if one was part of that

Page 28: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

19

particular cluster. If not, we selected the emotion category with the highest frequency. In case of multiple

Ekman emotions within one cluster, the two approaches were combined, meaning that the Ekman

emotion with the highest frequency was selected. For the first cluster analysis, the number of clusters

per TV series was compared and the clusters themselves were studied on their members and polarity, as

well as on their resemblance to Ekman’s basic emotions.

However, as one of our objectives is to examine topical emotion frequencies, we expected some

emotions to be represented more than others when processing the frequency scores for the different TV

series. Following this hypothesis, we generated three new dendrograms (one per TV series) using

Ward’s method once again, but this time excluding the emotion categories which occurred less than ten

times. This threshold of ten was acquired from a similar cluster analysis performed by De Bruyne et al.

(2019). The same cut-off value of 1.6 was maintained to differentiate between the new clusters. In the

new dendrograms, each adapted cluster was assigned the same colour as their corresponding original

cluster. That way, it would immediately become clear which clusters disappeared and which members

shifted or were removed from the original clusters.

The adapted dendrograms and more importantly the clusters were then compared to the original

hierarchical structures. For the second cluster analysis, we examined the changes within clusters, as well

as the modified linking between categories or clusters, if any. Similar to the first cluster analysis, the

distribution of positive and negative emotion clusters was studied as well. Finally, we also took a closer

look at the resemblance between our final cluster labels and Ekman’s basic emotions.

Page 29: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

20

4 RESULTS

4.1 Frequency analysis

This section presents the results from the frequency analysis. First the frequency scores from the three

TV series are presented next to each other in a general overview, and we will give some first impressions.

Then we will take a closer look at the frequency graphs per TV series, where the specific scores are

mentioned and the emotion categories are ranked from high to low. For each graph, the top 3, top 5 and

top 10 of most frequent emotions are examined, as well as the emotions that occur less than ten times

and the spread of the Ekman emotions. We also comment on the polarity of the emotion categories,

which is indicated in colour next to the emotion label. After the analysis per TV series, the same aspects

are studied for the combined frequency scores, of which the results are presented in the final part of this

section.

Figure 3: Frequencies of emotion categories compared per TV series

Page 30: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

21

Figure 3 gives an overview of the number of annotations per TV series for each of the 25 emotion labels.

When examining the emotion model, it was concluded that it consists of ten positive emotions (marked

green) and fourteen negative emotions (marked red). Surprise was considered a neutral emotion (marked

blue) until further notice, because it can be either positive or negative depending on the context. Though

considering the content of the different TV series, we expected surprise to have a positive connotation

in Blind Getrouwd (BG), and a negative connotation in both Bloed, Zweet en Luxeproblemen (BZL) and

Ooit Vrij (OV). This hypothesis in relation to the different polarities was based on the topic summaries

of the TV series, as well as the occurring events that were shown throughout the episodes.

A first look at the graph tells us that the frequency scores for BZL and OV are often the opposite

of the results for BG: when BZL and OV have a high frequency for a certain emotion, the frequency for

that emotion is often low for BG, and vice versa. Take for example the results for joy and frustration.

BG scores almost twice as high (57) as BZL (23) and OV (29) for joy, while its frequency for frustration

(9) is not even a quarter of the frequencies for BZL (47) and OV (53). This clear contradiction between

relatively high frequencies for BZL and OV and a lower frequency for BG, or vice versa, was the case

for more than half of the emotion categories. The frequency results per TV series will be discussed in

more detail in the following paragraphs.

Figure 4: Frequencies of emotion categories for Bloed, Zweet en Luxeproblemen

Page 31: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

22

Figure 4 shows the frequency scores for BZL, ranked from high to low. The top 3 consists of sadness,

frustration and disgust, which are all negative emotions. With the addition of irritation and pity, even

the top 5 consists of all negative emotions. When looking at the top 10 of emotions, two positive

emotions are introduced, as well as the temporarily neutral emotion surprise. On the other end of the

frequency scale, it is clear that there are eight emotions which appear less than ten times (lust, envy,

longing, optimism, pride, relief, enthrallment and love). Only one of these emotions is negative, while

the other seven are positive emotions. Among these is lust, which was not even indicated at all in the

dataset from BZL. Interesting to mention here is that none of the Ekman emotions3 appear less than ten

times. Two of the Ekman emotions, namely sadness and disgust, are even part of the top 3 most frequent

emotions, with a frequency of 55 and 46, respectively.

Figure 5: Frequencies of emotion categories for Blind Getrouwd

Contrary to the results for BZL, the top 3 most frequent emotions for BG does not consist of only

negative emotions. In Figure 5 above it can be seen that the 2 most frequent emotions are positive,

namely contentment and joy, with the negative emotion nervousness closing the top 3. The results for

3 As mentioned in our literature study (see Section 2.1), the Ekman emotions are anger, disgust, fear, joy,

sadness and surprise.

Page 32: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

23

BG continue in this positive direction, as the emotions that complete the top 5 are enthusiasm and

optimism, which are both positive again. However, the top 10 shows a more balanced outcome. Five of

the emotions are positive, four are negative, and the top 10 is closed by the neutral surprise. When

looking at the lower frequency scores, we notice that there are many emotions which appear less than

ten times. More specifically, there are 11 emotions of that kind, which is almost half of the emotion

categories. Nine of these emotions are negative and only two are positive. Surprisingly, the emotion

with a frequency score of zero happens to be part of Ekman’s emotion set. Whereas disgust was the third

most frequent emotion for BZL, it was never indicated for BG. The second least frequent emotion is an

Ekman emotion as well, namely anger. The rest of Ekman’s basic six were labelled more than ten times,

although joy is the only one that made it to the top 5 and top 3 of most frequent emotions with a frequency

of 57.

Figure 6: Frequencies of emotion categories for Ooit Vrij

Figure 6 shows the results for the final TV series, OV. Similarly to BZL, the top 3 for this dataset only

contains negative emotions, namely frustration, sadness and irritation. This remains the same for the

top 5, with the addition of disappointment and nervousness. The top 10 most frequent emotions shows

the first positive emotions, even though the majority still remains negative: seven negative emotions

Page 33: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

24

compared to only three positive emotions. Six emotions appeared less than ten times, which is a lower

number compared to BZL and BG. Two of those six emotions, one positive and one negative, were

never indicated. Four of Ekman’s emotions (sadness, anger, joy and fear) can be found in the top 10,

but only one in the top 5. That emotion is sadness and is again part of the top 3, which was also the case

for BZL, but this time it is only the second most frequent emotion.

Figure 7: Frequencies of emotion categories for the three TV series combined

The last part of the frequency analysis adds all results together without making a distinction between

the different topics from the dataset as was done in Figure 3. Figure 7 above shows the frequency scores

for all three TV series combined. We can see that the top 3 most frequent emotions consists of two

negative and one positive emotion, respectively sadness, nervousness and joy. Sadness already appeared

in the top 3 of two previously discussed TV series. It was the most frequent emotion for BZL and the

second most frequent emotion for OV. Nervousness also appeared in one of the top 3’s, as it was the

third most frequent emotion for BG. It also appeared in the top 5 for OV and the top 10 for BZL. The

last emotion in the top 3 shown above is joy, which was the second most frequent emotion for BG.

Besides that, it was also part of the top 10 for both BZL and OV.

Page 34: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

25

Frustration and contentment are the emotions that complete the top 5 in this combined graph,

setting the balance at three negative and two positive emotions. Interestingly, those two emotions to

complete the top 5 both had the highest frequency score for one of the separate datasets: frustration was

the most frequent emotion for OV and contentment for BG. Frustration was even the second most

frequent emotion for BZL, but remarkably appeared less than ten times in BG. Contentment on the other

hand appeared in the top 10 for both BZL and OV.

The top 10 shows a relatively equal distribution of positive and negative emotions: five negative

emotions, four positive, and again the neutral emotion surprise. When looking at the spread of the

Ekman emotions, we see that only three made it to the top 10 (sadness, joy and surprise), two of which

appear in the top 3 (sadness and joy).

4.2 Cluster analysis

The following paragraphs elaborate on the results from the cluster analysis. As already mentioned in

section 3.3, the hierarchical structures presented below were generated using Ward’s linkage method

(see Section 2.2.3 about cluster linking). Each section is divided into two parts: an initial cluster analysis

and an adapted cluster analysis. The dendrograms are studied on the number and polarity of clusters and

cluster members, as well as the spread of Ekman emotions. The transformation between the two different

cluster analyses is also examined. Lastly the final label set is compared to Ekman’s basic emotions set.

Results are presented for each of the TV series, as well as for the overall dataset.

4.2.1 Bloed, Zweet en Luxeproblemen

Figure 8a: Initial dendrogram for BZL

Page 35: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

26

Figure 8a depicts the dendrogram for Bloed, Zweet en Luxeproblemen using Ward’s linkage method and

applying a distance of 1.6 as the cut-off value. It defines eight different clusters: joy, relief, love, anger,

sadness, remorse, suffering and fear. Of these eight clusters, four have been assigned an emotion label

related to Ekman’s basic emotions (joy, anger, sadness, and fear). When looking at the structure of the

dendrogram, we can clearly see a distinction between two groups based on their polarity. The first three

clusters contain positive emotions, while the last five clusters contain mostly negative emotions.

Compared to Ekman’s set, which only has joy as a positive emotion, these first results are more equally

divided in positive and negative emotion clusters.

A striking observation here is that the fifth cluster contains no less than three of Ekman’s

emotions, namely surprise, disgust and sadness. Another interesting observation is that longing, a

positive emotion, is clustered together with remorse and envy, two negative emotions.

Figure 8b: Adapted dendrogram for BZL without infrequent emotions

Figure 8b shows the adapted dendrogram after excluding the emotion categories that were indicated less

than ten times. As expected, there are less clusters compared to the original hierarchical structure, but

they are very similar. Cluster 1 contains enthusiasm, contentment and joy; cluster 2 consists of rejection,

disappointment, anger, frustration and irritation, cluster 3 comprises surprise, disgust, pity and sadness;

suffering and torment form cluster 4. These four clusters have remained the same. Cluster 5 has remorse,

fear and nervousness as its members, which shows a new link between the previously consisting cluster

of fear and nervousness with the newly added remorse. When looking back at Figure 8a, we can see that

Page 36: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

27

remorse used to be grouped together with envy and longing. As the latter two were removed, remorse

shifted to one of the other clusters. This new dendrogram results in five clusters and a final label set with

joy, anger, sadness, suffering and fear.

Furthermore, as the one original cluster that contained both negative emotions and a positive

emotion has been split and partially removed, all clusters now consist of either only positive or only

negative members (with the exception of surprise). However, compared to the first cluster analysis for

BZL, it is clear that the distribution between positive and negative emotions is now less equal than

before: one positive cluster against four negative clusters. The only negative emotion label that

disappeared from the original cluster labels is remorse, but the one remaining member of that cluster is

now part of the fear cluster. Two of the positive labels have disappeared, which leaves only joy. This is

very similar to the distribution in Ekman’s emotion set. Overall, our final label set shows a fair

resemblance to Ekman’s basic emotions, with a total of four shared emotion labels (joy, anger, sadness

and fear).

4.2.2 Blind Getrouwd

Figure 9a: Initial dendrogram for BG

Figure 9a shows the dendrogram for Blind Getrouwd, with the same linkage method and cut-off value

used for BZL. The first cluster analysis results once again in eight separate clusters: sadness, anger, joy,

suffering, surprise, pity, longing and love. Similar to the results for BZL, four out of the eight clusters

have been assigned an emotion label from Ekman’s basic emotions (sadness, anger, joy and surprise).

When examining the polarity of the clusters, we can see a mixture of three positive labels, four negative

Page 37: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

28

labels and the temporarily neutral surprise. However, as the surprise cluster further contains all negative

emotions, we will consider this entire cluster to be negative as well, regardless of the neutral umbrella

term. This makes up for a total of three positive cluster labels and five negative cluster labels. Yet again,

this is a fairly equal distribution in ways of polarity, and definitely more equal than within Ekman’s

emotion set.

In addition, some further interesting observations are to be found in this dendrogram. There is

one cluster which again, similarly to the results for BZL, contains more than one Ekman emotion,

namely the third cluster with both surprise and fear. Additionally, longing and envy are once more

clustered together, even though they are respectively a positive and a negative emotion. This was also

the case with the initial cluster analysis for BZL.

Figure 9b: Adapted dendrogram for BG without infrequent emotions

Figure 9b shows the adapted dendrogram for the second cluster analysis. What immediately becomes

clear, is that there are remarkably less clusters. Due to the many emotion categories that were indicated

less than ten times, the number of clusters was halved. The clusters that remain, however, are very

similar to the original clusters. The sadness, surprise and joy clusters have stayed intact, while the love

cluster has both lost members and gained a new member. After envy was removed, instead of forming a

cluster on its own, longing joined the positive love cluster. This gives us a total of four clusters: cluster

1, labelled sadness, consists of rejection, disappointment and sadness; cluster 2 labelled surprise,

contains surprise, fear and nervousness; cluster 3, labelled joy, comprises enthusiasm, contentment, joy,

optimism and pride; and finally cluster 4, labelled love, covers enthrallment and love as well as longing.

Page 38: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

29

When judging on polarity, we see a balance of two negative and two positive emotion clusters,

respectively sadness and surprise in contrast with joy and love. This balance is undoubtedly more equal

than Ekman’s distribution. Comparing the final labels to Ekman’s basic emotions, it can be observed

that our clusters only have three labels in common with Ekman, namely sadness, surprise and joy.

Surprisingly, one of the clusters that disappeared is anger, which happens to be an Ekman emotion. But

then again, that particular emotion category was only indicated once for BG, and the other members of

the cluster also occurred less than ten times. Disgust, another Ekman emotion, was never even indicated

and was therefore excluded from both cluster analyses.

4.2.3 Ooit Vrij

Figure 10a: Initial dendrogram for OV

Figure 10a plots the dendrogram for Ooit Vrij, again adopting the same linkage method and cut-off value

as for the previously discussed structures. Similar to the other TV series, the first cluster analysis for

OV results in eight different clusters: joy, optimism, anger, disgust, sadness, fear, lust and surprise.

Remarkably, all six of Ekman’s basic emotions appear in different clusters, meaning that six of the

clusters are labelled with one of Ekman’s emotions. When looking at the polarity of the clusters, we

again see a mixture of positive and negative emotions. There are three positive emotion clusters, four

negative clusters and one labelled with the temporarily neutral surprise. Given the fact that the emotion

category surprise is only clustered together with pity, a negative emotion, we will consider this cluster

to be negative as well. This results in a final five negative clusters against three positive clusters, which

is still more equally distributed than Ekman’s emotion set. As this has also been the case for the previous

Page 39: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

30

topics, it could mean that a first unedited cluster analysis would typically produce more equally

distributed clusters in the sense of polarity.

Contrary to the first cluster analyses of BZL and BG, Figure 10a shows that all negative

emotions are grouped exclusively with negative emotions, and the same applies to all positive emotions.

However, the clustering results for OV show the first cluster in this research that consists of only one

emotion category, which in this case is lust.

Figure 10b: Adapted dendrogram for OV without infrequent emotions

Figure 10b shows the last adapted dendrogram generated for domain-specific cluster analysis. As

opposed to the other adapted dendrograms, this one shows a rather high number of clusters, seven to be

precise. At first glance, it seems as if not much has changed, because some of the structures have

remained intact. The only cluster which completely disappeared is the cluster that consisted of only one

member, lust. This results in a final label set with joy, optimism, anger, sadness, disgust, fear and

surprise, which means that all six of Ekman’s emotion labels were preserved. Cluster one is formed by

contentment, enthusiasm and joy; cluster 2 consists of love, longing and optimism; cluster 3 covers

anger, frustration and irritation; cluster 4 groups together rejection and sadness; cluster 5 contains

disgust, suffering and torment; cluster 6 comprises fear and nervousness; and lastly cluster 7 compiles

surprise, disappointment and remorse.

A striking observation is that one of the original clusters, namely the sadness cluster, has been

split into two parts, each with two members. Rejection and sadness stay linked and remain in the sadness

cluster, while disappointment and remorse are now clustered together with surprise.

Page 40: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

31

When we assess the polarity of the clusters, it is obvious that not much has changed. The one

cluster that disappeared was a positive emotion cluster, so now the distribution is five negative clusters

against only two positive clusters instead of three. Based on the distribution of polarity and the emotion

labels of the clusters, these results show a great resemblance to Ekman’s basic emotions.

4.2.4 Combined data

As already mentioned in our methodology, the three TV series for our research were chosen because of

their different topics. As part of this research, we also wanted to investigate whether emotion clusters

resulting from experimental cluster analysis are dependent on the topic of the data. To answer this sub-

question about domain adaptation, we combined the annotation results from the three TV series without

differentiating between the topics and used the complete dataset as input for a separate dendrogram. The

results from the cluster analysis are described in the following paragraphs.

Figure 11a: Initial dendrogram for the three TV series combined

Just like with BZL, BG and OV, we selected the dendrogram generated with Ward’s method and decided

on a cut-off value of 1.6. As can be seen in Figure 11a, this results in eight different clusters, which was

also the case for the topical cluster analyses. Cluster 1 contains enthusiasm, contentment and joy, cluster

2 lust, envy and longing, cluster 3 relief, enthrallment, love, optimism and pride, cluster 4 anger,

frustration and irritation, cluster 5 rejection, disappointment and sadness, cluster 6 remorse, surprise,

disgust and pity, cluster 7 suffering and torment, and finally cluster 8 contains fear and nervousness. The

Page 41: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

32

preliminary label set for the combined data now consists of joy, longing, optimism, anger, sadness,

surprise, suffering and fear, which shows a fair resemblance to Ekman’s emotion set.

Figure 11a also clearly shows two distinct groups within the hierarchical structure. The first

three clusters are positive emotion clusters, while the last five clusters are negative emotion clusters.

Even though cluster 6 is labelled with the neutral surprise, it can be considered a negative cluster because

all other emotions in that cluster are negative. Compared to Ekman’s set, these results are more equally

divided in ways of polarity. Another interesting remark about this cluster is that it contains two of

Ekman’s basic emotions, namely surprise and disgust.

When looking at the members within the separate clusters, we can see that all members are

either exclusively positive or negative, except for the members of the second cluster. In this positive

cluster, lust and longing, two positive emotions, are grouped together with envy, a negative emotion. It

is now the third time that such observation has been made, as this already occurred within two of the

topical cluster analyses, both times also involving longing and envy. This could possibly mean that these

two emotions often occur together, or that longing tends to be clustered together with negative emotions,

but further thoughts on this and the rest of the results will be discussed more elaborately in the

discussion.

Figure 11b: Adapted dendrogram for the combined data without infrequent emotions

Figure 11b shows the adapted dendrogram for the dataset of all three TV series combined. As opposed

to the previous adapted dendrograms, for this dendrogram the emotion categories were removed which

Page 42: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

33

occurred less than thirty times, instead of ten times. Considering the fact that the combined dataset is

approximately three times the size as the dataset per TV series, the threshold was tripled as well.

Similarly to the results for the domain-specific cluster analysis, the adapted dendrogram shows

fewer clusters than before. There are now only six clusters compared to the original eight. However,

when we take a closer look at the composition of the clusters, we see that none of the original clusters

completely disappeared. All clusters are still represented by at least one cluster member. Four of the

original clusters stayed intact, one was joined by a member of another cluster, and another one lost some

original members while gaining a new one. This results in these final six clusters: cluster 1 remained

intact with enthusiasm, contentment and joy; cluster 2 still contains love and optimism and was joined

by longing; cluster 3 also stayed intact with anger, frustration and irritation, cluster 4 remained the

same with fear and nervousness, but just shifted to another position in the hierarchical structure; cluster

5 still consists of the original members rejection, disappointment and sadness, and gained a new

member, namely suffering; and lastly cluster 6 is an original cluster as well, composed by remorse,

surprise, disgust and pity. This results in a final label set with joy, optimism, anger, fear, sadness and

surprise. With only one deviating emotion label (optimism instead of disgust, which is part of the

surprise cluster), this shows a great resemblance to Ekman’s basic emotions set.

In ways of polarity, we can see that the positive emotions are still outnumbered in the final label

set. As the surprise cluster has not changed and so can be considered a negative cluster, this leaves four

negative emotion labels (anger, fear, sadness and surprise) against only two positive emotion labels

(joy and optimism). Compared to Ekman’s set, though, this is already an improvement. Just like in the

initial dendrogram for the combined data, these two polarities are again clearly separated in the

hierarchical structure. Contrary to the clusters in Figure 11a, all clusters now consist of all positive or

all negative emotions. The orange cluster, which contained members of different polarities, has been

split. Two of the members were removed when applying the threshold of thirty, and the one remaining

member longing, a positive emotion, was grouped together with the remaining members of the positive

optimism cluster.

Page 43: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

34

5 DISCUSSION

In this part of the paper, we will discuss the results from our research, compare it to previous research,

and comment on certain characteristics. First of all, the results will be examined in more detail. The

most striking observations are selected for further analysis and will be presented per subject. We will

attempt to give an explanation for certain observations by linking the frequency analysis to the different

cluster analyses. Some other aspects of the data, such as topic and polarity, will be taken into account

as well. Furthermore, we will compare our outcome to the results from a similar research performed on

Dutch tweets (De Bruyne et al., 2019) and discuss some interesting similarities and differences in terms

of clustering, polarity and of course the final label set. Finally, we will briefly discuss the validity and

reliability of our research, as well as comment on the added value of this research in the field of natural

language processing (NLP).

5.1 Analysis of the results

A first interesting observation was made when analysing the frequency overview. The results showed

that BZL and OV often score rather high for a certain emotion while BG will score relatively low for

that same emotion category, or vice versa. This can be explained by the difference in topics. Even though

all three TV series showed both positive and negative emotions, during the transcription and annotation

task it appeared that the majority of the events shown in BZL and OV were predominantly negative,

while those in BG were found to be predominantly positive. We had expected to see an impact of this

in the results, and this expectation was met quite early in our analysis. This matter of different

predominant polarities is also reflected in the frequency tables per TV series: the top 3 and top 5 most

frequent emotions for BZL and OV contained solely negative emotions, whereas the top 3 and top 5 for

BG only contained a single negative emotion. The same goes for the emotion categories that were

indicated less than ten times, which were mostly positive for BZL and OV and mostly negative for BG.

When comparing the cluster analyses for the three TV series, we can see that the first cluster analysis

always resulted in a total of eight emotion clusters. Even the initial dendrogram for the combined data

revealed eight clusters. This is presumably just a coincidence related to the linkage method and cut-off

value that were selected. Even if we had chosen Ward’s method but decided on a different cut-off value,

it is very likely that the number of clusters would have varied. Besides that, a more relevant observation

regarding the clusters is that the first cluster analysis always resulted in clusters that were fairly equally

divided in terms of polarity. In fact, the initial distribution is three positive emotion clusters and five

negative emotion clusters for all three TV series and even for the combined data as well. When analysing

the distribution in polarity, we have to keep in mind that the emotion model used for this research

contains more negative than positive emotions (respectively fourteen against ten), and so it is more likely

Page 44: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

35

that there would be fewer positive clusters than negative clusters. Logically though, this would only be

the case if positive emotions are primarily grouped together with other positive emotions (and vice

versa), and if the clusters are relatively similar in size. As we have mentioned in our literature study

(Section 2.2.3.4), clusters generated with Ward’s method are indeed usually fairly equal in size, so the

assumption made here can be considered plausible.

The second cluster analysis, where the emotion categories with a frequency score lower than ten

were excluded, clearly shows that more discrepancies occurred between the TV series. First of all, the

number of final clusters differs. The eight original clusters were reduced to five final clusters for BZL

and only four for BG, while OV still retained a total of seven clusters. Examining the polarity of the

clusters, we can see that the fairly equal distribution from before has now shifted towards the negative

side, which corresponds to the unbalanced distribution in Ekman’s basic emotions set. Only one out of

five remaining clusters for BZL is positive, and for OV only two out of seven clusters are positive. The

final clusters for BG are the only exception, as the second dendrogram does not show a polarity shift

towards the negative side. In fact, the adapted clusters show a perfect balance of two positive and two

negative clusters.

This shift in polarity can be explained by the clusters that were removed, which is linked to the

frequency analysis. For BZL, the emotion categories that were indicated less than ten times were almost

all positive. Consequently, the two clusters of which all members completely disappeared (the orange

and yellow clusters in Figure 8a) were both positive, leaving only one positive cluster and so one positive

emotion label in the final label set. For OV, the only cluster that completely disappeared (the pink cluster

in Figure 10a) was a positive cluster. Even though it only consisted of one emotion category, this positive

cluster label was still removed from the final label set, increasing the imbalance between positive and

negative emotion labels even more. The other emotion categories that were indicated less than ten times

were all part of different clusters. So even after removing those categories, the final emotion label and

the polarity of the cluster were still represented by other members in that cluster. On the other hand, the

clusters that completely disappeared for the second cluster analysis of BG (the orange, green and purple

clusters in Figure 9b) were all negative, leaving only two negative emotions in the final label set. Then

again, the remaining members of two of the positive clusters merged into one cluster, decreasing the

total number of positive emotion labels from three to two, but causing the perfect balance between

positive and negative cases.

Additionally, there are some interesting remarks to be made about how the positive and negative clusters

are linked. If we take a look at both the initial and adapted dendrograms for BZL, we can see that there

is a clear separation between the positive and negative clusters. The initial dendrogram shows three

positive clusters grouped together on the left, and five negative clusters grouped together on the right.

The adapted dendrogram shows four negative clusters grouped together on the right, which are then

linked to the one positive cluster on the left. The same observation can be made with the adapted

Page 45: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

36

dendrogram for OV: five negative clusters are grouped together on the right side of the dendrogram and

are then linked to the two positive clusters grouped together on the left side.

At first sight, it seems as if the adapted dendrogram for BG too shows a clear separation in terms

of polarity. However, when we take a closer look at the hierarchical structure, it becomes clear that

disregarding the highest linking line does not separate the two positive clusters from the two negative

clusters. First the two positive clusters on the right side are grouped, and then this positive group is

linked to a negative cluster. Lastly, the final negative cluster on the left is linked to this group of three

clusters with different polarities. While the polarities are not really mixed, this is still a significant

difference compared to some of the other dendrograms where the highest line is the only link between

positive and negative clusters.

When we look at the initial dendrogram for BG, though, we immediately notice that the initial

clustering does not result in two distinct groups of all positive clusters on one side and all negative

clusters on the other. The polarities are mixed: on the far right, we start with two positive clusters

grouped together. These are then linked to a negative cluster, and then this group is linked to another

group of two negative clusters. These five clusters (of which two are now positive and three negative)

are then linked with a positive cluster. Finally, this group of six clusters on the right is connected to a

group of two negative clusters on the left. A first hypothesis as to why in this case the polarities are

mixed, is that perhaps certain utterances from BG were labelled with both positive and negative emotion

categories. A similar observation can be made with the initial dendrogram for OV, as this dendrogram

too shows a positive cluster intruding in the negative cluster group.

The clusters for OV show almost the exact same polarity distribution, and the initial dendrogram

for OV would have already achieved the perfect separation between positive and negative clusters that

can be seen in the adapted dendrogram, had it not been for the positive lust cluster amidst the negative

cluster group. This positive cluster is first grouped with a negative cluster, and then four more negative

clusters are linked to the group. Finally, the highest linking line connects this group of almost all negative

clusters on the right side of the dendrogram to a group of two positive clusters on the left.

Furthermore, we want to analyse some odd behaviour of certain emotion categories. A first emotion that

is worth discussing is surprise. As mentioned before, surprise was considered a neutral emotion, as it

can be both positive or negative depending on the context. Based on the topics of the TV series, we

made the hypothesis that surprise would be positive for BG, but negative for BZL and OV. Yet, when

we take a closer look at the emotion categories that were clustered together with surprise, we can see

that they are all negative for each of the TV series. For BZL, surprise is clustered with disgust, pity and

sadness. The dendrograms for BG show that the surprise cluster further consists of fear and nervousness.

For OV, surprise was first linked with pity, and then in the adapted dendrogram it was grouped with

disappointment and remorse. Even the combined data shows all other emotions in the surprise cluster,

namely remorse, disgust and pity, to be negative. From this observation, we can derive that surprise

Page 46: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

37

often occurred in utterances with negative emotions, which might implicate that it should be considered

a negative emotion as well. Our previous hypothesis on the polarity of surprise can therefore be rejected.

However, if we examine the annotations in more detail, there are some interesting results. We

have counted the number of times that each of the emotion categories was indicated with surprise, and

some of the observations actually oppose the cluster formation and instead support our initial hypothesis

on the polarity of surprise. As already mentioned, the surprise cluster for BZL further contained disgust,

pity and sadness, which corresponds to the annotations: those three emotions are indeed the labels with

which surprise was indicated the most for this topic (respectively 11, 9 and 11 times). For BG, though,

the emotion categories that were clustered together with surprise (fear and nervousness) were not the

most frequent emotions to be annotated with surprise. While fear was only indicated 3 times in

combination with surprise and nervousness only 8 times, contentment, enthusiasm and joy were each

indicated ten times or more. What is so striking about this observation, is that for BG the three most

indicated emotions in combination with surprise are all positive, meaning that surprise more frequently

occurred with positive emotions. This conflicts with what is shown in the emotion clusters for BG, as

all other members in the surprise cluster were negative, making surprise seem negative as well. If we

then take a closer look at the data for OV, we notice that the emotions that occurred the most with

surprise are frustration and nervousness, which were both indicated 6 times. Again, these emotion labels

do not match the members in the surprise clusters, as the initial dendrogram shows surprise in a cluster

with pity and the adapted dendrogram shows disappointment and remorse clustered together with

surprise. Nevertheless, seeing that frustration and nervousness are negative emotions just like pity,

disappointment and remorse, those alternative combinations would still result in surprise being

considered negative, and so for this topic our hypothesis on the polarity of surprise can be confirmed.

Overall, there was a significant difference between the number of occurrences with positive

emotions and negative emotions. The frequency scores of surprise being indicated in combination with

either positive or negative emotions is presented in Table 1 below. It becomes immediately clear that

surprise was more often indicated with negative emotions for both BZL and OV, while it was more

often indicated with positive emotions for BG. In relation to the polarity of surprise, this means that

surprise can be considered negative for two and positive for one of the topics. As opposed to the

clustering results, these observations from the annotation dataset correspond to our hypothesis on the

polarity of surprise.

Furthermore, these frequency scores also reveal another characteristic of surprise. In the entire

dataset, surprise was indicated a total of 58 times, but interestingly only occurred on its own twice. All

56 other times, it was indicated in combination with one or more emotion categories. In fact, almost all

emotion categories from our label set (22 out of 25 labels, or 24 if you leave out surprise) were indicated

at least once in combination with surprise. This implies that surprise is a versatile, ever-changing

emotion that can be paired with all sorts of emotions. On the other hand, it might also suggest that

Page 47: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

38

surprise can often be considered as a trigger emotion followed by another more prominent ‘main’

emotion.

BZL BG OV

negative emotion 57 12 32

positive emotion 30 42 9

no other emotion 1 0 1

Table 1: Polarity combinations with surprise

Another interesting emotion to examine is longing, which is one of the ten positive emotions in our label

set. When examining the other cluster members, though, we notice that longing was occasionally

grouped together with negative emotions. The initial dendrogram for BZL shows longing in the same

cluster as remorse and envy, two negative emotions. In the initial dendrogram for BG, longing again

shares a cluster with envy. It is only after the infrequent emotion categories for BG are removed that

longing becomes part of a positive cluster with enthrallment and love. The initial dendrogram for the

combined data also shows longing clustered together with envy, but then again it is grouped with the

positive emotion lust as well. This repeated link between longing and envy might indicate that these two

emotion categories often occurred together. Interestingly, when we take a look at the frequency scores,

we see that envy was always removed for the adapted cluster analysis, while this only happened once

with longing. This suggests that longing was more often indicated for an utterance containing envy, and

not the other way around. In other words, envy probably triggered the feeling of longing too, while

longing did not always trigger envy and so could be labelled on its own. This connection between envy

and longing can initially be supported by simply thinking about situations in which these two emotions

would occur together: if one is envious of someone or something, they would most likely long for that

specific situation to happen to them (Utterance 1). But when an individual longs for someone or

something, there is not always a specific person that they envy; it could just be a general feeling

(Utterance 2). To clarify, we present some supporting examples from the annotations dataset:

Utterance 1: “Ik kom uit een euh een warm gezin. Mijn ouders zijn toch al heel lang getrouwd.

Tis een heel warm, goed koppel. Tzijn ook mensen die echt wel pro-huwelijk zijn. En zo wil ik

het ook wel voor mezelf.”

Utterance 2: “In mijn leven is alles eigenlijk compleet. Kheb een toffe job, supertoffe vrienden

euh, lieve familie. Alles is er euh om gelukkig te zijn buiten da euh da één stukje da denk ik toch

alles wel compleet kan maken. Tis volgens mij wel leuker me twee.”

Page 48: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

39

Utterance 1 was labelled with both envy and longing, while utterance 2 was only labelled with longing.

The speaker of utterance 1 mentions their parents and how they want to have a loving marriage like

theirs, which is the reason why not only longing but also envy was indicated for this utterance. With

utterance 2, on the other hand, the speaker merely says what he would like to experience and does not

mention a specific person they are envious of. That is why only longing was indicated for this utterance.

In addition, we can refer back to the annotations and frequency analysis to further support our findings.

The frequency overview (Figure 3) tells us how many times envy and longing were labelled for each of

the TV series. After further examining the annotations, we have come to the conclusion that for BZL

envy and longing were always indicated on their own. Envy was never indicated for OV, and so longing

was always labelled separately. For BG though, these two emotions also occurred together: apart from

the 11 times that longing was indicated by itself, it was also indicated all four times that envy was

labelled, meaning that envy only occurred in combination with longing. However, we should mention

that the combination of these two emotions does not always occur together by default. Longing was

more frequently linked to other positive emotions such as love and optimism and was part of positive

emotion clusters too, which is shown in both dendrograms for OV and the general dataset, as well as the

adapted dendrogram for BG, which was already mentioned earlier.

One more striking observation is the continuous clustering of certain emotion categories. This occurred

for two groups of three emotions: the negative group of anger, frustration and irritation, as well as the

positive group of joy, contentment and enthusiasm.

For all three topical datasets and even the combined dataset, anger, frustration and irritation

were always clustered together. Utterance 3 and 4 below show examples from the annotations where all

three emotions were labelled at once. The only dendrogram where this cluster does not appear, is the

adapted dendrogram for BG, as those emotion categories were removed from the set for the second

cluster analysis. Both dendrograms for BZL show anger, frustration and irritation grouped together

with rejection and disappointment as well, but in all the other dendrograms, those three emotion

categories form a cluster on their own. After taking a closer look at the annotations, we counted 40

utterances where anger, frustration and irritation were indicated together. For 43 other utterances, one

of these three emotions was annotated in combination with another one of the three emotions.

Considering the fact that anger only appeared 57 times in the entire dataset, this tells us that anger

primarily occurred alongside other emotions.

This remarkable pattern either suggests that anger, frustration and irritation regularly occur

together, or that the difference in meaning between the three is rather small. In fact, the latter explanation

is supported by certain emotion frameworks, as some do not even view these labels as separate emotions.

Basic emotion frameworks in particular often capture frustration and irritation under the umbrella term

anger. Take for example Ekman’s Atlas of Emotions (atlasofemotions.org), which is an educational tool

about the five most agreed on universal emotions that form the foundation for many emotion frameworks

Page 49: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

40

(see Section 2.1.2.3). This model considers frustration as one of the states of anger, not as a separate

emotion. Parrott (2001), and later in their study also Shaver et al. (1987), even classify frustration as a

tertiary and irritation as a secondary emotion of the primary emotion anger. In fact, many basic emotion

sets (Tomkins, 1984; Plutchik, 1980; Izard, 1971) are fairly limited and only include anger, whereas the

more extensive emotion frameworks (Shaver et al., 1987; Russell, 1980; Schröder, Pirker, & Lamolle,

2006) sometimes also include frustration and in some cases even irritation.

Utterance 3: “Gij komt hier altijd met van die stomme flauwekul, gij. Kheb da nie nodig.”

Utterance 4: “(baas roept hurry up) - Jaaaaaaah. Gohhhh.”

Besides this group of negative emotions, joy, contentment and enthusiasm were always clustered

together as well, this time without any exceptions. The clustering of these positive emotions can be seen

in each and every one of the dendrograms, though not always as a separate cluster. Both dendrograms

for BG show this group of three being joined by two other cluster members, namely optimism and pride.

In the initial dendrogram for OV, it is only pride who completes the joy cluster, but this emotion category

is then removed for the second clustering due to the frequency threshold of ten. We again examined the

annotations and counted a total of 55 instances where joy, enthusiasm and contentment were indicated

for the same utterance. This is about half of the number of times that each of these emotions occurred

(the total frequency score in the entire dataset is 109 for joy, 107 for contentment and 93 for enthusiasm).

In 55 other cases, two of these three emotions were annotated together. Utterance 5 and 6 show examples

of an utterance for which all three emotion categories were indicated.

Utterance 5: “Oh, top! Nee? Echt wel chill, eh?”

Utterance 6: “Eindelijk ga ik vannacht in een goe bed slapen. Amai. Die flutmatrasjes, kga ze

nie missen. Voila.”

A final aspect of the results that we want to examine, is the spread of Ekman’s basic emotions.

Remarkable here is that almost all dendrograms show a cluster that contains multiple Ekman emotions.

The only structures that show the incorporated Ekman emotions separated in different clusters, are both

the initial and the adapted dendrogram for OV. All other topical datasets and the combined dataset as

well generate one particular cluster with two or even three Ekman emotions. The cluster for BZL

contains surprise, disgust and sadness from Ekman’s set. For BG, surprise and fear are clustered

together. Lastly, the combined dataset shows surprise and disgust in the same cluster. Interestingly,

these clusters remained intact in the adapted dendrograms. What immediately stands out when

examining these labels, is that surprise is always included in the cluster with more than one Ekman

emotion. We think this can be explained by the variable nature of surprise. As we have already

mentioned, surprise can be both a positive or a negative emotion. Furthermore, surprise can be paired

Page 50: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

41

with a variety of other emotions, as it does not often occur alone, but rather in combination with another

emotion category (see Table 1). Surprise can then be regarded as some sort of trigger emotion, followed

by another more prominent emotion. It has also come to our attention that the polarity of surprise then

depends on the polarity of the other emotions in that utterance: if for example surprise triggers the

positive emotion joy, then surprise itself will be considered a positive emotion as well. Utterance 7

presents an example where surprise triggered the emotions contentment, enthusiasm and joy. As these

are all positive emotions, for this utterance surprise would be considered a positive emotion as well.

Utterance 7: “Man, man, man. Waar ben ik aan begonnen? - Ik ga trouwen! - Oh my god! - Ho

jongen, da meende nie? Toch nog vant straat. Proficiat. Nu gaat beginnen.”

5.2 Comparison to previous research

To evaluate the outcome of our research, we would like to compare our results to those for another

domain, as well as to an already existing and frequently used emotion framework. De Bruyne et al.

(2019) have conducted a frequency and cluster analysis similar to ours, but instead of TV series, they

used Dutch tweets as data. Their study resulted in an empirically grounded framework intended for

domain-specific automatic emotion detection on tweets. To further examine the resemblance of our

results to already existing frameworks, our final label set will also be compared to Ekman’s well-known

basic emotions set.

We decided to compare the findings of our analysis to those of De Bruyne et al. (2019), which

revealed a number of similarities. Firstly, even though they adopted a different linking method and cut-

off value, their first cluster analysis (Figure 12a) resulted in a total of 8 clusters. This was also the case

for all four of our datasets. While this is a rather superficial observation that is probably a mere

coincidence, it remains a first noticeable similarity between both studies.

Page 51: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

42

Figure 12a: Initial dendrogram for tweets (De Bruyne et al., 2019)

Figure 12b: Adapted dendrogram for tweets without infrequent emotions (De Bruyne et al., 2019)

When we take a closer look at the dendrograms for tweets, we see that some of our striking observations

presented in the section above recur in this study as well. One of the remarkable things all results have

in common, is the repeated clustering of two different groups: anger, frustration and irritation on the

one hand, and joy, contentment and enthusiasm on the other hand. As we have pointed out in our analysis

above, these two groups mostly occurred on their own, but sometimes also shared a cluster with other

Page 52: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

43

emotion categories. In the dendrograms for tweets, however, only the latter can be observed: in both

dendrograms, the negative cluster with anger, frustration and irritation also contains disgust and

disappointment, and the positive cluster with joy, contentment and enthusiasm is completed by pride.

Interestingly, disappointment and pride belong to the selection of additional emotions that in some cases

were clustered together with one of the two groups in our research as well.

Furthermore, we have noticed that in both the initial and adapted dendrogram for tweets,

surprise is clustered with solely negative emotions, namely pity and sadness. As mentioned in the

previous section, the results for reality TV also showed all other members in the surprise cluster to be

negative. Some of those negative emotion categories were pity and sadness as well, among others.

Another situation that recurs is the incorporation of multiple Ekman emotions in one and the

same cluster. With our results, we saw this happening for three of the four datasets, and each time the

emotion category surprise was involved. This is not very different from what can be seen in the

dendrograms for tweets: the two dendrograms show both surprise and sadness in one cluster. However,

these are not the only Ekman emotions that are clustered together. One of the other clusters also shows

anger and disgust grouped together. Whereas our results showed a cluster that contained no less than

three Ekman emotions, there was never a dendrogram with more than one cluster containing multiple

Ekman emotions, let alone two dendrograms, so this observation is very particular.

A final similarity can be found in the frequency analyses of both studies. The most frequent

emotions for tweets actually show a great resemblance to the most frequent emotions for one of our

topical datasets, namely the one for BG. This observation again involves the emotion categories joy,

contentment and enthusiasm: In both frequency overviews, we can see that the two most frequent

emotions are contentment and joy. The third most frequent emotion in the tweets dataset is enthusiasm,

which happens to be the fourth most frequent emotion for BG.

In relation to emotion labels, our research on TV series transcriptions has produced a total of four label

sets: three are domain-specific in accordance with the specific TV series, while the fourth is rather

general. In theory, the general label set should resemble Ekman’s emotion set the most, seeing that both

frameworks are intended for all sorts of broad topics. Table 2 below gives an overview of the different

emotion frameworks for comparison. The emotion categories which are not part of Ekman’s set are

underlined.

Page 53: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

44

Set Number of

emotions

Emotion labels

BZL (topic 1) 5 anger, fear, joy, sadness, suffering

BG (topic 2) 4 joy, love, sadness, surprise

OV (topic 3) 7 anger, disgust, fear, joy, optimism, sadness,

surprise

BZL+BG+OV (general) 6 anger, fear, joy, optimism, sadness, surprise

Basic emotions (Ekman, 1992) 6 anger, disgust, fear, joy, sadness, surprise

Tweets (De Bruyne et al.,

2019)

5 anger, joy, love, nervousness, sadness

Table 2: Emotion sets

With the total number of emotions ranging between four and six, and the majority of the emotion labels

being Ekman emotions, it is clear that all sets show a fair resemblance to one another. However, if we

compare our label sets to the two other frameworks, it appears that our sets are more similar to Ekman’s

basic emotions set than to the label set for tweets. This difference between the label sets for reality TV

and the one for tweets confirms that the content of both genres clearly differs, which implies that it might

be beneficial to modify the label set according to the topic or genre of the data.

Looking at the final labels for the topics from our research, we notice that there is always only one

emotion category which is not part of Ekman’s set. Interestingly, the label set for OV even incorporates

all of Ekman’s emotions, with the addition of optimism. Our general label set shows all Ekman emotions

but one, disgust, which has been replaced by optimism. With only one discrepancy each, these two label

sets show the greatest resemblance to Ekman’s set. And with only one difference between the two sets,

which is caused by whether or not disgust is incorporated, our general label set and the one for OV are

the most similar to one another. By contrast, the label set for BG deviates the most from Ekman’s set:

only half of his basic emotions remained, and another emotion category, love, was added.

When we examine the frequency of each of the Ekman emotions in Table 2, a first striking

observation is that joy and sadness appear in all of the label sets. These two most frequent emotions are

then followed by anger, which appears in five of the six label sets as it not incorporated in the set for

BG. Fear is also not incorporated in the set for BG, but neither in the label set for tweets, which brings

fear to a total of four appearances. This is the same frequency score as for surprise, as surprise does not

appear in the label set for BZL, nor in the set for tweets. The last Ekman emotion to examine, and the

one which appears the least in the table above, is disgust. Apart from Ekman’s set, disgust only appears

once, namely in our label set for OV.

When it comes to the non-Ekman emotions mentioned in the overview, we can see that love and

optimism are the most popular emotions with two appearances each. Love is part of our label set for BG,

Page 54: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

45

but also appears in the framework for tweets. Remarkably, optimism even appears twice in our label sets

alone, in both the label set for OV and the general label set. The other non-Ekman emotion that appears

in one of our label sets is suffering, which is part of the set for BZL. On the other hand, the label set for

tweets also shows nervousness as a final label, but this emotion was not incorporated in any of our label

sets.

These observations, especially the similarity to Ekman’s set, suggest that Ekman’s emotions can

indeed be considered the most basic emotions that occur the most often and so some, if not all, should

therefore be incorporated in all label sets. Additionally, the non-Ekman emotions that complete the label

sets are found to give a more accurate indication of the specific data topics. For our research, this means

that those non-Ekman emotions summarise the main theme of the TV series by indicating the overall

emotional feeling. In short, we can infer that Ekman’s basic emotions framework can form a good

foundation for emotion label sets, but should be supplemented by other emotions that fit the content of

the data better.

5.3 Validity, reliability and added value

In this last section of the discussion, we want to reflect on the set-up of our study. It is important to

critically evaluate the process, as there are of course certain limitations to our study. First of all, the

annotation task was performed by one individual and was not reviewed by peers afterwards. Because of

this, it is possible that mistakes or odd annotations were left unseen. As the annotations are at the root

of this study and were used as input for the frequency and cluster analysis, it may have influenced the

outcome. However, we are confident that said mistakes were minimised. Clear guidelines based on IAA

studies4 were provided and the annotator in question was already familiarised with this type of work.

Secondly, emotion annotation remains quite a subjective matter. Even with guidelines

containing a clear explanation and several subcategories for each of the emotions, the labelling of

emotional content still relies on the intuition of the annotator. Results may differ depending on the

individual performing the task.

Thirdly, the selection of utterances from the transcriptions was not done arbitrarily. The 450

utterances for the subset were selected based on their emotional content, and utterances without

emotions were left out. This decision was made with the purpose of our study in mind, not to tamper

with the data or the results. Including objective utterances with no emotional content would simply be

of little to no use, as emotional content is exactly what we are trying to study. That is why the subset

should not be considered as a false representation of the topics, but rather a targeted selection focusing

on the emotional value.

4 The study on inter-annotator agreement is presented in section 3 of De Bruyne et al. (2019); see

https://www.lt3.ugent.be/publications/towards-an-empirically-grounded-framework-for-emot/ or

https://www.thinkmind.org/index.php?view=article&articleid=huso_2019_1_30_88038.

Page 55: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

46

Lastly, when choosing the umbrella terms for the clusters, priority was given to the Ekman

emotions, even if the cluster contained a non-Ekman emotion category with a higher frequency. We are

aware that this might make the final label sets biased and directs them towards a greater resemblance

with Ekman’s emotion set. However, as our intention was to compare the label sets from this study to

already existing frameworks, we decided to select the emotions that were incorporated in a well-known

and frequently used label set. If we had chosen another emotion category for the umbrella term, the

Ekman emotions in a cluster would still be represented by that other emotion label. For this reason, we

did not see any problem in favouring the Ekman emotions for the umbrella terms, as it would only

facilitate the comparison process and make the differences more distinct.

Overall, we are of the opinion that we have presented a valid study. Considering the limitations of this

study, our dataset of TV series transcriptions was extensive enough to give a good representation of the

different topics, and the label set we selected was broad enough to first cover a considerable share of the

emotion spectrum and then be narrowed down to a more limited set. While emotion annotation remains

a subjective task, we tried to maximise the objectivity and consistency of the annotations by providing

the annotator with clear guidelines. Our decisions during the research process were not based on personal

preference, but were made with the intention to achieve the best possible outcome.

For future work, it might be interesting to conduct a similar study with a bigger dataset to get a more

accurate representation of the predominant emotions in reality TV. As our research process can easily

be replicated, we also encourage researchers to apply this approach to various other genres and topics

in order to establish frameworks which are modified for specific types of data. This type of research

certainly has an added value in the field of NLP, as it provides empirically grounded emotion

frameworks. When opting for a domain-specific label set, machines can be trained more adequately for

automatic emotion detection and annotation in the interest of the content and purpose of the data. Some

of the already existing frameworks namely find their origins in psychological research rather than

linguistics and are therefore not intended for specific language-related purposes.

Page 56: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

47

6 CONCLUSION

Due to its many applications, emotion detection has become a popular research topic in the field of

natural language processing. While the majority of emotion frameworks have their origins in

psychology, many linguistic researchers still borrow those frameworks without any justification other

than the fact that they contain the most basic emotions. As there is no standard framework available, the

goal of this study was to provide a theoretically and empirically grounded framework for emotion

detection on Dutch data.

We used an extensive emotion framework consisting of 25 emotion categories to label 450 utterances

from Flemish reality TV transcriptions. This corpus of transcriptions incorporated utterances from three

TV series representing three different topics, with a selection of approximately 150 utterances per TV

series. Subsequently, we conducted a frequency and cluster analysis using those annotations, which

resulted in three topical label sets, as well as one general label set for this particular domain of reality

TV. This result immediately provides an affirmative answer to our first and main research question of

whether it is possible to deduce a label set from experimental cluster analysis, confirming our

hypothesis.

For the dataset, we selected three reality TV series with different topics so that we could not

only compare domains (reality TV versus tweets), but also different topics within the same domain. As

we had expected, the clusters and accordingly the final labels clearly differed depending on the topic:

The label set for the first topic (Bloed, Zweet en Luxeproblemen) consists of the five emotions joy, anger,

sadness, suffering and fear; the label set for the second topic (Blind Getrouwd) contains the four

emotions sadness, surprise, joy and love; finally the label set for the third topic (Ooit Vrij) comprises

the seven emotions joy, optimism, anger, sadness, disgust, fear and surprise. Moreover, the general

dataset, where all three topics were combined without making a distinction, also differs from each of

the topical label sets, as it consists of the six emotions joy, optimism, anger, fear, sadness and surprise.

In our discussion, we described the comparison of our label sets and clusters for reality TV to

those for tweets (De Bruyne et al., 2019). Although there were some similarities, such as the repeated

clustering of certain emotion categories, there was no clear match between the clusters for the two

domains. The label sets as well had some emotion categories in common, but still varied too much to

distinguish a pattern. This suggests that the domain is crucial when it comes to deciding which emotion

categories need to be included in the label set.

As our research revealed, it is very likely that the majority of label sets consists of basic emotions

such as those from Ekman’s model (1992). At the end of our discussion, we already argued that basic

emotions form a good basis for emotion frameworks, but should certainly be supported by other

emotions that are specifically adapted to the content of the data (depending on the domain and/or topic).

Page 57: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

48

To briefly answer our last subquestion: basic emotions are indeed a good foundation for emotion

frameworks, but basic emotions alone do not suffice.

In conclusion, first conducting a cluster analysis is a good way to motivate your choice of emotion labels

for certain data, as emotion clusters differ depending on the topic and domain. As our approach was

data-driven, the final label sets for emotion detection are motivated both theoretically and empirically

rather than selected arbitrarily, and will perform significantly better in the specific context they are

intended for.

Page 58: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

49

References

Airapetian, A. (2019). Emotion analysis in reality tv: A comparison between emotion annotations for

text and image. (Bachelor’s thesis). Ghent University, Belgium.

Bakliwal, A., Arora, P., & Varma, V. K. (2012). Entity centric opinion mining from blogs. In

Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology, 53–64.

Retrieved from https://semanticscholar.org

Bakliwal, A., Arora, P., Madhappan, S., Kapre, N., Singh, M., & Varma, V. K. (2012). Mining

sentiments from tweets. In Proceedings of the 3rd Workshop in Computational Approaches to

Subjectivity and Sentiment Analysis, 11-18. Retrieved from https://semanticscholar.org

Blashfield, R. K., & Aldenderfer, M. S. (1988). The methods and problems of cluster analysis. In

J. R. Nesselroade, & R. B. Cattell (Eds.), Perspectives on individual differences. Handbook

of multivariate experimental psychology, 447–473. Plenum Press. doi:10.1007/978-1-4613-

0893-5_14

Buechel, S., & Hahn, U. (2016). Emotion analysis as a regression problem — Dimensional models

and their implications on emotion representation and metrical evaluation. In ECAI 2016: 22nd

European Conference on Artificial Intelligence, 1114-1122. doi.10.3233/978-1-61499-672-9-

1114

Darwin, C. R. (1872). The expression of the emotions in man and animals. London: John Murray.

Retrieved from http://darwin-online.org.uk

Daumé III, H., & Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial

Intelligence Research, 26, 101–126. Retrieved from https://semanticscholar.org

De Bruyne, L., De Clercq, O., & Hoste, V. (2019). Towards an empirically grounded framework for

emotion analysis. In Proceedings of HUSO 2019, The fifth international conference on human

and social analytics, 11–16. Presented at the HUSO 2019: The Fifth International Conference

on Human and Social Analytics, IARIA, International Academy, Research, and Industry

Association. Retrieved from https://biblio.ugent.be/publication/8624200

Dice, L. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3),

297-302. doi:10.2307/1932409

Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3/4), 169-200.

doi:10.1080/02699939208411068

Ekman, P. (1997). Emotion families. In Semiotics around the World: Synthesis in Diversity, 191-193.

Berlin: Mouton de Gruyter. Retrieved from https://www.paulekman.com

Ekman, P. (1999). Basic emotions. In Handbook of Cognition and Emotion, 45-60. New York: John

Wiley & Sons Ltd. doi:10.1002/0470013494.ch3

Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of

Personality and Social Psychology, 17(2), 124-129. doi:10.1037/h0030377

Page 59: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

50

Ekman, P., Friesen, W. V., & Ellsworth, P. (1972). Emotion in the human face: Guidelines for

research and an integration of findings. New York: Pergamon Press. doi:10.1016/C2013-0-

02458-9

Fang, X., & Zhan, J. Z. (2015). Sentiment analysis using product review data. Journal of Big Data,

2(5), 1-14. doi:10.1186/s40537-015-0015-2

Fisher, L., & Van Ness, J. (1971). Admissible Clustering Procedures. Biometrika, 58(1), 91-104.

doi:10.2307/2334320

Frijda, N. H. (1988). The laws of emotion. American Psychologist, 43(5), 349-358.

doi:10.1037/0003-066X.43.5.349

Ghent University (2020). EmotioNL: Emotion detection for Dutch. Retrieved from

https://research.flw.ugent.be/en/projects/emotionl-emotion-detection-dutch

Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment

classification: a deep learning approach. In Proceedings of the 28th International Conference

on International Conference on Machine Learning (ICML’11), 513–520. Retrieved from

https://semanticscholar.org

Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and

blogs. In Proceedings of the International Conference on Weblogs and Social Media

(ICWSM’2007). Retrieved from https://semanticscholar.org

Houbregs, J. (Writer), & Belien, S. (Director). (2018). Bloed, Zweet en Luxeproblemen [Television

series]. In J. Houbregs (Executive Producer). Zaventem: Warner Bros. ITVP België.

Izard, C. E. (1971). The face of emotion. Appleton-Century-Crofts.

Izard, C. E. (1991). The psychology of emotions. New York: Springer. doi:10.1007/978-1-4899-0615-1

Lazar, C. (2012). Cluster analysis [PowerPoint slides]. Retrieved from

https://ai.vub.ac.be/sites/default/files/lecturemaster2011.pdf

Mohammad, S. M. (2016). Sentiment analysis: Detecting valence, emotions, and other affectual states

from text. In Emotion Measurement, 201-237. doi:10.1016/b978-0-08-100508-8.00009-6

Parrott, W. G. (2001). Emotions in social psychology: Essential readings. Philadelphia: Psychology

Press. Retrieved from http://books.google.be/books

Paul Ekman International. (2018, April 24). A brief look into Dr. Paul Ekman's early research.

Retrieved May 2019, from https://www.ekmaninternational.com/a-brief-history-into-paul-

ekmans-early-research/

Plutchik, R. (1962). The emotions: Facts, theories and a new model. New York: Random House.

Retrieved from https://archive.org

Plutchik, R. (1980). A general psychoevolutionary theory of emotion. Emotion: Theory, Research, and

Experience, 1(3), 3-33. doi:10.1016/B978-0-12-558701-3.50007-7

Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology,

39(6), 1161–1178. doi:10.1037/h0077714

Page 60: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

51

Russell, J., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of

Research in Personality, (11), 273-294. doi:10.1016/0092-6566(77)90037-X.

Schröder, M., Pirker, H., Lamolle, M. (2006). First suggestions for an emotion annotation and

representation language. In Proceedings of LREC, 88-92. Retrieved from

https://www.academia.edu/

Shaver, P., Schwartz, J., Kirson, D., & O'Connor, C. (1987). Emotion knowledge: Further exploration

of a prototype approach. Journal of Personality and Social Psychology, 52(6), 1061–

1086. doi:10.1037/0022-3514.52.6.1061

Sneath, P. H. A., & Sokal, R. R. (1973). Numerical Taxonomy: The Principles and Practice of

Numerical Classification. San Francisco: Freeman.

The Ekmans’ Atlas of Emotions. (n.d.) Retrieved May 2020, from http://atlasofemotions.org/

Thet, T. T., Na, J., & Khoo, C. S. (2010). Aspect-based sentiment analysis of movie reviews on

discussion boards. Journal of Information Science, 36(6), 823-848.

doi:10.1177/0165551510388123

Tomkins, S. S. (1962). Affect, imagery, consciousness. Volume I: The positive affects. New York:

Springer. Retrieved from http://books.google.be/books

Tomkins, S. S. (1984). Affect theory. In K. R. Scherer, & P. Ekman (Eds.), Approaches to Emotion,

163-195. Lawrence Erlbaum Associates.

Uytterhoeven, T. (Director). (2019). Ooit Vrij [Television series]. In I. Colpaert (Producer).

Vilvoorde: Woestijnvis.

Van Hoecke, E. (Director). (2019). Blind Getrouwd [Television series]. In M. Miller (Producer), L.

Lombaert (Executive Producer). Vilvoorde: Productie PIT, Antwerpen: DPG Media.

Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the

American Statistical Association, 58(301), 236-244. doi:10.2307/2282967

Wood, I.D., McCrae, J. P., Andryushechkin, V., & Buitelaar, P. (2018). A comparison of emotion

annotation approaches for text. Information, 9(5), 117. doi:10.3390/info9050117

Page 61: TOPICAL AND DOMAIN-SPECIFIC FRAMEWORKS FOR EMOTION DETECTION

52

Appendices

Appendix 1 (electronic): Annotations